ESTO Collaborative Astronomical Image Mosaics (using Montage) + TeraGrid Highlights! Daniel S. Katz (d.katz@ieee.org) Director for Cyberinfrastructure Development Center for Computation & Technology (CCT) Louisiana State University (LSU) University of Chicago (as of April 27th) TeraGrid GIG Director of Science Montage • • • An astronomical image mosaic service for the National Virtual Observatory Project web site - http://montage.ipac.caltech.edu/ Core team at IPAC and JPL Grid architecture developed in collaboration with ISI • • • • • • • • • Attila Bergou - JPL Nathaniel Anagnostou - IPAC Bruce Berriman - IPAC Ewa Deelman - ISI John Good - IPAC Joseph C. Jacob - JPL Daniel S. Katz – JPL/LSU/U.Chicago • • • • • • Carl Kesselman - ISI Anastasia Laity - IPAC Thomas Prince - Caltech Gurmeet Singh - ISI Mei-Hui Su - ISI Roy Williams - CACR Also heavily used in computer science as an exemplar workflow application http://montage.ipac.caltech.edu/ Non-Montage contributors • Roy Williams (CACR) • • • Alex Szalay (JHU) • • Developer of Atlasmaker Co-lead of NVO (-> USVO) Co-lead of NVO (-> USVO) Joe Jacob (JPL) • Developer of yourSky (predecessor to Montage) http://montage.ipac.caltech.edu/ Variables/images in astronomy • • • • • • • • Right Ascension Declination Energy/flux Date/time Wavelength Instrument Hold all but 2 constant, plot vs. energy to make an B&W image; use color to represent another variable Spectra: energy vs. frequency with all others constant http://montage.ipac.caltech.edu/ Data and IVOA Standards • How is data stored? • • • Images (normally in FITS, could be in VOTable) Tables (VOTable, uses XML) What data exists? • From a survey (e.g., 2MASS, DPOSS, SDSS) • Catalog (lists of sources, stored in VOTable) • Specta (stored in VOTable) • Images • Where is it? • • Produced by project, maybe with replicas elsewhere How is it cataloged/accessed? • • Cone search – given a location and radius, return a table containing sources that overlap that circle Simple Image Access – given a location and size, return a table containing images that {overlap, …} that region http://montage.ipac.caltech.edu/ WCS (World Coordinate System) • Various projections to transform (celestial) spherical space to 2-dimensional space Platte-carrée (Cartesian) Mercator Hammer-Aitoff Images from http://howdyyall.com/Texas/Members/Bob/GPS/Projectn.htm http://montage.ipac.caltech.edu/ What is Montage? • Deliver science grade astronomical image mosaics • An image mosaic is a combination of individual pixel data in many images so that they appear to be a single image measured with a telescope or spacecraft • Preserve astrometry (position) and photometry (flux) of images • Backgrounds rectified to a common level (sky emission, instrument signatures, . . .) • Scientific Use Cases • Structures in the sky are usually larger than individual images • High signal-to-noise images for studies of faint sources • Multiwavelength image federation • Images at different wavelengths have differing parameters (coordinates, projections, spatial samplings, . . .) • Place multiwavelength images on common set of image parameters to support faint source extraction http://montage.ipac.caltech.edu/ What is Montage? (cont.) • Delivers custom mosaics • • • User specified projection, coordinates, spatial sampling, mosaic size, image rotation Highly scalable - use on desktops for small mosaics or on a grid or cluster for large mosaics Extensible “toolkit” design • Stand-alone (loosely coupled) engines for image reprojection, background rectification, co-addition provides flexibility to users; e.g • Use only as reprojection and co-registration engine • Apply custom algorithms for, e.g., co-addition or background rectification with no impact on other engines • • • Implemented in ANSI C for portability Low testing cost Public service • • • Order mosaics through web portal Small cluster at IPAC for small jobs Uses public resources for larger jobs (specifics later) http://montage.ipac.caltech.edu/ Sample Mosaics 100 µm sky; aggregation of COBE and IRAS maps (Schlegel, Finkbeiner and Davis, 1998) • 360 x 180 degrees; CAR projection 2MASS 3-color mosaic of galactic plane • • • 44 x 8 degrees; 36.5 GB per band; CAR projection 158,400 x 28,800 pixels; covers 0.8% of the sky 4 hours wall clock time on cluster of 4 x 1.4-GHz Linux boxes http://montage.ipac.caltech.edu/ Montage Users Montage Adopted by Spitzer Legacy Teams GLIMPSE uses Montage for: • • Large scale infrared mosaics of the Galactic Plane Quality Assurance of GLIMPSE Spitzer data by co-registration of 2MASS J, H, K and MSX 8 µm images SWIRE uses Montage for: • • Building sky simulations for use in mission planning Fast background rectification and co-addition of in-flight images Right: Spitzer IRAC 3 channel mosaic (3.6µm in green, 4.5µm in red, and i-band optical in blue); high redshift non-stellar objects are visible in the full resolution view (yellow box). http://montage.ipac.caltech.edu/ Color composite of co-registered 2MASS and MSX. Each square is 0.5 x 0.5 degrees Montage Users (cont.) • Spitzer Space Telescope Outreach • IRSA (NASA’s InfraRed Science Archive) • COSMOS (a Hubble Treasury Program) • Michelson Science Center Navigator Program ground-based archives • NSF National Virtual Observatory (NVO) Atlasmaker project • UK Astrogrid project http://montage.ipac.caltech.edu/ Montage in Computer Science • • • • • • Pegasus/DAGman – coming in this talk ASKALON - Marek Wieczorek, Radu Prodan, and Thomas Fahringer. Scheduling of scientific workflows in the ASKALON grid environment. ACM SIGMOD Record, 34(3):52–62, 2005. QoS-enabled GridFTP - Marty Humphrey and Sang-Min Park. Data throttling for data-intensive workflows. In Proceedings of 22nd IEEE International Parallel and Distributed Processing Symposium, 2008. SWIFT - Borja Sotomayor, Kate Keahey, Ian Foster, and Tim Freeman. Enabling cost-effective resource leases with virtual machines. In Proceedings of HPDC 2007, 2007. SCALEA-G - Hong-Linh Truong, Thomas Fahringer, and Schahram Dustda. Dynamic instrumentation, performance monitoring and analysis of grid scientific workflows. Journal of Grid Computing, 2005(3):1–18, 2005 VGRaDS - VGRaDS: Montage, a project providing a portable, computeintensive service delivering custom mosaics on demand. http:// vgrads.rice.edu/research/applications/montage. http://montage.ipac.caltech.edu/ First Public Release of Montage • • Version 1 emphasized accuracy in photometry and astrometry • Images processed serially • Tested and validated on 2MASS 2IDR images on Red Hat Linux 8.0 (Kernel release 2.4.18-14) on a 32-bit processor • Tested on 10 WCS projections with mosaics smaller than 2 x 2 degrees and coordinate transformations Equ J2000 to Galactic and Ecliptic Extensively tested • 2,595 test cases executed • 119 defects reported and 116 corrected • 3 remaining defects renamed caveats • Corrected in Montage v2 release http://montage.ipac.caltech.edu/ Later Public Releases of Montage • • • Second release: Montage version 2.2 • More efficient reprojection algorithm: up to 30x speedup • Improved memory efficiency: capable of building larger mosaics • Enabled for parallel computation with MPI • Enabled for processing on TeraGrid using standard grid tools (TRL 7) Third release: Montage version 3.0 • Data access modules • Tiled output • Outreach tool to build multi-band jpeg images • Other improvements in processing speed and accuracy • Bug fixes Code and User’s Guide available for download at http:// montage.ipac.caltech.edu/ http://montage.ipac.caltech.edu/ Montage v1.7 Reprojection: mProject module Arbitrary Input Image SIMPLE = T / BITPIX= -64 / NAXIS = 2 / NAXIS1= 3000 / NAXIS2= 3000 / CDELT1= 3.333333E-4 / CDELT2= 3.333333E-4 / CRPIX1= 1500.5 / CRPIX2= 1500.5 / CTYPE1=‘RA---TAN’ CTYPE2=‘DEC--TAN’ CRVAL1= 265.91334 / CRVAL2= -29.35778 / CROTA2= 0. / END Central to the algorithm is accurate calculation of the area of spherical polygon intersection between two pixels (assumes great circle segments are adequate between pixel vertices) Input pixels projected on celestial sphere Output pixels projected on celestial sphere FITS header defines output projection Reprojected Image http://montage.ipac.caltech.edu/ Montage v2.2 Reprojection: mProjectPP module • Transform directly from input pixels to output pixels • • • Approach developed by Spitzer for tangent plane projections Performance improvement in reprojection by x 30 Montage version 2.2 includes a module, mTANHdr, to compute “distorted” gnomonic projections to make this approach more general • • • Allows the Spitzer algorithm to be used for other projections (in certain cases) For “typical” size images, pixel locations distorted by small distance relative to image projection plane Not applicable to wide area regions http://montage.ipac.caltech.edu/ Montage Background Rectification Example: Three overlapping reprojected 2MASS images Differences in overlap areas • A correction is calculated for each image based on all the differences between it and its neighbors • Approximation to a least squares fit to the difference data with brightness outlier pixels excluded • The correction is currently a plane but could be a higher order surface. http://montage.ipac.caltech.edu/ Montage Background Rectification Results Reprojected and BackgroundRectified Images http://montage.ipac.caltech.edu/ Montage Workflow mProject 1 3 2 1 mProject 2 1 mDiff 1 2 D12 2 Final Mosaic (Overlapping Tiles) mAdd 2 mAdd 1 mProject 3 3 mBackground 1 mDiff 2 3 D23 mBackground 2 a1x + b1y + c1 = 0 a2x + b2y + c2 = 0 a3x + b3y + c3 = 0 mFitplane D12 mFitplane D23 mBgModel ax + by + c = 0 dx + ey + f = 0 mConcatFit ax + by + c = 0 dx + ey + f = 0 http://montage.ipac.caltech.edu/ mBackground 3 Montage on the Grid • “Grid” is an abstraction • • Array of processors, grid of clusters, … Use a methodology for running on any “grid environment” • Exploit Montage’s modular design in an approach applicable to any grid environment • Describe flow of data and processing (in a Directed Acyclic Graph - DAG), including: • Which data are needed by which part of the job • What is to be run and when • • Build an architecture for ordering a mosaic through a web portal • • • Use standard grid tools to exploit the parallelization inherent in the Montage design Request can be processed on a grid Currently using TeraGrid This is just one example of how Montage could run on a grid http://montage.ipac.caltech.edu/ Montage on the Grid Using Pegasus (Planning for Execution on Grids) Example DAG for 10 input files Maps an abstract workflow to an executable form mProject Pegasus mDiff http://pegasus.isi.edu/ mFitPlane mConcatFit Grid Information Systems mBgModel mBackground Information about available resources, data location mAdd Condor DAGMan Data Stage-in nodes Executes the workflow Montage compute nodes Data stage-out nodes Registration nodes MyProxy User’s grid credentials http://montage.ipac.caltech.edu/ Grid Montage TeraGrid Portal Location, Size, Band User Portal Workflow JPL mDAGFiles JPL mGridExec Grid Scheduling and Execution Abstract Service Abstract Workflow Abstract Workflow Service m2MASSList Image List 2MASS Image List Service IPAC ISI Pegasus Concrete Workflow Condor DAGman DAGMan Computational Grid TeraGrid Clusters SDSC mNotify User Notification Service NCSA IPAC ISI Condor Pool http://montage.ipac.caltech.edu/ Montage Performance • MPI version is baseline • Processing is done stage by stage • Each stage uses one directory for input, another for output • Dynamic – examines input directory at runtime • Parallelism -> MPI executive using round robin processing • • • Best performance No fault tolerance, other than restart from last successful stage Grid version • • • • No stages, just dependencies based on data Completely static – DAG generated before run starts Good enough performance (>90%) for large mosaics Fault tolerance inherent – just failed task restarted http://montage.ipac.caltech.edu/ Montage and Web 2.0 • Outreach mosaics for image sharing • • Flickr/blog/etc. Output mosaics to be entered in open/multiple-contributor catalog (atlas) • All public images/data can be tagged • • • • • • • By design to build atlas As generated by Montage as products of research questions Using virtual data concept Mashups to be made from Montage components • • RSS feed to announce new images? Collaborative building of specific atlases • • Automated metadata tagging too? First need to make components into services Allows other services to be used in Montage workflow Wiki/Blog for discussion of how to use Montage, successes and failures, and user feedback Integration with Google Sky – generate Montage mosaics on the fly, including caching/building atlases http://montage.ipac.caltech.edu/ Montage Summary • Montage is a custom astronomical image mosaicking service that emphasizes astrometric and photometric accuracy • Montage version 3, available for download at the Montage website: http://montage.ipac.caltech.edu/ • Montage is being actively used by important science projects for observation planning and generation of science and E/PO products (e.g., SWIRE, GLIMPSE, IRSA, COSMOS, Spitzer Outreach, etc.) • Montage employs a modular design for flexibility and ease of use • • Modules are standalone and may be run sequentially or in parallel on cluster or grid computers • A prototype Montage service has been deployed on the TeraGrid, tying together distributed services at JPL, Caltech IPAC, and ISI • A prototype Web portal interface has been deployed at IPAC Developed in the HPC/Grid world, Cloud & Web 2.0 usage coming? http://montage.ipac.caltech.edu/ What is the TeraGrid? • • World’s largest open scientific discovery infrastructure Leadership class resources at eleven partner sites combined to create an integrated, persistent computational resource • • High-performance networks High-performance computers (>1 Pflops (~100,000 cores) -> 1.75 Pflops) • And a Condor pool (w/ ~13,000 CPUs) • • • • • • Allocated to US researchers and their collaborators through national peer-review process • • Visualization systems Data Collections (>30 PB, >100 discipline-specific databases) Science Gateways User portal User services - Help desk, training, advanced app support Generally, review of computing, not science Extremely user-driven • MPI jobs, ssh or grid (GRAM) access, etc. http://montage.ipac.caltech.edu/ TeraGrid http://montage.ipac.caltech.edu/ TeraGrid Future • Current RP agreements end in March 2010 • • TG XD (eXtreme Digital) starts in April 2010 2011 • • Potential interoperation with OSG and others Current TG GIG continues through July 2010 • • • Except track 2 centers (current and future) Allows four months of overlap in coordination Probable overlap between GIG and XD members Current TG (RPs and GIG) 1-year extension proposal being reviewed now http://montage.ipac.caltech.edu/ Who Uses TeraGrid (2007) Advanced Scientific Atmospheric Computing Sciences 2% 3% All 19 Others 4% Earth Sciences 3% Chemical, Thermal Systems 5% Materials Research 6% Molecular Biosciences 31% Astronomical Sciences 12% Physics 17% http://montage.ipac.caltech.edu/ Chemistry 17% How TeraGrid Is Used (2006) Batch Computing on Individual Resources +10%/year Exploratory and Application Porting +20%/year Workflow, Ensemble, and Parameter Sweep +15%/year Science Gateway Access +200%/year Remote Interactive Steering and Visualization +10%/year Tightly-Coupled Distributed Computation +0%/year http://montage.ipac.caltech.edu/ 850 650 250 500 35 10 Conclusions • Montage has proven useful to astronomers and computer scientists • • Experiments with Web 2.0 technologies starting TeraGrid • • • • Very large open platform for US academics and partners Most cycles used by traditional HPC users Most users coming through gateways Transitions coming – may include using/exposing Web 2.0 technologies http://montage.ipac.caltech.edu/