Collaborative Astronomical Image Mosaics (using Montage) + TeraGrid Highlights! ESTO

advertisement
ESTO
Collaborative Astronomical Image Mosaics
(using Montage) + TeraGrid Highlights!
Daniel S. Katz (d.katz@ieee.org)
Director for Cyberinfrastructure Development
Center for Computation & Technology (CCT)
Louisiana State University (LSU)
University of Chicago (as of April 27th)
TeraGrid GIG Director of Science
Montage
• 
• 
• 
An astronomical image mosaic service for the
National Virtual Observatory
Project web site - http://montage.ipac.caltech.edu/
Core team at IPAC and JPL
Grid architecture developed in collaboration with ISI
• 
• 
• 
• 
• 
• 
• 
• 
• 
Attila Bergou - JPL
Nathaniel Anagnostou - IPAC
Bruce Berriman - IPAC
Ewa Deelman - ISI
John Good - IPAC
Joseph C. Jacob - JPL
Daniel S. Katz – JPL/LSU/U.Chicago
• 
• 
• 
• 
• 
• 
Carl Kesselman - ISI
Anastasia Laity - IPAC
Thomas Prince - Caltech
Gurmeet Singh - ISI
Mei-Hui Su - ISI
Roy Williams - CACR
Also heavily used in computer science as an exemplar
workflow application
http://montage.ipac.caltech.edu/
Non-Montage contributors
• 
Roy Williams (CACR)
• 
• 
• 
Alex Szalay (JHU)
• 
• 
Developer of Atlasmaker
Co-lead of NVO (-> USVO)
Co-lead of NVO (-> USVO)
Joe Jacob (JPL)
• 
Developer of yourSky (predecessor to Montage)
http://montage.ipac.caltech.edu/
Variables/images in astronomy
• 
• 
• 
• 
• 
• 
• 
• 
Right Ascension
Declination
Energy/flux
Date/time
Wavelength
Instrument
Hold all but 2 constant, plot vs. energy to make an B&W
image; use color to represent another variable
Spectra: energy vs. frequency with all others constant
http://montage.ipac.caltech.edu/
Data and IVOA Standards
• 
How is data stored?
• 
• 
• 
Images (normally in FITS, could be in VOTable)
Tables (VOTable, uses XML)
What data exists?
• 
From a survey (e.g., 2MASS, DPOSS, SDSS)
•  Catalog (lists of sources, stored in VOTable)
•  Specta (stored in VOTable)
•  Images
• 
Where is it?
• 
• 
Produced by project, maybe with replicas elsewhere
How is it cataloged/accessed?
• 
• 
Cone search – given a location and radius, return a table
containing sources that overlap that circle
Simple Image Access – given a location and size, return a table
containing images that {overlap, …} that region
http://montage.ipac.caltech.edu/
WCS (World Coordinate System)
• 
Various projections to transform (celestial) spherical space
to 2-dimensional space
Platte-carrée
(Cartesian)
Mercator
Hammer-Aitoff
Images from http://howdyyall.com/Texas/Members/Bob/GPS/Projectn.htm
http://montage.ipac.caltech.edu/
What is Montage?
• 
Deliver science grade astronomical image mosaics
• 
An image mosaic is a combination of individual pixel data in many
images so that they appear to be a single image measured with a
telescope or spacecraft
•  Preserve astrometry (position) and photometry (flux) of images
•  Backgrounds rectified to a common level (sky emission, instrument
signatures, . . .)
• 
Scientific Use Cases
• 
Structures in the sky are usually larger than individual images
• 
High signal-to-noise images for studies of faint sources
• 
Multiwavelength image federation
•  Images at different wavelengths have differing parameters
(coordinates, projections, spatial samplings, . . .)
•  Place multiwavelength images on common set of image parameters
to support faint source extraction
http://montage.ipac.caltech.edu/
What is Montage? (cont.)
• 
Delivers custom mosaics
• 
• 
• 
User specified projection, coordinates, spatial sampling, mosaic size,
image rotation
Highly scalable - use on desktops for small mosaics or on a grid or
cluster for large mosaics
Extensible “toolkit” design
• 
Stand-alone (loosely coupled) engines for image reprojection, background
rectification, co-addition provides flexibility to users; e.g
•  Use only as reprojection and co-registration engine
•  Apply custom algorithms for, e.g., co-addition or background rectification with
no impact on other engines
• 
• 
• 
Implemented in ANSI C for portability
Low testing cost
Public service
• 
• 
• 
Order mosaics through web portal
Small cluster at IPAC for small jobs
Uses public resources for larger jobs (specifics later)
http://montage.ipac.caltech.edu/
Sample Mosaics
100 µm sky; aggregation of COBE and
IRAS maps (Schlegel, Finkbeiner and
Davis, 1998)
• 
360 x 180 degrees; CAR projection
2MASS 3-color mosaic of galactic plane
• 
• 
• 
44 x 8 degrees; 36.5 GB per band; CAR projection
158,400 x 28,800 pixels; covers 0.8% of the sky
4 hours wall clock time on cluster of 4 x 1.4-GHz Linux boxes
http://montage.ipac.caltech.edu/
Montage Users
Montage Adopted by Spitzer Legacy Teams
GLIMPSE uses Montage for:
• 
• 
Large scale infrared mosaics of the Galactic
Plane
Quality Assurance of GLIMPSE Spitzer data
by co-registration of 2MASS J, H, K and MSX
8 µm images
SWIRE uses Montage for:
• 
• 
Building sky simulations for use
in mission planning
Fast background rectification and
co-addition of in-flight images
Right: Spitzer IRAC 3 channel mosaic (3.6µm
in green, 4.5µm in red, and i-band optical in
blue); high redshift non-stellar objects are
visible in the full resolution view (yellow box).
http://montage.ipac.caltech.edu/
Color composite of co-registered
2MASS and MSX. Each square
is 0.5 x 0.5 degrees
Montage Users (cont.)
• 
Spitzer Space Telescope Outreach
• 
IRSA (NASA’s InfraRed Science Archive)
• 
COSMOS (a Hubble Treasury Program)
• 
Michelson Science Center Navigator Program
ground-based archives
• 
NSF National Virtual Observatory (NVO)
Atlasmaker project
• 
UK Astrogrid project
http://montage.ipac.caltech.edu/
Montage in Computer Science
• 
• 
• 
• 
• 
• 
Pegasus/DAGman – coming in this talk
ASKALON - Marek Wieczorek, Radu Prodan, and Thomas Fahringer.
Scheduling of scientific workflows in the ASKALON grid environment.
ACM SIGMOD Record, 34(3):52–62, 2005.
QoS-enabled GridFTP - Marty Humphrey and Sang-Min Park. Data
throttling for data-intensive workflows. In Proceedings of 22nd IEEE
International Parallel and Distributed Processing Symposium, 2008.
SWIFT - Borja Sotomayor, Kate Keahey, Ian Foster, and Tim Freeman.
Enabling cost-effective resource leases with virtual machines. In
Proceedings of HPDC 2007, 2007.
SCALEA-G - Hong-Linh Truong, Thomas Fahringer, and Schahram
Dustda. Dynamic instrumentation, performance monitoring and analysis of
grid scientific workflows. Journal of Grid Computing, 2005(3):1–18, 2005
VGRaDS - VGRaDS: Montage, a project providing a portable, computeintensive service delivering custom mosaics on demand. http://
vgrads.rice.edu/research/applications/montage.
http://montage.ipac.caltech.edu/
First Public Release of Montage
• 
• 
Version 1 emphasized accuracy in photometry and
astrometry
• 
Images processed serially
• 
Tested and validated on 2MASS 2IDR images on Red Hat Linux
8.0 (Kernel release 2.4.18-14) on a 32-bit processor
• 
Tested on 10 WCS projections with mosaics smaller than 2 x 2
degrees and coordinate transformations Equ J2000 to Galactic
and Ecliptic
Extensively tested
• 
2,595 test cases executed
• 
119 defects reported and 116 corrected
• 
3 remaining defects renamed caveats
•  Corrected in Montage v2 release
http://montage.ipac.caltech.edu/
Later Public Releases of Montage
• 
• 
• 
Second release: Montage version 2.2
• 
More efficient reprojection algorithm: up to 30x speedup
• 
Improved memory efficiency: capable of building larger mosaics
• 
Enabled for parallel computation with MPI
• 
Enabled for processing on TeraGrid using standard grid tools (TRL 7)
Third release: Montage version 3.0
• 
Data access modules
• 
Tiled output
• 
Outreach tool to build multi-band jpeg images
• 
Other improvements in processing speed and accuracy
• 
Bug fixes
Code and User’s Guide available for download at http://
montage.ipac.caltech.edu/
http://montage.ipac.caltech.edu/
Montage v1.7 Reprojection:
mProject module
Arbitrary Input
Image
SIMPLE =
T /
BITPIX=
-64 /
NAXIS =
2 /
NAXIS1=
3000 /
NAXIS2=
3000 /
CDELT1=
3.333333E-4 /
CDELT2=
3.333333E-4 /
CRPIX1=
1500.5 /
CRPIX2=
1500.5 /
CTYPE1=‘RA---TAN’
CTYPE2=‘DEC--TAN’
CRVAL1=
265.91334 /
CRVAL2=
-29.35778 /
CROTA2=
0. /
END
Central to the algorithm is accurate
calculation of the area of spherical
polygon intersection between two pixels
(assumes great circle segments are
adequate between pixel vertices)
Input pixels
projected on
celestial sphere
Output pixels
projected on
celestial sphere
FITS header defines output projection
Reprojected
Image
http://montage.ipac.caltech.edu/
Montage v2.2 Reprojection:
mProjectPP module
• 
Transform directly from input
pixels to output pixels
• 
• 
• 
Approach developed by Spitzer for
tangent plane projections
Performance improvement in
reprojection by x 30
Montage version 2.2 includes a module,
mTANHdr, to compute “distorted”
gnomonic projections to make this
approach more general
• 
• 
• 
Allows the Spitzer algorithm to be used for
other projections (in certain cases)
For “typical” size images, pixel locations
distorted by small distance relative to image
projection plane
Not applicable to wide area regions
http://montage.ipac.caltech.edu/
Montage Background Rectification
Example: Three overlapping
reprojected 2MASS images
Differences in overlap areas
•  A correction
is calculated for each image based on all the differences between it and its
neighbors
•  Approximation to a least squares fit to the difference data with brightness outlier pixels excluded
•  The
correction is currently a plane but could be a higher order surface.
http://montage.ipac.caltech.edu/
Montage Background Rectification Results
Reprojected and BackgroundRectified Images
http://montage.ipac.caltech.edu/
Montage Workflow
mProject 1
3
2
1
mProject 2
1
mDiff 1 2
D12
2
Final Mosaic
(Overlapping Tiles)
mAdd 2
mAdd 1
mProject 3
3
mBackground 1
mDiff 2 3
D23
mBackground 2
a1x + b1y + c1 = 0
a2x + b2y + c2 = 0
a3x + b3y + c3 = 0
mFitplane D12
mFitplane D23
mBgModel
ax + by + c = 0
dx + ey + f = 0
mConcatFit
ax + by + c = 0
dx + ey + f = 0
http://montage.ipac.caltech.edu/
mBackground 3
Montage on the Grid
• 
“Grid” is an abstraction
• 
• 
Array of processors, grid of clusters, …
Use a methodology for running on any “grid environment”
• 
Exploit Montage’s modular design in an approach applicable to any grid
environment
•  Describe flow of data and processing (in a Directed Acyclic Graph - DAG),
including:
• Which data are needed by which part of the job
• What is to be run and when
• 
• 
Build an architecture for ordering a mosaic through a web portal
• 
• 
• 
Use standard grid tools to exploit the parallelization inherent in the
Montage design
Request can be processed on a grid
Currently using TeraGrid
This is just one example of how Montage could run on a grid
http://montage.ipac.caltech.edu/
Montage on the Grid Using
Pegasus (Planning for Execution on Grids)
Example DAG for 10 input files
Maps an abstract workflow to an executable form mProject
Pegasus
mDiff
http://pegasus.isi.edu/
mFitPlane
mConcatFit
Grid Information Systems
mBgModel
mBackground
Information about
available resources,
data location
mAdd
Condor DAGMan
Data Stage-in nodes
Executes the workflow
Montage compute nodes
Data stage-out nodes
Registration nodes
MyProxy
User’s grid credentials
http://montage.ipac.caltech.edu/
Grid
Montage TeraGrid Portal
Location, Size, Band
User Portal
Workflow
JPL
mDAGFiles
JPL
mGridExec Grid Scheduling
and Execution
Abstract
Service
Abstract
Workflow
Abstract
Workflow
Service
m2MASSList
Image
List
2MASS
Image List
Service
IPAC
ISI
Pegasus
Concrete Workflow
Condor DAGman
DAGMan
Computational
Grid
TeraGrid Clusters
SDSC
mNotify
User
Notification
Service
NCSA
IPAC
ISI
Condor Pool
http://montage.ipac.caltech.edu/
Montage Performance
• 
MPI version is baseline
• 
Processing is done stage by stage
•  Each stage uses one directory for input, another for output
•  Dynamic – examines input directory at runtime
•  Parallelism -> MPI executive using round robin processing
• 
• 
• 
Best performance
No fault tolerance, other than restart from last
successful stage
Grid version
• 
• 
• 
• 
No stages, just dependencies based on data
Completely static – DAG generated before run starts
Good enough performance (>90%) for large mosaics
Fault tolerance inherent – just failed task restarted
http://montage.ipac.caltech.edu/
Montage and Web 2.0
• 
Outreach mosaics for image sharing
• 
• 
Flickr/blog/etc.
Output mosaics to be entered in open/multiple-contributor catalog (atlas)
• 
All public images/data can be tagged
• 
• 
• 
• 
• 
• 
• 
By design to build atlas
As generated by Montage as products of research questions
Using virtual data concept
Mashups to be made from Montage components
• 
• 
RSS feed to announce new images?
Collaborative building of specific atlases
• 
• 
Automated metadata tagging too?
First need to make components into services
Allows other services to be used in Montage workflow
Wiki/Blog for discussion of how to use Montage, successes and failures, and
user feedback
Integration with Google Sky – generate Montage mosaics on the fly, including
caching/building atlases
http://montage.ipac.caltech.edu/
Montage Summary
• 
Montage is a custom astronomical image mosaicking service that
emphasizes astrometric and photometric accuracy
• 
Montage version 3, available for download at the Montage website:
http://montage.ipac.caltech.edu/
• 
Montage is being actively used by important science projects for
observation planning and generation of science and E/PO products
(e.g., SWIRE, GLIMPSE, IRSA, COSMOS, Spitzer Outreach, etc.)
• 
Montage employs a modular design for flexibility and ease of use
• 
• 
Modules are standalone and may be run sequentially or in parallel on
cluster or grid computers
• 
A prototype Montage service has been deployed on the TeraGrid, tying
together distributed services at JPL, Caltech IPAC, and ISI
• 
A prototype Web portal interface has been deployed at IPAC
Developed in the HPC/Grid world, Cloud & Web 2.0 usage
coming?
http://montage.ipac.caltech.edu/
What is the TeraGrid?
• 
• 
World’s largest open scientific discovery infrastructure
Leadership class resources at eleven partner sites combined to create
an integrated, persistent computational resource
• 
• 
High-performance networks
High-performance computers (>1 Pflops (~100,000 cores) -> 1.75 Pflops)
•  And a Condor pool (w/ ~13,000 CPUs)
• 
• 
• 
• 
• 
• 
Allocated to US researchers and their collaborators through national
peer-review process
• 
• 
Visualization systems
Data Collections (>30 PB, >100 discipline-specific databases)
Science Gateways
User portal
User services - Help desk, training, advanced app support
Generally, review of computing, not science
Extremely user-driven
• 
MPI jobs, ssh or grid (GRAM) access, etc.
http://montage.ipac.caltech.edu/
TeraGrid
http://montage.ipac.caltech.edu/
TeraGrid Future
• 
Current RP agreements end in March 2010
• 
• 
TG XD (eXtreme Digital) starts in April 2010 2011
• 
• 
Potential interoperation with OSG and others
Current TG GIG continues through July 2010
• 
• 
• 
Except track 2 centers (current and future)
Allows four months of overlap in coordination
Probable overlap between GIG and XD members
Current TG (RPs and GIG) 1-year extension
proposal being reviewed now
http://montage.ipac.caltech.edu/
Who Uses TeraGrid (2007)
Advanced Scientific
Atmospheric
Computing
Sciences
2%
3%
All 19 Others
4%
Earth Sciences
3%
Chemical, Thermal
Systems
5%
Materials Research
6%
Molecular
Biosciences
31%
Astronomical
Sciences
12%
Physics
17%
http://montage.ipac.caltech.edu/
Chemistry
17%
How TeraGrid Is Used (2006)
Batch Computing on Individual Resources
+10%/year
Exploratory and Application Porting
+20%/year
Workflow, Ensemble, and Parameter Sweep
+15%/year
Science Gateway Access
+200%/year
Remote Interactive Steering and Visualization
+10%/year
Tightly-Coupled Distributed Computation
+0%/year
http://montage.ipac.caltech.edu/
850
650
250
500
35
10
Conclusions
• 
Montage has proven useful to astronomers and
computer scientists
• 
• 
Experiments with Web 2.0 technologies starting
TeraGrid
• 
• 
• 
• 
Very large open platform for US academics and
partners
Most cycles used by traditional HPC users
Most users coming through gateways
Transitions coming – may include using/exposing Web
2.0 technologies
http://montage.ipac.caltech.edu/
Download