VIRTUAL OBSERVATORY TECHNOLOGIES 7/30/2010 Tamás Budavári / The Johns Hopkins University Moore’s Law, Big Data! Tamás Budavári 2 7/30/2010 Outline 3 Tamás Budavári SQL for Big Data Computing Database and GPU integration CUDA from SQL Data intensive Web services Behind where the bytes are the scenes Working examples Sloan Digital Sky Survey Virtual Observatory tools and services 7/30/2010 The Virtual Observatory 4 Tamás Budavári “The Virtual Observatory is a framework that enables new astronomical research by greatly enhancing access to worldwide data and computing resources.” http://us-vo.org/ How it works How to build it How to use it What’s next 7/30/2010 Hierarchy of Services 5 Tamás Budavári Atomic services Access to observations, simulations Access to models Higher level services Combine for more functionality User and analysis tools Can be a high level service, too 7/30/2010 Heterogeneous Datasets 6 Tamás Budavári Blobs: images, spectra, etc... Access, transfer Catalogs Fast searches, indexes 7/30/2010 Structured Query Language 7 Tamás Budavári SQL`92 standard Almost in English SELECT <columns> FROM <table> WHERE <conditions> Astronomical Data Query Language An extended subset GIS-like spatial 7/30/2010 Structured Query Language 8 Tamás Budavári SQL`92 standard Almost in English SELECT RA, Dec FROM Stars WHERE r < 15 Astronomical Data Query Language An extended subset GIS-like spatial 7/30/2010 Joining Tables 9 Tamás Budavári Sources in observations fields: 2 tables SELECT f.FieldID, … s.ObjID, s.RA, s.Dec, … FROM Fields AS f INNER JOIN Sources AS s ON s.FieldID=f.FieldID WHERE f.ExpTime > 1000 AND s.Rmag > 16 7/30/2010 Calculations in SQL 10 Tamás Budavári Computed columns Use J-H in SELECT and/or WHERE Similarly functions, e.g., POWER(10,-0.4*Rmag) Grouping SELECT FieldID, AVG(J), STDEV(J) FROM Sources GROUP BY FieldID Can use for histograming, etc… E.g., SDSS Catalog Archive here 7/30/2010 Surveys in Astronomy 11 Tamás Budavári Sloan Digital Sky Survey 2001-2008 8TB Catalog Archive Server Custom tools and indices Upcoming Surveys PanSTARRS: 100TB 2010 LSST: 1PB+ 201? New Moore’s Law 12 Tamás Budavári In the number of cores Faster than ever (for now) 7/30/2010 New Programming Paradigm 13 Tamás Budavári 100s of cores – 27k parallel threads per GPU Running Forget the fancy old algorithms Built a billion threads a second on wrong assumptions Today CPU is free, RAM is slow GPU has >50GB/s bandwidth Still difficult to occupy the cores 7/30/2010 Hybrid Architecture 14 Tamás Budavári launch run sync 7/30/2010 Extending SQL Server 15 Tamás Budavári Dedicated service for direct access Shared memory IPC w/ on-the-fly data transform IPC SQL 7/30/2010 Extending SQL Server 16 Tamás Budavári Dedicated service for direct access Shared memory IPC w/ on-the-fly data transform IPC SQL 7/30/2010 Spatial Statistics 17 Tamás Budavári Correlation functions From pair-counts State of the art Dual-tree 8 bins traversal High resolution bins? Just like brute force 7/30/2010 Sloan DR7 18 Tamás Budavári 800800 bins All Done Inside the Database 19 Tamás Budavári Pair counts computed on GPU Returns 2D histogram as a table (i, j, cts) Calculate the correlation fn in SQL Can also do async parallel GPU jobs 7/30/2010 All Done Inside the Database 20 Tamás Budavári Pair counts computed on GPU Returns 2D histogram as a table (i, j, cts) Calculate the correlation fn in SQL Can also do async parallel GPU jobs 7/30/2010 21 Distributed Data Data at the Projects 22 Tamás Budavári Exponential growth Projects last 3-5 years, data sent upwards at the end Data will never be centralized Most data at projects More responsibility on projects Bring analysis close to the data 7/30/2010 23 Tamás Budavári 7/30/2010 Data Federation 24 Tamás Budavári Metcalfe’s Law Utility of computer networks grows as the number of possible connections: O(N2) The Virtual Observatory The federation of N astronomy archives has utility O(N2), i.e. possibilities for making discoveries The whole is more than the sum of the parts 7/30/2010 Interoperability Challenges 25 Tamás Budavári Metadata standards Data discovery Data requests Data delivery Units Database queries Distributed applications Authentication and authorization 7/30/2010 US National Virtual Observatory 26 Tamás Budavári NVO Research 2002-2007 NSF ITR Program: $10M for 5 years 17 organizations: Astro, CS, IT VAO Facility 2010 NSF $20M for 5 years Operational phase! http://us-vo.org/ 7/30/2010 http://ivoa.net/ 7/30/2010 http://ivoa.net/ 7/30/2010 IVOA Specifications 29 Tamás Budavári 7/30/2010 First Standards 30 Tamás Budavári VOTable Universal container for tables (in XML) First VO standard (from the DTD era) ConeSearch Simple catalog access based on location First VO standard interface (http get) Many implemented them! 7/30/2010 Early Standards 31 Tamás Budavári Simple Image Access Protocol (SIAP) Http request, similar to opening a web page Returns links to the matching images in votable Assumes we know how to deal with FITS images Universal Content Descriptor (UCD) Crystallized set of keywords from literature For data discovery – not queries 7/30/2010 Components 32 Tamás Budavári Discovery Directory, Sky coverage Tables, Catalogs Images, Spectra Events Distributed Storage VOSpace Authentication Distributed Computing Access Messaging Web & Grid services VOStat SAMP, VOPipe User Interfaces Aladin Topcat Mirage, etc… 7/30/2010 33 VO Examples VO Applications and Services NVO Quick Start 34 Tamás Budavári 7/30/2010 Ready, Steady… 35 Tamás Budavári 7/30/2010 DataScope 36 Tamás Budavári Collect info in VO On a particular object Or a part of the sky GRBs, transients, etc. VO plotting tools FITS images Catalog data And more… 7/30/2010 Bandpass Services 37 Tamás Budavári Public repository Web site Search by keyword or eff Extract in various formats Register & submit yours On-the-fly plotting Easy access to all Web services To code against 7/30/2010 Spectrum Services 38 Tamás Budavári Public repository Web site SDSS, 2dF spectra, etc Spatial and SQL search Register & submit yours On-the-fly plotting Building composites De-reddening Line analysis Web services 7/30/2010 Open SkyQuery 39 Tamás Budavári SkyNode interface to archives Implements ADQL returns VOTable Basic node understands “REGION” Full node understands “XMATCH” SkyQuery portal Knows the SkyNodes from Registry Understands federated query http://openskyquery.net/ WESIX 40 Tamás Budavári Web Enabled Source-Identification with Crossmatching Higher level astronomy services built on other existing VO services: SExtractor service and Open SkyQuery Result can be sent to plotting tool for quick inspection. http://nvogre.astro.washington.edu:8080/wesix/ 7/30/2010 VOStat 41 Tamás Budavári Enabling R For VO data 7/30/2010 Sky Coverage 42 Tamás Budavári Discovery Transients: VOEvent 43 Tamás Budavári 7/30/2010 Help! 44 Tamás Budavári 45 VO for Developers Automated tools for analysis Advanced services Web Services 46 Tamás Budavári Simple HTTP requests ConeSearch Simple Image Access Standard SOAP and REST Interoperable across platforms IVOA compliant XML messages Programming toolkits exist 7/30/2010 Command Line: VO-CLI 47 Tamás Budavári VOTool 7/30/2010 Command Line: VO-CLI 48 Tamás Budavári VOTool 7/30/2010 49 Future New features Better integration VOSpace 2.0 50 Tamás Budavári Storage instances soon everywhere Save intermediate data products Arrange for their transfer to other places VOPipe Chain VOSpaces for data flow between services Async execution of custom processing steps 7/30/2010 Summary 51 Tamás Budavári More and Moore data: new opportunities No central data store but at projects On-site processing: CPU + GPU Hierarchical Services Standardized interfaces Data federation New “VxOs” VaO: Virtual Astronomical Observatory VsO, 7/30/2010 Sites to Explore 52 Tamás Budavári 7/30/2010 53 Tamás Budavári 7/30/2010