GALT at NeSI Data-driven Demands for Better Languages Prof. Malcolm Atkinson Director www.nesc.ac.uk 16th October 2003 Outline What are the Common Factors for HL WFLs? Uniform to Rich Type Transitions Mobile Computation needs Safety Dynamic re-factoring to optimise enactment Sloan Digital Sky Survey Production System Slide from Ian Foster’s ssdbm 03 keynote Global Knowledge Communities Often Driven by Data: E.g., Astronomy No. & sizes of data sets as of mid-2002, grouped by wavelength • 12 waveband coverage of large areas of the sky • Total about 200 TB data • Doubling every 12 months • Largest catalogues near 1B objects Data and images courtesy Alex Szalay, John Hopkins Architecture of Service Interaction • Packaging to avoid round trips • Unit for data movement services to handle C L I E N T A P I R E Q U E S T O R S T U B 1 Data Set dr Data Set 2 Architecture of Service Interaction C L I E N T A P Ident I Type Value R E Q U E S T O R 1 S T Ident U Type B Value Data Set dr 2 Data Set Ident Type Value Ident Type Value Ident Type Value Ident Type Value Ident Type Value Ident Type Value Architecture of Service Interaction Request R PerformRequestDocument.xsd C 1 E Data Set <performRequest> Q L … U I E </performRequest> E N T A P Ident I Type Value S T O R S T Ident U Type B Value dr 2 Data Set Ident Type Value Ident Type Value Ident Type Value Ident Type Value Ident Type Value Ident Type Value Architecture of Service Interaction TableOfTargetGalaxies R WebRowSet.xsd C 1 E <table> Q L … U I E </table> E N T A P Ident I Type Value S T O R S T Ident U Type B Value Data Set dr 2 Data Set Ident Type Value Ident Type Value Ident Type Value Ident Type Value Ident Type Value Ident Type Value Architecture (2) 2 Data Set C L I E N T A P I R E Q U E S T O R S T U B 1 Data Set dr 3 Data Set Data Set C L I E N T C O N S U M E R A P I S T U B 4 Tera → Peta Bytes RAM time to move 15 minutes 1Gb WAN move time 10 hours ($1000) Disk Cost 7 disks = $5000 (SCSI) Disk Power 100 Watts Disk Weight 5.6 Kg Disk Footprint RAM time to move 2 months 1Gb WAN move time 14 months ($1 million) Disk Cost 6800 Disks + 490 units + 32 racks = $7 million Disk Power 100 Kilowatts Disk Weight 33 Tonnes Disk Footprint Inside machine May 2003 Approximately 60 Correct m2 See also Distributed Computing Economics Jim Gray, Microsoft Research, MSR-TR-2003-24 Data Access & Integration Services 1a. Request to Registry for sources of data about “x” SOAP/HTTP Registry 1b. Registry responds with Factory handle service creation API interactions 2a. Request to Factory for access to database Factory Client 2c. Factory returns handle of GDS to client 3a. Client queries GDS with XPath, SQL, etc 3c. Results of query returned to client as XML 2b. Factory creates GridDataService to manage access Grid Data Service XML / Relationa l database 3b. GDS interacts with database Future DAI Services 1a. Request to Registry for sources of data about “x” & “y” 1b. Registry responds with Factory handle Data Registry SOAP/HTTP service creation API interactions 2a. Request to Factory for access and integration from resources Sx and Sy Data Access & Integration master 2c. Factory returns handle of GDS to client 3b. Client Problem tells“scientific” Solving analyst Client Application Environment coding scientific insights Analyst 2b. Factory creates Semantic GridDataServices network Meta data 3a. Client submits sequence of scripts each has a set of queries to GDS with XPath, SQL, etc GDTS1 GDS GDTS XML database GDS2 Sx 3c. Sequences of result sets returned to analyst as formatted binary described in a standard XML notation Application Code GDS GDS1 Sy GDS3 GDS GDTS2 GDTS Relational database Take Home Message Information Grids How do we describe components / services / data? Economic generation of those descriptions Reliability and at the right level How do we describe data & compute processes? Characterising code behaviour Characterising message content Safe assembly of services and data operations How do we provide Integrated Enactment? Safe code movement Optimisation Plenty of Challenges Face “dirty complexity”; deliver performance & safety