Algorithms and the Grid Geoffrey Fox Computer Science, Informatics, Physics Pervasive Technology Laboratories Indiana University Bloomington IN 47401 March 18 2005 gcf@indiana.edu http://www.infomall.org 1 Trends in Simulation Research 1990-2000 the HPCC High Performance Computing and Communication Initiative • Established Parallel Computing • Developed wonderful algorithms – especially in partial differential equation and particle dynamics areas • Almost no useful software except for MPI – messaging between parallel computer nodes 1995-now Internet explosion and development of Web Service distributed system model • Replaces CORBA, Java RMI, HLA, COM etc. 2000- now: almost no USA academic work in core simulation • Major projects like ASCI (DoE) and HPCMO (DoD) thrive 2003-? Data Deluge apparent and Grid links Internet and HPCC with focus on data-simulation integration 2 e-Business e-Science and the Grid e-Business captures an emerging view of corporations as dynamic virtual organizations linking employees, customers and stakeholders across the world. e-Science is the similar vision for scientific research with international participation in large accelerators, satellites or distributed gene analyses. The Grid or CyberInfrastructure integrates the best of the Web, Agents, traditional enterprise software, high performance computing and Peer-to-peer systems to provide the information technology e-infrastructure for e-moreorlessanything. DATA ADVANCED ,ANALYSIS ACQUISITION VISUALIZATION A deluge of data of unprecedented and inevitable size must be managed and understood. People, computers, data and instruments must be linked. On demand assignment of experts, computers, networks and COMPUTATIONAL RESOURCES storage resources must be supported IMAGING INSTRUMENTS LARGE-SCALE DATABASES QuickTime™ and a decompressor are needed to see this picture. 3 Some Important Styles of Grids Computational Grids were origin of concepts and link computers across the globe – high latency stops this from being used as parallel machine Knowledge and Information Grids link sensors and information repositories as in Virtual Observatories or BioInformatics • More detail on next slide Collaborative Grids link multidisciplinary researchers across laboratories and universities Community Grids focus on Grids involving large numbers of peers rather than focusing on linking major resources – links Grid and Peer-to-peer network concepts Semantic Grid links Grid, and AI community with Semantic web (ontology/meta-data enriched resources) and Agent concepts Grid Service Farms supply services-on-demand as in collaboration, GIS support, Image processing filter 4 Information/Knowledge Grids Distributed (10’s to 1000’s) of data sources (instruments, file systems, curated databases …) Data Deluge: 1 (now) to 100’s petabytes/year (2012) • Moore’s law for Sensors Possible filters assigned dynamically (on-demand) • Run image processing algorithm on telescope image • Run Gene sequencing algorithm on compiled data Needs decision support front end with “what-if” simulations Metadata (provenance) critical to annotate data Integrate across experiments as in multi-wavelength astronomy Data Deluge comes from pixels/year available 5 Virtual Observatory Astronomy Grid Integrate Experiments Radio Far-Infrared Visible Dust Map Visible + X-ray Galaxy Density Map6 e-Business and (Virtual) Organizations Enterprise Grid supports information system for an organization; includes “university computer center”, “(digital) library”, sales, marketing, manufacturing … Outsourcing Grid links different parts of an enterprise together Manufacturing plants with designers • Animators with electronic game or film designers and producers • Coaches with aspiring players (e-NCAA or e-NFL etc.) • Outsourcing will become easier …….. Customer Grid links businesses and their customers as in many web sites such as amazon.com e-Multimedia can use secure peer-to-peer Grids to link creators, distributors and consumers of digital music, games and films respecting rights Distance education Grid links teacher at one place, students all over the place, mentors and graders; shared curriculum, homework, live classes … 7 DAME In flight data ~5000 engines ~ Gigabyte per aircraft per Engine per transatlantic flight Airline Global Network Such as SITA Ground Station Engine Health (Data) Center Maintenance Centre Internet, e-mail, pager Rolls Royce and UK e-Science Program Distributed Aircraft Maintenance Environment 8 NASA Aerospace Engineering Grid Wing Models •Lift Capabilities •Drag Capabilities •Responsiveness Airframe Models Stabilizer Models •Deflection capabilities •Responsiveness Crew Capabilities - accuracy - perception - stamina - re-action times - SOP’s Human Models Engine Models •Braking performance •Steering capabilities •Traction •Dampening capabilities Landing Gear Models •Thrust performance •Reverse Thrust performance •Responsiveness •Fuel Consumption simulations are produced by coupling ItWhole takes asystem distributed virtual organization to design, simulate andall build a complex system simulations like an aircraft of the sub-system 9 e-Defense and e-Crisis Grids support Command and Control and provide Global Situational Awareness • Link commanders and frontline troops to themselves and to archival and real-time data; link to what-if simulations • Dynamic heterogeneous wired and wireless networks • Security and fault tolerance essential System of Systems; Grid of Grids • The command and information infrastructure of each ship is a Grid; each fleet is linked together by a Grid; the President is informed by and informs the national defense Grid • Grids must be heterogeneous and federated Crisis Management and Response enabled by a Grid linking sensors, disaster managers, and first responders with decision support Define and Build DoD relevant Services – Collaboration, Sensors, GIS, Database etc. 10 Analysis and Visualization ADVANCED VISUALIZATION ,ANALYSIS QuickTime™ and a decompressor are needed to see this picture. Large Disks Old Style Metacomputing Grid COMPUTATIONAL RESOURCES LARGE-SCALE DATABASES Large Scale Parallel Computers Spread a single large Problem over multiple supercomputers 11 Classes of Computing Grid Applications Running “Pleasing Parallel Jobs” as in United Devices, Entropia (Desktop Grid) “cycle stealing systems” Can be managed (“inside” the enterprise as in Condor) or more informal (as in SETI@Home) Computing-on-demand in Industry where jobs spawned are perhaps very large (SAP, Oracle …) Support distributed file systems as in Legion (Avaki), Globus with (web-enhanced) UNIX programming paradigm • Particle Physics will run some 30,000 simultaneous jobs this way Pipelined applications linking data/instruments, compute, visualization Seamless Access where Grid portals allow one to choose one of multiple resources with a common interfaces 12 What is Happening? Grid ideas are being developed in (at least) two communities • Web Service – W3C, OASIS • Grid Forum (High Performance Computing, e-Science) • Open Middleware Infrastructure Institute OMII currently only in UK but maybe spreads to EU and USA Service Standards are being debated Grid Operational Infrastructure is being deployed Grid Architecture and core software being developed Particular System Services are being developed “centrally” – OGSA framework for this in Lots of fields are setting domain specific standards and building domain specific services Grids are viewed differently in different areas • Largely “computing-on-demand” in industry (IBM, Oracle, HP, Sun) • Largely distributed collaboratories in academia 13 A typical Web Service In principle, services can be in any language (Fortran .. Java .. Perl .. Python) and the interfaces can be method calls, Java RMI Messages, CGI Web invocations, totally compiled away (inlining) The simplest implementations involve XML messages (SOAP) and programs written in net friendly languages like Java and Python Web Services WSDL interfaces Portal Service Security WSDL interfaces Web Services Payment Credit Card Catalog Warehouse Shipping control 14 Services and Distributed Objects A web service is a computer program running on either the local or remote machine with a set of well defined interfaces (ports) specified in XML (WSDL) Web Services (WS) have many similarities with Distributed Object (DO) technology but there are some (important) technical and religious points (not easy to distinguish) • CORBA Java COM are typical DO technologies • Agents are typically SOA (Service Oriented Architecture) Both involve distributed entities but Web Services are more loosely coupled • WS interact with messages; DO with RPC (Remote Procedure Call) • DO have “factories”; WS manage instances internally and interactionspecific state not exposed and hence need not be managed • DO have explicit state (statefull services); WS use context in the messages to link interactions (statefull interactions) Claim: DO’s do NOT scale; WS build on experience (with CORBA) and do scale 15 Grid impact on Algorithms I Your favorite parallel algorithm will often run untouched on a Grid node linked to other simulations using traditional algorithms Algorithms tolerant of high latency Algorithms for new applications enabled by the Grid Data assimilation for data-deluged science generalizing data mining • Where and how to process data • Incorporation of data in simulation Complex Systems algorithms for non traditional simulations as in biology, social systems • Cellular automata 16 Grid impact on Algorithms II MPI software model not suited for Grid; use SOAP and publish/subscribe • Microseconds and milliseconds Latency Grid workflow needs “integration algorithms” • Multidisciplinary algorithms for loose code coupling • Workflow scheduling algorithms (data oriented) • Data caching algorithms Algorithms like distributed hash tables for distributed storage and look up of data Algorithms for Grid security • Efficient support of group keys for multicast • Detection of Denial of Service attacks Much better software available for building toolkits and Problem Solving Environments i.e. for using algorithms 17 Data Deluged Science In the past, we worried about data in the form of parallel I/O or MPI-IO, but we didn’t consider it as an enabler of new algorithms and new ways of computing Data assimilation was not central to HPCC DoE ASCI set up because didn’t want test data! Now particle physics will get 100 petabytes from CERN • Nuclear physics (Jefferson Lab) in same situation • Use around 30,000 CPU’s simultaneously 24X7 Weather, climate, solid earth (EarthScope) Bioinformatics curated databases (Biocomplexity only 1000’s of data points at present) Virtual Observatory and SkyServer in Astronomy Environmental Sensor nets 18 Weather Requirements 19 Data Deluged Science Computing Paradigm Data Assimilation Information Simulation Informatics Model Ideas Computational Science Datamining Reasoning USArray Seismic Sensors 21 a Site-specific Irregular Scalar Measurements Ice Sheets Constellations for Plate Boundary-Scale Vector Measurements a a Volcanoes PBO Greenland Long Valley, CA Topography 1 km Stress Change Northridge, CA Earthquakes Hector Mine, CA 22 Repositories Federated Databases Database Sensors Streaming Data Field Trip Data Database Research SERVOGrid Data Filter Services Research Simulations Geoscience Research and Education Grids Customization Services From Research to Education ? Discovery Services Education Analysis and Visualization Portal Education Grid Computer 23 Farm SERVOGrid Requirements Seamless Access to Data repositories and large scale computers Integration of multiple data sources including sensors, databases, file systems with analysis system • Including filtered OGSA-DAI (Grid database access) Rich meta-data generation and access with SERVOGrid specific Schema extending openGIS (Geography as a Web service) standards and using Semantic Grid Portals with component model for user interfaces and web control of all capabilities Collaboration to support world-wide work Basic Grid tools: workflow and notification NOT metacomputing 24 OGSA-DAI Grid Services Grid Grid Data Assimilation HPC Simulation Analysis Control Visualize Data Deluged Science Computing Architecture Distributed Filters massage data For simulation 25 Data Assimilation Data assimilation implies one is solving some optimization problem which might have Kalman Filter like structure Nobs min Theoretical Unknowns 2 Data ( position , time ) Simulated _ Value Error i i 2 i 1 Due to data deluge, one will become more and more dominated by the data (Nobs much larger than number of simulation points). Natural approach is to form for each local (position, time) patch the “important” data combinations so that optimization doesn’t waste time on large error or insensitive data. Data reduction done in natural distributed fashion NOT on HPC machine as distributed computing most cost effective if calculations essentially independent • Filter functions must be transmitted from HPC machine 26 Distributed Filtering Nobslocal patch >> Nfilteredlocal patch ≈ Number_of_Unknownslocal patch In simplest approach, filtered data gotten by linear transformations on original data based on Singular Value Decomposition of Least squares matrix Send needed Filter Receive filtered data Nobslocal patch 1 Nfilteredlocal patch 1 Geographically Distributed Sensor patches Nobslocal patch 2 Factorize Matrix to product of local patches Nfilteredlocal patch 2 Distributed Machine HPC Machine 27 Non Traditional Applications: Critical Infrastructure Simulations These include electrical/gas/water grids and Internet, transportation, cell/wired phone dynamics. One has some “classic SPICE style” network simulations in area like power grid (although load and infrastructure data incomplete) • 6000 to 17000 generators • 50000 to 140000 transmission lines • 40000 to 100000 substations Need algorithms both for simulating infrastructures but also to link them 28 Non Traditional Applications: Critical Infrastructure Simulations Activity data for people/institutions essential for detailed dynamics but again these are not “classic” data but need to be “fitted” in data assimilation style in terms of some assumed lower level model. • They tell you goals of people but not their low level movement Disease and Internet virus spread and social network simulations can be built on dynamics coming from infrastructure simulations • Many results like “small world” internet connection structure are qualitative and unclear if they can be extended to detailed simulations • A lot of interest in (regulatory) networks in Biology 29 (Non) Traditional Structure 1) Traditional: Known equations plus boundary values 2) Data assimilation: somewhat uncertain initial conditions and approximations corrected by data assimilation 3) Data deluged Science: Phenomenological degrees of freedom swimming in a sea of data Known Data Known Equations on Agreed DoF Prediction Phenomenological Degrees of Freedom Swimming in a Sea of Data 30 Some Questions for Non Traditional Applications No systematic study of how best to represent data deluged sciences without known equations Obviously data assimilation very relevant Role of Cellular Automata (CA) and refinements like the New Kind of Science by Wolfram • Can CA or Potts model parameterize any system? Relationship to back propagation and other neural network representations Relationship to “just” interpolating data and then extrapolating a little Role of Uncertainty Analysis – everything (equations, model, data) is uncertain! Relationship of data mining and simulation A new trade-off: How to split funds between sensors and simulation engines 31 When is a High Performance Computer? We might wish to consider three classes of multi-node computers 1) Classic MPP with microsecond latency and scalable internode bandwidth (tcomm/tcalc ~ 10 or so) 2) Classic Cluster which can vary from configurations like 1) to 3) but typically have millisecond latency and modest bandwidth 3) Classic Grid or distributed systems of computers around the network • Latencies of inter-node communication – 100’s of milliseconds but can have good bandwidth All have same peak CPU performance but synchronization costs increase as one goes from 1) to 3) Cost of system (dollars per gigaflop) decreases by factors of 2 at each step from 1) to 2) to 3) One should NOT use classic MPP if class 2) or 3) suffices unless some security or data issues dominates over cost-performance One should not use a Grid as a true parallel computer – it can link parallel computers together for convenient access etc. 32 Building PSE’s with the Rule of the Millisecond I Typical Web Services are used in situations with interaction delays (network transit) of 100’s of milliseconds But basic message-based interaction architecture only incurs fraction of a millisecond delay Thus use Web Services to build ALL PSE components • Use messages and NOT method/subroutine call or RPC Interaction Nugget1 Nugget3 Nugget2 Nugget4 Data 33 Building PSE’s with the Rule of the Millisecond II Messaging has several advantages over scripting languages • Collaboration trivial by sharing messages • Software Engineering due to greater modularity • Web Services do/will have wonderful support “Loose” Application coupling uses workflow technologies Find characteristic interaction time (millisecond programs; microseconds MPI and particle) and use best supported architecture at this level • Two levels: Web Service (Grid) and C/C++/C#/Fortran/Java/Python Major difficulty in frameworks is NOT building them but rather in supporting them • IMHO only hope is to always minimize life-cycle support risks • Simulation/science is too small a field to support much! Expect to use DIFFERENT technologies at each level even though possible to do everything with one technology 34 • Trade off support versus performance/customization Requirements for MPI Messaging tcalc tcomm tcalc MPI and SOAP Messaging both send data from a source to a destination • MPI supports multicast (broadcast) communication; • MPI specifies destination and a context (in comm parameter) • MPI specifies data to send • MPI has a tag to allow flexibility in processing in source processor • MPI has calls to understand context (number of processors etc.) MPI requires very low latency and high bandwidth so that tcomm/tcalc is at most 10 • BlueGene/L has bandwidth between 0.25 and 3 Gigabytes/sec/node and latency of about 5 microseconds • Latency determined so Message Size/Bandwidth > Latency 35 Requirements for SOAP Messaging Web Services has much of the same requirements as MPI with two differences where MPI more stringent than SOAP • Latencies are inevitably 1 (local) to 100 milliseconds which is 200 to 20,000 times that of BlueGene/L 1) 0.000001 ms – CPU does a calculation 2) 0.001 to 0.01 ms – MPI latency 3) 1 to 10 ms – wake-up a thread or process 4) 10 to 1000 ms – Internet delay • Bandwidths for many business applications are low as one just needs to send enough information for ATM and Bank to define transactions SOAP has MUCH greater flexibility in areas like security, faulttolerance, “virtualizing addressing” because one can run a lot of software in 100 milliseconds • Typically takes 1-3 milliseconds to gobble up a modest message in Java and “add value” 36 Structure of SOAP SOAP defines a very obvious message structure with a header and a body just like email The header contains information used by the “Internet operating system” • Destination, Source, Routing, Context, Sequence Number … The message body is partly further information used by the operating system and partly information for application when it is not looked at by “operating system” except to encrypt, compress it etc. • Note WS-Security supports separate encryption for different parts of a document Much discussion in field revolves around what is referenced in header This structure makes it possible to define VERY Sophisticated messaging 37 MPI and SOAP Integration Note SOAP Specifies format and through WSDL interfaces MPI only specifies interface and so interoperability between different MPIs requires additional work • IMPI http://impi.nist.gov/IMPI/ Pervasive networks can support high bandwidth (Terabits/sec soon) but latency issue is not resolvable in general way Can combine MPI interfaces with SOAP messaging but I don’t think this has been done Just as walking, cars, planes, phones coexist with different properties; so SOAP and MPI are both good and should be used where appropriate 38 NaradaBrokering http://www.naradabrokering.org We have built a messaging system that is designed to support traditional Web Services but has an architecture that allows it to support high performance data transport as required for Scientific applications • We suggest using this system whenever your application can tolerate 1-10 millisecond latency in linking components • Use MPI when you need much lower latency Use SOAP approach when MPI interfaces required but latency high • As in linking two parallel applications at remote sites Technically it forms an overlay network supporting in software features often done at IP Level 39 Transit Delay (Milliseconds) Mean transit delay for message samples in NaradaBrokering: Different communication hops 9 8 7 6 5 4 3 2 1 0 hop-2 hop-3 hop-5 hop-7 100 1000 Message Payload Size (Bytes) Pentium-3, 1GHz, 256 MB RAM 100 Mbps LAN 40 JRE 1.3 Linux Standard Deviation for message samples in NaradaBrokering Different communication hops - Internal Machines 0.8 hop-2 hop-3 hop-5 hop-7 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 1000 1500 2000 2500 3000 3500 Message Payload Size (Bytes) 4000 4500 5000 41 Fast Web Service Communication I • Internet Messaging systems allow one to optimize message streams at the cost of “startup time”, • Web Services can deliver the fastest possible interconnections with or without reliable messaging • Typical results from Grossman (UIC) comparing Slow SOAP over TCP with binary and UDP transport (latter gains a factor of 1000) Record Count SOAP/XML Pure SOAP WS-DMX/ASCII SOAP over UDP WS-DMX/Binary Binary over UDP MB µ σ/µ MB µ σ/µ MB µ σ/µ 10000 50000 150000 375000 1000000 5000000 0.93 4.65 13.9 34.9 93 465 2.04 8.21 26.4 75.4 278 7020 7020 6.45% 1.57% 0.30% 0.25% 0.11% 2.23% 0.5 2.4 7.2 18 48 242 1.47 1.79 2.09 3.08 3.88 8.45 0.61% 0.50% 0.62% 0.29% 1.73% 6.92% 0.28 1.4 4.2 10.5 28 140 1.45 1.63 1.94 2.11 3.32 5.60 5.60 0.38% 0.27% 0.85% 1.11% 0.25% 42 8.12% Fast Web Service Communication II • Mechanism only works for streams – sets of related messages • SOAP header in streams is constant except for sequence number (Message ID), time-stamp .. • One needs two types of new Web Service Specification • “WS-StreamNegotiation” to define how one can use WS-Policy to send messages at start of a stream to define the methodology for treating remaining messages in stream • “WS-FlexibleRepresentation” to define new encodings of messages 43 Fast Web Service Communication III • Then use “WS-StreamNegotiation” to negotiate stream in Tortoise SOAP – ASCII XML over HTTP and TCP – – Deposit basic SOAP header through connection – it is part of context for stream (linking of 2 services) – Agree on firewall penetration, reliability mechanism, binary representation and fast transport protocol – Naturally transport UDP plus WS-RM • Use “WS-FlexibleRepresentation” to define encoding of a Fast transport (On a different port) with messages just having “FlexibleRepresentationContextToken”, Sequence Number, Time stamp if needed – RTP packets have essentially this structure – Could add stream termination status • Can monitor and control with original negotiation stream 44 • Can generate different streams optimized for different end-points Role of Workflow Service-1 Service-3 Service-2 Programming SOAP and Web Services (the Grid): Workflow describes linkage between services As distributed, linkage must be by messages Linkage is two-way and has both control and data Apply to multi-disciplinary, multi-scale linkage, multi-program linkage, link visualization to simulation, GIS to simulations and visualization filters to each other Microsoft-IBM specification BPEL is current preferred Web Service XML specification of workflow 45 Example workflow Here a sensor feeds a datamining application (We are extending datamining in DoD applications with Grossman from UIC) The data-mining application drives a visualization 46 Example Flood Simulation workflow Data Archives Runoff Model Flow Model Data Archives GIS Grid Services Link Distributed Data and Applications SOAP Messages And Events Runoff Model Flow Model Flow Model 47 SERVOGrid Codes, Relationships Elastic Dislocation Inversion Viscoelastic FEM Viscoelastic Layered BEM Elastic Dislocation Pattern Recognizers Fault Model BEM 48 This linkage called Workflow in Grid/Web Service parlance Two-level Programming I • The Web Service (Grid) paradigm implicitly assumes a two-level Programming Model • We make a Service (same as a “distributed object” or “computer program” running on a remote computer) using conventional technologies – C++ Java or Fortran Monte Carlo module – Data streaming from a sensor or Satellite – Specialized (JDBC) database access • Such services accept and produce data from users files and databases Service Data • The Grid is built by coordinating such services assuming we have solved problem of programming the service 49 Two-level Programming II The Grid is discussing the composition of distributed services with the runtime Service1 Service2 interfaces to Grid as opposed to UNIX Service3 Service4 pipes/data streams Familiar from use of UNIX Shell, PERL or Python scripts to produce real applications from core programs Such interpretative environments are the single processor analog of Grid Programming Some projects like GrADS from Rice University are looking at integration between service and composition levels but dominant effort looks at each level separately 50 3 Layer Programming Model Application (level 1 Programming) Application Semantics (Metadata, Ontology) Level 2 “Programming” MPI Fortran C++ etc. Semantic Web Basic Web Service Infrastructure Web Service 1 WS 2 WS 3 WS 4 Workflow (level 3) Programming BPEL Workflow will be built on top of NaradaBrokering as messaging layer 51 Conclusions Grids are inevitable and pervasive Can expect Web Services and Grids to merge with a common set of general principles but different implementations with different scaling and functionality trade-offs We will be flooded with data, information and purported knowledge Develop algorithms that exploit and support the data deluge Software infrastructure for building tools getting much better Use MPI where its appropriate 52