Grid Challenges It’s the vision, stupid …but it NEEDS TO be followed by operational standards based on real applications… The Global Grid Forum 25 June 2003 Gordon Bell Microsoft Corporation A quick look at some past visions and a challenge NREN >> Internet WWW Challenge: Will match any Grid enabled application that wins a Gordon Bell Prize for parallelism FCCSET NREN Plan 11/1987 10G1G- 3G a factor of 1000 makes a difference 45 M 100M10M- 1.5 M Optical Phase 2 1M100K10K- Phase 1 56K 1988 1990 1992 1994 1996 1998 2000 3 Originating Bandwidth (Gb/s) U.S. Interstate Comm. traffic L Roberts ’92 10,000- ARPAnet Goals c1972 = Grid Goals Video Conf. 1,000- Voice 100- Video on Demand NSF bb• 10FAX Email Broadcast TV 1- |1990 | |2000 | |2010 | |20204 Growth in hype vs reality WWW books, Infoway newspapers regulation Infoway speculation “how great it’ll be” (politicians , telecoms & futurists) Infoway addiction conferences lawsuits c 1995 Data from Gordon’s WAG 5 Articles per newspaper versus orders per second sent via Internet orders per second articles per newspaper c 1995 Data from Gordon’s WAG 6 Articles about security, privacy, & fraud versus commerce ($M) actual commerce articles about risk and NOT doing commerce organized crime on Internet c 1995 Data from Gordon’s WAG 7 The virtuous cycle of bandwidth supply and demand Increased Demand Standards IP Create new service Telnet & FTP EMAIL Increase Capacity (circuits & bw) Lower response time WWW Audio Video Voice! Grids Video Conf. FTP Web Svcs 8 For More Information Grid Book c1998 from 1996 - www.mkp.com/grids The Globus Project™ - www.globus.org OGSA - www.globus.org/ogsa Global Grid Forum 651 pp. 22 chapters, 41 authors - www.gridforum.org Grid Computing 2003 1080 pages 43 chapters, O(100) authors From Carl Kesselman, ISI Progress...a review Grid started out with great promise…c1998 Interesting use at NASA for coupled programs NMI (National Middleware Infrastructure) …State_Tools.gov, funded by NSF.gov clearly open, clearly not “free” not IETF model Tools vs. standards & evolving working code Some examples: C1980: Seti@home, folding@home, >> Napster p2p 2001 15 TB Terraserver > Terraservice w/Web Services 2003 Alex Szelay & Jim Gray: Skyserver/skyservice Cornell Theory Center Web Services based apps NEES—good poster child. An XML task GRADs and Teragrid… dream or research or just $$s? TerraServer c1998: The “Whole Earth” Database TerraServer Experience c2001 Successful Web Site To the rescue! 50,000 daily users satisfied with “human-accessible” data 59 GB imagery transmitted daily New Feature Requests Programmable access to meta-data User selectable image sizes, i.e. “a map server” Permission to use TerraServer data within server applications .NET TerraService Architecture Standard Browsers Map UI Web Forms Existing DB Server Map Server Http Handler Smart Clients Windows Forms .NET Framework 668 m Rows SQL 2000 TerraServer Web Service 1.0 TB Db SQL 2000 1.0 TB Db ADO.NET OLEDB SQL 2000 1.0 TB Db Data Intensive Science: the Next Frontier The W.M. Keck Fellows in Advanced Scientific Data Analysis Alex Szalay The Johns Hopkins University Department of Physics and Astronomy National Virtual Observatory NSF ITR project, “Building the Framework for the National Virtual Observatory” is a collaboration of 17 funded and 3 unfunded organizations Astronomy data centers National observatories Supercomputer centers University departments Computer science/information technology specialists PI and project director: Alex Szalay (JHU) CoPI: Roy Williams (Caltech/CACR) Scientific Data Exploration 1. Thousand years ago: science was empirical describing natural phenomena 2. Last few hundred years: theoretical branch using models, generalizations 3. Last few decades: a computational branch simulating complex phenomena 4. Today: data exploration is emerging synthesizing theory, experiment and computation with advanced data management and statistics Living in an Exponential World Astronomers have a few hundred TB now 1 pixel (byte) / sq arc second ~ 4TB Multi-spectral, temporal, … → 1PB 1000 They mine it looking for new (kinds of) objects, more of interesting ones (quasars), density variations in 400-D space, correlations in 400-D space 100 10 1 0.1 1970 1975 Data doubles every year Data is public after 1 year So, 50% of the data is public Same trend appears in all sciences 1980 1985 1990 1995 2000 CCDs Glass Why Is Astronomy Special? IRAS 25m It has no commercial value No privacy concerns, freely share results with others Great for experimenting with algorithms It is real and well documented High-dimensional (with confidence intervals) Spatial, temporal Diverse and distributed Many different instruments from many different places and many different times The questions are interesting There is a lot of it (soon petabytes) GB: It is not over-funded aka it’s poor ROSAT ~keV 2MASS 2m DSS Optical IRAS 100m WENSS 92cm NVSS 20cm GB 6cm Making Discoveries When and where are discoveries made? Always at the edges and boundaries Going deeper, collecting more data, using more colors…. Metcalfe’s law Utility of computer networks grows as the number of possible connections: O(N2) VO: Federation of N archives Possibilities for new discoveries grow as O(N2) Current sky surveys have proven this Very early discoveries from SDSS, 2MASS, DPOSS What can be learned from Sky Server? It’s about data, not about harvesting flops 1-2 hr. query programs versus 1 wk programs based on grep 10 minute runs versus 3 day compute & searches Database viewpoint. 100x speed-ups Avoid costly re-computation and searches Use indices and PARALLEL I/O. Read / Write >>1. Parallelism is automatic, transparent, and just depends on the number of computers/disks. Limited experience and talent to use dbases. Soon: The Virtual Observatory Many new surveys are coming SDSS is a dry run for the next ones LSST will be 5TB/night All the data will be on the Internet ftp, web services… Data and applications will be associated with the instruments Distributed world wide, cross-indexed Federation is a must Will be the best telescope in the world World Wide Telescope Finds the “needle in the haystack” Successful demonstrations in Jan’03 Emerging Concepts Standardizing distributed data access Web Services, supported on all platforms XML: Extensible Markup Language SOAP: Simple Object Access Protocol WSDL: Web Services Description Language Standardizing distributed computing Grid Services Custom configure remote computing dynamically Build your own remote computer, and discard Virtual Data: new data sets on demand Both needed for Data Exploration Computational Science Simulations based on Web Services Gerd Heber Cornell Theory Center heber@tc.cornell.edu Three Flavors of Adaptivity Application-level Algorithm-level Mathematical model High/low confidence Discretization method Solution technique System-level Resource availability Fault tolerance International Conference on Computational Science 2003 The Problem Do distributed, coupled and adaptive multi-physics simulations of Mechanics of chemically-reacting flows (Damage) Thermo-Mechanics of solids Components provided as Web Services International Conference on Computational Science 2003 International Conference on Computational Science 2003 Geography Cornell University Theory Center Department of Computer Science Department of Civil Engineering University of Alabama Mississippi State University College of William and Mary International Conference on Computational Science 2003 Workflow International Conference on Computational Science 2003 Components MiniCAD Meshers Surface (Delaunay, quality guarantees) Volume (Dmesh, Jmesh, Gmesh) Fluid/Thermal simulation (Loci, CHEM) Thermo-mechanical component (CPTC) Fracture mechanics Visualization (OpenDX + SQL Server) International Conference on Computational Science 2003 Web Services “Web Services are self-contained, modular applications that can be described, published, located, and invoked over a network, …” (IBM) Service oriented architecture: Publish, find, bind XML, SOAP, UDDI, WSDL International Conference on Computational Science 2003 Features and Requirements Distributed expertise Platform and language neutrality No porting Network accessibility (“firewall compliant”) Security Industry standards Metadata State Students shouldn’t waste too much time with coding! International Conference on Computational Science 2003 GrADS Vision • Build a National Problem-Solving System on the Grid • Software Support for Application Development on Grids • Challenges: — Transparent to the user, who sees a problem-solving system — Goal: Design and build programming systems for the Grid that broaden the community of users who can develop and run applications in this complex environment — Presenting a high-level application development interface* – If programming is hard, the Grid will not not reach its potential — Designing and constructing applications for adaptability — Late mapping of applications to Grid resources — Monitoring and control of performance – When should the application be interrupted and remapped? *GB note: This is a superset of the previously unsolved clusters programming problem! GrADSoft Architecture Performance Feedback Software Components Source Application WholeProgram Compiler Performance Problem Configurable Object Program Resource Negotiator Real-time Performance Monitor Negotiation Scheduler Binder Libraries Grid Runtime System Network for Earthquake Eng. Simulation NEESgrid: US national infrastructure to couple earthquake engineers with experimental facilities, databases, computers, & each other On-demand access to experiments, data streams, computing, archives, collaboration Carl Kesselman, ISI NEESgrid: Argonne, Michigan, NCSA, UIUC, USC From www.neesgrid.org A Universal Architecture for Web Services… Microsoft Vision Security Reliable Messaging “Scales Up” Transactions on large Routing systems … “Scales In” on a machine “Scales Down” to devices “Scales Away” spans organizations & geographies “Scales Out” by adding machines Messaging Infrastructure Distributed applications Vertical processes Embedded systems Network equipment … 39 Web Services: Level I Foundation to Build Upon Basic profile Defined by WS-I XML, SOAP, WSDL, UDDI Broad vendor support WS-I assures widespread compatibility Level II Secure, Reliable, Transacted Connected Applications Secure Reliable Messaging XML Transports … Transacted Metadata Management Business Process Level III From Infrastructure to Solutions Application schemas Domain specific profiles Vertical industry services Vison: Community/Data-Centric Computing Versus Machine-Centered Centers Goal: Enable technical communities to create and take responsibility for their own computing environments of personal, data, and program collaboration and distribution Design based on technology and cost, e.g. networking, apps programs maintenance, databases, and providing 24x7 web and other services Many alternative styles and locations are possible Service from existing centers, including many state centers Software vendors could be encouraged to supply apps web services NCAR style center based on shared data and apps Instrument- and model-based databases. Both central & distributed when multiple viewpoints create the whole. Wholly distributed services supplied by many individual groups Community/Data Centric: “web service” Community is responsible Planned & budget as resources Responsible for its infrastructure Apps are from community Computing is integral to work In sync with technologies 1-3 Tflops/$M; 1-3 PBytes/$M to buy smallish Tflops & PBytes. New scalables are “centers” Community can afford and evolve Dedicated to a community Program, data & database centric May be aligned with instruments or other community activities Output = web service; Can communities form that can supply services? Commitment to standards A general architecture comes much from understanding the problems Understanding the problems comes from actually solving such problems This is bottom-up, based on experience Microsoft is committed to develop community-wide web services standards… Is the Grid Forum equally committed? The End How can GRIDs become a real, useful, computer structure? Get a life. Use the standards and tools. Adopt an application and/or community…now!