Welcome to CW 2008!!! www.cs.wisc.edu/~miron The Condor Project (Established ‘85) Distributed Computing research performed by a team of ~35 faculty, full time staff and students who face software/middleware engineering challenges in a UNIX/Linux/Windows/OS X environment, involved in national and international collaborations, interact with users in academia and industry, maintain and support a distributed production environment (more than 4000 CPUs at UW), and educate and train students. www.cs.wisc.edu/~miron “ … Since the early days of mankind the primary motivation for the establishment of communities has been the idea that by being part of an organized group the capabilities of an individual are improved. The great progress in the area of inter-computer communication led to the development of means by which stand-alone processing subsystems can be integrated into multicomputer ‘communities’. … “ Miron Livny, “ Study of Load Balancing Algorithms for Decentralized Distributed Processing Systems.”, Ph.D thesis, July 1983. www.cs.wisc.edu/~miron Main Threads of Activities › Distributed Computing Research – develop and › › › › evaluate new concepts, frameworks and technologies Keep Condor “flight worthy” and support our users The Open Science Grid (OSG) – build and operate a national High Throughput Computing infrastructure The Grid Laboratory Of Wisconsin (GLOW) – build, maintain and operate a distributed computing and storage infrastructure on the UW campus. The NSF Middleware Initiative - Develop, build and operate a national Build and Test facility powered by Metronome www.cs.wisc.edu/~miron Future of Grid Computing Miron Livny Computer Sciences Department University of Wisconsin-Madison miron@cs.wisc.edu The Tulmod says in the name of Rabbi Yochanan, “Since the destruction of the Temple, prophecy has been taken from prophets and given to fools and children.” (Baba Batra 12b) www.cs.wisc.edu/~miron The Grid Computing Movement I believe that as a movement grid computing ran its course. No more an easy source of funding No more an easy way to get the “troops” mobilized No more an easy sell of software tools No more an easy way to get your papers published or your press releases posted www.cs.wisc.edu/~miron Introduction “The term “the Grid” was coined in the mid 1990s to denote a proposed distributed computing infrastructure for advanced science and engineering [27]. Considerable progress has since been made on the construction of such an infrastructure (e.g., [10, 14, 36, 47]) but the term “Grid” has also been conflated, at least in popular perception, to embrace everything from advanced networking to artificial intelligence. One might wonder if the term has any real substance and meaning. Is there really a distinct “Grid problem” and hence a need for new “Grid technologies”? If so, what is the nature of these technologies and what is their domain of applicability? While numerous groups have interest in Grid concepts and share, to a significant extent, a common vision of Grid architecture, we do not see consensus on the answers to these questions.” “The Anatomy of the Grid - Enabling Scalable Virtual Organizations” Ian Foster, Carl Kesselman and Steven Tuecke 2001. www.cs.wisc.edu/~miron Distributed Computing Distributed computing is here to stay and to continue to evolve as processing, storage and communication resources get more powerful and cheaper Big science is inherently distributed Most scientific disciplines (and many commercial sectors) depend on High Throughput Computing (HTC) capabilities www.cs.wisc.edu/~miron Keynote 3: When All Computing Becomes Grid Computing Speaker: Prof. Daniel A. Reed Chancellor’s Eminent Professor Director, Renaissance Computing Institute University of North Carolina at Chapel Hill Abstract: Scientific computing is moving rapidly from a world of “reliable, secure parallel systems” to a world of distributed software, virtual organizations and high-performance, though unreliable parallel and distributed systems with few guarantees of availability and quality of service. In addition, a tsunami of new experimental and computational data poses equally vexing problems in analysis, transport, visualization and collaboration. This transformation poses daunting scaling and reliability challenges and necessitates new approaches to collaboration, software development, performance measurement, system reliability and coordination. This talk describes Renaissance approaches to solving some of today’s most challenging scientific and societal problems using Grids and parallel systems, supported by rich tools for performance analysis, reliability assessment and workflow management. www.cs.wisc.edu/~miron As we return to the fundamentals and stay away from hype and the technologies of the moment, we will advance the state of the art in distributed computing www.cs.wisc.edu/~miron Our HTC Community is Stronger than Ever www.cs.wisc.edu/~miron Downloads per month www.cs.wisc.edu/~miron Fractions per month www.cs.wisc.edu/~miron Language Weaver Executive Summary • Incorporated in 2002 – USC/ISI startup that commercializes statisticalbased machine translation software • Continuously improved language pair offering in terms of language pairs coverage and translation quality – More than 50 language pairs – Center of excellence in Statistical Machine Translation and Natural Language Processing IT Needs • The Language Weaver Machine Translation systems are trained automatically on large amounts of parallel data. • Training/learning processes implement workflows with hundreds of steps, which use thousands of CPU hours and which generate hundreds of gigabytes of data • Robust/fast workflows are essential for rapid experimentation cycles Solution: Condor • Condor-specific workflows adequately manage thousands of atomic computational steps/day. • Advantages: – Robustness – good recovery from failures – Well-balanced utilization of existing IT infrastructure The Road Ahead › › › › › › › › › › Green Computing Computing in the Clouds “Launch and Leave” Computing Turn-on of the LHC Broader and larger community of contributors More and bigger campus grids Fetching work from “other” sources Multi-Core nodes Low latency and short jobs Staging data through Storage Elements www.cs.wisc.edu/~miron Thank you for building such a wonderful community www.cs.wisc.edu/~miron