OSG Integration - Computer Sciences Dept.

advertisement
Welcome to CW 2008!!!
www.cs.wisc.edu/~miron
The Condor Project
(Established ‘85)
Distributed Computing research
performed by a team of ~35 faculty, full
time staff and students who
 face software/middleware engineering challenges
in a UNIX/Linux/Windows/OS X environment,
 involved in national and international collaborations,
 interact with users in academia and industry,
 maintain and support a distributed production
environment (more than 4000 CPUs at UW),
 and educate and train students.
www.cs.wisc.edu/~miron
“ … Since the early days of mankind the
primary motivation for the establishment of
communities has been the idea that by being
part of an organized group the capabilities
of an individual are improved. The great
progress in the area of inter-computer
communication led to the development of
means by which stand-alone processing subsystems can be integrated into multicomputer ‘communities’. … “
Miron Livny, “ Study of Load Balancing Algorithms for
Decentralized Distributed Processing Systems.”,
Ph.D thesis, July 1983.
www.cs.wisc.edu/~miron
Main Threads of Activities
› Distributed Computing Research – develop and
›
›
›
›
evaluate new concepts, frameworks and technologies
Keep Condor “flight worthy” and support our users
The Open Science Grid (OSG) – build and operate a
national High Throughput Computing infrastructure
The Grid Laboratory Of Wisconsin (GLOW) – build,
maintain and operate a distributed computing and
storage infrastructure on the UW campus.
The NSF Middleware Initiative - Develop, build
and operate a national Build and Test facility
powered by Metronome
www.cs.wisc.edu/~miron
Future of
Grid Computing
Miron Livny
Computer Sciences Department
University of Wisconsin-Madison
miron@cs.wisc.edu
The Tulmod says in the name of Rabbi
Yochanan,
“Since the destruction of the
Temple, prophecy has been
taken from prophets and
given to fools and children.”
(Baba Batra 12b)
www.cs.wisc.edu/~miron
The Grid Computing Movement
I believe that as a movement grid
computing ran its course.
No more an easy source of funding
No more an easy way to get the “troops”
mobilized
No more an easy sell of software tools
No more an easy way to get your papers
published or your press releases posted
www.cs.wisc.edu/~miron
Introduction
“The term “the Grid” was coined in the mid 1990s to denote a proposed
distributed computing infrastructure for advanced science and
engineering [27]. Considerable progress has since been made on the
construction of such an infrastructure (e.g., [10, 14, 36, 47]) but the term
“Grid” has also been conflated, at least in popular perception, to embrace
everything from advanced networking to artificial intelligence. One might
wonder if the term has any real substance and meaning. Is there really a
distinct “Grid problem” and hence a need for new “Grid technologies”? If so,
what is the nature of these technologies and what is their domain of
applicability? While numerous groups have interest in Grid concepts and
share, to a significant extent, a common vision of Grid architecture, we do not
see consensus on the answers to these questions.”
“The Anatomy of the Grid - Enabling Scalable Virtual Organizations” Ian
Foster, Carl Kesselman and Steven Tuecke 2001.
www.cs.wisc.edu/~miron
Distributed Computing
Distributed computing is here to stay
and to continue to evolve as processing,
storage and communication resources
get more powerful and cheaper
Big science is inherently distributed
Most scientific disciplines (and many
commercial sectors) depend on High
Throughput Computing (HTC) capabilities
www.cs.wisc.edu/~miron
Keynote 3: When All Computing Becomes Grid
Computing
Speaker: Prof. Daniel A. Reed
Chancellor’s Eminent Professor
Director, Renaissance Computing Institute
University of North Carolina at Chapel Hill
Abstract:
Scientific computing is moving rapidly from a world of “reliable,
secure parallel systems” to a world of distributed software, virtual
organizations and high-performance, though unreliable parallel and
distributed systems with few guarantees of availability and quality of
service. In addition, a tsunami of new experimental and computational
data poses equally vexing problems in analysis, transport, visualization
and collaboration. This transformation poses daunting scaling and
reliability challenges and necessitates new approaches to
collaboration, software development, performance measurement,
system reliability and coordination. This talk describes Renaissance
approaches to solving some of today’s most challenging scientific and
societal problems using Grids and parallel systems, supported by rich
tools for performance analysis, reliability assessment and workflow
management.
www.cs.wisc.edu/~miron
As we return to the
fundamentals and stay
away from hype and the
technologies of the
moment, we will advance
the state of the art in
distributed computing
www.cs.wisc.edu/~miron
Our HTC
Community
is Stronger
than Ever
www.cs.wisc.edu/~miron
Downloads per month
www.cs.wisc.edu/~miron
Fractions per month
www.cs.wisc.edu/~miron
Language Weaver Executive Summary
• Incorporated in 2002
– USC/ISI startup that commercializes statisticalbased machine translation software
• Continuously improved language pair offering in
terms of language pairs coverage and translation
quality
– More than 50 language pairs
– Center of excellence in Statistical Machine
Translation and Natural Language Processing
IT Needs
• The Language Weaver Machine Translation
systems are trained automatically on large
amounts of parallel data.
• Training/learning processes implement
workflows with hundreds of steps, which use
thousands of CPU hours and which generate
hundreds of gigabytes of data
• Robust/fast workflows are essential for rapid
experimentation cycles
Solution: Condor
• Condor-specific workflows adequately manage
thousands of atomic computational steps/day.
• Advantages:
– Robustness – good recovery from failures
– Well-balanced utilization of existing IT
infrastructure
The Road Ahead
›
›
›
›
›
›
›
›
›
›
Green Computing
Computing in the Clouds
“Launch and Leave” Computing
Turn-on of the LHC
Broader and larger community of contributors
More and bigger campus grids
Fetching work from “other” sources
Multi-Core nodes
Low latency and short jobs
Staging data through Storage Elements
www.cs.wisc.edu/~miron
Thank you for building such
a wonderful community
www.cs.wisc.edu/~miron
Download