CRA: No Challenge, decade outlook. Industry is on a fast

advertisement
NSF Visit
Gordon Bell
www.research.microsoft.com/~gbell
Microsoft Research
4 October 2002
Topics
• How much things have changed since CISE was
formed in 1986, but remain the same?
• 10 year base case @CRA’s Grand Challenges?
http://www.google.com/search?sourceid=navclient&q=cra+grand+challenges
• GB MyLifeBits: storing one’s entire life for recall,
home media, etc.
• Clusters, Grids, and Centers…challenge is apps
• Supercomputing directions
Messages…
• The Grand Challenge for CISE is to work on applications in
science, engineering, and bio/medicine/health care (e.g.
NIH).
• Databases versus greping. Revolution needed.
Performance from software >= Moore’s Law
• Big challenge moving forward will come from trying to
manage and exploit all the storage.
• Supercomputing: Cray. Gresham's Law
• Build on industry standards and efforts.
Grid and “web services” must co-operate.
• Whatever happened to the first, Grand Challenges?
• Minimize grant overhead… site visits.
IBM Sets Up Biotech Research
Center
U.S.-based IBM recently set up a biotechnology research
and development center in Taiwan -- IBM Life Sciences
Center of Excellence -- the company's first in the Asia
Pacific region… the center will provide computation
solutions and services from an integrated bio-information
database linked to resources around the world. Local
research institutes working in cooperation with the center
include Academia Sinica, the Institute for Information
Industry and National Yang Ming University.
From HPCWire 30 September 2002
Retrospective: CISE formed in 1986
• CISE spent about $100 million on research in 1987
• Q: What areas of software research do you think will be the most
vital in the next decade?
• A: Methods to design and build large programs and data bases in a
distributed environment are central.
• Q: What software research areas are funded?
• A: We fund what the community considers to be important … objectoriented languages, data bases, & human interfaces; semantics;
formal methods of design and construction; connectionism; and data
and knowledge bases, including concurrency. We aren’t funding
applications.
Software Productivity c1986
• I believe the big gains in software will come about by eliminating the
old style of programming, by moving to a new paradigm, rather than
magic tools or techniques to make the programming process better.
Visicalc and Lotus 1-2-3 are good examples of a dramatic
improvement in programming productivity. In essence, programming is
eliminated and the work put in the hands of the users.
• These breakthroughs are unlikely to come from the software research
community, because they aren’t involved in real applications. Most
likely they will come from people trained in another discipline who
understand enough about software to be able to carry out the basic
work that ultimately is turned over to the software engineers to
maintain and evolve.
Software productivity c1986
• Q: The recent Software Engineering Conference featured a division
of opinion on mechanized programming. … developing a
programming system to write programs can automate much of the
mundane tasks…
• A: Mechanized programming is recreated and renamed every few
years. In the beginning, it meant a compiler. The last time it was
called automatic programming. A few years ago it was program
generators and the programmer’s work bench. The better it gets, the
more programming you do!
Parallelism c1986
• To show my commitment to parallel processing, for the
next 10 years I will offer two $1000 annual awards for
the best, operational scientific or engineering program
with the most speedup ...
• Q: What …do you expect from parallelism in the next
decade?
• A: Our goal is obtaining a factor of 100 … within the
decade and a factor of 10 within five years. 10 will be
easy because it is inherently in most applications right
now. The hardware will clearly be there if the software
can support it or the users can use it.
• Many researchers think this goal is aiming too low. They
think it should be a factor of I million within 15 years.
However, I am skeptical that anything more than our goal
will be
Goodness
No challenge, next decade of systems.
Industry’s evolutionary path…
¿Que sera sera
Grand
Challengeland
Death and
Doldrums
2000
Time
2012
Computing Research Association Grand Challenges
Gordon Bell
Microsoft Research
26 June 2002
In a decade, the evolution:
We can count on:
• Moore’s Law provides ≈50-100x performance, const. $
20% $ decrease/year => ½ per 5 years
• Terabyte personal stores => personal db managers
• Astronomical sized, by current standards, databases!
• Paper quality screens on watch, tablets… walls
• DSL wired, 3-4G/802.11j nets (>10 Mbps) access
• Network Services: Finally computers can use|access the
web. “It’s the Internet, Stupid.”
– Enabler of intra-, extra-, inter-net commerce
– Finally EDI/Exchanges/Markets
• Ubiquity rivaling the telephone.
– Challenge: An instrument to supplant the phone?
– Challenge: Affordability for everyone on planet <$1500/year
• Personal authentication to access anything of value
• Murphy’s Law continues with larger and more complex
systems, requiring better fundamental understanding. A
opportunity and need for “Autonomic Computing”
In a decade, the evolution:
We are likely to “have”
• 120M computers/yr. World population >1B.
– increasing with decreasing price. 2x / -50%
– X% are discarded. Result is 1 Billion.
• Smaller personals w/phones… video @PDA $
• Almost adequate speech communication for
commands, limited dictation, note taking,
segmenting/indexing video
• Vision capable of tracking each individual in a
relatively large crowd.
With identity, everybody’s location is known,
everywhere, anytime.
Inevitable wireless nets… body,
home, …x-area nets will create
new opportunities
• Need to construct these environment of platforms,
networking protocols, and programming environments
for each kind
• Each net has to research its own sensor/effector
structure as f(application) e.g. body, outdoor, building,
• Taxonomy includes these alternative dimensions:
–
–
–
–
–
–
–
–
Network function
master|slave vs. distributed… currently peripheral nets
permanent|dynamic
indoor|outdoor;
size and spatial diameter;
bandwidth and performance;
sensor/effector types;
security and noise immunity;
New environments can support a wide
range of new apps
• Continued evolution of personal monitoring and
assistance for health and personal care of all ages
• Personal platforms that provide “total recall” that
will assist (25% of population) solving problems
• Platforms for changing education will be available.
Limiters: Authoring tools & standards; content
• Transforming the scientific infrastructure is needed!
–
–
–
–
petabyte databases, petaflops performance
shared data notebooks across instruments and labs
new ways of performing experiments and
new ways of programming/visualizing and storing data.
• Serendipity: Something really new, like we get
every decade but didn’t predict, will occur.
R & D Challenges
• Engineering, evolutionary construction, and non-trivial
maintenance of billions of node, fractal nets ranging from
the space, continent, campus, local, … to in-body nets
• Increasing information flows & vast sea of data
– Large disks everywhere!
personal to large servers across all apps
– Akin to the vast tape libraries that are never read (bit rot)
• A modern, healthcare system that each of us would be
happy or unafraid of being admitted into.
Cf. islands (incompatible systems) of automation and
instruments floating on a sea of paper moved around by
people who maintain a bloated and inefficient “services”
industry/economy.
MyLifeBits, The Challenge of a
0.001-1 Petabyte lifetime PC
Cyberizing everything…
I’ve written, said, presented (incl. video),
photos of physical objects & a few things
I’ve read, heard, seen
and might “want to see” on TV
"The PC is going to be the place where you store the
information … really the center of control“ Billg
1/7/2001
MyLifeBits is an “on-going” project following
CyberAll to “cyberize” all of personal bits!
►Memory recall of books, CDs,
communication, papers, photos, video
►Photos of physical object collections
►Elimination of all physical stores & objects
►Content source for home media: ambiance,
entertainment, communication, interaction
Freestyle for CDs, photos, TV content, videos
Goal: to understand the 1 TByte PC:
need, utility, cost, feasibility, challenge & tools.
Storing all we’ve read, heard, & seen
Human data-types
read text, few pictures
/hr
200 K
/day (/4yr)
2 -10 M/G
/lifetime
60-300 G
speech text @120wpm
speech @1KBps
43 K
3.6 M
0.5 M/G
40 M/G
15 G
1.2 T
stills w/voice @100KB
200 K
2 M/G
60 G
video-like 50Kb/s POTS
video 200Kb/s VHS-lite
22 M
90 M
.25 G/T
1 G/T
25 T
100 T
video 4.3Mb/s HDTV/DVD 1.8 G
20 G/T
1P
© 2002
Scenes from Media Center
A “killer app” for
Terabyte, Lifetime, PC?
MyLifeBits demonstrates need for lifetime memory!
► MODI (Microsoft Office Document Imaging)!
The most significant Office™ addition since HTML.
► Technology to support the vision:
1. Guarantee that data will live forever!
2. A single index that includes mail, conversations,
web accesses, and books!
3. E-book…e-magazines reach critical mass!
4. Telephony and audio capture are needed
5. Photo & video “index serving”
6. More meta-information … Office, photos
7. Lots of GUIs to improve ease-of-use
►
The Clusters – GRID Era
GS
CC
C 2002
Lyon, France September 2002
Copyright Gordon Bell
Clusters & Grids
Same observations as 2000

GRID was/is
X an exciting concept …
They can/must work within a
community, organization, or project.
Apps need to drive.
– “Necessity is the mother of invention.”
–
Taxonomy… interesting vs necessity
Web SVCs

Cycle scavenging and object evaluation
(e.g. seti@home, QCD)
– File distribution/sharing for IP theft
Napster
– Databases &/or programs for a
community
(astronomy, bioinformatics, CERN,
NCAR)
–
e.g.
Grid nj. An arbitrary distributed,
cluster platform





A geographical and multi-organizational
collection of diverse computers dynamically
configured as cluster platforms responding to
arbitrary, ill-defined jobs “thrown” at it.
Costs are not necessarily favorable e.g. disks
are less expensive than cost to transfer data.
Latency and bandwidth are non-deterministic,
thereby changing cluster characteristics
Once a large body of data exists for a job, it is
inherently bound to (set into) fixed resources.
Large datasets & I/O bound programs need to
be with their data or be database accesses…
But are there resources there to share?
Bright spots… near term, user
focus, a lesson for Grid suppliers


Tony Hey, head of UK scientific computing.
apps-based funding.
versus tools-based funding.
Web services based Grid & data orientation.
David Abramson - Nimrod.
–
–
–

Andrew Grimshaw - Avaki
–



Parameter scans… other low hanging fruit
Encapsulate apps! “Excel”-- language/control mgmt.
“Legacy apps are programs that users just want, and
there’s no time or resources to modify code
…independent of age, author, or language e.g. Java.”
Making Legion vision real. A reality check.
Lip 4 pairs of “web services” based apps
Gray et al Skyservice and Terraservice
Goal: providing a web service must be as easy
as publishing a web page…and will occur!!!
SkyServer: delivering a web service to the
astronomy community.
Prototype for other sciences?
Gray, Szalay, et al
First paper on the SkyServer
http://research.microsoft.com/~gray/Papers/MSR_
TR_2001_77_Virtual_Observatory.pdf
http://research.microsoft.com/~gray/Papers/MSR_
TR_2001_77_Virtual_Observatory.doc
Later, more detailed paper for database community
http://research.microsoft.com/~gray/Papers/MSR_
TR_01_104_SkyServer_V1.pdf
http://research.microsoft.com/~gray/Papers/MSR_
TR_01_104_SkyServer_V1.doc
Copyright Gordon Bell
Clusters & Grids
What can be learned from Sky Server?
It’s about data, not about harvesting
flops
 1-2 hr. query programs versus 1 wk
programs based on grep
 10 minute runs versus 3 day compute
& searches
 Database viewpoint. 100x speed-ups

Avoid costly re-computation and searches
Use indices and PARALLEL I/O.
Read / Write >>1.
– Parallelism
and
Copyright Gordon Bell is automatic, transparent,
Clusters & Grids
just depends on the number of
–
–
Some science is hitting a wall
FTP and GREP are not adequate (Jim Gray)






You can GREP 1 GB in a minute
You can GREP 1 TB in 2 days
You can GREP 1 PB in 3 years.
You can FTP 1 MB in 1 sec.
 You can FTP 1 GB / min.
 …
2 days and 1K$
disks
 …
3 years and 1M$

1PB ~10,000 >> 1,000
At some point you need
indices to limit search
parallel data search and analysis
Goal using dbases. Make it easy to
–
–
Publish: Record structured data
Find data anywhere in the network
Get the subset you need!
–

Explore datasets interactively
Database becomes the file system!!!
Network concerns

Very high cost
–
–
–


Disks cost less than $2/GByte to purchase
Low availability of fast links (last mile problem)
–
–

Labs & universities have DS3 links at most,
and they are very expensive
Traffic: Instant messaging, music stealing
Performance at desktop is poor
–

$(1 + 1) / GByte to send on the net;
Fedex and 160 GByte shipments are cheaper
Disks cost $1/GByte to purchase!!!
DSL at home is $0.15 - $0.30
1- 10 Mbps; very poor communication links
Manage: trade-in fast links for cheap links!!
Gray’s $2.4 K, 1 TByte
Sneakernet aka Disk Brick
Cost to move a Terabyte
Cost, time, and speed to
move a Terabyte
Cost of a “Sneaker-Net” TB
We now ship NTFS/SQL disks.
Not good format for Linux.
Ship NFS/CIFS/ODBC servers
(not disks).
Plug “disk” into LAN.
DHCP then file or DB
serve…
Service
in Bay
long
term
CourtesyWeb
of Jim Gray,
Microsoft
Area
Research
Cost to move a Terabyte
Speed
Rent
Raw
Context
Mbps $/month $/Mbps
home phone
0.04
40 1,000
home DSL
0.6
70
117
T1
1.5
1,200
800
T3
43
28,000
651
OC3
155
49,000
316
100 Mpbs
100
Gbps
1000
OC192
9600 1,920,000
200
Raw
$/TB
Time/TB
sent
days
3,086 6 years
360 5 months
2,469 2 months
2,010 2 days
976 14 hours
1 day
2.2 hours
617 14 minutes
Cost, time of Sneaker-net vs Alts
Medi
a
CD
DVD
Tape
DiskBric
1500
200
25
7
Robot$
2x800
2x8K
2x15K
1K
Media
$
240
400
1000
1,400
TB read +
write time
ship
time
TotalTim/
TB
Mbps
60 hrs
24
hrs
6 days
28
60 hrs
24
hrs
6 days
28
$20 K $2,000
92 hrs
24
hrs
5 days
18
$31 K $3,100
19 hrs
24
hrs
Courtesy of Jim Gray, Microsoft Bay Area Research
2 days
52
Cost
(10 TB)
$/TB
shipped
$2 K
$208
$2.6
K
$260
Grids: Real and “personal”
Two carrots, one downside. A bet.


Bell will match any Gordon Bell Prize
(parallelism, performance, or
performance/cost) winner’s prize that is
based on “Grid Platform Technology”.
I will bet any individual or set of
individuals of the Grid Research
community up to $5,000 that a Grid
application will not win the above by
SC2005.
Copyright Gordon Bell
Clusters & Grids
Technical computing:
Observations on an ever changing,
occasionally repetitious,
environment
Copyright Gordon Bell
LANL 5/17/2002
A brief, simplified history of HPC
1.
2.
3.
4.
5.
6.
7.
8.
9.
Sequential & data parallelism using shared
memory, Cray’s Fortran computers 60-02 (US:90)
1978: VAXen threaten general purpose centers…
NSF response: form many centers 1988 - present
SCI: Search for parallelism to exploit micros 85-95
Scalability: “bet the farm” on clusters.
Users “adapt” to clusters aka multi-computers
with LCD program model, MPI. >95
Beowulf Clusters adopt standardized hardware
and Linus’s software to create a standard! >1995
“Do-it-yourself” Beowulfs impede new
structures and threaten g.p. centers >2000
1997-2002: Let’s tell NEC they aren’t “in step”.
High speed networking enables peer2peer
computing and the Grid. Will this really work?
What Is the
System
Architecture?
(GB c1990)
Distributed
Memory
Multiprocessors
(scalable)
Multiprocessors
Single Address Space
Shared Memory
Computation
MIMD
Distributed
Multicomputers
(scalable)
X
SIMD
Multiple Address Space
Message Passing
Computation
X
Cross-point or Multi-stage
Cray, Fujitsu, Hitachi, IBM,
NEC, Tera
Central
Memory
Multiprocessors
(not scalable)
Multicomputers
Dynamic Binding of
addresses to processors
KSR
Static binding, Ring multi
IEEE SCI proposal
Static Binding, caching
Alliant, DASH
Static Run-time Binding
research machines
X
Simple, ring multi ... bus
multi replacement
Bus multis
DEC, Encore, NCR, ...
Sequent, SGI,Sun
Mesh Connected
Intel
Butterfly/Fat Tree/Cubes
CM5, NCUBE
Switch connected
IBM
X
Fast LANs for High
Availability and High
Capacity Clusters
DEC, Tandem
LANs for Distributed
Processing
Workstations, PCs
GRID
Processor Architectures?
VECTORS
OR
VECTORS
CS View
SC Designers View
MISC >> CISC >>
Language directed
RISC >> VCISC
(vectors)>>
RISC >> Super-scalar >>
Extra-Long Instruction
Word
Massively parallel (SIMD)
(multiple pipelines)
Caches: mostly alleviate
need for memory B/W
Memory B/W = perf.
Copyright Gordon Bell
LANL 5/17/2002
Results from DARPA’s SCI c1983






Many research and construction efforts …
virtually all new hardware efforts failed except
Intel and Cray.
DARPA directed purchases… screwed up the
market, including the many VC funded efforts.
No Software funding!
Users responded to the massive power
potential with LCD software.
Clusters, clusters, clusters using MPI. Beowulf!
It’s not scalar vs vector, its memory bandwidth!
–
–
6-10 scalar processors = 1 vector unit
16-64 scalars = a 2 – 6 processor SMP
Copyright Gordon Bell
LANL 5/17/2002
Dead Supercomputer Society





















ACRI
Alliant
American Supercomputer
Ametek
Applied Dynamics
Astronautics
BBN
CDC
Convex
Cray Computer
Cray Research
Culler-Harris
Culler Scientific
Cydrome
Dana/Ardent/Stellar/Stardent
Denelcor
Elexsi
ETA Systems
Evans and Sutherland Computer
Floating Point Systems
Galaxy YH-1






















Goodyear Aerospace MPP
Gould NPL
Guiltech
Intel Scientific Computers
International Parallel Machines
Kendall Square Research
Key Computer Laboratories
MasPar
Meiko
Multiflow
Myrias
Numerix
Prisma
Tera
Thinking Machines
Saxpy
Scientific Computer Systems (SCS)
Soviet Supercomputers
Supertek
Supercomputer Systems
Suprenum
Vitesse Electronics
What a difference
25 years AND
spending >10x
makes!
ESRDC: 40 Tflops.
640 nodes
(8 - 8GFl P.vec/node)
LLNL
150 Mflops
machine
room c1978
Copyright Gordon Bell
LANL 5/17/2002
Japanese Earth Simulator
• Spectacular results for $400M.
– Year to year gain of 10x. The greatest gain since the
first (1987) Gordon Bell Prize.
– Performance is 10x the nearest entrant
– Performance/cost is 3x the nearest entrant
– RAP (real application performance) >60% Peak
Other machines are typically 10% of peak.
– Programming was done in HPF (Fortran) that the US
research community abandoned.
• NCAR was right in wanting to purchase an NEC
super
Computer types
-------- Connectivity-------WAN/LAN
Netwrked
Supers…
GRID
Legion
&
P2P
Condor
SAN
VPPuni
DSM
SM
NEC super
NEC mP
Cray X…T
(all mPv)
Clusters
Old
World
Mainframes
T3E
SGI DSM
SP2(mP)
clusters &
Multis
BeowulfNOW
SGI DSM WSs PCs
NT
clusters
Copyright Gordon Bell
LANL 5/17/2002
The Challenge leading to Beowulf






NASA HPCC Program begun in 1992
Comprised Computational Aero-Science and
Earth and Space Science (ESS)
Driven by need for post processing data
manipulation and visualization of large data sets
Conventional techniques imposed long user
response time and shared resource contention
Cost low enough for dedicated single-user
platform
Requirement:
–

1 Gflops peak, 10 Gbyte, < $50K
Commercial systems: $1000/Mflops or 1M/Gflops
Copyright Gordon Bell
LANL 5/17/2002
The Virtuous Economic Cycle
drives the PC industry… & Beowulf
Attracts
suppliers
Greater
availability
@ lower cost
Standards
Attracts users
Copyright Gordon Bell
Creates apps,
tools, training,
LANL 5/17/2002
Lessons from Beowulf










An experiment in parallel computing systems
Established vision- low cost high end computing
Demonstrated effectiveness of PC clusters for
some (not all) classes of applications
Provided networking software
Provided cluster management tools
Conveyed findings to broad community
Tutorials and the book
Provided design standard to rally community!
Standards beget: books, trained people, software
… virtuous cycle that allowed apps to form
Industry begins to form beyond a research project
Courtesy, Thomas Sterling, Caltech.
Clusters: Next Steps
Scalability…
 They can exist at all levels:
personal, group, … centers
 Clusters challenge centers…
given that smaller users get
small clusters

Copyright Gordon Bell
LANL 5/17/2002
Computing in small spaces @ LANL
(RLX cluster in building with NO A/C)
240 processors
@2/3 GFlops
Fill the 4
racks -- gives
a Teraflops
Internet II concerns given $0.5B cost

Very high cost
–
–


Disks cost $1/GByte to purchase!
Low availability of fast links (last mile problem)
–
–

$(1 + 1) / GByte to send on the net;
Fedex and 160 GByte shipments are cheaper
DSL at home is $0.15 - $0.30
Labs & universities have DS3 links at most,
and they are very expensive
Traffic: Instant messaging, music stealing
Performance at desktop is poor
–
1- 10 Mbps; very poor communication links
Copyright Gordon Bell
LANL 5/17/2002
Scalable computing: the effects



They come in all sizes; incremental growth
10 or 100 to 10,000 (100X for most users)
debug vs run; problem growth
Allows compatibility heretofore impossible
1978: VAX chose Cray Fortran
1987: The NSF centers went to UNIX
Users chose sensible environment
–
–

The role of gp centers e.g. NSF, statex is
unclear. Necessity for support?
–
–
–

Acquisition and operational costs & environments
Cost to use as measured by user’s time
Scientific Data for a given community…
Community programs and data
Manage GRIDdiscipline
Are clusters ≈ Gresham’s Law? Drive out alts.
The end
Copyright Gordon Bell
LANL 5/17/2002
Download