Peng Liu ,
Yao Shi ,
Li Zhang ,
Kuo Zhang ,
Tian Wang , |
ZunChong Tian ,
Hao Wang ,
Xiaoge Wang
1
Trophy presentation by Jim Gray
•
Penny Sort history and Award
• The need for long-range research
•
Some long-range systems research goals.
• What I have been doing.
2
1970
1980
1990
Sort
IBM TP 1-7
CA and Tony Lukes
Debit Credit
Gray
Wisconsin
Bitton Boral DeWitt Turbyfill
Datamation
Anon et al
TPC-A
TPC-B
MCC
Boral &...
Teradata
Bollinger &...
TPC-C
PennySort
MinuteSort
TPC-D
3 TPC-W ?
2000
• April Fools 1995: Datamation Sort
– Sort 1M 100 B records
– An IO benchmark: 15-min to 1 hr!
• 1993:
{Minute | Penny}x{Daytona | Indy}
• 1998:
TeraByte Sort
• Web site: http://research.Microsoft.com/barc/SortBenchmark/
4
• How much can you sort for a penny (in a minute).
– Hardware and Software cost
– Depreciated over 3 years
– 1M$ system gets about 1 second,
– 1K$ system gets about 1,000 seconds.
– Time (seconds) = SystemPrice ($) / 946,080
• Input and output are disk resident
• Input is
– 100-byte records (random data)
– key is first 10 bytes.
• Must create output file and fill with sorted version of input file.
• Daytona (product) and Indy (special) categories
5
• Hardware
– 266 Mhz Intel PPro
– 64 MB SDRAM (10ns)
– Dual Fujitsu DMA 3.2GB EIDE disks
• Software
– NT workstation 4.3
– NT 5 sort
• Performance
Memory
8% – sort 15 M 100-byte records
(~1.5 GB) board – Disk to disk
13%
– elapsed time 820 sec
PennySort Machine (1107$ )
Disk
25%
Other
22%
• cpu time = 404 sec cpu
32%
Cabinet +
Assembly
7%
Network,
Video, floppy
9%
Software
6%
6
•
Daytona & Indy:
2.58 GB in 917 sec
• HMsort:
Brad Helmkamp,
Keith McCready,
Stenograph LLC
• Intel 400Mhz
2 IDE disks
7
• Chris Nyberg
Nsort
SGI 32x Origin2000
151 Minutes
8
•
Daytona:
Daivd Cossock, Sam Fineberg,
Pankaj Mehra, John Peck
Tandem/Sandia TSort:
68 CPU ServerNet
47 minutes
•
Indy:
IBM SPsort
408 nodes, 1952 cpu 2168 disks
17.6 minutes = 1057sec
(all for 1/3 of 94M$, slice price is 64k$ for 4cpu, 2GB ram, 6 9GB disks + interconnect
9
• 2 – 4 GBps!
4.0
3.5
3.0
2.5
2.0
1.5
1.0
0.5
0.0
0
GPFS read
GPFS write
Local read
Local write
100 200 300 400 500
Elapsed time (seconds)
600 700 800 900
10
Penny
Minute
TeraByt e
2002 Sort Records
9.8 GB 1098 seconds
105 million records $857 Linux/Intel
THsort , report as doc (128KB) or pdf (33KB)
Peng Liu , Yao Shi , Li Zhang , Kuo Zhang ,
Tian Wang , ZunChong Tian , Hao Wang ,
Xiaoge Wang
High Performance Institute,
Dept. of Computer Science and Technology,
Tsinghua University, Beijing 100084, China
Indy
11.6 GB 1380 seconds
125 m records on a $672 Linux/Intel system
DMsort pdf (660KB), ps (950KB)
Araron Darling , Alex Mohr ,
U. Wisconsin, Madison
12 GB in 60 seconds
Ordinal Nsort
SGI 32 cpu Origin IRIX
49 minutes
Daivd Cossock , Sam Fineberg ,
Pankaj Mehra , John Peck
68x2 Compaq &Sandia Labs
21.8 GB in 56.51 sec
218 million records
NOW+HPVMsort 64 nodes WinNT pdf.
Luis Rivera , Andrew Chien UCSD
1057 seconds
SPsort
1952 SP cluster 2168 disks
Jm Wyllie PDF SPsort.pdf (80KB)
11
(and friend)
12
• Partly hardware
1.E+06
• Partly software
• Partly economics
1.E+03
Records Sorted per Second
Doubles Every Year
THsort ~
1TB/$
1.E+00
1.E-03
1985
GB Sorted per Dollar
Doubles Every Year
1990 1995
• Speedup comes from Moore’s law 40%/year
• Processor/Disk/Network arrays: 60%/year
(this is a software speedup).
THsort
~1TB/$
SPsort
1.E+08
Records Sorted per Second
Doubles Every Year
1.E+07
1.E+06
1.E+05
1.E+04
1.E+03
Sort Re cords/se cond vs T ime
S P sort/ IB
IBM 3090
1.E+06
IBM RS6000
NOW
Sandia/Compaq
/NT
Ordinal+SGI
Alpha
1.E+03
Cray YMP
Sequent
Intel
HyperCube
Penny
NT sort
1.E+00
Kitsuregawa
Hardware Sorter
Tandem
NT/PennySort Compaq/NT
1.E+02
1985
Bitton M68000
1990 1995 2000
1.E-03
1985 1990
GB Sorted per Dollar
Doubles Every Year
1995 14 2000
• Sorts 1TB in 1Minute
• 2 pass so 3TB of disk
• = 10 disks if 330GB/disk
• = 5Gps (if each disk is 50Mbps)
• So, 600 seconds (3TB/5GBps)
• So, node costs 1.5k$
• Costs 100x that today
• maybe in 4 years?
15
•
Penny Sort history and Award
• The need for long-range research
•
Some long-range systems research goals.
• What I have been doing.
16
– So that you can see intermediate progress
.
17
1.
Devise an architecture that scales up:
Grow the system without limits
* .
This is impossible (without limits?), but...
This meant automatic parallelism, automatic management, distributed, scaleup:
1,000,000 : 1 fault tolerant, high performance
• Benefits:
– long term vision guides research problems
– simple to state, so attracts colleagues and support
– Can tell your friends & family what it is that you do
18
.
• Babbage: Computers
• Bush: Automatic Information storage & access
• Turing: Intelligent Machines
• Note:
– Previous Turing lectures described several “theory” problems.
– Problems here are “systems” problems.
– Some include a “and prove it” clause.
– They are enabling technologies, not applications.
– Newell’s: Intelligent Universe
(Ubiquitous computing.) missing because I could not find “simple-to-state” problems.
19
(1791-1871)
• Babbage’s computing goals have been realized
– But we still need better algorithms & faster machines
• What happens when
– Computers are free and infinitely powerful?
– Bandwidth and storage is free and infinite?
• Remaining limits:
– Content: the core asset of cyberspace
– Software: Bugs, >100$ per line of code (!)
– Operations: > 1,000 $/node/year
20
1890-1945
Mechanical
Relay
7-year doubling
1.E+09
1.E+06
Combination of Hans Moravac + Larry Roberts + Gordon Bell
WordSize*ops/s/sysprice
1945-1985 1.E+03
Tube, transistor,..
2.3 year doubling
1.E+00
1985-2000
Microprocessor
1.0 year doubling
1.E-03
1.E-06
1880 ops per second/$
doubles every
7.5 years
1900 1920 1940
doubles every
1.0 years
1960
doubles every
2.3 years
1980
21
2000
• Appliance just works. TV, PDA, desktop, ...
• State replicated in safe place (somewhere else)
• If hardware fails, or is lost or stolen, replacement arrives next day (plug&play).
• If software faults, software and state refresh from server.
• If you buy a new appliance, it plugs in and refreshes from the server (as though the old one failed)
• Most vendors are building towards this vision.
• Browsers come close to working this way.
22
•
Manager
– Sets goals
– Sets policy
– Sets budget
– System does the rest.
•
Everyone is a CIO (Chief Information Officer)
9. Build a system
– used by millions of people each day
– Administered and managed by a ½ time person.
• On hardware fault, order replacement part
•
On overload, order additional equipment
• Upgrade hardware and software automatically.
23
•
Build a system used by millions of people that
10. Only services authorized users
• Service cannot be denied (can’t destroy data or power).
• Information cannot be stolen.
11. Is always available:
(out less than 1 second per 100 years = 8 9’s of availability)
• 1950’s
90% availability,
Today 99% uptime for web sites,
99.99% for well managed sites (50 minutes/year)
3 extra 9s in 45 years.
• Goal: 5 more 9s: 1 second per century.
– And prove it.
24
• 20 $ to design and write it.
• The only thing in Cyber
• 30 $ to test and document it.
Space that is getting
• 50 $ to maintain it.
MORE expensive &
100$ total
LESS reliable
Solution so far: • Application generators:
• Write fewer lines
High level languages
Web sites, Databases, ...
• Semi-custom apps:
SAP, PeopleSoft,..
• Scripting & Objects
• Non Procedural
JavaScript & DOM
•10x not 1,000x better
Very domain specific
25
Do What I Mean
(not 100$ Line of code!, no programming bugs)
The holy grail of programming languages & systems
12. Devise a specification language or UI
1.
That is easy for people to express designs (1,000x easier),
2.
That computers can compile, and
3.
That can describe all applications (is complete).
• System should “reason” about application
–
Ask about exception cases.
–
Ask about incomplete specification.
– But not be onerous.
• This already exists in domain-specific areas.
(i.e. 2 out of 3 already exists)
• An imitation game for a programming staff.
26
•
Penny Sort history and Award
• The need for long-range research
•
Some long-range systems research goals.
• What I have been doing.
27
• Traveling & Talking
• Helping Alex Build the SkyServer
• Loading data
• Helping build the Virtual Observatory
• Doing spatial geometry in SQL (no kidding)!
• Learning about web services
(and implementing some)
28