THsort PennySort Award Ceremony Beijing China

advertisement

THsort

PennySort

Award Ceremony

Beijing China

19 October 2002

Peng Liu ,

Yao Shi ,

Li Zhang ,

Kuo Zhang ,

Tian Wang , |

ZunChong Tian ,

Hao Wang ,

Xiaoge Wang

1

Trophy presentation by Jim Gray

Outline

Penny Sort history and Award

• The need for long-range research

Some long-range systems research goals.

• What I have been doing.

2

1970

1980

1990

Sort

Benchmark History

IBM TP 1-7

CA and Tony Lukes

Debit Credit

Gray

Wisconsin

Bitton Boral DeWitt Turbyfill

Datamation

Anon et al

TPC-A

TPC-B

MCC

Boral &...

Teradata

Bollinger &...

TPC-C

PennySort

MinuteSort

TPC-D

3 TPC-W ?

2000

A Short History of Sort

• April Fools 1995: Datamation Sort

– Sort 1M 100 B records

– An IO benchmark: 15-min to 1 hr!

• 1993:

{Minute | Penny}x{Daytona | Indy}

• 1998:

TeraByte Sort

• Web site: http://research.Microsoft.com/barc/SortBenchmark/

4

Ground Rules

• How much can you sort for a penny (in a minute).

– Hardware and Software cost

– Depreciated over 3 years

– 1M$ system gets about 1 second,

– 1K$ system gets about 1,000 seconds.

– Time (seconds) = SystemPrice ($) / 946,080

• Input and output are disk resident

• Input is

– 100-byte records (random data)

– key is first 10 bytes.

• Must create output file and fill with sorted version of input file.

• Daytona (product) and Indy (special) categories

5

PennySort

• Hardware

– 266 Mhz Intel PPro

– 64 MB SDRAM (10ns)

– Dual Fujitsu DMA 3.2GB EIDE disks

• Software

– NT workstation 4.3

– NT 5 sort

• Performance

Memory

8% – sort 15 M 100-byte records

(~1.5 GB) board – Disk to disk

13%

– elapsed time 820 sec

PennySort Machine (1107$ )

Disk

25%

Other

22%

• cpu time = 404 sec cpu

32%

Cabinet +

Assembly

7%

Network,

Video, floppy

9%

Software

6%

6

1999 PennySort

Daytona & Indy:

2.58 GB in 917 sec

• HMsort:

Brad Helmkamp,

Keith McCready,

Stenograph LLC

• Intel 400Mhz

2 IDE disks

7

1998 TB Sort

• Chris Nyberg

Nsort

SGI 32x Origin2000

151 Minutes

8

1999 Terabyte Sort

Daytona:

Daivd Cossock, Sam Fineberg,

Pankaj Mehra, John Peck

Tandem/Sandia TSort:

68 CPU ServerNet

47 minutes

Indy:

IBM SPsort

408 nodes, 1952 cpu 2168 disks

17.6 minutes = 1057sec

(all for 1/3 of 94M$, slice price is 64k$ for 4cpu, 2GB ram, 6 9GB disks + interconnect

9

SP sort

• 2 – 4 GBps!

4.0

3.5

3.0

2.5

2.0

1.5

1.0

0.5

0.0

0

GPFS read

GPFS write

Local read

Local write

100 200 300 400 500

Elapsed time (seconds)

600 700 800 900

10

Penny

Minute

TeraByt e

2002 Sort Records

9.8 GB 1098 seconds

105 million records $857 Linux/Intel

THsort , report as doc (128KB) or pdf (33KB)

Peng Liu , Yao Shi , Li Zhang , Kuo Zhang ,

Tian Wang , ZunChong Tian , Hao Wang ,

Xiaoge Wang

High Performance Institute,

Dept. of Computer Science and Technology,

Tsinghua University, Beijing 100084, China

Indy

11.6 GB 1380 seconds

125 m records on a $672 Linux/Intel system

DMsort pdf (660KB), ps (950KB)

Araron Darling , Alex Mohr ,

U. Wisconsin, Madison

12 GB in 60 seconds

Ordinal Nsort

SGI 32 cpu Origin IRIX

49 minutes

Daivd Cossock , Sam Fineberg ,

Pankaj Mehra , John Peck

68x2 Compaq &Sandia Labs

21.8 GB in 56.51 sec

218 million records

NOW+HPVMsort 64 nodes WinNT pdf.

Luis Rivera , Andrew Chien UCSD

1057 seconds

SPsort

1952 SP cluster 2168 disks

Jm Wyllie PDF SPsort.pdf (80KB)

11

The THsort Team

(and friend)

12

2x/year!

• Partly hardware

1.E+06

• Partly software

• Partly economics

1.E+03

Records Sorted per Second

Doubles Every Year

THsort ~

1TB/$

1.E+00

1.E-03

1985

GB Sorted per Dollar

Doubles Every Year

1990 1995

Progress on Sorting

• Speedup comes from Moore’s law 40%/year

• Processor/Disk/Network arrays: 60%/year

(this is a software speedup).

THsort

~1TB/$

SPsort

1.E+08

Records Sorted per Second

Doubles Every Year

1.E+07

1.E+06

1.E+05

1.E+04

1.E+03

Sort Re cords/se cond vs T ime

S P sort/ IB

IBM 3090

1.E+06

IBM RS6000

NOW

Sandia/Compaq

/NT

Ordinal+SGI

Alpha

1.E+03

Cray YMP

Sequent

Intel

HyperCube

Penny

NT sort

1.E+00

Kitsuregawa

Hardware Sorter

Tandem

NT/PennySort Compaq/NT

1.E+02

1985

Bitton M68000

1990 1995 2000

1.E-03

1985 1990

GB Sorted per Dollar

Doubles Every Year

1995 14 2000

Musings: PennySort=TBsort

• Sorts 1TB in 1Minute

• 2 pass so 3TB of disk

• = 10 disks if 330GB/disk

• = 5Gps (if each disk is 50Mbps)

• So, 600 seconds (3TB/5GBps)

• So, node costs 1.5k$

• Costs 100x that today

• maybe in 4 years?

15

Outline

Penny Sort history and Award

• The need for long-range research

Some long-range systems research goals.

• What I have been doing.

16

Properties of a Research Goal

• Simple to state.

• Not obvious how to do it.

• Clear benefit.

• Can be broken into smaller steps

– So that you can see intermediate progress

.

• Progress and solution is testable.

17

I was motivated by a simple goal

1.

Devise an architecture that scales up:

Grow the system without limits

* .

This is impossible (without limits?), but...

This meant automatic parallelism, automatic management, distributed, scaleup:

1,000,000 : 1 fault tolerant, high performance

• Benefits:

– long term vision guides research problems

– simple to state, so attracts colleagues and support

– Can tell your friends & family what it is that you do

18

.

Three Seminal Papers

• Babbage: Computers

• Bush: Automatic Information storage & access

• Turing: Intelligent Machines

• Note:

– Previous Turing lectures described several “theory” problems.

– Problems here are “systems” problems.

– Some include a “and prove it” clause.

– They are enabling technologies, not applications.

– Newell’s: Intelligent Universe

(Ubiquitous computing.) missing because I could not find “simple-to-state” problems.

19

Charles Babbage

(1791-1871)

• Babbage’s computing goals have been realized

– But we still need better algorithms & faster machines

• What happens when

– Computers are free and infinitely powerful?

– Bandwidth and storage is free and infinite?

• Remaining limits:

– Content: the core asset of cyberspace

– Software: Bugs, >100$ per line of code (!)

– Operations: > 1,000 $/node/year

20

ops/s/$ Had Three Growth Curves

1890-1990

1890-1945

Mechanical

Relay

7-year doubling

1.E+09

1.E+06

Combination of Hans Moravac + Larry Roberts + Gordon Bell

WordSize*ops/s/sysprice

1945-1985 1.E+03

Tube, transistor,..

2.3 year doubling

1.E+00

1985-2000

Microprocessor

1.0 year doubling

1.E-03

1.E-06

1880 ops per second/$

doubles every

7.5 years

1900 1920 1940

doubles every

1.0 years

1960

doubles every

2.3 years

1980

21

2000

Trouble-Free Appliances

• Appliance just works. TV, PDA, desktop, ...

• State replicated in safe place (somewhere else)

• If hardware fails, or is lost or stolen, replacement arrives next day (plug&play).

• If software faults, software and state refresh from server.

• If you buy a new appliance, it plugs in and refreshes from the server (as though the old one failed)

• Most vendors are building towards this vision.

• Browsers come close to working this way.

22

Trouble-Free Systems

Manager

– Sets goals

– Sets policy

– Sets budget

– System does the rest.

Everyone is a CIO (Chief Information Officer)

9. Build a system

– used by millions of people each day

– Administered and managed by a ½ time person.

• On hardware fault, order replacement part

On overload, order additional equipment

• Upgrade hardware and software automatically.

23

Trustworthy Systems

Build a system used by millions of people that

10. Only services authorized users

• Service cannot be denied (can’t destroy data or power).

• Information cannot be stolen.

11. Is always available:

(out less than 1 second per 100 years = 8 9’s of availability)

• 1950’s

90% availability,

Today 99% uptime for web sites,

99.99% for well managed sites (50 minutes/year)

3 extra 9s in 45 years.

• Goal: 5 more 9s: 1 second per century.

– And prove it.

24

100 $ line of code?

1 bug per thousand lines?

• 20 $ to design and write it.

• The only thing in Cyber

• 30 $ to test and document it.

Space that is getting

• 50 $ to maintain it.

MORE expensive &

100$ total

LESS reliable

Solution so far: • Application generators:

• Write fewer lines

High level languages

Web sites, Databases, ...

• Semi-custom apps:

SAP, PeopleSoft,..

• Scripting & Objects

• Non Procedural

JavaScript & DOM

•10x not 1,000x better

Very domain specific

25

Automatic Programming

Do What I Mean

(not 100$ Line of code!, no programming bugs)

The holy grail of programming languages & systems

12. Devise a specification language or UI

1.

That is easy for people to express designs (1,000x easier),

2.

That computers can compile, and

3.

That can describe all applications (is complete).

• System should “reason” about application

Ask about exception cases.

Ask about incomplete specification.

– But not be onerous.

• This already exists in domain-specific areas.

(i.e. 2 out of 3 already exists)

• An imitation game for a programming staff.

26

Outline

Penny Sort history and Award

• The need for long-range research

Some long-range systems research goals.

• What I have been doing.

27

What I Have Been Doing

• Traveling & Talking

• Helping Alex Build the SkyServer

• Loading data

• Helping build the Virtual Observatory

• Doing spatial geometry in SQL (no kidding)!

• Learning about web services

(and implementing some)

28

Download