Rules of Thumb in Data Engineering Jim Gray Microsoft Storage Lunch

advertisement
Rules of Thumb in Data
Engineering
Jim Gray
Microsoft Storage Lunch
10 July 2001
Gray@Microsoft.com,
http://research.Microsoft.com/~Gray/Talks/
1
Outline
Moore’s Law and consequences
Storage rules of thumb
Balanced systems rules revisited
Networking rules of thumb
Caching rules of thumb
2
Meta-Message:
Technology Ratios Matter
Price and Performance change.
If everything changes in the same way,
then nothing really changes.
If some things get much cheaper/faster than others,
then that is real change.
Some things are not changing much:



Cost of people
Speed of light
…
And some things are changing a LOT
3
Trends: Moore’s Law
Performance/Price doubles every 18 months
100x per decade
Progress in next 18 months
= ALL previous progress


New storage = sum of all old storage (ever)
New processing = sum of all old processing.
E. coli double ever 20 minutes!
15 years ago
4
Trends:
ops/s/$ Had Three Growth Phases
1890-1945
Mechanical
Relay
7-year doubling
1945-1985
Tube, transistor,..
2.3 year doubling
1985-2000
Microprocessor
1.0 year doubling
1.E+09
ops per second/$
doubles every
1.0 years
1.E+06
1.E+03
1.E+00
1.E-03
doubles every
7.5 years
doubles every
2.3 years
1.E-06
1880
1900
1920
1940
1960
1980
2000
5
So: a problem
Suppose you have a ten-year compute job on the
world’s fastest supercomputer.
What should you do.
? Commit 250M$ now?
? Program for 9 years
Software speedup:
26 = 64x
Moore’s law speedup:
26 = 64x
so 4,000x speedup:
spend 1M$ (not 250M$ on hardware)
runs in 2 weeks, not 10 years.
Homework problem:
What is the optimum strategy?
6
Disk TB Shipped per Year
Storage capacity
beating Moore’s law
1E+7
1E+6
1E+5
3 k$/TB today (raw disk)
1k$/TB by end of 2002
ExaByte
disk TB
growth:
112%/y
Moore's Law:
58.7%/y
1E+4
1E+3
1988
Moores law
Revenue
TB growth
Price decline
1998 Disk Trend (Jim Porter)
http://www.disktrend.com/pdf/portrpkg.pdf.
1991
1994
1997
58.70% /year
7.47%
112.30% (since 1993)
50.70% (since 1993)
2000
7
Consequence of Moore’s law:
Need an address bit every 18 months.
Moore’s law gives you 2x more in 18 months.
RAM


Today we have 10 MB to 100 GB machines
(24-36 bits of addressing) then
In 9 years we will need 6 more bits:
30-42 bit addressing (4TB ram).
Disks


Today we have 10 GB to 100 TB file systems/DBs
(33-47 bit file addresses)
In 9 years, we will need 6 more bits
11
40-53 bit file addresses (100 PB files)
Architecture could change this
1-level store:



System 48, AS400 has 1-level store.
Never re-uses an address.
Needs 96-bit addressing today.
NUMAs and Clusters


Willing to buy a 100 M$ computer?
Then add 6 more address bits.
Only 1-level store pushes us beyond 64-bits
Still, these are “logical” addresses,
64-bit physical will last many years
12
Trends: Gilder’s Law:
3x bandwidth/year for 25 more years
Today:



40 Gbps per channel (λ)
12 channels per fiber (wdm): 500 Gbps
32 fibers/bundle = 16 Tbps/bundle
In lab 3 Tbps/fiber (400 x WDM)
In theory 25 Tbps per fiber
1 Tbps = USA 1996 WAN bisection bandwidth
Aggregate bandwidth doubles every 8 months!
1 fiber = 25 Tbps
13
Outline
Moore’s Law and consequences
Storage rules of thumb
Balanced systems rules revisited
Networking rules of thumb
Caching rules of thumb
14
How much storage do we need?
Yotta
Everything
Soon everything can be
!
recorded and indexed
Recorded
All Books
Most bytes will never be
MultiMedia
seen by humans.
Data summarization, trend
All LoC books
detection anomaly
(words)
detection are key
technologies
.Movi
See Mike Lesk:
How much information is there:
http://www.lesk.com/mlesk/ksg97/ksg.html
http://www.sims.berkeley.edu/research/projects/how-much-info/
24 Yecto, 21 zepto, 18 atto, 15 femto, 12 pico, 9 nano, 6 micro, 3 milli
Exa
Peta
Tera
Giga
e
A Photo
See Lyman & Varian:
How much information
Zetta
A Book
Mega
15
Kilo
Storage Latency:
How Far Away is the Data?
10 9
Andromeda
Tape /Optical
Robot
10 6 Disk
100
10
2
1
Memory
On Board Cache
On Chip Cache
Registers
2,000 Years
Pluto
Springfield
2 Years
1.5 hr
This Campus
10 min
This Room
My Head
1 min
16
Storage Hierarchy :
Speed & Capacity vs Cost Tradeoffs
1012
Disc
Secondary
109
Main
106
Price vs Speed
Cache
102
Offline Main
Tape
100
Secondary
Online
Online
Tape
Disc Tape 10-2
Offline
Nearline
Tape
Tape
10-4
Cache
103
10-6
10-9 10-6 10-3 10 0 10 3
Access Time (seconds)
10-9 10-6 10-3 10 0 10 3
Access Time (seconds)
17
$/MB
Typical System (bytes)
1015
Size vs Speed
Nearline
Tape
Disks: Today
Disk is 8GB to 160 GB
10-50 MBps
5k-15k rpm (6ms-2ms rotational latency)
12ms-7ms seek
2K$/IDE-TB, 15k$/SCSI-TB
For shared disks most time spent
waiting in queue for access to
arm/controller
Wait
Transfer
Transfer
Rotate
Rotate
Seek
Seek
18
Standard Storage Metrics
Capacity:



RAM:
Disk:
Tape:
MB and $/MB: today at 512MB and 200$/GB
GB and $/GB: today at 80GB and 70k$/TB
TB and $/TB: today at
40GB and 10k$/TB
(nearline)
Access time (latency)



RAM:
Disk:
Tape:
100 ns
15 ms
30 second pick, 30 second position
Transfer rate



RAM:
Disk:
Tape:
1-10 GB/s
10-50 MB/s - - -Arrays can go to 10GB/s
5-15 MB/s - - - Arrays can go to 1GB/s
19
New Storage Metrics:
Kaps, Maps, SCAN
Kaps: How many kilobyte objects served per second


The file server, transaction processing metric
This is the OLD metric.
Maps: How many megabyte objects served per sec

The Multi-Media metric
SCAN: How long to scan all the data

the data mining and utility metric
And

Kaps/$, Maps/$, TBscan/$
20
Storage Ratios Changed
10x better access time
10x more bandwidth
100x more capacity
Data 25x cooler (1Kaps/20MB vs 1Kaps/500MB)
4,000x lower media price
20x to 100x lower disk price
Scan takes 10x longer (3 min vs 45 min)
1
1980
1990
Year



1970-1990
1990-1995
1995-1997
today ~
6$/GB disk
600$/GB ram
0.1
2000
100:1
10:1
50:1
100:1
Storage Price vs Time
Megabytes per kilo-dollar
100
10,000.
1,000.
MB/k$
Accesses per Second
1.
Capacity (GB)
seeks per second
bandwidth: MB/s
10.
10

Disk accesses/second
vs Time
Disk Performance vs Time
100
RAM/disk media price ratio
changed
10
100.
10.
1.
1
1980
1990
Year
2000
0.1
1980
1990
Year
24
2000
Data on Disk
Can Move to RAM in 10 years
Storage Price vs Time
Megabytes per kilo-dollar
10,000.
100:1
MB/k$
1,000.
100.
10.
10 years1.
0.1
1980
1990
Year
2000
25
More Kaps and Kaps/$ but….
1970
1980
1990
1000
100
10
2000
100 GB
30 MB/s
26
Kaps/disk
Kaps/$
Disk accesses got much less expensive
Better disks
Kaps over time
Cheaper disks!
1.E+6
Kaps/$
But: disk arms
1.E+5
1.E+4
are expensive
the scarce resource 1.E+3
1.E+2
1 hour Scan
Kaps
1.E+1
vs 5 minutes in 1990 1.E+0
The “Absurd” 10x (=4 year) Disk
2.5 hr scan time
(poor sequential access)
1 aps / 5 GB
(VERY cold data)
It’s a tape!
100 MB/s
200 Kaps
1 TB
27
Disk vs Tape
Disk

80 GB
20 MBps
5 ms seek time
3 ms rotate latency
3$/GB for drive
3$/GB for ctlrs/cabinet
15 TB/rack

1 hour scan





Tape

40 GB
10 MBps
10 sec pick time
30-120 second seek time
2$/GB for media
8$/GB for drive+library
10 TB/rack

1 week scan





Guestimates
Cern: 200 TB
3480 tapes
2 col = 50GB
Rack = 1 TB
= 8 drives
The price advantage of tape is narrowing, and
the performance advantage of disk is growing
At 10K$/TB, disk is competitive with nearline tape.
28
How to cool disk data:
Cache data in main memory

See 5 minute rule later in presentation
Fewer-larger transfers

Larger pages (512-> 8KB -> 256KB)
Sequential rather than random access


Random 8KB IO is 1.5 MBps
Sequential IO is 30 MBps (20:1 ratio is growing)
Raid1 (mirroring) rather than Raid5
(parity).
31
Auto Manage Storage
1980 rule of thumb:

A DataAdmin per 10GB, SysAdmin per mips
2000 rule of thumb


A DataAdmin per 5TB
SysAdmin per 100 clones (varies with app).
Problem:

5TB is 50k$ today, 5k$ in a few years.

Admin cost >> storage cost !!!!
Challenge:

Automate ALL storage admin tasks
35
Summarizing storage rules of thumb (1)
Moore’s law: 4x every 3 years
100x more per decade
Implies 2 bit of addressing every 3 years.
Storage capacities increase 100x/decade
Storage costs drop 100x per decade
Storage throughput increases 10x/decade
Data cools 10x/decade
Disk page sizes increase 5x per decade.
36
Summarizing storage rules of thumb (2)
RAM:Disk and Disk:Tape cost ratios are
100:1
and
3:1
So, in 10 years, disk data can move to RAM
since prices decline 100x per decade.
A person can administer a million dollars of
disk storage: that is 1TB - 100TB today
Disks are replacing tapes as backup devices.
You can’t backup/restore a Petabyte quickly
so geoplex it.
Mirroring rather than Parity to save disk arms
37
Outline
Moore’s Law and consequences
Storage rules of thumb
Balanced systems rules revisited
Networking rules of thumb
Caching rules of thumb
38
Standard Architecture (today)
System Bus
PCI Bus 1
PCI Bus 2
39
Amdahl’s Balance Laws
parallelism law: If a computation has a serial
part S and a parallel component P,
then the maximum speedup is (S+P)/S.
balanced system law: A system needs
a bit of IO per second per instruction per second:
about 8 MIPS per MBps.
memory law: =1:
the MB/MIPS ratio (called alpha ()),
in a balanced system is 1.
IO law:
Programs do one IO per 50,000 instructions.
40
Amdahl’s Laws Valid 35 Years Later?
Parallelism law is algebra: so SURE!
Balanced system laws?
Look at tpc results (tpcC, tpcH) at http://www.tpc.org/
Some imagination needed:

What’s an instruction (CPI varies from 1-3)?
 RISC, CISC, VLIW, … clocks per instruction,…

What’s an I/O?
41
TPC systems
Normalize for CPI (clocks per instruction)


TPC-C has about 7 ins/byte of IO
TPC-H has 3 ins/byte of IO
TPC-H needs ½ as many disks, sequential vs random
Both use 9GB 10 krpm disks (need arms, not bytes)
KB IO/s
MHz/
Disk Disks MB/s
CPI mips
/
/
s
/ cpu / cpu
cpu
IO disk
Amdahl
1
1
1
6
TPC-C=
random
550
2.1
262
8
100
397
50
40
TPC-H=
sequential
550
1.2
458
64
100
176
22
141
Ins/
IO
Byte
42
8
7
3
TPC systems: What’s alpha
(=MB/MIPS)
?
Hard to say:



Intel 32 bit addressing (= 4GB limit). Known CPI.
IBM, HP, Sun have 64 GB limit.
Unknown CPI.
Look at both, guess CPI for IBM, HP, Sun
Alpha is between 1 and 6
Mips
Memory
Alpha
Amdahl
1
1
tpcC Intel
8x262 = 2Gips
4GB
tpcH Intel
8x458 = 4Gips
4GB
tpcC IBM
24 cpus ?= 12 Gips
64GB
tpcH HP
32 cpus ?= 16 Gips
32 GB
1
2
1
6
243
Instructions per IO?
We know 8 mips per MBps of IO
So, 8KB page is 64 K instructions
And 64KB page is 512 K instructions.
But, sequential has fewer instructions/byte.
(3 vs 7 in tpcH vs tpcC).
So, 64KB page is 200 K instructions.
44
Amdahl’s Balance Laws Revised
Laws right, just need “interpretation”
Balanced System Law:
A system needs 8 MIPS/MBpsIO,
(imagination?)
but instruction rate must be measured on the workload.
 Sequential workloads have low CPI (clocks per instruction),
 random workloads tend to have higher CPI.
Alpha (the MB/MIPS ratio) is rising from 1 to 6.
This trend will likely continue.
One Random IO’s per 50k instructions.
Sequential IOs are larger
One sequential IO per 200k instructions
45
PAP vs RAP
Peak Advertised Performance vs
Real Application Performance
Application
Data
File System
CPU
System Bus
550 x4 Mips = 2 Bips
1600 MBps
500 MBps
PCI
1-3 cpi = 170-550 mips
System Bus
133 MBps
90 MBps
SCSI
PCI Bus 1
PCI Bus 2
160 MBps
90 MBps
Disks
66 MBps
25 MBps
46
Outline
Moore’s Law and consequences
Storage rules of thumb
Balanced systems rules revisited
Networking rules of thumb
Caching rules of thumb
47
Ubiquitous 10 GBps SANs
in 5 years
1Gbps Ethernet are reality now.

Also FiberChannel ,MyriNet, GigaNet,
ServerNet,, ATM,…
10 Gbps x4 WDM deployed now

(OC192)
1 GBps
3 Tbps WDM working in lab
In 5 years, expect 10x,
wow!!
120 MBps
(1Gbps)
80 MBps
40 MBps
5 MBps
20 MBps
49
Networking
WANS are getting faster than LANS
G8 = OC192 = 8Gbps is “standard”
Link bandwidth improves 4x per 3 years
Speed of light (60 ms round trip in US)
Software stacks
have always been the problem.
Time = SenderCPU + ReceiverCPU + bytes/bandwidth
This has been
the problem
50
The Promise of SAN/VIA:10x in 2 years
http://www.ViArch.org/
Yesterday:



10 MBps (100 Mbps Ethernet)
250
~20 MBps tcp/ip saturates
200
2 cpus
round-trip latency ~250 µs 150
Now

Wires are 10x faster
Myrinet, Gbps Ethernet, ServerNet,…

Fast user-level
communication
 tcp/ip ~ 100 MBps 10% cpu
Time µs to
Send 1KB
Transmit
receivercpu
sender cpu
100
50
0
100Mbps
Gbps
SAN
 round-trip latency is 15 us
1.6 Gbps demoed on a WAN
51
How much does wire-time cost?
$/Mbyte?
Gbps Ethernet
100 Mbps Ethernet
OC12 (650 Mbps)
DSL
POTs
Wireless:
Cost
Time
.2µ$
.3µ$
.003$
.0006$
.002$
.80$
10 ms
100 ms
20 ms
25 sec
200 sec
500 sec
Seat cost
$/3y
GBpsE
2000
100MbpsE
700
OC12
12960000
OC3
3132000
T1
28800
DSL
2300
POTS
1180
Wireless ?
Bandwidt
h
B/s
$/MB
Time
1.00E+08
2.E-07
0.010
1.00E+07
7.E-07
0.100
5.00E+07
3.E-03
0.020
3.00E+06
1.E-02
0.333
1.00E+05
3.E-03
10.000
4.00E+04
6.E-04
25.000
5.00E+03
2.E-03 200.000
2.00E+03
8.E-01 500.000
seconds in 3 years
94608000
53
Data delivery costs 1$/GB today
Rent for “big” customers:
300$/megabit per
second per month
Improved 3x in last 6
years (!).
That translates to 1$/GB
you send.
You can mail a 160 GB disk
for 20$.


3x160
GB
~ ½ TB
That’s 16x cheaper
If overnight it’s 3 MBps.
54
Outline
Moore’s Law and consequences
Storage rules of thumb
Balanced systems rules revisited
Networking rules of thumb
Caching rules of thumb
55
The Five Minute Rule
Trade DRAM for Disk Accesses
Cost of an access (Drive_Cost / Access_per_second)
Cost of a DRAM page ( $/MB/ pages_per_MB)
Break even has two terms:
Technology term and an Economic term
PagesPerMBofDRAM
PricePerDi skDrive
BreakEvenReferenceInterval 

AccessPerSecondPerDi sk PricePerMB ofDRAM
Grew page size to compensate for changing ratios.
Now at 5 minutes for random, 10 seconds sequential
56
The 5 Minute Rule Derived
$
T =TimeBetweenReferences to Page
Breakeven:
RAM_$_Per_MB =
PagesPerMB
T=
_____DiskPrice
.
T x AccessesPerSecond
DiskPrice
x PagesPerMB
.
RAM_$_Per_MB x AccessPerSecond
57
Plugging in the Numbers
BreakEvenReferenceInterval 
PagesPerMBofDRAM
PricePerDi skDrive

AccessPerSecondPerDi sk PricePerMB ofDRAM
PPM/aps
Random
128/120
Sequential
1/30 ~
disk$/Ram$
Break Even
~1 1000/3 ~300
5 minutes
.03
~ 300
10seconds
Trend is longer times because
disk$ not changing much,
RAM$ declining 100x/decade
5 Minutes & 10 second rule
58
The 10 Instruction Rule
Spend 10 instructions /second to save 1 byte
Cost of instruction:
I =ProcessorCost/MIPS*LifeTime
Cost of byte:
B = RAM_$_Per_B/LifeTime
Breakeven:
NxI = B
N = B/I = (RAM_$_B X MIPS)/ ProcessorCost
~ (3E-6x5E8)/500 = 3 ins/B for Intel
~ (3E-6x3E8)/10 = 10 ins/B for ARM
59
Trading Storage for Computation
You can spend 10 bytes of RAM to save 1
instruction/second.
Rent for Disk: 1$/GB (forever)
Processor costs 10$ to 1,000$/mips
10$ - 1,000$ for 100 Tera Ops.
So 1$/TeraOp
1 GB ~ 1 Top
1 MB ~ 1 Gop
1 KB ~ 1 Mop
Save a 1KB object on disk
if it costs more than 10 ms to compute.
60
When to Cache Web Pages.
Caching
Caching
Caching
Caching


saves user time
saves wire time
costs storage
only works sometimes:
New pages are a miss
Stale pages are a miss
61
Web Page Caching Saves People Time
Assume people cost 20$/hour (or .2 $/hr ???)
Assume 20% hit in browser, 40% in proxy
Assume 3 second server time
Caching saves people time
28$/year to 150$/year of people time
or .28 cents to 1.5$/year.
connection
cache
R_remote
seconds
LAN
LAN
Modem
Modem
Mobile
Mobile
proxy
browser
proxy
browser
proxy
browser
3
3
5
5
13
13
R_local
seconds
H
hit rate
0.3
0.1
2
0.1
10
0.1
0.4
0.2
0.4
0.2
0.4
0.2
People
S avings
¢/page
0.6
0.3
0.7
0.5
0.7
62
1.4
Web Page Caching Saves Resources
Wire cost is penny (wireless) to 100µ$ LAN
Storage is 8 µ$/mo
Breakeven: wire cost = storage rent
4 to 7 months
Add people cost: breakeven is ~ 4 years.
“cheap people” (.2$/hr)  6 to 8 months.
Time =  A/B
A
B
C
$/10 KB
$/10 KB
download
storage/mo
cache
Internet/LAN
network
1.E-04
8.E-06
storage
time
18 months
$
0.02
Modem
Wireless
2.E-04
1.E-02
8.E-06
2.E-04
36 months
300 years
0.03
0.07
Break-even People Cost
Time =
(A+ C )/B
of download Break Even
15 years
21 years
>99963years
Caching
Disk caching


5 minute rule for random IO
11 second rule for sequential IO
Web page caching:

If page will be re-referenced in
18 months: with free users
15 years: with valuable users
then cache the page in the client/proxy.
Challenge:
guessing which pages will be re-referenced
detecting stale pages (page velocity)
64
Meta-Message:
Technology Ratios Matter
Price and Performance change.
If everything changes in the same way,
then nothing really changes.
If some things get much cheaper/faster than others,
then that is real change.
Some things are not changing much:



Cost of people
Speed of light
…
And some things are changing a LOT
65
Outline
Moore’s Law and consequences
Storage rules of thumb
Balanced systems rules revisited
Networking rules of thumb
Caching rules of thumb
66
Download