What Happens When Processing Storage Bandwidth

advertisement
What Happens When
Processing
Storage
Bandwidth
are Free and Infinite?
Jim Gray
Microsoft Research
1
Outline

Clusters of Hardware CyberBricks
– all nodes are very intelligent
– Processing migrates to where the power is
• Disk, network, display controllers have full-blown OS
• Send RPCs to them (SQL, Java, HTTP, DCOM, CORBA) to them
• Computer is a federated distributed system.

Software CyberBricks
– standard way to interconnect intelligent nodes
– needs execution model
– needs parallelism
2
When
Computers & Communication are Free


Traditional computer industry is 0 B$/year
All the costs are in
– Content (good)
– System Management (bad)
• A vendor claims it costs 8$/MB/year to manage disk storage.
– => WebTV (1GB drive) costs 8,000$/year to manage!
– => 10 PB DB costs 80 Billion $/year to manage!
• Automatic management is ESSENTIAL

In the mean time….
3
1980 Rule of Thumb


You need a systems’ programmer per MIPS
You need a Data Administrator per 10 GB
4
One Person per MegaBuck





1 Breadbox ~ 5x 1987 machine room
48 GB is hand-held
One person does all the work
Cost/tps is 1,000x less
25 micro dollars per transaction
A megabuck buys 40 of these!!!
Hardware expert
OS expert
Net expert
DB expert
App expert
4x200 Mhz cpu
1/2 GB DRAM
12 x 4GB disk
3 x7 x 4GB
disk arrays5
All God’s Children Have Clusters!
Buying Computing By the Slice

People are buying computers by the dozens
– Computers only cost 1k$/slice!

Clustering them together
6
A cluster is a cluster is a cluster


It’s so natural,
even mainframes cluster !
Looking closer at usage patterns,
a few models emerge
Looking closer at sites, you see
hierarchies
bunches
functional specialization
7
“Commercial” NT Clusters

16-node Tandem Cluster
– 64 cpus
– 2 TB of disk
– Decision support

45-node Compaq Cluster
–
–
–
–
140 cpus
14 GB DRAM
4 TB RAID disk
OLTP (Debit Credit)
• 1 B tpd (14 k tps)
8
Tandem Oracle/NT




27,383 tpmC
71.50 $/tpmC
4 x 6 cpus
384 disks
=2.7 TB
9
Microsoft.com: ~150x4 nodes
Building 11
Log Processing
Ave CFG:4xP6,
Internal WWW 1 GB RAM,
180 GB HD
Ave Cost:$128K
FY98 Fcst:2
Staging Servers
(7)
The Microsoft.Com Site
Ave CFG:4xP5,
512 RAM,
30 GB HD
Ave Cost:$35K
FY98 Fcst:12
FTP Servers
Ave CFG:4xP5,
512 RAM,
Download 30 GB HD
Replication Ave Cost:$28K
FY98 Fcst: 0
SQLNet
Feeder LAN
Router
Live SQL Servers
MOSWest
Admin LAN
Live SQL Server
All servers in Building11
are accessable from
corpnet.
www.microsoft.com
(4)
register.microsoft.com
(2) Ave CFG:4xP6,
home.microsoft.com
(4)
premium.microsoft.com
(2)
Ave CFG:4xP6,
512 RAM,
30 GB HD
Ave Cost:$35K
FY98 Fcst:3
Ave CFG:4xP6,
512 RAM,
160 GB HD
Ave Cost:$83K
FY98 Fcst:12
Ave CFG:4xP6,
512 RAM,
50 GB HD
Ave Cost:$35K
FY98 Fcst:2
www.microsoft.com
(4)
Ave CFG:4xP6
512 RAM
28 GB HD
Ave Cost: $35K
FY98 Fcst: 17
FDDI Ring
(MIS1)
FDDI Ring
(MIS2)
activex.microsoft.com
(2)
Ave CFG:4xP6,
256 RAM,
30 GB HD
Ave Cost:$25K
FY98 Fcst:2
Router
premium.microsoft.com
(1)
Internet
Ave CFG:4xP5,
256 RAM,
20 GB HD
Ave Cost:$29K
FY98 Fcst:2
register.msn.com
(2)
search.microsoft.com
(1)
Japan Data Center
www.microsoft.com
premium.microsoft.com
(3)
(1)
Ave CFG:4xP6,
Ave CFG:4xP6,
512 RAM,
30 GB HD
Ave Cost:$35K
FY98 Fcst:1
512 RAM,
50 GB HD
Ave Cost:$50K
FY98 Fcst:1
FTP
Download Server
(1)
HTTP
Download Servers
(2)
SQL SERVERS
(2)
Ave CFG:4xP6,
512 RAM,
160 GB HD
Ave Cost:$80K
FY98 Fcst:1
msid.msn.com
(1)
Switched
Ethernet
search.microsoft.com
(2)
Router
Secondary
Gigaswitch
\\Tweeks\Statistics\LAN and Server Name Info\Cluster Process Flow\MidYear98a.vsd
12/15/97
Router
(100 Mb/Sec Each)
support.microsoft.com
(2)
Ave CFG:4xP6,
512 RAM,
30 GB HD
Ave Cost:$35K
FY98 Fcst:9
13
DS3
(45 Mb/Sec Each)
Ave CFG:4xP5,
512 RAM,
30 GB HD
Ave Cost:$28K
FY98 Fcst:0
register.microsoft.com
(2)
support.microsoft.com
search.microsoft.com
(1)
(3)
2
Ethernet
Router
FTP.microsoft.com
(3)
register.microsoft.com
(1)
(100Mb/Sec Each)
Internet
Router
msid.msn.com
(1)
2
OC3
Primary
Gigaswitch
Router
FDDI Ring
(MIS3)
Switched
Ethernet
Router
Router
home.microsoft.com
(2)
Ave CFG:4xP6,
512 RAM,
30 GB HD
Ave Cost:$28K
FY98 Fcst:7
Router
msid.msn.com
(1)
FTP
Download Server
(1)
SQL SERVERS
(2)
Ave CFG:4xP6,
512 RAM,
160 GB HD
Ave Cost:$80K
FY98 Fcst:1
Router
Ave CFG:4xP6,
512 RAM,
30 GB HD
Ave Cost:$28K
FY98 Fcst:3
cdm.microsoft.com
(1)
Ave CFG:4xP5,
256 RAM,
12 GB HD
Ave Cost:$24K
FY98 Fcst:0
512 RAM,
30 GB HD
Ave Cost:$35K
FY98 Fcst:1
msid.msn.com
(1)
search.microsoft.com
(3)
home.microsoft.com
(3)
Ave CFG:4xP6,
1 GB RAM,
160 GB HD
Ave Cost:$83K
FY98 Fcst:2
msid.msn.com
(1)
512 RAM,
30 GB HD
Ave Cost:$43K
FY98 Fcst:10
Ave CFG:4xP6,
512 RAM,
50 GB HD
Ave Cost:$50K
FY98 Fcst:17
www.microsoft.com
(3)
www.microsoft.com premium.microsoft.com
(1)
Ave CFG:4xP6,
Ave CFG:4xP6,(3)
512 RAM,
50 GB HD
Ave Cost:$50K
FY98 Fcst:1
SQL Consolidators
DMZ Staging Servers
Router
SQL Reporting
Ave CFG:4xP6,
512 RAM,
160 GB HD
Ave Cost:$80K
FY98 Fcst:2
European Data Center
IDC Staging Servers
MOSWest
www.microsoft.com
(5)
Internet
FDDI Ring
(MIS4)
home.microsoft.com
(5)
10
HotMail: ~400 Computers
11
Inktomi (hotbot), WebTV: > 200 nodes

Inktomi: ~250 UltraSparcs
–
–
–
–
–

web crawl
index crawled web and save index
Return search results on demand
Track Ads and click-thrus
ACID vs BASE (basic Availability, Serialized Eventually)
Web TV
– ~200 UltraSparcs
• Render pages, Provide Email
– ~ 4 Network Appliance NFS file servers
– A large Oracle app tracking customers
12
Loki: Pentium
Clusters for Science
http://loki-www.lanl.gov/
16 Pentium Pro Processors
x 5 Fast Ethernet interfaces
+ 2 Gbytes RAM
+ 50 Gbytes Disk
+ 2 Fast Ethernet switches
+
Linux…………………...
= 1.2 real Gflops for $63,000
(but that is the 1996 price)
Beowulf project is similar
http://cesdis.gsfc.nasa.gov/pub/people/becker/beo
wulf.html

Scientists want cheap mips.
13
Your Tax Dollars At Work
ASCI for Stockpile Stewardship




Intel/Sandia:
9000x1 node Ppro
LLNL/IBM:
512x8 PowerPC (SP2)
LNL/Cray:
?
Maui Supercomputer Center
– 512x1 SP2
14
Berkeley NOW (network of workstations) Project
http://now.cs.berkeley.edu/

105 nodes
– Sun UltraSparc 170,
128 MB,
2x2GB disk
– Myrinet interconnect (2x160MBps
per node)
– SBus (30MBps) limited





GLUNIX layer above Solaris
Inktomi (HotBot search)
NAS Parallel Benchmarks
Crypto cracker
Sort 9 GB per second
15
Wisconsin COW



40 UltraSparcs
64MB + 2x2GB disk
+ Myrinet
SUN OS
Used as a compute engine
16
Andrew Chien’s JBOB
http://www-csag.cs.uiuc.edu/individual/achien.html






48 nodes
36 HP 2PIIx128 1 disk
Kayak boxes
10 Compaq 2PIIx128 1 disk,
Wkstation 6000
32-Myrinet&16-ServerNet
connected
Operational
All running NT
17
NCSA Cluster

The National Center for
Supercomputing Applications
University of Illinois @ Urbana

500 Pentium cpus, 2k disks, SAN
Compaq + HP +Myricom
A Super Computer for 3M$
Classic Fortran/MPI programming
NT + DCOM programming model




18
4 B PC’s (1 Bips, .1GB dram, 10 GB disk 1 Gbps Net, B=G)
The Bricks of Cyberspace
Cost 1,000 $
 Come with

– NT
– DBMS
– High speed Net
– System management
– GUI / OOUI
– Tools
Compatible with everyone else
 CyberBricks

19
Super Server: 4T Machine

Array of 1,000 4B machines
1
b ips processors
1 B B DRAM
10 B B disks
1 Bbps comm lines
1 TB tape robot


A few megabucks
Challenge:
CPU
50 GB Disc
5 GB RAM
Manageability
Programmability
Security
Cyber Brick
a 4B machine
Availability
Scaleability
Affordability

As easy as a single system
Future servers are CLUSTERS
of processors, discs
Distributed database techniques
make clusters work
20
Cluster Vision
Buying Computers by the Slice

Rack & Stack
– Mail-order components
– Plug them into the cluster

Modular growth without limits
– Grow by adding small modules

Fault tolerance:
– Spare modules mask failures

Parallel execution & data search
– Use multiple processors and disks

Clients and servers made from the same stuff
– Inexpensive: built with
commodity CyberBricks
21
Nostalgia Behemoth in the Basement



today’s PC
is yesterday’s supercomputer
Can use LOTS of them
Main Apps changed:
– scientific  commercial  web
– Web & Transaction servers
– Data Mining, Web Farming
22
SMP -> nUMA: BIG FAT SERVERS


Directory based caching
lets you build large SMPs
Every vendor building a
HUGE SMP
– 256 way
– 3x slower remote memory
– 8-level memory hierarchy
•
•
•
•
•
•
•
L1, L2 cache
DRAM
remote DRAM (3, 6, 9,…)
Disk cache
Disk
Tape cache
Tape

Needs
– 64 bit addressing
– nUMA sensitive OS
• (not clear who will do it)

Or Hypervisor
– like IBM LSF,
– Stanford Disco
www-flash.stanford.edu/Hive/papers.html

You get an expensive
cluster-in-a-box
with very fast network
23
Great Debate: Shared What?
Shared Memory
(SMP)
CLIENTS
Shared Disk
CLIENTS
Easy to program
Difficult to build
Difficult to scale
SGI, Sun, Sequent
Shared Nothing
(network)
CLIENTS
Hard to program
Easy to build
Easy to scale
VMScluster, Sysplex
Tandem, Teradata, SP2, NT
NUMA blurs distinction, but has it’s own problems
24
Thesis
Many little beat few big
$1
million
3
1 MM
$100 K
$10 K
Pico Processor
Micro
Mini
Mainframe
Nano 1 MB
10 pico-second ram
10 nano-second ram
100 MB
10 GB 10 microsecond ram
1 TB
14"




9"
5.25"
3.5"
2.5" 1.8"
10 millisecond disc
100 TB 10 second tape archive
Smoking, hairy golf ball
How to connect the many little parts?
How to program the many little parts?
Fault tolerance?
1 M SPEC marks, 1TFLOP
106 clocks to bulk ram
Event-horizon on chip
VM reincarnated
Multi-program cache,
On-Chip SMP
25
A Hypothetical Question
Taking things to the limit

Moore’s law 100x per decade:
– Exa-instructions per second in 30 years
– Exa-bit memory chips
– Exa-byte disks

Gilder’s Law of the Telecosom
3x/year more bandwidth
60,000x per decade!
– 40 Gbps per fiber today
26
Gilder’s Telecosom Law:
3x bandwidth/year for 25 more years

Today:
– 10 Gbps per channel
– 4 channels per fiber: 40 Gbps
– 32 fibers/bundle = 1.2 Tbps/bundle



In lab 3 Tbps/fiber (400 x WDM)
In theory 25 Tbps per fiber
1 Tbps = USA 1996 WAN bisection bandwidth
1 fiber = 25 Tbps
27
Networking
BIG!! Changes coming!

Technology
– 10 GBps bus “now”
– 1 Gbps links “now”
– 1 Tbps links in 10 years
– Fast & cheap switches

CHALLENGE
– reduce software tax
on messages
– Today 30 K ins
+ 10 ins/byte
– Goal: 1 K ins
+ .01 ins/byte

Standard interconnects

– processor-processor
 Best bet:
– processor-device (=processor)
– SAN/VIA
Deregulation WILL work
– Smart NICs
someday
– Special protocol
– User-Level Net IO (like disk)28
What if
Networking Was as Cheap As Disk IO?

TCP/IP

– Unix/NT
100% cpu @ 40MBps
Disk
– Unix/NT
8% cpu @ 40MBps
Why the Difference?
Host does
TCP/IP packetizing,
checksum,…
flow control
small buffers
Host Bus Adapter does
SCSI packetizing,
checksum,…
flow control
29
DMA
The Promise of SAN/VIA
10x better in 2 years

Today:
– wires are 10 MBps (100 Mbps Ethernet)
– ~20 MBps tcp/ip saturates 2 cpus
– round-trip latency is ~300 us

In two years
250
200
Now
Soon
150
100
50
0
Bandwidth
Latency
Overhead
– wires are 100 MBps (1 Gbps Ethernet, ServerNet,…)
– tcp/ip ~ 100 MBps 10% of each processor
– round-trip latency is 20 us

works in lab today
assumes app uses zero-copy Winsock2 api.
See http://www.viarch.org/
30
Functionally Specialized Cards

Storage
P mips processor
ASIC
Today:
P= 20 mips
M MB DRAM

Network
M= 2 MB
In a few years
ASIC
P= 200 mips
M= 64 MB

Display
ASIC
32
It’s Already True of Printers
Peripheral = CyberBrick


You buy a printer
You get a
– several network interfaces
– A Postscript engine
•
•
•
•
cpu,
memory,
software,
a spooler (soon)
– and… a print engine.
33
System On A Chip

Integrate Processing with memory on one chip
–
–
–
–

chip is 75% memory now
1MB cache >> 1960 supercomputers
256 Mb memory chip is 32 MB!
IRAM, CRAM, PIM,… projects abound
Integrate Networking with processing on one chip
– system bus is a kind of network
– ATM, FiberChannel, Ethernet,.. Logic on chip.
– Direct IO (no intermediate bus)

Functionally specialized cards shrink to a chip.
34
All Device Controllers will be Cray 1’s

TODAY
– Disk controller is 10 mips risc engine
with 2MB DRAM
– NIC is similar power

SOON
Central
Processor &
Memory
– Will become 100 mips systems
with 100 MB DRAM.

They are nodes in a federation
(can run Oracle on NT in disk controller).

Advantages
–
–
–
–
–
Uniform programming model
Great tools
Security
economics (cyberbricks)
Move computation to data (minimize traffic)
Tera Byte
Backplane
35
With Tera Byte Interconnect
and Super Computer Adapters

Processing is incidental to
– Networking
– Storage
– UI

Disk Controller/NIC is
– faster than device
– close to device
– Can borrow device
package & power


Tera Byte
Backplane
So use idle capacity for computation.
Run app in device.
36
Implications
Conventional



Offload device handling
to NIC/HBA
higher level protocols:
I2O, NASD, VIA…
SMP and Cluster
parallelism is important.
Central
Processor &
Memory
Radical



Move app to
NIC/device controller
higher-higher level
protocols: CORBA /
DCOM.
Cluster parallelism is
VERY important.
Tera Byte
Backplane
37
How Do They Talk to Each Other?




Each node has an OS
Each node has local resources: A federation.
Each node does not completely trust the others.
Nodes use RPC to talk to each other
Applications
?
RPC
streams
datagrams


Huge leverage in high-level interfaces.
Same old distributed system story.
VIAL/VIPL
?
RPC
streams
datagrams
– CORBA? DCOM? IIOP? RMI? HTTP?
– One or all of the above.
Applications
VIAL/VIPL
Wire(s)
38
Restatement
The huge clusters we saw
are prototypes for this:
A Federation of
Functionally specialized nodes
Each node shrinks to a “point” device
With embedded processing.
Each node / device is autonomous
39
Each talks a high-level protocol
Outline

Clusters of Hardware CyberBricks
– all nodes are very intelligent
– Processing migrates to where the power is
• Disk, network, display controllers have full-blown OS
• Send RPCs to them (SQL, Java, HTTP, DCOM, CORBA) to them
• Computer is a federated distributed system.

Software CyberBricks
– standard way to interconnect intelligent nodes
– needs execution model
– needs parallelism
40
Software CyberBricks: Objects!


It’s a zoo
Objects and 3-tier computing (transactions)
–
–
–
–

Give natural distribution & parallelism
Give remote management!
TP & Web: Dispatch RPCs to pool of object servers
Components are a 1B$ business today!
Need a Parallel & distributed computing model
41
The COMponent Promise

Objects are
Software CyberBricks
– productivity breakthrough (plug ins)
– manageability breakthrough (modules)



Microsoft:
DCOM + ActiveX
IBM/Sun/Oracle/Netscape:
CORBA + Java Beans
Both promise
– parallel distributed execution
– centralized management of
distributed system
Both camps
Share key goals:










Encapsulation: hide
implementation
Polymorphism: generic ops
key to GUI and reuse
Uniform Naming
Discovery: finding a service
Fault handling: transactions
Versioning: allow upgrades
Transparency: local/remote
Security: who has authority
Shrink-wrap: minimal
inheritance
Automation: easy
42
History and Alphabet Soup
1995
CORBA
Solaris
Object
Management
Group (OMG)
1990
X/Open
UNIX
International
1985
Open software
Foundation (OSF)
Microsoft DCOM based
on OSF-DCE Technology
DCOM and ActiveX extend it
Open
Group
OSF
DCE
NT
COM
43
The OLE-COM Experience


Macintosh had Publish & Subscribe
PowerPoint needed graphs:
– plugged MS Graph in as an component.

Office adopted OLE
– one graph program for all of office

Internet arrived
– URLs are object references,
– Office is Web Enabled right away!


Office97 smaller than Office95
because of shared components
It works!!
44
Linking And Embedding
Objects are data modules;
transactions are execution modules

Link: pointer to object
somewhere else
– Think URL in Internet


Embed: bytes
are here
Objects may be active; can
callback to subscribers
45
The BIG Picture
Components and transactions




Software modules are objects
Object Request Broker (a.k.a., Transaction Processing Monitor)
connects objects
(clients to servers)
Standard interfaces allow software plug-ins
Transaction ties execution of a “job” into an atomic unit:
all-or-nothing, durable, isolated
Object Request Broker
46
Object Request Broker (ORB)







Orchestrates RPC
Registers Servers
Manages pools of servers
Connects clients to servers
Does Naming, request-level authorization,
Provides transaction coordination
Direct and queued invocation
Old names:
– Transaction Processing Monitor,
Transaction
– Web server,
– NetWare
Object-Request Broker
47
The OO Points So Far

Objects are software Cyber Bricks
Object interconnect standards are emerging
Cyber Bricks become Federated Systems.

Next points:


– put processing close to data
– do parallel processing.
48
Three Tier Computing

Clients do presentation, gather input

Clients do some workflow (Xscript)

Clients send high-level requests to ORB


Presentation
workflow
ORB dispatches work-flows and business
objects -- proxies for client, orchestrate flows
& queues
Server-side workflow scripts call on
distributed business objects to execute task
Application
Objects
Database
49
Transaction Processing
Evolution to Three Tier




Intelligence migrated to clients Mainframe
cards
Mainframe Batch processing
(centralized)
Dumb terminals &
Remote Job Entry
green
screen
3270
TP Monitor
Intelligent terminals
database backends
Workflow Systems
Object Request Brokers
Application Generators
Server
ORB
Active
50
Web Evolution to Three Tier
Intelligence migrated to clients (like TP)
WAIS

Character-mode clients,
smart servers
Web
Server
archie
ghopher
green screen
Mosaic

GUI Browsers - Web file servers

GUI Plugins - Web dispatchers - CGI

Smart clients - Web dispatcher (ORB)
pools of app servers (ISAPI, Viper)
workflow scripts at client & server
NS & IE
Active
51
PC Evolution to Three Tier
Intelligence migrated to server

Stand-alone PC
(centralized)

PC + File & print server
IO request
reply
disk I/O
message per I/O

PC + Database server
message per SQL statement

PC + App server
message per transaction

SQL
Statement
ActiveX Client, ORB ActiveX
server, Xscript
Transaction
52
Why Did Everyone Go To ThreeTier?

Manageability
Presentation
– Business rules must be with data
– Middleware operations tools

Performance (scaleability)
workflow
– Server resources are precious
– ORB dispatches requests to server pools

Technology & Physics
–
–
–
–
Put UI processing near user
Put shared data processing near shared data
Minimizes data moves
Encapsulate / modularity
Application
Objects
Database
53
The OO Points So Far





Objects are software Cyber Bricks
Object interconnect standards are emerging
Cyber Bricks become Federated Systems.
Put processing close to data
Next point:
– do parallel processing.
54
Kinds of Parallel Execution
Pipeline
Partition
outputs split N ways
inputs merge M ways
Any
Sequential
Program
Sequential
Sequential
Any
Sequential
Sequential
Program
Any
Sequential
Program
Any
Sequential
Sequential
Program
55
Object Oriented Programming
Parallelism From Many Little Jobs





Gives location transparency
ORB/web/tpmon multiplexes clients to servers
Enables distribution
Exploits embarrassingly parallel apps (transactions)
HTTP and RPC (dcom, corba, rmi, iiop, …) are basis
Tp mon / orb/ web server
56
Why Parallel Access To Data?
At 10 MB/s
1.2 days to scan
1 Terabyte
1,000 x parallel
100 second SCAN.
1 Terabyte
10 MB/s
Parallelism:
divide a big problem
into many smaller ones
to be solved in parallel.
57
Why are Relational Operators
Successful for Parallelism?
Relational data model
uniform operators
on uniform data stream
Closed under composition
Each operator consumes 1 or 2 input streams
Each stream is a uniform collection of data
Sequential data in and out: Pure dataflow
partitioning some operators (e.g. aggregates, non-equi-join, sort,..)
requires innovation
AUTOMATIC PARALLELISM
58
Database Systems
“Hide” Parallelism

Automate system management via tools
– data placement
– data organization (indexing)
– periodic tasks (dump / recover / reorganize)

Automatic fault tolerance
– duplex & failover
– transactions

Automatic parallelism
– among transactions (locking)
– within a transaction (parallel execution)
59
SQL a Non-Procedural
Programming Language


SQL: functional programming language
describes answer set.
Optimizer picks best execution plan
– Picks data flow web (pipeline),
– degree of parallelism (partitioning)
– other execution parameters (process placement, memory,...)
Execution
Planning
Monitor
Schema
GUI
Optimizer
Plan
Executors
Rivers
60
Automatic Parallel Object Relational DB
Select image
from landsat
where date between 1970 and 1990
and overlaps(location, :Rockies)
and snow_cover(image) >.7;
Landsat
date loc image
1/2/72
.
.
.
.
.
..
.
.
4/8/95
33N
120W
.
.
.
.
.
.
.
34N
120W
Temporal
Spatial
Image
Assign one process per processor/disk:
find images with right data & location
analyze image, if 70% snow, return it
Answer
image
date, location,
& image tests
61
Data Rivers: Split + Merge Streams
N X M Data Streams
M Consumers
N producers
River
Producers add records to the river,
Consumers consume records from the river
Purely sequential programming.
River does flow control and buffering
does partition and merge of data records
River = Split/Merge in Gamma =
Exchange operator in Volcano.
62
Partitioned Execution
Spreads computation and IO among processors
Count
Count
Count
Count
Count
Count
A Table
A...E
F...J
K...N
O...S
T...Z
Partitioned data gives
NATURAL parallelism
63
N x M way Parallelism
Merge
Merge
Merge
Sort
Sort
Sort
Sort
Sort
Join
Join
Join
Join
Join
A...E
F...J
K...N
O...S
T...Z
N inputs, M outputs, no bottlenecks.
Partitioned Data
Partitioned and Pipelined Data Flows
64
Main Message

Technology trends give
– many processors and storage units
– inexpensively

To analyze large quantities of data
– sequential (regular) access patterns are 100x
faster
– parallelism is 1000x faster (trades time for
money)
– Relational systems show many parallel
algorithms.
65
Summary

Clusters of Hardware CyberBricks
– all nodes are very intelligent
– Processing migrates to where the power is
• Disk, network, display controllers have full-blown OS
• Send RPCs to them (SQL, Java, HTTP, DCOM, CORBA) to them
• Computer is a federated distributed system.

Software CyberBricks
– standard way to interconnect intelligent nodes
– needs execution model
– needs parallelism
66
Download