EOSDIS Alternate Architecture Study

advertisement
EOSDIS Alternate Architecture Study
Jim Gray
McKay Fellow, UC Berkeley,
1 May 1995,
gray @ crl.com
1. Background - problem and proposed solution
2. What California proposed
Co workers:
Mike Stonebraker: Producer / Director / Script Writer / Propeller Head
Bill Farrell: Ramrod and Computer-literate DirtBag
Jeff Dozier: Godfather
Special effects:
Earth Science: Frank Davis, C. Roberto Mechoso, Jim Frew
Computer Science: Reagan Moore, Jim Gray, Joe Pasquale
Administration: Claire Mosher
Writing: Stephanie Sides
1
Prototypes: many-many....people
Jim Gray
What’s The Problem?
Antarctica is melting -- 77% of fresh water liberated
=> sea level rises 70 meters
=> Chico & Memphis are beach front property
=> New York, Washington, SF, LA, London, Paris
Let’s study it! Mission to Planet Earth
EOS: Earth Observing System (17B$ => 10B$)
50 instruments on 10 satellites 1997-2001
Plus Landsat (added later)
EOS DIS: Data Information System:
3-5 MB/s raw, 30-50 MB/s processed.
4 TB/day, 15 PB by year 2007
Issues
How to store it?
Jim Gray
How to serve it to users?
2
What Happened?
1986: Mission to Planet Earth
1989: Bids from Hughes & TRW
1993: Contract Grant, Public Review:
customers do not want it (tape/mainframe centric)
1994: Alternate Architecture
Three “outside teams”
Wyoming:
Maryland:
California:
Internet 20,000,000
Software Engineering
DB centric
One “home team”
CORBA & Z 39.50 & UNIX
1995: Drifting in the Sequoia direction
3
Jim Gray
The Hughes Plan
8 DAACs (Data Active Archive Centers)
= Bytes
(one per congressional district?)
N SCFs (Scientific Computation Facilities) = MIPS
(typically instrument or science teams)
Thin wires among them
90% of DAAC processing is PULL
building standard data products
fixed pipeline: calibrate, grid, derive
Typical subscriber gets tapes or CDroms
(standard data products)
One “chauffeur” per 10 customers (high ops costs)
Build everything (operations, HSM, DBMS,...) from scratch
CORBA and Z 39.50 is the glue.
4
Criticism:
not
evolvable,
not
open,
not
online,
not
useful.
Jim Gray
What California Proposed
0. Design for success: expect that millions will use the system (online)
1. DBMS centric design automates discovery, access, management
2. Object relational databases enable
Automate access to data so that the NASA 500,
Global Change 10,000 and Internet 20,000,000 can use system.
Cache popular results, not all results (saves 3x or more)
Compute on demand (saves lots of storage and cpu).
Emphasize pull processing rather than push processing.
Use parallelism to get scaleup.
Do Batch as a data pump
3 Be Smart Shoppers:
Use COTS hardware/software (saves 400M$)
Just-in-time acquisition (saves 400M$)
Use workstation not mainframe technology (gives 10x more stuff)
Depreciate over 3 years (ends in 2007 with "fresh" equipment)
4. 2 + N node architecture
2 Super DAACs for fault tolerance and for growth.
Unify the 2 "big" data storage centers with 2 big data analysis centers.
Allow many “little” peer-DAACs at science/user groups
5
Jim Gray
Meta-Model for Sequoia Proposal
Be technological optimists:
couldn’t build it today, count on progress.
ride technology wave (= not water cooled)
Buy or Seed, do not build.
Use COTS where possible
Fund 2 or more COTS vendors if need product
OR DBMS
HSM
Operations.
Replace people with technology (= OR DBMS):
automate data discovery, access, visualization
6
DBMS Centric view.
Jim Gray
DBMS Centric View
This is a database problem (no kidding)!
This is not
a file system problem (file wrong abstraction)
a rpc problem (CORBA wrong abstraction)
a Z 39.50 problem (Z 39.50 is a FAP).
This is a operations problem
Hierarchical storage management
Network management
Source code control
client-server tools.
You can BUY all this stuff. Fund COTS.
7
BUILD
AS
LITTLE
AS
POSSIBLE
Jim Gray
What California Proposed
0. Design for success: expect that millions will use the system (online)
1. DBMS centric design automates discovery, access, management
2. Object relational databases enable
Automate access to data so that the NASA 500,
Global Change 10,000 and Internet 20,000,000 can use system.
Cache popular results, not all results (saves 3x or more)
Compute on demand (saves lots of storage and cpu).
Emphasize pull processing rather than push processing.
Use parallelism to get scaleup.
Do Batch as a data pump
3 Be Smart Shoppers:
Use COTS hardware/software (saves 400M$)
Just-in-time acquisition (saves 400M$)
Use workstation not mainframe technology (gives 10x more stuff)
Depreciate over 3 years (ends in 2007 with "fresh" equipment)
4. 2 + N node architecture
2 Super DAACs for fault tolerance and for growth.
Unify the 2 "big" data storage centers with 2 big data analysis centers.
Allow many “little” peer-DAACs at science/user groups
8
Jim Gray
Design for Success: Expect Lots of Users
Expect that millions will use the system (online)
Three user categories:
NASA 500 -- funded by NASA to do science
Global Change 10 k - other dirt bags
Internet 20 m - everyone else
Grain speculators
Environmental Impact Reports
New applications
=> discovery & access must be automatic
Allow anyone to set up a peer-DAAC & SCF
Design for Ad Hoc queries, Not Standard Data Products
If push is 90%, then 10% of data is read (on average).
=> A failure: no one uses the data, in DSS, push is 1% or less.
=> computation demand is 100x Hughes estimate
9
Jim Gray
(pull is 10x to 100x greater than push)
The Process Flow
Data arrives and is pre-processed.
instrument data is
calibrated,
gridded
averaged
Geophysical data is derived
Users ask
for stored data
OR to analyze and combine data.
Can make the pull-push split dynamically
Pull Processing
Other Data
Push Processing
10
Jim Gray
The Software Model: Global View
SQL* is the FAP and API.
Applications use it to access data.
It includes
stored procedures
(so RPC)
GC class libraries
Computation is data driven
Gateways for other interfaces
HTTP, Z 39.50, Corba & COM
TP or TP-lite manages workflow
client program
other protocol
client program
SQL–
gateway
*
SQL–
COTS middleware
SQL–
SQL–
*
COTS SQL-*
local DBMS
on peerDAAC or
superDAAC
*
gateway
other protocol
non-SQL–
DBMS on
peerDAAC or
superDAAC or
non-EOSDIS system
*
11
Jim Gray
*
Automate access to data
Invest in:
Design global change schema.
cooperate with standards groups.
OR DBMS class libraries for GC datatypes
Develop browser to do resource discovery
Community will develop access & vis tools
OR DBMS will do
PUSH processing: triggers and workflow
PULL processing: query optimization.
(some assembly required).
12
Jim Gray
How Well Did SQL Work?
Bill Farrell and others did 30 user scenarios
schema, application, SQL, performance
Snow cover, CO2, GCM,...
Avg ad hoc scenario generated about 30% of
EOSDIS baseline processing
=> validated PULL over PUSH demand
SQL was indeed a power tool:
Many scenarios became a few simple SQL queries:
Need a spatial & temporal SQL.
Personal view:
It’s great!, much better than Farrell or I expected.
13
Jim Gray
Compute on demand
90% of data is NEVER used (according to Hughes).
Some data is used only once.
Data is often re-calculated
repair hardware/software bugs,
new & better algorithms
Optimization: store only popular data.
Compute this based on past use
(of this data and related data)
Balance two costs:
1. Re_Compute_Cost / Re_Use_Interval
2. Storage_Cost x Re_Use_Interval
Recompute is often cheaper (saves 3x we think).
14
Jim Gray
Use parallelism to get scaleup.
Many queries look at 100s or 1,000s of data tiles.
e.g. Berkeley weekly Landsat images since 1972.
= 1000 tape accesses.
= 4,000 tape minutes =
Done 1,000 way parallel: =
6 days.
4 minutes.
Disk & tape demands are huge: multi-GOX
Computation demands are huge: tera-ops.
Only solution:
Use parallel execution
Use parallel data access
SQL* does this for you automatically.
15
Jim Gray
Data Pump
Compute on demand small jobs
less than 1,000 tape mounts
less than 100 M disk accesses
less than 100 TeraOps.
(less than 30 minute response time)
For BIG JOBS scan entire 15PB database
once a day /week
Any BIG JOB can piggyback on this data scan.
DAAC in 2007:
10-TB RAM
500 nodes
1 PB of Disk 10,000 drives
Jim Gray
15 PB of Tape Robot
1,000 robots
16
What California Proposed
0. Design for success: expect that millions will use the system (online)
1. DBMS centric design automates discovery, access, management
2. Object relational databases enable
Automate access to data so that the NASA 500,
Global Change 10,000 and Internet 20,000,000 can use system.
Cache popular results, not all results (saves 3x or more)
Compute on demand (saves lots of storage and cpu).
Emphasize pull processing rather than push processing.
Use parallelism to get scaleup.
Do Batch as a data pump
3 Be Smart Shoppers:
Use COTS hardware/software (saves 400M$)
Just-in-time acquisition (saves 400M$)
Use workstation not mainframe technology (gives 10x more stuff)
Depreciate over 3 years (ends in 2007 with "fresh" equipment)
4. 2 + N node architecture
2 Super DAACs for fault tolerance and for growth.
Unify the 2 "big" data storage centers with 2 big data analysis centers.
Allow many “little” peer-DAACs at science/user groups
17
Jim Gray
Use COTS hardware/software (saves 400M$)
Defense contractors want to build (and maintain) stuff.
(they do it for the money)
Fund SQL* (SQL-2007): Object-Relational (extensible)
supports Global Change data types
Automates access
Reliable storage
Tertiary storage
Parallel data search (automatic)
Workflow (job control)
Reliable
Fund Operations software companies (Tivoli...)
18
Jim Gray
Use workstation technology (NOW)
Use workstation hardware technology,
not Super Computers
0.5$/MB of disk vs 30$/MB of disk
100$/MIPS vs 18,000$/MIPS
3k$/tape drive vs 50k$/tape drive
Processor, Disk, Tape ARRAYS: connected by ATM
a NOW
Gives 10x (?100x) more stuff for same dollars
Allows ad hoc query load
Allows a scaleable design
Allows same hardware: SuperDAACs = PeerDAACs
19
Jim Gray
Use workstation technology (NOW)
Study used RS/6000 and DEC 7000 as workstation
(they are 100k$/slice).
Should have used Compaq.
Price for 20GFlop, 24 TB disk, 2PB tape TODAY
Type of
platform
M ax
number
of days
to read
archive
Number
of data
servers
Number
of
comput
e
servers
WS/NTP
27.7
18
106
Vector/NTP
27.7
2
Compaq/dlt
1
100
Number
of tap e
robots
Data
server
cost
($M )
Disk
cache
cost
($M )
Archiv
e tape
cost
($M )
Compute
platform
cost
($M )
Total
hardware
cost
($M )
8
$29.5
$31.5
$12.8
$55.7
$129.5
4
8
$31.0
$31.5
$12.8
$128.0
$203.3
200
1000
$4
$12
$6
$8
$30
Compaq/DLT prices computed by Gray.
10% Peer DAAC costs 3M$ today, 1% Micro DAAC (200TB) costs 300K$
20
Jim Gray
Just-in-time acquisition (saves 400M$)
Hardware prices decline 20%-40%/year
So buy at last moment
Buy best product that day: commodity
Depreciate over 3 years so that facility is fresh.
(after 3 years, cost is 23% of original). 60% decline peaks at 10M$
EOS DIS Disk Storage Size and Cost
(assume 40% price decline/year and 10% on disk.)
5
10
10
10
10
Data Need TB
4
3
2
Storage Cost M$
10
1
Jim Gray 1994
21
1996
1998
2000
2002
2004
2006
2008
What California Proposed
0. Design for success: expect that millions will use the system (online)
1. DBMS centric design automates discovery, access, management
2. Object relational databases enable
Automate access to data so that the NASA 500,
Global Change 10,000 and Internet 20,000,000 can use system.
Cache popular results, not all results (saves 3x or more)
Compute on demand (saves lots of storage and cpu).
Emphasize pull processing rather than push processing.
Use parallelism to get scaleup.
Do Batch as a data pump
3 Be Smart Shoppers:
Use COTS hardware/software (saves 400M$)
Just-in-time acquisition (saves 400M$)
Use workstation not mainframe technology (gives 10x more stuff)
Depreciate over 3 years (ends in 2007 with "fresh" equipment)
4. 2 + N node architecture
2 Super DAACs for fault tolerance and for growth.
Unify the 2 "big" data storage centers with 2 big data analysis centers.
Allow many “little” peer-DAACs at science/user groups
22
Jim Gray
2+N DAAC architecture
2 Super-DAACs Have 2 BIG sites which
Each store ALL the data (back each other up)
no other way to archive these 15 PB databases
Each service 1/2 the queries and run a data pump
Each produces 1/2 the standard data products
Each has a BIG MIP farm next to the Byte farm
(a SCF science computation facility).
N Peer-DAACs
Each stores part of the data (got from a super DAAC)
Can be NASA sponsored or private.
Same software and hardware as Super-DAACs
Super-DAACs are “banks”, Peer-DAACs are “pubs”
Jim Gray
careful
23
anything goes
Minimize Operations Costs
Reduced sites (DAACs) have reduced costs
Use Mosaic, Email, Telephone user support model
Count on vendors to provide:
Network management (NetView & SMTP)
Data replication
Application software version control
Workflow control
Help desk software
More reliable hardware/software
24
Jim Gray
Unify data storage centers with data analysis
Data analysis (Science Computation Facilities)
need quick & high bandwidth access to DB.
WAN technology is good but not that good.
WAN technology is not free.
=> Co-Locate DAACs and SCFs.
=> two super SCFs, many peer SCFs.
Instrument teams often find a bug or new algorithm
=> reprocess all the base data to make new data set.
=> ripple effect to data consumers
=> must track data lineage.
25
Jim Gray
Budget
We had a VERY difficult time discovering a budget.
So we did our own.
It was less.
Big savings in operations and development
Hardware savings could give bigger DAACs
Hughes Sequoia
COTS
108
52
Development
207
50
Operations
260
120
Other
193
100
total
766
SCFs
312
DAACs
158
SuperTotal
1236
10-year costs
800
COTS
700
Operations
600
Development
500
Other
400
300
200
100
0
Jim Gray
Hughes
Sequoia
26
What California Proposed
0. Design for success: expect that millions will use the system (online)
1. DBMS centric design automates discovery, access, management
2. Object relational databases enable
Automate access to data so that the NASA 500,
Global Change 10,000 and Internet 20,000,000 can use system.
Cache popular results, not all results (saves 3x or more)
Compute on demand (saves lots of storage and cpu).
Emphasize pull processing rather than push processing.
Use parallelism to get scaleup.
Do Batch as a data pump
3 Be Smart Shoppers:
Use COTS hardware/software (saves 400M$)
Just-in-time acquisition (saves 400M$)
Use workstation not mainframe technology (gives 10x more stuff)
Depreciate over 3 years (ends in 2007 with "fresh" equipment)
4. 2 + N node architecture
2 Super DAACs for fault tolerance and for growth.
Unify the 2 "big" data storage centers with 2 big data analysis centers.
Allow many “little” peer-DAACs at science/user groups
27
Jim Gray
Challenging Problems
Design the Global Change Schema
Understand data lineage
Build discovery, analysis, visualization tools
Build an OR DBMS
Including
distributed,
parallel,
workflow
lazy-eager evaluation
tertiary storage,
SQL
workflow
Build a decent & reliable HSM
Build a way to operate a 1,000 node NOW.
28
Jim Gray
Download