SciDB

advertisement
Jacek Becla
SLAC National Accelerator Laboratory
Jacek Becla
1
Outline
• LSST (challenges, baseline)
• SciDB
• Summary
Jacek Becla
2
Jacek Becla
3
LSST Database Challenges
• Scale
– ~50 billion astronomical objects
– ~1,000 observations per object
– At least 1 release per year
– 30-50 PB catalog
• Query complexity
– Spatial and temporal correlations
Jacek Becla
4
Database Challenges (II)
• ~Live updates, 1/3 of Object Catalog / 24h
• Provenance tracking, virtual data
• Would be nice…
– Better APIs
– Tighter integration with images
– Better support of data uncertainty
Jacek Becla
5
Baseline Architecture
• Shared nothing MPP database
– On top of open source DBMS (such as MySQL)
• Large tables partitioned
– With overlaps!
• Shared scans for high volume queries
• Heavily indexed for low-latency low-volume
queries
Jacek Becla
6
Qserv Architecture
Deploying for
wide use by
LSST science
collaborations
on ~10 TB data
set this year
Jacek Becla
7
Qserv Highlights
• Distributed processing (map) and result combine
(reduce) use off-the-shelf DBMS
• Built for analyses on immutable data sets
• Overlapping partitioning, fixed chunks,
1st level materialized, 2nd on the fly
• Have working prototype
–
–
–
–
–
Alpha state
Large scale tests (100+ nodes) underway
Shared scans planned for 1stQ next year
Beta planned mid next year
All open source
Jacek Becla
8
Open source DBMS for scientific research
Jacek Becla
Special thanks to Paul Brown and
Marilyn Matz (SciDB/Zetics) for help
with contents of the SciDB slides
9
Arrays Are the Basic Data Structure
CREATE ARRAY Reads
( Read::CHAR(1), Confidence::FLOAT )
[ SEQ_ID=0:*, POSITION=0:50 ]
Array has
• name - 'Reads' in this case
• element structure - a tuple of named, typed attributes
• dimensional specification - named whole number index
ranges
• Arrays can be sparse (missing values)
Jacek Becla
10
Data Model
– Cells can be tuples or other
arrays
• Time is an automatically
supported extra dimension
QuickTime™ and a
decompressor
are needed to see this picture.
QuickTime™ and a
decompressor
are needed to see this picture.
QuickTime™ and a
decompressor
are needed to see this picture.
QuickTime™ and a
decompressor
are needed to see this picture.
QuickTime™ and a
decompressor
are needed to see this picture.
QuickTime™ and a
decompressor
are needed to see this picture.
QuickTime™ and a
decompressor
are needed to see this picture.
QuickTime™ and a
decompressor
are needed to see this picture.
QuickTime™ and a
decompressor
are needed to see this picture.
QuickTime™ and a
decompressor
are needed to see this picture.
QuickTime™ and a
decompressor
are needed to see this picture.
QuickTime™ and a
decompressor
are needed to see this picture.
QuickTime™ and a
decompressor
are needed to see this picture.
QuickTime™ and a
decompressor
are needed to see this picture.
QuickTime™ and a
decompressor
are needed to see this picture.
QuickTime™ and a
decompressor
are needed to see this picture.
QuickTime™ and a
decompressor
are needed to see this picture.
QuickTime™ and a
decompressor
are needed to see this picture.
Dimension 2
• Nested multi-dimensional
arrays
QuickTime™ and a
decompressor
are needed to see this picture.
(0,0)
a1 = 3
a2 = 6.52
a3 =
(5,2)
Dimension 1
– No overwrites
• Extensible type system
• Ragged arrays allow each row/column
to have a different dimensionality
Jacek Becla
11
Data Model: More Good Stuff
• Support for multiple flavors of ‘null’
– Array cells can be ‘EMPTY’
– User-definable, context-sensitive treatment
• Native support for uncertain or probabilistic data
– Error bars, range data, normal probability distribution
functions
• Support for provenance and named versions
– Records lineage of derived data
Jacek Becla
12
Architecture
• Shared nothing cluster
–
–
–
–
Application
10’s–1000’s of nodes
Commodity hardware
TCP/IP between nodes
Linear scale-up
• Each node has a processor and storage
• Queries refer to arrays as if not
distributed
• Query planner optimizes queries for
efficient data access & processing
• Query plan runs on a node’s local
executor & storage manager
• Runtime supervisor coordinates execution
Jacek Becla
Application Layer
Java, C++, …
Language Specific UI
Doesn’t require
JDBC, ODBC
Runtime Supervisor
Server Layer
Query Interface and Parser
AQL an extension
of SQL
Also supports UDFs
Plan Generator
Node 3
Node 2
Node 1
Storage Layer
Local Executor
Storage Manager
13
Array-oriented Storage Manager
• Optimized for both dense and
sparse array data
– Different data storage, compression,
and access
• Arrays are “chunked”
(in multiple dimensions)
• Chunks are partitioned across a collection of nodes
• Chunks have ‘overlap’ to support neighborhood operations
• No overwrite
• Replication provides efficiency and back-up
• Shared scans provide efficient I/O
• In-situ data, including netCDF and HDF5
Jacek Becla
14
Internals - 3 steps
1. Vertical Partitioning into multiple single-attribute arrays.
Compressed.
Jacek Becla
15
Internals
2. Divide the single-value array into overlapping
partitions or chunks
Jacek Becla
16
Internals
3. Assign chunks to physical nodes in a
massively parallel architecture
Jacek Becla
17
Query Language & Operators
•
Array Query Language -- AQL
– Declarative SQL-like language extended for working with array data
– Not currently available: coming in next release
•
Array Internal Language -- AIL
– Functional language for internal implementation of the operators
– Available now
•
Array Specific Operators
– Aggregate, Apply, Filter, Join, Regrid, Subsample, et al
•
Linear Algebra, Statistical, and other applied math operators
– Matrix operations, clustering, covariance, linear and logistic regression, et al
•
Extensible with Postgres-style user-defined functions
– e.g., to ‘cook’ images, perform customized search, FFT, feature-detect
•
Interfaces to other open source packages
– MatLab, R, SAS
Jacek Becla
18
Array Query Language (AQL)
CREATE ARRAY Test_Array
< A: integer NULLS,
B: double,
C: USER_DEFINED_TYPE >
[I=0:99999,1000, 10, J=0:99999,1000, 10 ]
PARTITION OVER ( Node1, Node2, Node3 )
USING block_cyclic();
attribute
names
A, B, C
index names
I, J
chunk
size
overlap
10
1000
Jacek Becla
19
Array Query Language (AQL)
SELECT Geo-Mean ( T.B )
FROM Test_Array T
WHERE
T.I BETWEEN :C1 AND :C2
AND T.J BETWEEN :C3 AND :C4
AND T.A = 10
GROUP BY T.I;
User-defined aggregate on
an attribute B in array T
Subsample
Filter
Group-by
So far as SELECT / FROM / WHERE / GROUP BY queries are
concerned, there is little logical difference between AQL and SQL
Jacek Becla
20
Matrix Multiply
CREATE ARRAY TS_Data < A1:int32, B1:double >
[ I=0:99999,1000,0, J=0:3999,100,0 ]
10K x 4K
multiply (project (TS_Data, B1),
transpose(project (TS_Data, B1)))
Project
on one
attribute
• Smaller of the two arrays is replicated at all nodes
– Scatter-gather
• Each node does its “core” of the bigger array with
the replicated smaller one
• Produces a distributed answer
Jacek Becla
21
User-defined functions (UDFs)
• Will run in parallel
• Enables
– Operations not yet natively
supported
– Image processing
– Domain-specific custom operations
Jacek Becla
22
Traditional RDBMS vs Arrays
• Data model
– In many places, n-d arrays better than tables
• Simulating arrays on top of tables costs ~x100
– Locality, adjacency is natural in n-dimensional space
– Tracking uncertainty, versioning, updates, units, etc.
becomes just another dimension
• Operations
– Need array operators and parallel user-defined-functions
more than SQL
– Think regrid, smooth, transpose, matrix multiply,
curve fitting, covariance, not join
Jacek Becla
23
Analytics in SciDB
• Advanced
– Matlab, SAS, R, S+ like functionality
• Scalable: disk based, massively parallel
– Does not fall off performance cliff
when main memory exceeded
– Add resources to do more work
• Integrated with data mgmt
– Data not cycled between db
and analytics engine
Jacek Becla
24
SciDB Team
• ~25
– including world-class database pioneers
• Mostly volunteers from academic,
science and industrial communities
Jacek Becla
25
SciDB POCs
• LSST (demo’ed @VLDB)
• Quantitative finance
• Genomic sequencing
• Tests with LHC Atlas tag data
Jacek Becla
26
Just For Fun
•
Given densely populated
10K x 10K array:
CREATE ARRAY
Big < A1:int32,
B1:double,
C1:char >
[ I=0:9999,1000,0,
J=0:9999,1000,0 ]
•
"What is the probability the
value of "B1" increases from
J-1 to J, over a small range
of [ I -> I, J->J ], when the
C1 is the same for the next
and previous?"
Jacek Becla
apply (
join (
count (
filter (
project (
filter (
apply (
join (
subsample ( Big , 5000, 5000, 5500, 5500 ) AS Ar1,
subsample ( Big , 5000, 4999, 5500, 5499 ) AS Ar2
),
'Diff',
Ar1.B1 - Ar2.B1
),
Ar1.C1 = Ar2.C1
)
'Diff'
),
Diff > 0
)
) AS N,
count (
project (
filter (
apply (
join (
subsample ( Big , 5000, 5000, 5500, 5500 ) AS Ar1,
subsample ( Big , 5000, 4999, 5500, 5499 ) AS Ar2
),
'Diff',
Ar1.B1 - Ar2.B1
),
Ar1.C1 = Ar2.C1
),
'Diff'
),
) AS D
),
'P',
abs(N.count) / abs(D.count) )
27
Roadmap
•
R0.5 available: very early stage “work-in-progress”
– Basic functionality
• iquery interpreter
• Internal query language - AIL
• Small set of operators - create, aggregate, join, filter, subsample
– Minimal ‘wiki-style’ documentation
– Breathes, crawls, hiccups
•
R0.75 targeted for end-of-year
–
–
–
–
•
AQL - the SQL-like array query language
Error handling
Scalable math operations
Better documentation & stability
Linux only
R1.0 beta in April 2011
– More functionally complete (UDFs, uncertainty, provenance et al)
– Robust, high performance
Jacek Becla
28
SciDB and LSST
• SciDB inspired by LSST needs
– Promises to address all key challenges
– And more…
• Detailed evaluation planned (R0.75 or R1.0)
• Off-the-shelf, open source, well supported
much more attractive than maintaining a
custom-built solution over the long term
Jacek Becla
29
Summary
• Baseline LSST – shared nothing architecture
on top of open source DBMS
• SciDB a new promising option
– Shared nothing, scalable
– Focuses on complex analytics,
spatial and temporal correlations
– Strong team behind it
• …and rapidly growing interest!
– More at http://scidb.org
Jacek Becla
30
Download