VO meeting
Peter Buneman
University of Edinburgh
1 July, 2003
1 July, 2003
The textbook account is that rows are stored contiguously:
Name Age Shoesize
Joe 11
Fred 56
Jane 35
Sally 25
Roddy 44
Lori 42
8
11
8
8
11
7
Lori 42 7
Jane 35
Joe 11
Sally 25
8
8
8
Disk pages
Roddy 44
Fred 56
11
11
VO meeting 1 July, 2003
An alternative is to store the columns contiguously:
Name Age Shoesize
Joe 11
Fred 56
Jane 35
Sally 25
Roddy 44
Lori 42
8
11
8
8
11
7
Joe
Fred
Jane
Sally
Roddy
Lori
11
56
35
25
44
42
8
11
8
8
11
7
Batory ’79; Copeland and Khoshafian ’85; Boncz, Wilschut and Kersten ’98;
Ailamaki, DeWitt, Hill and Skounakis ’01.
VO meeting 1 July, 2003
• Projections come for “free”. E.g.
SELECT SingleField
FROM FiftyColumnTable requires 2% of the i/o.
• The received wisdom on joins is that they take 3-5 scans through the table.
We need only scan the join and “output” fields.
• The storage per column is smaller. Conventional techiques use “sparse” placement on pages.
' 3 × storage used for textual dump.
• Much better chance of getting entire columns into main memory.
VO meeting 1 July, 2003
• Bad on highly selective queries
SELECT *
FROM FiftyColumnTable
WHERE KeyValue = 123456 requires 50 page accesses rather than one (in addition to indexing.)
• Deletions are a disaster!!
• Other transactions may be more expensive, e.g. tuple locking.
VO meeting 1 July, 2003
• Vertical partitioning (decomposing into “narrower” tables) is commonplace.
• Vectorized implementations have been developed.
– SyBase IQ (?)
– Three companies on Wall street. Two are “shipping” products.
VO meeting 1 July, 2003
• Good for combining with numerical/array processing. (Order of rows may be significant.)
• Easy to add/drop columns (schema evolution)
• Keys/foreign keys are “compiled” into indexes.
• Wall Street systems provide both SQL and APL interfaces to data.
– Excellent research topic: combine array processing optimization and database query optimization.
VO meeting 1 July, 2003
Note that XML does not support updates and transactions (at least we don’t know how to define them.)
Many scientific databases are also low transaction and write-mostly.
At first sight XML is irregular, and it’s not clear how to “vectorize” it. One possibility is to use the DTD to find table-like fragments.
Another is to use the Liefke-Suciu decomposition – originally designed for compressing XML...
VO meeting 1 July, 2003
h recipe-book i h recipe i h contributor i Annie h /contributor i h name i Salsa h /name i h comment i
Peter h i i loves h /i i this stuff name contributor h /comment i
Salsa Annie h ingredient i h name i tomatoes h /name i h qty i 1kg h /qty i h /ingredient i h ingredient i
Peter h name i onions h /name i h qty i 200g h /qty i h /ingredient i h /recipe i h /recipe-book i recipe−book recipe comment i loves ingredient ingredient this stuff name qty tomatoes 1kg name onions qty
200g
VO meeting 1 July, 2003
recipe−book 1 recipe name
2
3 contributor comment i ingredient qty
4
5
6
7
8
Salsa
1/2/3 loves
1/2/5/6
Annie
1/2/4 tomatoes onions
1/2/7/3
Peter this stuff
1/2/5
1kg
200g
1/2/7/8
(a) The tag map
(b) The data files recipe−book 1 recipe 2 name 3 contributor 4 comment 5 ingredient 7 ingredient
1 2 3 # + 4 # + 5 # 6 # + # + 7 3 # + 8 # + + 7 3 # + 8 # + + +
(c) The skeleton
Salsa Annie Peter i 6 this stuff name 3 qty 8 name 3 qty 8 loves tomatoes 1kg onions 200g
VO meeting 1 July, 2003
The LS decomposition was designed for compression . It showed that better compression usually was obtained by compressing the “columns” individually.
Claim: it is also useful for querying.
Prior claim: it efficiently supports the SAX API
• The data is parsed. Checks for well-formedness have been performed. The
“identity” program runs 5 times faster on LS.
• One can build a lazy SAX parser. Files are not read until needed.
Simple “select/project” queries on downwards paths can be implemented with lazy SAX:
FOR $X IN DB/P
WHERE $X/A = "blahblah"
RETURN $X/B
VO meeting 1 July, 2003
Baseball
Shakespeare
Source Lazy Skeleton
672 106
7646 561
85
539
Baseball
Shakespeare
Source Lazy Skeleton
66
2139
11
40
1.3
31
Uncompressed
All sizes in kb
Compressed
Baseball query: names of all players with ERA > 0.3 (or something)
Shakespeare query: titles of all plays in which Falstaff appears.
NB. Compression of baseball skeleton shows much more regularity than indicated by DTD.
VO meeting 1 July, 2003
(with Martin Grohe and Christoph Koch)
• The skeleton can be quite large.
• We’d like it to fit into main memory.
• We’d like to compress it in a “query-friendly” fashion
Idea: recognize common “subexpressions”. Essentially the same idea as is used in symbolic model checking for ordered binary decision diagrams.
VO meeting 1 July, 2003
VO meeting
bib book paper paper title author author author title
(a) author title author book bib paper book
(3) bib
(2) paper title
(b) author title
(c) author
1 July, 2003
• In this example we have left out the “text here” stubs. In most DB examples text is always present or always absent, so the compression is no worse if they are added.
• The maximum compression (with edge cardinalities) is log(log n )
• In practice, relational databases under the vanilla XML encoding compress to a small constant size (independent of the number of tuples).
VO meeting 1 July, 2003
We represent a tree as a set of vertices V and a function γ : V → V the ordered sequence of children.
∗ that gives
We require the graph of γ to be acyclic and have a single root.
Write v → w if w is the i th child of v .
A schema is a set σ of unary relation names.
An instance I of σ consists of a tree ( V, γ ) together with a function from σ that associates a subset S n of V with each element n of σ
VO meeting 1 July, 2003
S bib
1 v
1
2
S book
1 v
2
2
4
3
S title v
3
(a)
3
1 v
4
2
S paper v
5
S author
S bib
1 w
1
2 3
S book
1 w
2
2
3
4
S title w
3
1 w
4
2 w
5
(b)
Note: edge numbers here indicate child order.
1
S paper
2 w
6
S author w
7
VO meeting 1 July, 2003
A bisimilarity relation on a σ -instance I is an equivalence relation ∼ on V s.t. for all v, w ∈ V with v ∼ w we have
• for all i , if v ∼ w and v v 0 ∼ w 0 , and
→ v 0 then there exists w 0 ∈ V s.t.
w → w 0 and
• for all n ∈ σ :
¡ v ∈ S n
⇐⇒ w ∈ S n
¢
.
VO meeting 1 July, 2003
For any σ -instance I there is a unique (up to node relabelling) minimal instance, and this can be computed in linear time.
A σ -instance and τ -instance are compatible if, when restricted to the common names σ ∩ τ , they are compatible.
A common extension of compatible σ - and τ -instances can be computed in quadratic time. (Product automaton construction.) The algorithm is linear in the size of the output.
VO meeting 1 July, 2003
VO meeting a a a a
Original b b b b
b a a b a b a a b a b a b a a a b a b a b a b a a b a
* b a
//a b b a a/a b a b a b a a b a a a b a b b a a b b b b a b a b a b a b b a b b a
*/a b a
//a/b b b a a/a/b b a b a
*/a/following::* b
1 July, 2003
Let Q be a Core XPath query and I a compressed instance. Then, Q can be evaluated on I in time O (2 | Q | ∗ | I | ) .
It is possible to find pathological examples for which the expansion is exponential.
However, the time is linear in the output, which is bounded by the original tree.
VO meeting 1 July, 2003
SwissProt
(457.4 MB)
DBLP
(103.6 MB)
TreeBank
(55.8 MB)
OMIM
(28.3 MB)
XMark
(9.6 MB)
Shakespeare
(7.9 MB)
Baseball
(671.9 KB)
TPC-D
(287.9 KB)
| V
T
10,903,569
| | V
M ( T )
83,427
| | E
M ( T )
792,620
|
| E
M ( T )
|
| E
T
|
7.3 % −
85,712 1,100,648 10.1 % +
2,611,932 171,820
222,755
6.6 % −
8.5 % +
2,447,728 323,256 853,242 34.9 % −
475,366 1,301,690 53.2 % +
206,454
321
4481
962
975
11,921
14,416
5.8 %
7.0 %
−
+
190,488
179,691
28,307
11,765
3,642
6,692
1,121
1,534
26
83
15
53
11,837 6.2 % −
27,438 14.4 % +
29,006 16.1 % −
31,910 17.8 % +
76
727
0.3 % −
2.6 % +
161
261
1.4 % −
2.2 % +
(tags ignored: “ − ”; all tags included: “ + ”)
VO meeting 1 July, 2003
The skeletons of arrays and tables are trivial!
10 9
Object
Id RA DEC
. . .
300
Descriptive stuff
300
VO meeting
TABLE
10 9
TR
300
TD
1 July, 2003
SwissProt
(457.4 MB)
DBLP
(103.6 MB)
TreeBank
(55.8 MB)
OMIM
(28.3 MB)
Q
3
Q
4
Q
5
Q
1
Q
4
Q
5
Q
1
Q
2
Q
2
Q
3
Q
4
Q
5
Q
5
Q
1
Q
2
Q
3
Q
1
Q
2
Q
3
Q
4
(1) parse time
56.921s
56.661s
64.971s
79.279s
60.036s
8.805s
8.795s
10.954s
14.056s
13.866s
8.942s
8.961s
9.647s
11.370s
7.883s
1.363s
1.380s
1.669s
2.085s
2.098s
| V
(2) bef.
M ( T )
|
84,314
84,314
84,166
84,071
84,480
1,246
1,246
2,469
2,191
2,191
349,229
349,229
357,254
348,582
350,671
963
963
977
1,030
1,023
(3)
| E bef.
M ( T )
|
796,059
796,059
798,354
808,771
814,307
176,280
176,280
187,761
188,368
188,368
913,743
913,743
938,785
912,549
917,197
13,819
13,819
13,893
14,766
12,243
(4) query time
1.748s
1.783s
1.664s
2.627s
2.825s
0.137s
0.136s
0.146s
0.313s
0.325s
8.884s
9.048s
4.659s
4.234s
9.910s
0.011s
0.011s
0.008s
0.016s
0.017s
(5) after
| V Q
( M ( T ))
|
84,314
84,344
84,184
84,071
84,999
1,246
1,265
2,469
2,196
2,200
349,229
362,662
361,222
348,582
364,141
963
964
977
1,042
1,024
(6) after
| EQ
( M ( T ))
|
796,059
796,087
798,371
808,771
815,281
176,280
176,302
187,761
188,368
188,368
913,743
945,576
948,205
912,549
948,170
13,819
13,819
13,893
14,781
12,243
(7)
#nodes sel. (dag)
1
1
106
1
3
740
202
9
249
18
1
1
1
1
1
1
1
1
1
4
(8)
#nodes sel. (tree)
1
249,978
46,679
1
991
1
100,313
32
3
3
1
1,778
203
9
624
1
8,650
26
3
4
VO meeting 1 July, 2003
XMark
(9.6 MB)
Shakespeare
(7.9 MB)
Baseball
(671.9 KB)
Q
4
Q
5
Q
1
Q
2
Q
3
Q
4
Q
5
Q
5
Q
1
Q
2
Q
3
Q
1
Q
2
Q
3
Q
4
(1) parse time
1.160s
0.810s
0.839s
0.844s
1.053s
1.457s
0.792s
0.894s
1.050s
0.958s
0.082s
0.082s
0.083s
0.116s
0.090s
| V
(2) bef.
M ( T )
|
3,780
3,780
3,755
3,733
4,101
1,520
1,520
1,560
1,586
1,194
26
26
46
1,215
48
(3) bef.
| E
M ( T )
|
11,993
11,993
13,578
14,747
12,639
31,048
31,048
31,253
31,364
29,418
76
76
805
14,413
870
(4) query time
0.074s
0.439s
0.033s
0.042s
0.061s
0.054s
0.055s
0.038s
0.046s
0.045s
0.001s
0.001s
0.001s
0.023s
0.003s
(5) after
| V Q
( M ( T ))
|
3,780
3,877
3,755
3,750
4,410
1,520
1,551
1,564
1,586
1,235
26
30
46
1,226
53
(6) after
| EQ
( M ( T ))
|
11,993
12,168
13,578
14,841
13,171
31,048
31,105
31,254
31,364
29,497
76
76
805
14,413
892
(7)
#nodes sel. (dag)
1
13
661
38
5
2
57
1
2
14
1
47
1
1
1
(8)
#nodes sel. (tree)
1
39
1,083
47
5
1
106,882
851
235
67
1
1,226
276
47
58
VO meeting 1 July, 2003
Preceding techniques work for languages (e.g. XPath) that select nodes from the XML tree.
What about languages (e.g. XQuery) that construct new nodes?
What about highly selective queries, e.g.
FOR $X IN DB/P
WHERE random() < 0.001
RETURN $X
What about joins?
VO meeting 1 July, 2003
On simple selections, the “edge cardinality compression” goes wrong.
10 8
1
2 1
3
. . .
.
Rather than modify the skeleton, generate a new skeleton and new data files.
Does this avoid exponential blow-up? – Not always
VO meeting 1 July, 2003
0
0
0
(a)
1
1
1
1
1
1
0
0
1
1
1
Path //1/*/*/*
But what is the story in pratice?
0
1
1
1
1
0
1
1
1
1 0
1
1
1
1
(b)
0
1
1
1
1
1
1
1
1
0
1
1
1
1
1
1
1
1
VO meeting 1 July, 2003
Byron Choi and Rob Hutchison
We want to evaluate highly selective queries
FOR $X IN DB/P
WHERE $X/keyfield=123456
RETURN $X/field1, $X/field2, ...
without scanning the data files for field1 , field2 . . . from the beginning.
How do we make the equivalent of a B-tree index work in a vectorized representation?
VO meeting 1 July, 2003
The skeleton is represented by a set S of tree addresses – sequences of integers.
Find a marking function M : [0 , n ] → S with the property that
• it is monotone w.r.t. lexicographic order, and
• it is “evenly distributed”
Example. For
1
2 1
3
10 8 an excellent choice is M ( i ) = 1 .ki.
1 where kn = 10 8
Now place markers at corresponding positions in the data files.
We can use an index to “fast forward” to the appropriate block of the data file
VO meeting 1 July, 2003
Id RA DEC
. . .
300
Object
. . .
Id RA DEC
. . .
300
Object
Annotation
10 9
Object
. . .
Object
300
Id RA DEC
. . .
Annotation
300
Vectorized/conventional efficiency on queries on the “array-like” subset?
VO meeting 1 July, 2003
None yet.
Recent work by Byron Choi in XQuery implementations of astronomers’ queries on VOTable (astronomical XML data.) Not yet using indexing techinques discussed.
Lots more to look at. Particularly interesting is the interaction between DB
(query) and scientific (array processing) optimization.
Any ideas?
VO meeting 1 July, 2003