Slides

advertisement
Top-k Searching
According to User Preferences
Based on Fuzzy Functions
with Usage of Tree-Oriented Data Structures
Efficient
Matúš Ondreička
Superised by Prof. Jaroslav Pokorný
Faculty of Mathematics and Physics
Department of Software Engineering
Charles University in Prague
Czech Republic
Research - outline

introduction



top-k problem, user preferences, fuzzy functions
related work
technical solutions

Tree-Oriented Data Structures






set of B+-trees
multidimensional B+-tree
multidimensional B+-tree with lists
MD-algorithm, MXT-algorithm
experiments, current results
motivation of future research
VLDB 2011 PhD Workshop
Matúš Ondreička
2
Top-k problem

top-k searching
 the (few) best k objects with more
 k objects with the highest ratting
 according to user preferences
 based on fuzzy functions

attributes
efficient top-k searching
 without accessing all the objects
 allow the full support of model of
 local preferences
 global preferences
VLDB 2011 PhD Workshop
Matúš Ondreička
user preferences
3
Model of user preferences

local preferences


objects are preferred according to one attribute
an attribute's domain is continuous
100% 1


fU(x): xA → [0, 1]
an attribute's domain is discrete


fU(x)
modeled with an fuzzy function
0% 0
0€
xA
1000€
evaluating of each value
ACER := 0.6, APPLE := 1.0, DELL := 0.9, SONY := 0.8
global preferences

objects are preferred according more attributes

modeled with an aggregation function
@U(x): ( f1U(x), ..., fmU(x) ) → [0, 1]

@U(x)
e.g. weighted average
VLDB 2011 PhD Workshop
Matúš Ondreička
w1 . f1U(x) + ... + wm . fmU(x)
=
w1 + ... + wm
4
Motivation and related work


XML, multimedia, the Web, etc.
relational databases




Ilyas, Beskales, Soliman: A survey of top-k query processing
techniques in relational database systems. 2008.
ranking functions
query optimalization
Fagin's algorithms




Fagin, R., Lotem, A., Naor, M.: Optimal aggregation algorithms for
middleware. Journal of Computer and System Sciences 66, 2003.
only support of a monotone ranking functions
based on sorted lists
no supporting of local user preferences

BASIC MOTIVATION FOR OUR RESEARCH
VLDB 2011 PhD Workshop
Matúš Ondreička
5
Usage of B+-tree

local user preference


by fuzzy function
0.2 0.5 0.8
on monotonous interval



moving in leaf level
‘’ways’’ in leaf level
continuously on all ‘’ways’’
 comparing objects on
different ‘’ways’’
 choosing the biggest on all
the ‘’ways’’

0.0 0.1 0.2
C
Q
U
w1
D
0.6 0.7 0.8
0.3 0.4 0.5
G
H
w2
R
Y
w3
S
E
0.9 1.0
T
K
w4
M
N
F
w5
1
obtaining objects

during the computation of
algorithm
 with ratings
 in descending order
by fuzzy function fU
VLDB 2011 PhD Workshop
0
0
0.1
Matúš Ondreička
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
6
Fagin's algorithms

TA (threshold algorithm) and NRA (no random access)


searches the best k objects
according to monotone aggregate function @
 without accessing all objects

preconditions



a set of objects X with values of m attributes A1, ..., Am
objects from the set X are stored in m lists L1, ..., Lm
lists contain pairs (x, ax)
 lists are sorted in descending order
 monotone aggregation function @

L1
L2
L3
(x1, 1.0)
(x2, 0.8)
(x3, 0.6)
(x4, 0.4)
(x5, 0.2)
(x6, 0.0)
(x3, 1.0)
(x4, 0.8)
(x6, 0.6)
(x1, 0.4)
(x2, 0.2)
(x5, 0.0)
(x1, 1.0)
(x4, 0.8)
(x3, 0.6)
(x5, 0.4)
(x2, 0.2)
(x6, 0.0)
multi-user solution
lists are based on B+-tree
algorithm can get pairs (x, fU(x))
from B+-tree sequentially
 in descending order according to
user's fuzzy function fU(x)


VLDB 2011 PhD Workshop
A1
A2
A3
B+-tree
B+-tree
B+-tree
x1 x2 x3 x4 x5 x6
x5 x2 x1 x6 x4 x3
x6 x2 x5 x3 x4 x1
Matúš Ondreička
7
Multidimensional B-tree

MDB-tree

allows to index set of objects by m > 1 attributes in one data structure
 m levels, values of one attribute are stored in each level
 nodes are B+-trees, whose leaf nodes are linked in two directions
A
B
C
D
E
F
G
H
I
J
K
A1
0.0
0.0
0.0
0.0
0.5
1.0
1.0
1.0
1.0
1.0
1.0
A2
0.4
0.4
1.0
1.0
0.9
0.0
0.0
0.0
0.4
0.7
0.7
VLDB 2011 PhD Workshop
A3
0.3
0.5
0.5
0.5
1.0
0.0
0.0
0.7
0.7
0.4
0.6
0.0 0.5 1.0
0.4 1.0
0.9
0.3 0.5
0.5
1.0
A
C
D
E
B
Matúš Ondreička
0.0 0.4 0.7
0.0 0.7
F
G
H
0.7
I
0.4 0.6
J
K
8
MD-algorithm


search the best k objects in a multidimensional B-tree (MDB-tree)
without getting all the objects
principle of MD-algorithm

MD-algorithm searches MDB-tree with the recursive procedure
 it uses the temporary list TK of the best actual k objects


it uses the best rating B(S) of B+-tree S


monotone aggregate function @
definition



analogically to Fagin’s TA-algorithm
B(S)=1+1= 2.0
B(S) of B+-tree S in i-th level of MDB-tree
B(S) = @(k1, ..., ki-1, 1, ..., 1)
0.4 0.7 0.8
example:
@(xA1,
xA2)=
xA1
+
B(S)=0.8+1= 1.8
xA2
0.3 0.6
1.0
0.3 0.5 0.8
B(S)=0.8+0.7= 1.5
A
VLDB 2011 PhD Workshop
Matúš Ondreička
B
C
D
E
F
G
H
9
Searching the best 3 objects
1
0
S1
0.8 1.0
f1U(x)
0.6
S2
S6
0.4 0.6 0.3
0.0 0.4
f2U(x)
S8
0.6 0.2 0.1
0.9 1.0 0.5
B(S3)=1.0+0.6+1= 2.6
1
0
0.3
0.0
rating
B(S2)=1.0+1+1= 3.0
1
0
TK object
1st
2nd
3rd
S7
0.8 1.0
1.0
f3U(x)
P
V
VLDB 2011 PhD Workshop
B
S4
S3
S5
S10
S9
0.5 0.9
0.1 0.5 0.6 0.2 0
1.0
0.8 1.0 0.7
0.5 0.7
2.1 2.2
2.2 1.8
1.8
B(S)=1.0+0.6+0.5=2.1
B(S)=1.0+0.6+0.6=
B(S)=1.0+0.6+0.2=
H
U
E
F
J
G
Z
M
M
Q
Q
Matúš Ondreička
C
X
A
O
I
Y
D
S
R
L
K
T
W
10
MXT-algorithm


based on integration of MD-algorithm and TA-algorithm
uses new data structure: multidimensional B+-tree with lists

first n attributes (nominal)
stored and searched in the same way as in MD-algorithm


last m - n attributes (ordinal)
stored as groups of m - n Fagin's sorted lists
searched by instances of Fagin's TA-algorithm


0.0
0.3
A2
1.0
0.2
A3 A4
A3
A4
A3
A2
A1
0.1
0.3
1.0
A3 A4
0.6
A3 A4
A2
0.6 0.7
A3 A4
A3 A4
0.4
A3 A4
A2
0.7
A3
A4
{x1, 1.0} {x3, 1.0}
{x2, 0.8} {x4, 0.7}
{x3, 0.5} {x6, 0.6}
{x4, 0.4} {x1, 0.3}
{x5, 0.2} {x5, 0.1}
{x6, 0.0} {x2, 0.0}
A4
VLDB 2011 PhD Workshop
A1
1.0
1.0
1.0
1.0
1.0
1.0
A2
0.7
0.7
0.7
0.7
0.7
0.7
A3
1.0
0.8
0.5
0.4
0.2
0.0
A4
0.3
0.0
1.0
0.7
0.1
0.6
... …
…
…
…
1.0
Matúš Ondreička
x1
x2
x3
x4
x5
x6
11
An example of results

implemented top-k algorithms




tests results


the number of obtained objects
real data






TA-algorithm, MD-algorithm, MXT-algorithm
using lists based on B+-trees implementation in Java
data structures have been tested in memory (not on disk)
8 822 flats for rent in Prague
||dom(District)|| = 10
||dom(Type)|| = 10
||dom(Area)|| = 229
||dom(Price)|| = 411
real user's preferences



user prefers flats of some types
in specific districts,
smaller prices and bigger areas
VLDB 2011 PhD Workshop
Matúš Ondreička
12
Motivation, future research

improvements of performance of algorithms

heuristics


improvement of data structures.



attribute dependencies between more attributes
similarity measures


to find k objects most similar to an object can be user preference
user feedback

After running of first top-k query user tune his/her preferences and execute next top-k query
different data models

very large data sets



tree-oriented data structure allow to dynamise the environment while solving a top-k problem
data streams


in MXT-algorithm construction, instances of TA-algorithm would be computed concurrently
different models of user preferences


automatic arrangement levels in MDB-tree with lists, manage empty values
parallel computing


to monitor a distribution of the key values in nodes
tree-oriented data structure as a sliding window
approximations, uncertain data, heterogeneous data
web environment

more information resources distributed on the web
VLDB 2011 PhD Workshop
Matúš Ondreička
13
An application TreeTopK
VLDB 2011 PhD Workshop
Matúš Ondreička
14
Thank You for attention!
VLDB 2011 PhD Workshop
Matúš Ondreička
15
Download