Towards Efficient Dominant Relationship Exploration Zhenglu Yang

advertisement
Towards Efficient Dominant Relationship Exploration
of the Product Items on the Web
Zhenglu Yang and Lin Li and Botao Wang and Masaru Kitsuregawa
Dept. of Info. and Comm. Engineering, University of Tokyo
4-6-1 Komaba, Meguro-ku, Tokyo, Japan 153-8505
{yangzl, lilin, botaow, kitsure}@tkl.iis.u-tokyo.ac.jp
Abstract
In recent years, there has been a prevalence of search
engines being employed to find useful information in
the Web as they efficiently explore hyperlinks between
web pages which define a natural graph structure that
yields a good ranking. Unfortunately, current search
engines cannot effectively rank those relational data,
which exists on dynamic websites supported by online
databases. In this study, to rank such structured data
(i.e., find the “best” items), we propose an integrated
online system consisting of compressed data structure
to encode the dominant relationship of the relational
data. Efficient querying strategies and updating scheme
are devised to facilitate the ranking process. Extensive
experiments illustrate the effectiveness and efficiency of
our methods. As such, we believe the work in this paper
can be complementary to traditional search engines.
Figure 1: Digital camera example
maximal vector problem in database context.
Efficient skyline querying methodologies have been studied
extensively (i.e., (Kossmann, Ramsak, & Rost 2002;
Papadias et al. 2003)). All of these works, however,
concerned only the pure binary relationship, i.e., a product
item p is whether or not worse than (dominated by) others.
Interestingly, Li et al. (Li et al. 2006) proposed to analyze
a more general dominant relationship in a business model,
that users preferred the details of the dominant relationship
more, i.e., an item p dominates (and vice versa, is dominated
by) how many other items. Here we show an example.
Introduction
Example 1: Consider you are a manager of Sony corporation. You want to know the business position of a digital
camera b (Cyber-shot DSC-H5/B) in the current market with
regard to your preference, i.e., price and weight, by checking how many other items are better/worse than b and what
they are. For the sample cameras shown in Fig. 1, you can
deduct the conclusion that item b is better than items d and e
but worse than items a and c with regard to your preference1 .
Suppose you are buying a digital camera from a website (i.e.,
www.bestbuy.com) and you are looking for one that is cheap
and with high sensor resolution to take beautiful pictures.
Unfortunately, these two goals are complementary to one
another as cameras with more megapixels tend to be more
expensive. Moreover, you would appreciate the search if
only a few “best” goods are recommended by the website’s
system with regard to your preference, instead of checking
all the items manually. Intuitively, these “best” goods are
all cameras that are not worse than any other camera in both
attributes, i.e., price and sensor resolution. For instance, Fig.
1 shows some sample cameras and among them, only the
items c (PowerShot A630) and d (EOS Digital Rebel XTi)
should be the candidates recommended to the user.
This problem of searching for such “best” items, can be
traced back to the 1960s in the theory field. The set of these
“best” items is called the Pareto set and the objects are called
maximal vectors (Bentley et al. 1978). However, these main
memory algorithms are inefficient for online query processing, due to the frequently updated data.
Recently, the skyline operator (Borzsonyi, Kossmann, & Stocker 2001) was proposed to tackle the
The dominant relationship analysis in Example 1 plays an
important role in business decisions. Unfortunately, the existing work cannot tackle the dominant relationship analysis
in a dynamic environment (i.e., the Web). Relationships are
recomputed naively each time upon update, resulting in extreme inefficiency with the increase of online databases. In
this paper we aim to efficiently construct an integrated system to solve the issue in frequently updated environments.
The keyword-based search engines (i.e., Google) cannot
be employed here because they lose effect when tackling
relational data. For the purposes of this paper, we assume
the attribute sets of products are available in structured format (i.e., Fig. 1). This assumption is reasonable due to
1
Note that the analysis here can be further used to determine the
price of a new product, which should be competitive in the current
market while reserving the most profit.
c 2007, Association for the Advancement of Artificial
Copyright Intelligence (www.aaai.org). All rights reserved.
1483
Related work
Deep Web. The framework of this paper can be seen
as complementary to the information retrieval systems
for Deep W eb (Bergman 2001), whose information is
buried far down on dynamically generated websites, and
cannot been found by standard search engines (i.e, Google).
Specifically, our work is a postprocess of data extraction
(Kushmerick, Weld, & Doorenbos 1997), which uses
wrapper softwares to convert the unstructured web data into
structured format data.
Figure 2: Product attributes in 2-dimensional space and the
corresponding partial order
the existence of the techniques that convert unstructured text
into structured tables (i.e., wrapper programs (Kushmerick,
Weld, & Doorenbos 1997)).
We found the interrelated connection between the dominant relationship and the partial order (Yang, Wang, & Kitsuregawa 2007). To illustrate the idea, here we show a simple example. Fig. 2 (a) represents the product items in two
attributes (dimensions) space, i.e., price and weight. Fig.
2 (b) is the corresponding partial order (encoded as DAG
format). We can know that item b dominates items d and
e and is dominated by items a and c, by checking the outlink and in-link of b, respectively. However, (Yang, Wang,
& Kitsuregawa 2007) does not deal with dominant relationship analysis in dynamic environments. Furthermore, how
to construct partial order representation of spatial datasets
is not tackled. In this paper we aim to explore how to efficiently find, update and query such succinct representative
partial orders with formal justification.
Different users may have different preferences. To cope
with this problem, we propose a compressed data cube to
encode all the possibilities in a concise format and devise
efficient strategies to extract the information.
There is another methodology, faceted search (Yee et al.
2003), that can be employed to support shopping by exploring different attributes of the products. Faceted search combines navigational and direct search to leverage the best of
both approaches. However, it does not utilize the property
of the numerical attributes, i.e., partial order, to refine the
results. Our contributions in this paper are as follows:
Maximum vector and Skyline. The maximum vector problem (Kung, Luccio, & Preparata 1975) is a special case of
dominant relationship analysis. There are some other related
issues, i.e., convex hull (Preparata & Shamos 1985). The
skyline query (Borzsonyi, Kossmann, & Stocker 2001) was
proposed to tackle the maximal vector problem in database
context. However, most existing works concerned only the
pure dominant relationship and, output those items which
are not “dominated” by others. In contrast, Li et al. (Li et al.
2006) proposed to analyze a general dominant relationship
from a microeconomic aspect in static environments.
In real dynamic environments (i.e., the Web), the attribute
values are always updates, which makes it very difficult
to analyze the dominant relationship. Moreover, users
are always interested in not only “how many” items are
dominating/dominated by a specific item, but also “whom”
they are. These issues cannot be efficiently tackled by using
existing methodologies.
Partial order mining. Partial order has appeared in many
computational models (i.e., concurrent models (Lamport
1978)). In this paper, we mainly consider the problem of
converting the spatial dataset into partial order representation, which are then queried to get the dominant relationship
efficiently. To our best knowledge, there has been no work
on this problem. An interesting study investigated the problem of mining a small set of partial orders globally fitting
data best (Mannila & Meek 2000). More recently, CasasGarriga et al. (Casas-Garriga 2005) was intended for discovering several small partial orders from a set of sequences
instead of only one that describes all or most of the set. Yet
different from this paper, they did not further explore the partial orders for a specific purpose (i.e., dominant relationship
extraction).
• We formally justify the interrelated connection between
the dominant relationship and the partial order analysis.
We propose effective methods to construct a data cube,
P arCube, which concisely represents the dominant relationship as partial order representation (DAGs).
• Based on P arCube, we efficiently construct an online
query system, which employs an effective update scheme
to maintain P arCube in a dynamic environment.
• We introduce efficient query processing strategies on top
of P arCube to answer the general dominant relationship
queries in a dynamic environment.
• We conduct comprehensive experiments to confirm the efficiency of our strategies by using synthetic and real data.
Preliminaries
Given a d-dimension (attribute) space S={s1 , s2 , . . . , sd },
a set of product items D={p1 , p2 , . . . , pn } is said to be a
dataset on S if every pi ∈ D is a d-dimensional item on S.
We use pi .sj to denote the j th dimension value of item pi .
For each dimension si , we assume that there exists a total
order relationship. For simplicity and without loss of generality, we assume smaller values are preferred. Moreover,
we assume that the distinct value condition holds (Yuan et
al. 2005; Pei et al. 2005).
A partial order on D is a binary relation on D such
that, for all x,y,z ∈ D, (i) x x (reflexivity), (ii) x y
The remainder of this paper is organized as follows. After
discussing the related work, we introduce the preliminaries
and then propose several strategies to efficiently construct,
query and maintain an integrated system for dominant relationship analysis. The performance analysis is reported in
Section 5. We conclude the paper in Section 6.
1484
Table 2: Constructing P arCube algorithm
Table 1: Sample Product Items Dataset
D1
D2
D3
a
2
1
3
b
3
4
1
c
1
3
6
d
5
5
2
e
6
6
5
f
4
2
4
INPUT:
OUTPUT:
1.
2.
3.
Spatial dataset D
Data cube P arCube
Convert D to the corresponding sequence dataset D // process 1
Apply PrefixSpan (Pei et al. 2001) to get the sequential patterns from D , merge them to the local maximal sequential
sequences as SeqCube
// process 2
Apply the algorithm proposed in (Casas-Garriga 2005) to get the
partial order representation (DAGs) as P arCube
// process 3
this system: 1) System construction (symbolized as blue
color); 2) Query processing (symbolized as red color); and
3) System updating (symbolized as yellow color).
Figure 3: The proposed system based on the partial order
data cube (ParCube) (this figure is best viewed in color)
System Construction
We propose to apply strategies from another research context, sequential pattern mining (Agrawal & Srikant 1995),
to get the partial order representation cube (P arCube) from
a spatial dataset. There are three processes for P arCube
construction.
The first process is to convert the spatial dataset to the sequence dataset. With a k-dimensional dataset, we simply get
a k-customer sequence dataset, by sorting the objects in each
customer (dimension) according to their value in ascending
order. For example, Fig. 4 (b) shows the converted sequence
dataset of the spatial dataset in Fig. 4 (a). Note the specific
values of these points on k-dimension are resident on the
disk. Efficient management methods (i.e., R-tree (Guttman
1984)) are employed, as will be explained in this paper.
and y x imply x=y (antisymmetry), (iii) x y and y z
imply x z (transitivity). We use (D,) to denote the
partial order set (or poset) of D. We denote by ≺ the strict
partial order on D, i.e., x ≺ y if x y and x = y. Given
x,y ∈ D, x and y are said to be comparable if either x ≺ y
or y ≺ x; otherwise, they are said to be incomparable.
Definition 1 (dominant in ordering context) A product p
is said to dominate another product q on S if and only if
∀sk ∈ S, p.sk q.sk and ∃st ∈ S, p.st ≺ q.st .
The partial order (D,) can be represented by a DAG
G = (D, E), where (υ, ω) ∈ E if ω υ and there does
not exist another value x ∈ D such that ω x υ. For
simplicity and without loss of generality, we assume that G
is a single connected component.
Theorem 1 The converted sequence dataset records all the
dominant relationships of the points in the spatial dataset.
Proof [Sketch of Proof] Trivial because the small-large
pair (dominant) relationship in the spatial dataset is equivalent to the early-late pair (dominant) relationship in the converted sequence dataset.
The second and the third processes aim to determine a
partial order that describes the point set in the subspace S of data space S in D . In this paper, we apply the approach
in (Casas-Garriga 2005) with a minor modification, that we
mine general sequential patterns (Agrawal & Srikant 1995).
In the second process, we discover the sequential patterns
from the transformed sequence dataset by applying PrefixSpan algorithm (Pei et al. 2001)3 , which is the state-of-the-art
method. To save space, we merge these sequential patterns
as local maximal sequential sequences, which are not
the subsequence of other sequential sequences. For example, in subspace {D1 , D2 }, although af d, af e, ade,
and f de are length-3 patterns, we merge them as length4 local maximal sequential sequences, as af de. The
resultant data cube (SeqCube) from the second process is
shown in Fig. 4 (c).
Definition 2 (dominant set, DGS(p, D, S’)) Given
a
product p, we use DGS(p, D, S’) to denote the set of
products from D which are dominated by p in the subspace
S’ of S.
Definition 3 (dominated set, DDS(p, D, S’)) Given
a
product p, we use DDS(p, D, S’) to denote the set of
products from D which dominate p in the subspace S’ of S.
The problem that we want to solve is as follows:
Problem 1 (General Point Query(GPQ)) Given
a
dataset D, dimension space S’, and a product p, find DGS(p,
D, S’) and DDS(p, D, S’).
Note that GPQ is the generalized model of Subspace
Analysis Queries (SAQ)(Li et al. 2006).
Example 1 Consider the 3-dimensional dataset D = {a, b,
c, d, e, f} in Table 12 . Given a query point b, dimension
space S ={D1 , D2 }, the dominating set DGS(b, D, S ) =
{d, e} and the dominated set DDS(b, D, S ) = {a, c}. We
will use this dataset as a running example hereafter.
Integrated system building,
maintaining and querying
Theorem 2 SeqCube records all the dominant relationship
of the points in the sequence dataset D.
Proof [Sketch of Proof] (Proof by Contradiction.) For
simplicity, we only prove for a specific subspace of
In this section, we introduce our online query system, which
is illustrated in Fig. 3. There are three parts involved with
Di denotes the ith attribute. For simplicity, we use small integer to simulate items’ values on the attributes for convenience of
description in this paper.
2
3
We skip the details of PrefixSpan here. Interested users can
refer to (Pei et al. 2001).
1485
Figure 4: The result representation of each process for the example spatial dataset
Figure 5: All virtual nodes representation in 2-dimensional space {D1 , D2 } (this figure is best viewed in color)
SeqCube. Assume to the contrary there is a dominant relationship between two points, a dominates b in a subspace
S , that is not represented in the cuboid S of SeqCube.
This means the sequential pattern ab is not listed in S of
SeqCube, which contradicts our assumption the sequential
pattern mining can find all the sequential patterns.
In the third process, the combinations of the local
maximal sequential sequences are enumerated to generate partial orders with DAG representation, by applying
the method proposed in (Casas-Garriga 2005). The resultant
data cube (P arCube) for the example dataset is shown in
Fig. 4 (d).
the counting, the numbers of points dominating/dominated
by current nodes are inserted into each node.
/D
• Pquery ∈
We proposed to solve this problem by traversing the
partial order representation (DAGs) with semantic cutting
(Yang, Wang, & Kitsuregawa 2007). To illustrate this, we
parse the example DAG in Fig. 2 (b) by adding some
V irtual N odes which act as real nodes that dominate different sets of points in the original dataset. Fig. 5 (a) shows
all the virtual nodes (denoted as Vi ). We can see that for
the nodes which dominate three points, there are two virtual
nodes (i.e., V6 and V7 ). Fig. 5 (b) illustrates the virtual
nodes effect in grid model.
To store the information of these virtual nodes, we use a
more compressive method that, with the help of the DAG of
the local maximal sequential sequence shown in Fig. 2 (b),
we only need to save the out-link node and the occupied
region (represented by its upper bound/lower bound) of each
virtual node. For example, Fig. 5 (c) shows the outlinks
of the virtual nodes V6 and V7 and their occupied regions.
Suppose we know that a query point Pquery is in the region
occupied by V6 , then the point b in Fig. 2 (b) will be first
visited according to the out-link of V6 . The remainder of
the process will be the same as that in the first case, when
Pquery ∈ D.
Theorem 3 ParCube records all the dominant relationships of the points in the spatial dataset D.
Proof [Sketch of Proof] Proof can be deduced based on
Theorem 1 and Theorem 2 in this paper, and in (CasasGarriga 2005).
Complexity analysis: The cost of the P arCube construction is mainly dependent on the (maximal) sequential pattern
mining process, which is #P-complete (Yang 2004).
The pseudo code of constructing P arCube is shown in
Table 2, where the three lines describe the three processes
and can be justified based on Theorem 1, 2 and 3, respectively.
Query Processing
Complexity analysis: For the scenario when Pquery ∈ D,
/ D, the
the time complexity is O(1), and when Pquery ∈
time complexity is O(d · log n) in the worst case, where d
is the dimension of the dataset, and n is the number of the
virtual nodes. The space complexity is O(n).
Proof [Sketch of Proof] When Pquery ∈ D, the number
of dominated nodes can be got by following the out-links of
Pquery in DAGs. Hence, the time complexity is O(1). When
We differentiate the two cases based on whether the query
point Pquery is in the dataset D.
• Pquery ∈ D
If Pquery is in D, all the general dominant relationship
related to Pquery can be easily discovered by traversing the
DAG in a specific subspace. As an example, Fig. 2 (b) shows
the DAG representation in subspace {D1 , D2 }. To facilitate
1486
Figure 6: Inserting a node (this figure is best viewed in color)
Pquery ∈
/ D, the time complexity is mainly dependent on
the time used to determine the virtual node corresponding to
Pquery , which is indeed search in R-tree. The time complex
is O(d · log n) and the space complex is O(n).
Figure 7: Visualization of our system
System Maintenance
Performance Analysis
The complete P arCube is recomputed naively upon each
update. Such “blind” updating method is extremely inefficient. We propose our effective update scheme for
P arCube. We differentiate three cases when update happens: node deleting, inserting and modifying.
We performed the experiments using an Intel Core 2
Duo 2.33GHz PC with 2G memory. All the algorithms
were written in C++. We conducted experiments on both
real and synthetic datasets by comparing two algorithms,
P arOrder W eb and N aive4 . Detailed of the datasets used
to test is described as follows:
• Deleting a node θ.
To tackle this case is straightforward: if we want to delete
θ from a DAG, we only need to link the parents of θ and
the children of θ, respectively. Moreover, we need another
traverse of the updated DAG to remove those collision edges
and update the tuples of each node (i.e., the number of points
dominating/dominated current node).
• Inserting a node θ.
Inserting a node θ into a DAG can be naively treated as
the query processing when θ ∈
/ D. However, we can only
get those points which are dominated by θ (i.e., downward
of θ). For instance, if we know a new node is located in the
region of the virtual node V6 in Fig. 5 (b), we still cannot
determine its exact position in the corresponding DAG in
Fig. 2 (b) (i.e., between a and b, or between c and b, etc).
Therefore, we need to further explore the topology of the
DAG to locate the exact position of θ.
Fig. 6 shows all the four possibilities when a new node
θ exists in the region of the virtual node V6 . The nodes,
θ1 ∼ θ4 , can be deemed as the internal virtual nodes of
V6 , whose positions are illustrated in Fig. 6 (a). Given a
new node θ, we use the same strategies as those used to find
the corresponding internal virtual node of θ, by range search
based on lower/upper bound in Fig. 6 (c). For example, if
a new node θ is tested to be existing in the region {2,2,
3,3}, then we have θ=θ1 and its parent (in-link) should be
a and child (out-link) should be b.
1. Real Data. The real dataset was extracted from the web
sites of several stores5 . The retrieved products include
Note Computer (1419 items, 16 attributes), Digital Camera (683 items, 14 attributes), TV (710 items, 10 attributes), DVD Player (432 items, 12 attributes).
2. Synthetic Data. We employ the three most popular
synthetic benchmark datasets in skyline query research
context, independent, correlated and anti-correlated
(Borzsonyi, Kossmann, & Stocker 2001), with dimensionality d in the range [3, 6] and cardinality as 100K.
Effectiveness of Our System
To convenient users to find the dominant relationship between items with regard to their preferences, we built a
system with a graphical user interface based on P ref use
toolkit6 . Fig. 7 illustrates the snapshot for one user query
(Product = “Digital Camera” ∧ skyline attributes: Price ∨
Sensor Resolution ∨ Weight). The dominant relationship
tree shown in the middle of the page clearly illustrates the
business status of each product item. The node on the left
part of the tree dominates (is better than) the node on the
right. Furthermore, it is easy to check each item’s information by clicking on its represented node and meanwhile,
browse its dominating nodes.
Query Performance
• Modifying a node θ.
In this section, we evaluated the query answering performance of P arOrder W eb. There are two cases while
querying, the query point p is in the original spatial dataset
D or not. Note that the difference of the two methods can be
Modifying a node can simply be tackled as a sequel
execution of node deleting and node inserting.
Complexity analysis: The worst case updating time complexity is O(d · niv · log niv ), where d is the number of dimensions involved, and niv is the number of node intervals
in R-tree. The space complexity is O(nin ), where nin is the
number of internal virtual nodes.
4
ParOrder Web was implemented as described in this paper and
Naive was tested with the extension of DADA (Li et al. 2006)
5
www.kakaku.com, www.rakuten.co.jp and www.biccamera.
com
6
http://www.prefuse.org
1487
Conclusions
We have proposed effective methods to construct a data
cube, P arCube, which concisely represents the dominant
relationship as partial orders. We constructed an online
query system, which employs an effective update scheme to
maintain P arCube. We introduced efficient query processing strategies in a dynamic environment. The performance
study confirmed the efficiency of our strategies.
Figure 8: Query processing (anti-correlated) of GPQ queries
References
Agrawal, R., and Srikant, R. 1995. Mining sequential patterns. In ICDE, 3–14.
Bentley, J. L.; Kung, H.; Schkolnick, M.; and Thompson,
C. 1978. On the average number of maxima in a set of
vectors and applications. Journal of ACM 25(4):536–543.
Bergman, M. K. 2001. The deep web: Surfacing hidden value (white paper). Journal of Electronic Publishing
7(1):536–543.
Borzsonyi, S.; Kossmann, D.; and Stocker, K. 2001. The
skyline operator. In ICDE, 421–430.
Casas-Garriga, G. 2005. Summarizing sequential data with
closed partial orders. In SDM, 380–391.
Guttman, A. 1984. R-trees: A dynamic index structure for
spatial searching. In SIGMOD, 47–57.
Kossmann, D.; Ramsak, F.; and Rost, S. 2002. Shooting
stars in the sky: An online algorithm for skyline queries. In
VLDB, 275–286.
Kung, H.; Luccio, F.; and Preparata, F. 1975. On finding
the maxima of a set of vectors. JACM 22(4).
Kushmerick, N.; Weld, D. S.; and Doorenbos, R. B. 1997.
Wrapper induction for information extraction. In IJCAI,
729–737.
Lamport, L. 1978. Time, clocks, and the ordering of
events in a distributed system. Communications of the
ACM 21(7):558–564.
Li, C.; Ooi, B. C.; Tung, A. K.; and Wang, S. 2006. Dada:
A data cube for dominant relationship analysis. In SIGMOD.
Mannila, H., and Meek, C. 2000. Global partial orders
from sequential data. In KDD, 161–168.
Papadias, D.; Tao, Y.; Fu, G.; and Seeger, B. 2003. An
optimal and progressive algorithm for skyline queries. In
SIGMOD, 467–478.
Pei, J.; Han, J.; Mortazavi-Asl, B.; and Pinto, H. 2001.
Prefixspan:mining sequential patterns efficiently by prefixprojected pattern growth. In ICDE, 215–224.
Pei, J.; Jin, W.; Ester, M.; and Tao, Y. 2005. Catching
the best views of skyline: A semantic approach based on
decisive subspaces. In VLDB, 253–264.
Preparata, F. P., and Shamos, M. I. 1985. Computational
Geometry: An Introduction. Springer-Verlag.
Yang, Z.; Wang, B.; and Kitsuregawa, M. 2007. General
dominant relationship analysis based on partial order models. In SAC.
Yang, G. 2004. The complexity of mining maximal frequent itemsets and maximal frequent patterns. In KDD,
344–353.
Yee, K.-P.; Swearingen, K.; Li, K.; and Hearst, M. 2003.
Faceted metadata for image search and browsing. In CHI,
401–408.
Yuan, Y.; Lin, X.; Liu, Q.; Wang, W.; Yu, J. X.; and Zhang,
Q. 2005. Efficient computation of the skyline cube. In
VLDB, 241–252.
Figure 9: System maintenance on the anti-correlated dataset
finely illustrated on large datasets and hence, we show the
result on large synthetic data.
We randomly selected 10,000 different points from D for
the first case and generated 10,000 different points for the
second case. For each point p, we queried its dominating
items in all subspaces. Due to limited space, we present
only the result on the anti-correlated dataset.
Fig. 8(a) and Fig. 8 (b) show the query time against
dimensionality when the query point p ∈ D and p ∈
/ D,
respectively. We can see that the P arOrder W eb algorithm outperforms the N aive in both cases. This is because the compact representation of partial orders can lead
to faster routing. From Fig. 8, we can also know that the
performance difference between the N aive method and the
P arOrder W eb in the first case is much larger than that
in the seconde case. The reason is that we can traverse the
DAGs in the first case, instead of judging the virtual nodes
in the second case.
Efficiency of System Maintenance
In this experiment, we evaluated the efficiency of our
scheme to system maintenance. We randomly selected
10,000 different points from D to simulate the deletion case,
and generated 10,000 different points to simulate the insertion case. From Fig. 9, we can see that the P arOrder W eb
algorithm outperforms the N aive in both cases. The reason
is due to the compact representation of partial orders, which
can lead to less modification. Due to the curse of dimensionality, both of these two methods become inefficient when the
dimensionality increases. Another issue we observed is that
the update cost is relative high due to the precise exploration
of dominant relationship for both algorithms.
We have also explored the compression benefits of
P arOrder W eb compared with N aive. Because we used
a more concise format, i.e., partial order, to record the dominant relationship among the items, our methodology showed
superior performance compared the naive one (Yang, Wang,
& Kitsuregawa 2007).
1488
Download