Jean Reno

advertisement
CXquery (Chamois Xquery)
and its Applications
Hwan-Seung Yong (용 환승, 龍 煥昇)
Dept. of Computer Science and Engineering
梨花女子大學校 (Ewha Womans Univ.)
Seoul, 大韓民國
2005-09-26
H.S.Yong, EWU.
1
Contents






Motivations of CXquery: Structure Agnostic Query
Query Processing Issues
Developing CXquery System Experience
CXquery to Xquery Conversion
CXquery to XML Stream Query Processing
Final Remarks
2005-09-26
H.S.Yong, EWU.
2
What is CXquery


CXquery: Chamois Xquery
Chamois


Project name for Component Based Knowledge
Engineering System Framework in Ewha lead by Dr.
Won Kim [IEEE 2002]
Chamois is an antelope name living Alps Mountain


This animal requires short steps to leap high
CXquery is same with Xquery except one


We don’t need Xpath composing conditions.
Only use element/attribute name
2005-09-26
H.S.Yong, EWU.
3
Background

In RDB




query is made using schema (relation name, attribute
name) with Constant Values for condition checking
Schema is relatively simple structure
Easy to learn query and can be used by end user
In OO and ORDB


Schema have complex structure
So query composition and design is very hard task


Only professionals do
In XML, what happen?
2005-09-26
H.S.Yong, EWU.
4
XML and Query issues


Hard to compose like OO/OR case
Try to design SQL like XML query until now



XML even allow data with no schema (DTD
unknown)


Xquery, Xpath: W3C standard
Is it SQL like?
How do we make query?
Natural language query for RDB



For easy of use
How about natural language query for XML?
Or how about semi-natural language query
2005-09-26
H.S.Yong, EWU.
5
Some aspects on XML


XML is a meta language for encoding domain
information (book, movie, music, product, company,
math, chemistry etc)
There need XML standard for each domain
worldwide (MathML, CML, BioXML etc)


But not yet enough
This means data from equal domain can be
encoded using different DTD


There can be many kind of movie DTDs, music DTDs.
Xquery have to follow DTDs, so same query can be
expressed by different Xquery
2005-09-26
H.S.Yong, EWU.
6
DTD Design Choices of same data




Element representation
Attribute representation
Nested representation
Combinations of the above
2005-09-26
H.S.Yong, EWU.
7
Element representation: 1st DTD Type

Xquery depends on DTD structure
for $m in doc ()//movie
where $m/genre[text() = “action”] and
$m/year[text() = “1994”] and
$m/actor[text() = “Jean Reno”]
return <title>$m/title/text()</title>
<movie>
<year>1994</year>
<country>America</country>
<country>France</country>
<genre>drama</genre>
<genre>action</genre>
<title>Leon</title>
① elements representation
<director>Luc Besson</director>
<actor>Jean Reno</actor>
<actor>Natalie Portman</actor>
</movie>
2005-09-26
H.S.Yong, EWU.
8
Attribute representation: 2nd DTD Type
for $m in doc ()//movie
where $m/*[@genre = “action”] and
$m/*[@year = “1994”] and
$m/*[actor = “Jean Reno”]
return <title>$m/*/@title</title>
<movie>
<production year=”1994”
country=”America France”>
</production>
<detail_info genre=”drama action”
title=”Leon”>
② attributes representation
</detail_info>
<people actor=”Jean Reno”
director=”Luc Besson”>
</people>
</movie>
2005-09-26
H.S.Yong, EWU.
9
Nested representation: Third DTD Type
for $y in doc ()//year
where $y[text() = “1994”] // genre[text() = “action”] // actor[text()= “Jean Reno”]
return $y//title/text()</title>
<year>1994
<country>America
<genre>action
<movie>
<title>Leon</title>
<director>Luc Besson</director>
<actor>Jean Reno</actor>
</movie>
…
③ nested representation
<genre>
</country>
…
</year>
2005-09-26
H.S.Yong, EWU.
10
Combinations: Fourth DTD Type
for $g in doc ()//genre
where $g[@* = “action”] // year[*= “1994”] // actor[text() = “Jean Reno”]
return <title>$g//title/text()</title>
<genre type=”action”>
<country name=”America”>
<year><yyyy>1994</yyyy>
<movie>
<title>Leon</title>
<people director=”Luc Besson” actor=”Jean Reno”>
</people>
</movie>
④ nested + attributes + elements
…
representation
</year>
</country>
</genre>
2005-09-26
H.S.Yong, EWU.
11
Independence and DBMS

But in XML in heterogeneous(?) distributed
environment


Each Xquery seriously depends on its DTD
Without defining single DTD and XML data conversion,
we have to make different Xquery
2005-09-26
H.S.Yong, EWU.
12
Real SQL like XML query

Rather use XPath expressions





Just use


/continent/country/state/city/name = ‘Kyoto’
//city/name=‘Kyoto’
//city//* = ‘Kyoto’
//city/@name=‘Kyoto’
name=‘Kyoto’
Just use element or attribute name instead of Xpath

“Find information about city named Kyoto”

2005-09-26
Natural query requires heav semantic processing
H.S.Yong, EWU.
13
CXquery Approach

Assumption



User have to know exact tag name (Element/Attribute) and values
User didn’t know the structure (DTD) of XML
Query Example


Search for movie titles whose genre is ‘action’, release year is ‘1994’, and
whose stars include ‘Jean Reno’
genre = “action” and year = “1994” and actor = “Jean Reno”
Apply to XQuery
2005-09-26
for $t in doc()//title
where genre = “action” and
year = “1994” and
actor = “Jean Reno”
return $t
H.S.Yong, EWU.
14
Contents






Motivations of CXquery: Structure Agnostic Query
Query Processing Issues
Developing CXquery System Experience
CXquery to Xquery Conversion
CXquery to XML Stream Query Processing
Final Remarks
2005-09-26
H.S.Yong, EWU.
15
Four Query Processing Issues

First, ‘similarity matching’ is required



In an environment where the schema or DTD of XML documents is
not precisely known or “fuzzy” (approximate) search is done, even
the precise names of the elements and attributes may not be
known.
Use thesaurus based matching
E.g) for the element names “actor”, “genre”, and “year”, the query
processor may also need to search for names such as “performer”,
“category”, and “date”, respectively.
2005-09-26
H.S.Yong, EWU.
16
Query Processing Issues

Second, heterogeneous representation of same content in
XML






intervening elements and/or an attribute between an element name
and its corresponding value.
Figure (a): One example DTD: element representation
Figure (b): type intervenes between genre and “action”, name
intervenes between actor and “Jean Reno”, and yyyy intervenes
between year and “1994”.
Figure (c), genre, year, and actor are represented as attributes
Figure (d), genre, year, and actor are represented as elements but
their values are represented as attribute values.
This introduces significant implementation difficulties

Processor should consider all possible representations.
2005-09-26
H.S.Yong, EWU.
17
(a)
2005-09-26
(c)
(b)
H.S.Yong, EWU.
(d)
18
Query Processing Issues


Third, intervening elements (<family>)and/or an attribute between
an element and its corresponding value
 leads to “semantic uncertainty” in the association between the
element and the value.
Ex) <actor>
<family>
<name>Jean Reno</name>
…
</family>
</actor>
“Jean Reno” is the value associated with the element or attribute
“family” of “actor”.

blind binding of actor to “Jean Reno” is possble, declare that the search
predicate “actor = Jean Reno” is true
Semantic correctness may be in question !!!
2005-09-26
H.S.Yong, EWU.
19
Query Processing Issues

Fourth, identification of nearest common ancestor (NCA) is
needed

of all element and attribute names that appear in the search
predicates

For query-processing optimization
<movies>
For preventing erroneous<movie>
results

2005-09-26
<general_info>
<year>1994</year>
<genre>action</genre>
</general_info>
○
<detail_info>
<actors>
<actor>Jean Reno</actor>
</actors>
</detail_info>
× </movie>
<movie>
<general_info>
<year>1994</year>
<genre>action</genre> ...
</movie>
H.S.Yong, EWU.
</movies>
20
Query Processing Issues

However, the problem is difficult


since the structure of the XML hierarchy is not specified in
CXquery
Ex) NCA of year, genre, and actor
2005-09-26
H.S.Yong, EWU.
21
Contents






Motivations of CXquery: Structure Agnostic Query
Query Processing Issues
Developing CXquery System Experience
CXquery to Xquery Conversion
CXquery to XML Stream Query Processing
Final Remarks
2005-09-26
H.S.Yong, EWU.
22
One Approach to support CXquery

Implementation
Condition clause:
genre =“action” AND year =“1993” AND actor =“Tommy Lee Jones”
Structure??
Data names : element/attribute name
Data values
Result clause: title
2005-09-26
H.S.Yong, EWU.
23
One Approach to support CXquery


Implementation of XML Server based on CXquery
Special Indexing is used





Node index: all element and attribute name
Value index: all constant value in XML
All node and value numbering to find their structural
relationship
Indices are stored using RDB
Performance evaluation shows promising result.
[ISMIS 2005]
2005-09-26
H.S.Yong, EWU.
24
One Approach

Query processor should drive all paths among the names and
values.



Identification of name and value relationship
Identification of relationship between names
Classification of all possible paths XML can have is investigated
genre =“action” AND year = “1993” AND actor =“Tommy Lee Jones”
All possible paths
2005-09-26
H.S.Yong, EWU.
25
Path m-FE
Path m-HE
Path m-HEA
Path m-FEA
V1
V1
Vn
CnE
C1E C1A … CkE CmA
Vn
V1
Vn-1 Vn
V2
CkE CmA V1
V2
iE
Vn-1 Vn
C1A …
Path d-uHE-FE
Path d-FEA-FEA
CnA
iE
V1
Path d-lHE-FE
Path d-FE-FEA
Path d-FA-FE
C1E C1A
C1E
C1E … CnE
Path d-FE-FA
Vn
iE
C1A … CnE
V1
C1E
iA …
iE
V1
Vn
CnA
C1E … CnE
Vn
Path d-HE-HA
C1E
iE
V1
CnE
Vn
V1
Path d-ulHE-HE
iE
iE
iE
iE
iE
2005-09-26
Vn
CnA
Vn
CnE
C1E
iA
iE
C1A
iA
V1
CnE
V1
Vn
Vn
V1
iE
Vn
C1E
H.S.Yong, EWU.
Vn
iE
iE
CnE
V1
iE
CnE
V1
Vn
Vn
Path d-HEA-HE
iE
V1
iA … CnE
iA
V1
Vn
Path d-lHE-HE
C1E
iE
C1E
iE
V1
iE
Path d-HA-HE Path d-HE-HEA Path d-HEA-FE
C1A
V1
C1E … CnE
V1
iA
C1E
iE
C1E … CnE
iE
iA … CnE
Path d-ulHE-FE
Path d-uHE-HE
iE
iE
C1E
iE
CnE
iA
Vn
Vn
Path d-HEA-HEA
iE
C1A
CnE iA
V1
Vn
26
One Approach

To search all possible paths, node numbering scheme is used for each
node in XML
<year yyyy=”1994”>
<country name=“America”>
<movie >
yyyy
<title>Leon</title>
<genre type=“drama” type=”action”></genre>
<people>
name
<director>Luc Besson</director>
1994
<actor>
<name>Jean Reno</name>
<name>Natalie Portman</name>America title
</actor>
</people>
Leon
</movie>
….
<country name=”France”>
….
</year>
2005-09-26
H.S.Yong, EWU.
year
country
movie
genre
type
type
drama
action
country
….
people
director
….
name
France
actor
Luc
name
Besson
Jean
Reno
name
Natalie
Portman
27
Node numbering to identify relationship
10,1000
year
20,25
30,490
yyyy
country
20,25
1994
40,45
50,170
name
40,45
movie
70,75
America title
500,990
510,515
….
name
80,110
120,180
genre
people
Leon
France
90,95
100,105
Leon
type
type
90,95
100,105
130,135 150,155
160,165
drama
action
Luc
name
Besson
name
130,135
….
510,515
70,75
140,170
director
actor
150,155
Jean
Reno
2005-09-26
title
country
160,165
Natalie
Portman
H.S.Yong, EWU.
Doc-ID
StartRegion
End-Region
name
1
10
220
movie
1
20
30
year
1
40
110
1
120
210
Basicinfo
people
28
Processing flow diagram overview
• Implement an XML-server to evaluate the performance of the
query expression
XML
Documents
Node & Value
Analyzer
Node names
& Identifier
Identifier
Creator
Values &
Identifier
Queries
Components of a
condition clause
& components of
a result clause
Query
results
2005-09-26
Value Table
Index
Constructor
Data Table
Index Table
Parser
Parser
Node Table
SQL
Translator
SQL
statements
Result
Creator
H.S.Yong, EWU.
Path Type Classifier
SQL Processor
Region Processor
Query Processor
29
Contents






Motivations of CXquery: Structure Agnostic Query
Query Processing Issues
Developing CXquery System Experience
CXquery to Xquery Conversion
CXquery to XML Stream Query Processing
Final Remarks
2005-09-26
H.S.Yong, EWU.
30
CXquery to Xquery Conversion

System Diagram Overview
CXQuery
CXQuery to Xquery
Converter
XML DB
Result
User
DTD/Result
Xquery
XML Server
XML
Document
2005-09-26
DTD 1
XML
DTD 2
XML
(eXist 1.0)
H.S.Yong, EWU.
31
CXquery to Xquery Converter

Set of Xquery should be generated for one
CXquery based on number of different DTD
For $c in doc()
Where genre=”action” AND actor=”Jean
Reno”
Return title
CXQuery
For $c in /movies
Where $c/genre=”action”
AND $c/actor=”Jean Reno”
Return $c/title
2005-09-26
H.S.Yong, EWU.
XQuery
32
Xquery for each DTD type
DTD Type 1 For $c in /movies
Where $c/movie/@genre=”action”
AND $c/movie/actor=” Jean Reno”
Return $c/movie/title
DTD Type 2 For $c in /movies
Where $c/movie/genre=”action”
AND $c/movie/genre/actor=” Jean Reno”
Return $c/ title
DTD Type 3 For $c in /movie
Where $c/movie/@genre=”action”
AND $c/movie/title/@actor=” Jean Reno”
Return $c/movie/title
DTD Type 4 For $c in /movies
Where $c/movie/genre=”action”
AND $c/movie/actor=” Jean Reno”
Return $c/movie/title
2005-09-26
H.S.Yong, EWU.
33
Contents






Motivations of CXquery: Structure Agnostic Query
Query Processing Issues
Developing CXquery System Experience
CXquery to Xquery Conversion
CXquery to XML Stream Query Processing
Final Remarks
2005-09-26
H.S.Yong, EWU.
34
System Flow diagram
Input
CXQuery
File
DTD File
XML Stream
CXQueries
Processing
Path
Generator
DTD
Path Set
CXQuery
Converter
XML Steam
Xpath Queris
Yfilter XML
Stream Engine
Output
XML문서
2005-09-26
H.S.Yong, EWU.
35
CXquery to Xquery Conversion
(b) CXQuery
(a) DTD Path Set
path_mondial-cities.txt
/cities/city/name
/cities/city/latitude
/cities/city/population
/cities/city/located_at
/cities/city[@is_country_cap]
/cities/city[@is_state_cap]
/cities/city/population[@year]
/cities/city/located_at[@watertype]
CXQ1:is_country_cap="yes" or latitude
CXQ2:car_code="MK and area="25333"
CXQ3:name="Caspian Sea"
or area="17000"
CXQ4:latitude
CXQ5:ethnicgroups
CXQ6:name
CXQ7:country="Korea"
…
(d) Xquery Generation
(c) Xpath ser from (a) and (b)
/cities/city/name
/cities/city/latitude
/cities/city[@is_country_cap]
2005-09-26
/cities/city/latitude
/cities/city[@is_country_cap=“yes”]
/cities/city[name=“Caspian Sea”]
/cities/city/name
H.S.Yong, EWU.
36
Xquery Conversion for each DTDs
path_mondial-countries.txt
...
/countries/country/name
/countries/country/provinces
/countries/country/encompasses
/countries/country/neighbor
/countries/country[@car_code]
/countries/country[@area]
/countries/country/population[@year]
/countries/country/ethnicgroups[@percentage]
/countries/country/religions[@percentage]
/countries/country/ethnicgroups
path_mondial-cities.txt
...
/cities/city/name
/cities/city/latitude
/cities/city/population
/cities/city/located_at
/cities/city[@is_country_cap]
/cities/city[@is_state_cap]
/cities/city/population[@year]
...
qry6.txt
CXQ1:is_country_cap="yes" or latitude
CXQ2:name="Caspian Sea" or area="17000"
CXQ3:latitude
CXQ4:ethnicgroups
CXQ5:name
xpath_qry6.txt
CXQ5:/cities/city/name
CXQ1:/cities/city/latitude
CXQ1:/cities/city[@is_state_cap=“yes”]
CXQ5:/continents/continent/name
CXQ5:/countries/country/name
CXQ4:/countries/country/ethnicgroups
CXQ2:/countries/country[@area="17000"]
CXQ4:/countries/country/ethnicgroups[@percentage]
CXQ5:/lakes/lake/name
path_mondial-continents.txt
/continents/continent
/continents/continent/name
path_mondial-lakes.txt
/lakes/lake/name
2005-09-26
H.S.Yong, EWU.
37
Implementation Result

CXquery Example
2005-09-26
H.S.Yong, EWU.
38
Matching process of CXquery with Path
Set
2005-09-26
H.S.Yong, EWU.
39
Xquery Conversion results
2005-09-26
H.S.Yong, EWU.
40
Example 6 CXquery
2005-09-26
H.S.Yong, EWU.
41
Converted 13 Xquery
2005-09-26
H.S.Yong, EWU.
42
CXquery for distributed XML servers

In heterogeneous DBMS environment


Single standard schema is required in central server
Query translation is required


Query on Standard schema translated into site’s schema
Distributed CXquery environment




We don’t need standard XML schema but collection of
Each Site’s DTD is enough
User only compose query using CXquery
CXquery has DTD neutral property
Central site then convert CXquery to site’s Xquery and
collect result.
2005-09-26
H.S.Yong, EWU.
43
Heterogeneous XML stream query
processing

Stream data is increasing


RSS stream, news stream, stock trading, sensor
stream, multimedia stream etc.
Stream processing engine is needed


Handle large number of heterogeneous XML stream
concurrently
How do we use single stream query on this multiple
heterogeous streams


Query translation for each stream and processing differently?
Apply single CXquery to multiple heterogeneous
stream.
2005-09-26
H.S.Yong, EWU.
44
Final Remarks



CXquery having no path is introduced
This area of research need more works from now on
Technical issues for future research

Element/Attribute – Value Association is required to
solve Semantic ambiguity problem

2005-09-26
Name = “Kyoto” vs name=“Tanaka” vs name = “Winter Sonata”
H.S.Yong, EWU.
45
Final Remarks

Possible approaches

Define DTD tag name more specifically


System can resolve domain conflict exactly through
data mining etc.


Cityname = “Kyoto” vs person-name=“Tanaka” vs Movie-name
= “Winter Sonata”
“Kyoto” represents city name, “Tanaka” represents Person
name etc.
User specify exact domain name for all constants


2005-09-26
Name = “Kyoto[City]”, name=“Tanaka[Person]” name=“Winter
Sonata[Movie]”
XML extension is required
H.S.Yong, EWU.
46



Thank you for your attention
聞いて いただいて どうも ありがとう ございました
Questions?
2005-09-26
H.S.Yong, EWU.
47
References





ISMIS 2005] Wol Young Lee, Hwan Seung Yong, "A Query
Expression and Processing Technique for an XML Search Engine,"
ISMIS 2005: 15th International Symposium on Methodologies for
Intelligent Systems, Saratoga Springs, NY, USA, May 2005.pp.266275.
[JOT 2004b]Won Kim, Wol Young Lee, Hwan Seung Yong, "On
Query-Processing Issues for Non-Navigational Queries for XML," in
Journal of Object Technology, Vol.3, No. 10, November-December
2004, pp. 19-26.
[JOT 2004a] Won Kim, Wol Young Lee, Hwan Seung Yong, On
Supporting Structure-Agnostic Queries for XML, in Journal of Object
Technology, Vol.3, No.7, July-August 2004, pp.27-35 ,
[JOT 2002] Won Kim et al., "The Chamois Reconfigurable DataMining Architecture, " Journal of Object Technology, Vol. 1, No. 2,
July-August 2002, pp.2-10.
[IEEE 2002] Won Kim et al., "Chamois: A Component-Based
Knowledge Engineering Framework," IEEE Computer, Vol. 35, No.
5, May 2002, pp. 46-54.
2005-09-26
H.S.Yong, EWU.
48
Download