QURSED: Querying and Reporting Semistructured Data Yannis Papakonstantinou

advertisement
QURSED:
Querying and Reporting
Semistructured Data
Yannis Papakonstantinou
Michalis Petropoulos
UNIVERSITY OF CALIFORNIA, SAN DIEGO
Vasilis Vassalos
NEW YORK UNIVERSITY
June 2002
Overview
• Query Forms and Reports
– Challenges of Semistructured Data
• The QURSED system
– Architecture
• Technical foundation
– Tree Query Language (TQL)
– Query Set Specification (QSS)
• QURSED Editor
Exporting DBMSs on the Web
BROWSER / GUI
End-User
XML
XML
XQuery
XML
View
XML
View
Source
Source
XML
Schema
FOR
$s IN document(“sensors.xml"),
$p IN $s/manufacturer/product
WHERE
$p/specs/sensing_distance > 4
AND
$p/specs/body_type/cylindrical
RETURN
<big_cylindrical_sensors>
{$p}
</big_cylindrical_sensors>
DATABASE SERVER
• XML views and schemas
• XQuery behind the scenes
• Need for web-based interfaces
Web and Databases Effort
• Data intensive Web site
generators
End-User
Web
Site
– Strudel
– Forms as functions on
edges/links
Site
Generator
Presentation
Templates
SITE
GRAPH
Web Site
Content
and Structure
– Araneus
– Autoweb
Semistructured
Queries
(Mediated)
View
SCHEMA
SEMISTRUCTURED
DATA
• Declarative
• Separation of content,
structure and presentation
Query Forms and Reports
Requirements
• Handle semistructureness
– Powerful query forms and reports
• Be declarative
– Separate logic from presentation
• Encode compactly a large number of queries
– Compared to a set of query templates
• Visual interface for the developer
– Programming should NOT be a requirement
Query Forms and Reports
Query Forms and Reports
QURSED Approach
End-User
Query
Form
Page
HTML
QURSED
Editor
QSS
XML
Schema
– Separation of content &
presentation
Report
Pages
QURSED
Engine
XQuery
XHTML
XQuery
Engine
• XML Schema-driven
• Declarative!
• Editor
– Visual actions to declarative
specifications
– Automatic construction of report
pages
• Query Set Specification (QSS)
– Large set of parameterized
queries
– Compact representation
• Engine
– Automatic query formulation
– Direct result construction
Query Forms and Reports
QURSED Editor
XML Schema
Query Forms and Reports
Developing Query Forms from
the XML Schema
Developing Reports from the
XML Schema
QURSED System Architecture
Web Designer
End-User
WYSIWYG
XHTML
Editor
XHTML
Query Form
Page
BROWSER
Query
Form Page
Developer
Report
Pages
Query Set
Specification
APP SERVER
(Optional)
XHTML
Template
Report Page
(Optional)
Custom
Predicates
QURSED
Editor
Query Form
Page
Query/Visual
Association
XML
Schema
QURSED
Compiler
Deployment
Dynamic
Server Pages
QURSED
Run Time Engine
XQuery
Expressions
XML/XHTML
XQuery
Engine
Tree Query Language (TQL)
XML
Data
Tree
PROT1 = “NEMA3” OR
AND
html
PROT1 = “NEMA1”
body
sensors
$S
table GROUPBY ($PROD, $NAME)
manufacturer
tr
SORTBY ($NAME DESC)
name
$NAME
td
product
$PROD
$NAME
specs
td
body_type
table
GROUPBY ($CYL)
OR
tr
$DIA <= 20 AND
AND
td
$DIA <= 40
“Cylindrical”
cylindrical
$CYL Bindings
tr
diameter
$DIA
td
barrel_style
$BAR
$DIA
GROUPBY ($DIA)
$HEI <= 20 AND
td
AND
$WID <= 40
$BAR
GROUPBY ($BAR)
rectangular
$REC
table
GROUPBY ($REC)
height
$HEI
tr
width
$WID
td
protection_ratings
“Rectangular”
protection_rating
$PROT1
tr
td
$HEI
GROUPBY ($HEI)
td
$WID
GROUPBY ($WID)
• Condition tree
• Schema-driven generation
• Result tree
XML
Result
Tree
Tree Query Language (TQL)
PROT1 = “NEMA3” OR
PROT1 = “NEMA1”
sensors
$S
manufacturer
name
$NAME
product
$PROD
specs
sensors
AND
protection_ratings
protection_rating
manufacturer
manufacturer
name
name
“Turck”
“Turck”
product
product
part_number
part_number
“B123”
“A123”
specs
specs
sensing_distance
sensing_distance
“25”
“11”
body_type
body_type
rectangular
$PROT1
cylindrical
height
diameter
“10”
“17”
width
barrel_style
“30”
“Smooth”
protection_ratings
protection_ratings
protection_rating
protection_rating
“NEMA3”
“NEMA1”
protection_rating
protection_rating
“NEMA4”
“NEMA3”
$NAME
$PROD
Turck
product
. part_number
.
.
“A123”
$NAME
Turck
$NAME
Turck
$PROD
product
. part_number
.
.
“A123”
$PROD
product
. part_number
.
.
“B123”
$PART $PROT1
A123 NEMA1
$PART $PROT1
A123 NEMA3
$PART $PROT1
B123 NEMA3
Tree Query Language (TQL)
AND
sensors
$S
manufacturer
name
$NAME
product
$PROD
specs
body_type
$BODY
OR
$DIA <= 20 AND
AND
$DIA <= 40
$CYL
cylindrical
diameter
$DIA
barrel_style
$BAR
$HEI <= 20 AND
AND
$WID <= 40
rectangular
$REC
height
$HEI
width
$WID
protection_ratings
protection_rating
$NAME
$PROD
Turck
product
. part_number
.
“A123”
.
$NAME
Turck
$PROD
product
. part_number
.
“B123”
.
sensors
$PART
$BODY
A123 cylindrical
$PART
$BODY
B123 rectangular
manufacturer
manufacturer
name
name
“Turck”
“Turck”
product
product
part_number
part_number
“B123”
“A123”
specs
specs
sensing_distance
sensing_distance
“25”
“11”
body_type
body_type
rectangular
cylindrical
height
diameter
“10”
“17”
width
barrel_style
“30”
“Smooth”
protection_ratings
protection_ratings
protection_rating
protection_rating
“NEMA3”
“NEMA1”
protection_rating
protection_rating
“NEMA4”
“NEMA3”
$CYL
cylindrical
. diameter
.
“17”
.
$DIA $BAR
17 Smooth
$REC
rectangular
. height
.
“10”
.
$HEI $WID
10
30
TQL Semantics
Condition Tree
• Conjunctive Condition Trees
– OR-Removal Algorithm
– Transformation Rules
AND
A
OR with
replace
B
C
D
OR
AND
A
B
D
AND
A
C
D
OR
A
OR with
replace
B
C
OR
A
B
C
element node
A
OR
replace
B with
C
D
element node
OR
AND
A
B
D
AND
A
C
D
element node
OR
replace
AND with
AND
OR
element node
AND
element node
AND
TQL Semantics
Result Tree
html
body
table GROUPBY ($PROD, $NAME)
tr
SORTBY ($NAME DESC)
td
$NAME
td
table
GROUPBY ($CYL)
tr
td
“Cylindrical”
tr
td
$DIA
GROUPBY ($DIA)
td
$BAR
GROUPBY ($BAR)
table
GROUPBY ($REC)
tr
td
“Rectangular”
tr
td
$HEI
GROUPBY ($HEI)
td
$WID
GROUPBY ($WID)
html
body
table
tr
tr
td
td
td
Turck
Turck
td
td
A123
B123
td
td
11
25
td
td
table
table
tr
tr
td
td
“Rectangular”
“Cylindrical”
tr
tr
td
td
17
10
td
td
Smooth
30
Tree Query Language (TQL)
• Translated to XQuery
–
–
–
–
By QURSED Run-Time Engine
TQL2XQuery Algorithm
Syntax directed translation
Tree patterns in TQL to nested FOR-WHERE-RETURN
expressions in XQuery
Query Set Specification (QSS)
AND
$DIST <= $#DIST
$PROT1 = $#PROT1
sensors
$S
manufacturer
name
$NAME
product
$PROD
specs
sensing_distance
$DIST
body_type
OR
AND
$DIA <= $#DIMX AND
$DIA <= $#DIMY
$CYL
cylindrical
$DIA
diameter
barrel_style
$BAR
$HEI <= $#DIMX AND
$WID <= $#DIMY
rectangular
$REC
$HEI
height
$WID
width
AND
f1 f2 f3
protection_ratings
protection_rating
$PROT1
Condition Tree Generator
• Parameterized boolean
expressions
• Multiple boolean
expressions per AND node
• Condition fragments
Dependencies
AND
$BODY = $#BODY
$DIA <= $#DIA AND $BAR <= $#BAR
$HEI <= $#HEI AND $WID <= $#WID
sensors
body_type
$BODY
<f2, $#BODY = “Cylindrical”, {f1}>
cylindrical
diameter
$DIA
barrel_style
$BAR
rectangular
$HEI
height
width
f1 f2 f3
$WID
<f3, $#BODY = “Rectangular”, {f1}>
Dependencies
f1
f2
f3
<f2, $#BODY = “cylindrical”, {f1}>
<f3, $#BODY = “rectangular”, {f1}>
• Dependencies Graph
• Resolution algorithm based on topological sort
Run-time: QSS to TQL Queries
$DIST <= $#DIST
6
AND
$NAME = $#NAME
“Turck”
sensors
manufacturer
name
f1 f2
$NAME
$PROD
product
part_number
$PART
specs
sensing_distance
$DIST
Run-time: QSS to TQL Queries
Constants from
Query Form Page
XHTML
Report Page
QURSED Run Time Engine
Instantiate parameters
in CTG with constants
Find Active Condition Fragments
Resolve Dependencies
Union of Active Condition
Fragments to TQL Query
TQL Query to XQuery Expression
XQuery
Expressions
XML/XHTML
XQuery
Engine
QURSED Editor
Building Query/Visual Association
Condition Fragment List
Data Path
Form Control
Predicate
QURSED Editor
From visual actions to QSS
sensors/manufacturer/*
= man_name_select
AND
$NAME = $#NAME
sensors
manufacturer
name
f1
$NAME
product
part_number
specs
sensing_distance
Choice of schema element e
means
• Addition of e to the CTG
• Addition of the e path to
the CTG
• Creation of a name
variable for e
QURSED Editor
Disjunction
($DIA <= $#DIMX AND $DIA <= $#DIMY)
OR
($HEI <= $#DIMX AND $WID <= $#DIMY)
QURSED Editor
Disjunction
$DIA <= $#DIMX AND
$DIA <= $#DIMY
AND
OR
$HEI <= $#DIMX AND
$WID <= $#DIMY
sensors
body_type
cylindrical
$CYL
diameter
$DIA
barrel_style
$BAR
rectangular
$REC
$HEI
height
$WID
width
AND
sensors
body_type
OR
AND
$DIA <= $#DIMX AND $DIA <= $#DIMY
cylindrical
$CYL
diameter
$DIA
barrel_style
$BAR
AND
$HEI <= $#DIMX AND $WID <= $#DIMY
rectangular
$REC
$HEI
height
$WID
width
• Creation of disjunctive condition triggers
transformation of the Condition Tree Generator
– ORNodes Algorithm
QURSED Editor
Building Reports
Group By Mapping
Element Mapping
Elements
to Appear
on Report
QURSED Editor
Result Tree
html
body
table
manufacturer
GROUPBY ($MAN)
tr
td
AND
$NAME
GROUPBY ($NAME)
sensors
td
product
table
manufacturer
GROUPBY ($PROD)
tr
$NAME
td
name
table
cylindrical
GROUPBY ($CYL)
product
$PROD
tr
$PART
td
part_number
$DIA
GROUPBY ($DIA)
td
specs
$BAR
GROUPBY ($BAR)
$DIST
sensing_distance
fR
rectangular
table
GROUPBY ($REC)
tr
td
$HEI
GROUPBY ($HEI)
td
$WID
GROUPBY ($WID)
More Features
• Expandable schema
– Multiple copies/variables for repeatable elements
• Optional elements
• Sort-by options
• Template-driven construction of report pages
– Element mappings
– Group-by mappings
• Detailed list of visual actions of QURSED Editor
QURSED Contributions
• The first web-based generator of powerful query
forms and reports for semistructured XML data
• Declarative
– Separates querying functionality and presentation
• Handles semistructureness
– Disjunction
• Technical foundation
– XML Schema, QSS, TQL, XQuery
• QURSED Editor
– Visual actions “translated” to QSS and query/visual
association
– Automates report construction for heterogeneous data
Related work
• Web-based Form and Report Generators
– Macromedia Ultradev, Coldfusion, Microsoft Visual InterDev
– Excellent for flat uniform relational tables
– Visual query formulation paradigm allows the specification of
projections, sort-bys, simple conditions
– However, the development of form and report pages for
semistructured data requires substantial programming effort
• Visual Querying Interfaces
– EquiX, BBQ, VQBD, Lorel’s DataGuide-driven GUI, PESTO
– Excellent visual paradigm for the formulation of fairly
complex queries
– The goal is the development of a query or a query template
– User needs to be familiar with database models and schemas
Questions and Answers
?
Download