QURSED: Querying and Reporting Semistructured Data Yannis Papakonstantinou Michalis Petropoulos UNIVERSITY OF CALIFORNIA, SAN DIEGO Vasilis Vassalos NEW YORK UNIVERSITY June 2002 Overview • Query Forms and Reports – Challenges of Semistructured Data • The QURSED system – Architecture • Technical foundation – Tree Query Language (TQL) – Query Set Specification (QSS) • QURSED Editor Exporting DBMSs on the Web BROWSER / GUI End-User XML XML XQuery XML View XML View Source Source XML Schema FOR $s IN document(“sensors.xml"), $p IN $s/manufacturer/product WHERE $p/specs/sensing_distance > 4 AND $p/specs/body_type/cylindrical RETURN <big_cylindrical_sensors> {$p} </big_cylindrical_sensors> DATABASE SERVER • XML views and schemas • XQuery behind the scenes • Need for web-based interfaces Web and Databases Effort • Data intensive Web site generators End-User Web Site – Strudel – Forms as functions on edges/links Site Generator Presentation Templates SITE GRAPH Web Site Content and Structure – Araneus – Autoweb Semistructured Queries (Mediated) View SCHEMA SEMISTRUCTURED DATA • Declarative • Separation of content, structure and presentation Query Forms and Reports Requirements • Handle semistructureness – Powerful query forms and reports • Be declarative – Separate logic from presentation • Encode compactly a large number of queries – Compared to a set of query templates • Visual interface for the developer – Programming should NOT be a requirement Query Forms and Reports Query Forms and Reports QURSED Approach End-User Query Form Page HTML QURSED Editor QSS XML Schema – Separation of content & presentation Report Pages QURSED Engine XQuery XHTML XQuery Engine • XML Schema-driven • Declarative! • Editor – Visual actions to declarative specifications – Automatic construction of report pages • Query Set Specification (QSS) – Large set of parameterized queries – Compact representation • Engine – Automatic query formulation – Direct result construction Query Forms and Reports QURSED Editor XML Schema Query Forms and Reports Developing Query Forms from the XML Schema Developing Reports from the XML Schema QURSED System Architecture Web Designer End-User WYSIWYG XHTML Editor XHTML Query Form Page BROWSER Query Form Page Developer Report Pages Query Set Specification APP SERVER (Optional) XHTML Template Report Page (Optional) Custom Predicates QURSED Editor Query Form Page Query/Visual Association XML Schema QURSED Compiler Deployment Dynamic Server Pages QURSED Run Time Engine XQuery Expressions XML/XHTML XQuery Engine Tree Query Language (TQL) XML Data Tree PROT1 = “NEMA3” OR AND html PROT1 = “NEMA1” body sensors $S table GROUPBY ($PROD, $NAME) manufacturer tr SORTBY ($NAME DESC) name $NAME td product $PROD $NAME specs td body_type table GROUPBY ($CYL) OR tr $DIA <= 20 AND AND td $DIA <= 40 “Cylindrical” cylindrical $CYL Bindings tr diameter $DIA td barrel_style $BAR $DIA GROUPBY ($DIA) $HEI <= 20 AND td AND $WID <= 40 $BAR GROUPBY ($BAR) rectangular $REC table GROUPBY ($REC) height $HEI tr width $WID td protection_ratings “Rectangular” protection_rating $PROT1 tr td $HEI GROUPBY ($HEI) td $WID GROUPBY ($WID) • Condition tree • Schema-driven generation • Result tree XML Result Tree Tree Query Language (TQL) PROT1 = “NEMA3” OR PROT1 = “NEMA1” sensors $S manufacturer name $NAME product $PROD specs sensors AND protection_ratings protection_rating manufacturer manufacturer name name “Turck” “Turck” product product part_number part_number “B123” “A123” specs specs sensing_distance sensing_distance “25” “11” body_type body_type rectangular $PROT1 cylindrical height diameter “10” “17” width barrel_style “30” “Smooth” protection_ratings protection_ratings protection_rating protection_rating “NEMA3” “NEMA1” protection_rating protection_rating “NEMA4” “NEMA3” $NAME $PROD Turck product . part_number . . “A123” $NAME Turck $NAME Turck $PROD product . part_number . . “A123” $PROD product . part_number . . “B123” $PART $PROT1 A123 NEMA1 $PART $PROT1 A123 NEMA3 $PART $PROT1 B123 NEMA3 Tree Query Language (TQL) AND sensors $S manufacturer name $NAME product $PROD specs body_type $BODY OR $DIA <= 20 AND AND $DIA <= 40 $CYL cylindrical diameter $DIA barrel_style $BAR $HEI <= 20 AND AND $WID <= 40 rectangular $REC height $HEI width $WID protection_ratings protection_rating $NAME $PROD Turck product . part_number . “A123” . $NAME Turck $PROD product . part_number . “B123” . sensors $PART $BODY A123 cylindrical $PART $BODY B123 rectangular manufacturer manufacturer name name “Turck” “Turck” product product part_number part_number “B123” “A123” specs specs sensing_distance sensing_distance “25” “11” body_type body_type rectangular cylindrical height diameter “10” “17” width barrel_style “30” “Smooth” protection_ratings protection_ratings protection_rating protection_rating “NEMA3” “NEMA1” protection_rating protection_rating “NEMA4” “NEMA3” $CYL cylindrical . diameter . “17” . $DIA $BAR 17 Smooth $REC rectangular . height . “10” . $HEI $WID 10 30 TQL Semantics Condition Tree • Conjunctive Condition Trees – OR-Removal Algorithm – Transformation Rules AND A OR with replace B C D OR AND A B D AND A C D OR A OR with replace B C OR A B C element node A OR replace B with C D element node OR AND A B D AND A C D element node OR replace AND with AND OR element node AND element node AND TQL Semantics Result Tree html body table GROUPBY ($PROD, $NAME) tr SORTBY ($NAME DESC) td $NAME td table GROUPBY ($CYL) tr td “Cylindrical” tr td $DIA GROUPBY ($DIA) td $BAR GROUPBY ($BAR) table GROUPBY ($REC) tr td “Rectangular” tr td $HEI GROUPBY ($HEI) td $WID GROUPBY ($WID) html body table tr tr td td td Turck Turck td td A123 B123 td td 11 25 td td table table tr tr td td “Rectangular” “Cylindrical” tr tr td td 17 10 td td Smooth 30 Tree Query Language (TQL) • Translated to XQuery – – – – By QURSED Run-Time Engine TQL2XQuery Algorithm Syntax directed translation Tree patterns in TQL to nested FOR-WHERE-RETURN expressions in XQuery Query Set Specification (QSS) AND $DIST <= $#DIST $PROT1 = $#PROT1 sensors $S manufacturer name $NAME product $PROD specs sensing_distance $DIST body_type OR AND $DIA <= $#DIMX AND $DIA <= $#DIMY $CYL cylindrical $DIA diameter barrel_style $BAR $HEI <= $#DIMX AND $WID <= $#DIMY rectangular $REC $HEI height $WID width AND f1 f2 f3 protection_ratings protection_rating $PROT1 Condition Tree Generator • Parameterized boolean expressions • Multiple boolean expressions per AND node • Condition fragments Dependencies AND $BODY = $#BODY $DIA <= $#DIA AND $BAR <= $#BAR $HEI <= $#HEI AND $WID <= $#WID sensors body_type $BODY <f2, $#BODY = “Cylindrical”, {f1}> cylindrical diameter $DIA barrel_style $BAR rectangular $HEI height width f1 f2 f3 $WID <f3, $#BODY = “Rectangular”, {f1}> Dependencies f1 f2 f3 <f2, $#BODY = “cylindrical”, {f1}> <f3, $#BODY = “rectangular”, {f1}> • Dependencies Graph • Resolution algorithm based on topological sort Run-time: QSS to TQL Queries $DIST <= $#DIST 6 AND $NAME = $#NAME “Turck” sensors manufacturer name f1 f2 $NAME $PROD product part_number $PART specs sensing_distance $DIST Run-time: QSS to TQL Queries Constants from Query Form Page XHTML Report Page QURSED Run Time Engine Instantiate parameters in CTG with constants Find Active Condition Fragments Resolve Dependencies Union of Active Condition Fragments to TQL Query TQL Query to XQuery Expression XQuery Expressions XML/XHTML XQuery Engine QURSED Editor Building Query/Visual Association Condition Fragment List Data Path Form Control Predicate QURSED Editor From visual actions to QSS sensors/manufacturer/* = man_name_select AND $NAME = $#NAME sensors manufacturer name f1 $NAME product part_number specs sensing_distance Choice of schema element e means • Addition of e to the CTG • Addition of the e path to the CTG • Creation of a name variable for e QURSED Editor Disjunction ($DIA <= $#DIMX AND $DIA <= $#DIMY) OR ($HEI <= $#DIMX AND $WID <= $#DIMY) QURSED Editor Disjunction $DIA <= $#DIMX AND $DIA <= $#DIMY AND OR $HEI <= $#DIMX AND $WID <= $#DIMY sensors body_type cylindrical $CYL diameter $DIA barrel_style $BAR rectangular $REC $HEI height $WID width AND sensors body_type OR AND $DIA <= $#DIMX AND $DIA <= $#DIMY cylindrical $CYL diameter $DIA barrel_style $BAR AND $HEI <= $#DIMX AND $WID <= $#DIMY rectangular $REC $HEI height $WID width • Creation of disjunctive condition triggers transformation of the Condition Tree Generator – ORNodes Algorithm QURSED Editor Building Reports Group By Mapping Element Mapping Elements to Appear on Report QURSED Editor Result Tree html body table manufacturer GROUPBY ($MAN) tr td AND $NAME GROUPBY ($NAME) sensors td product table manufacturer GROUPBY ($PROD) tr $NAME td name table cylindrical GROUPBY ($CYL) product $PROD tr $PART td part_number $DIA GROUPBY ($DIA) td specs $BAR GROUPBY ($BAR) $DIST sensing_distance fR rectangular table GROUPBY ($REC) tr td $HEI GROUPBY ($HEI) td $WID GROUPBY ($WID) More Features • Expandable schema – Multiple copies/variables for repeatable elements • Optional elements • Sort-by options • Template-driven construction of report pages – Element mappings – Group-by mappings • Detailed list of visual actions of QURSED Editor QURSED Contributions • The first web-based generator of powerful query forms and reports for semistructured XML data • Declarative – Separates querying functionality and presentation • Handles semistructureness – Disjunction • Technical foundation – XML Schema, QSS, TQL, XQuery • QURSED Editor – Visual actions “translated” to QSS and query/visual association – Automates report construction for heterogeneous data Related work • Web-based Form and Report Generators – Macromedia Ultradev, Coldfusion, Microsoft Visual InterDev – Excellent for flat uniform relational tables – Visual query formulation paradigm allows the specification of projections, sort-bys, simple conditions – However, the development of form and report pages for semistructured data requires substantial programming effort • Visual Querying Interfaces – EquiX, BBQ, VQBD, Lorel’s DataGuide-driven GUI, PESTO – Excellent visual paradigm for the formulation of fairly complex queries – The goal is the development of a query or a query template – User needs to be familiar with database models and schemas Questions and Answers ?