Done by: LAKHLIFI Houda
The growth of the world wide web has created a new kind of data management problem which is building and maintaining web sites. One of the tools that are known to be used in the construction of web sites is Strudel, a web site management system that applies familiar concepts from database management systems, to the process of building websites. The main motivation for developing Strudel is the observation that with current technologies, creating and managing large sites is tedious because a site designer must perform simultaneously three tasks: 1) Selecting and managing the data available at the site, 2) Organizing the structure of data in individual pages and between multiple pages, and 3) Designing the visual presentation of pages. Indeed, existing site management tools often unify these tasks, which prevents the site builder from performing each task separately and more importantly, prohibits the generation of multiple sites from the same data. Moreover, many sites derive their content from multiple and varied data sources such as relational databases structured files and existing web sites, and such thing requires the site builder to have detailed understanding of all data sources. Thus all these problems constitute the starting basis that launched the project of Strudel.
Strudel presents the following features:
It separates the 3 web site creation tasks
It integrates content from multiple sources
It manages semi structured data
It uses a high level declarative language for managing site’s structure (StruQL)
Its advantages over the other web sites management tools are the following:
It derives multiple sites from the same data
It supports easy restructuring and modification
It provides platform for:
-Enforcing integrity constraints
-Designing policies for efficient run-time management of sites
1.
Content Management Layer:
Using Strudel, the site builder first defines the data that will be available at the site. The Web site's raw data resides either in tuple-stream sources (e.g., relational databases, flat files) or in graph-structured sources (e.g., XML documents). Tuple-stream sources can be accessed directly using Strudel's simple API for querying sources, or tuple-stream sources can be mapped into Strudel's data model. Source-specific wrappers translate an external source into Strudel's graph model. The integrated view of the data is called the data graph .
A data graph contains objects and collections . Objects are connected by directed edges labeled with string-valued attributes .
Objects are either internal nodes, identified by a unique object identifier (OID), or are atomic values, such as integers, strings, and files. Strudel supports several atomic types that commonly appear in Web pages e.g., URLs, and PostScript, text, image, and HTML files. Here's an example of a graph database of bibliographic data:
Objects are grouped into named collections . Objects may belong to multiple collections, and objects in the same collection may have different representations. For example, the graph above has two collections: Bibentry , which contains the objects bib1 and bib2 , and People , which contains pers1 and pers2 . Note that although bib1 and bib2 are in the same collection, they do not have the same representation. bib1 has a month attribute, but bib2 does not.
Strudel has a specific ASCII format for representing the logical view of its data model. Those graph files have a .ddl format (Data Definition Language) and are the first ASCII format that was used by Strudel. Today, new versions of Strudel support
XML representation of data graphs. Those XML documents can conform to Strudel’s
DTD but it is not a requirement.
Example using a data definition language (.ddl) format:
Graph people
Collection Person { }
Object norman in Person { lastname “Ramsey” firstname “Norman”
}
Object mary in Person { lastname “Fernandez” firstname “Mary” homepage is URL http://www.research.att.com
}
Example using an XML format:
<?xml version="1.0"?>
<STRUDEL>
<collections>
<collection name="Bibentry">
<member IDREF="Bibinfo.bib1"/>
<member IDREF="Bibinfo.bib2"/>
</collection>
<collection name="People">
<member IDREF="Bibinfo.pers1"/>
<member IDREF="Bibinfo.pers2"/>
</collection>
</collections>
<objects>
<object ID="Bibinfo.bib1">
<year type="int">1995</year>
<month>Jun</month>
<title>Simple and Effective ...</title>
<author ID="Bibinfo.pers1">
<firstname>Mary</firstname>
<lastname>Fernandez</lastname>
<homepage type="url">http://www.research.att.com/~mff</homepage>
</author>
<bibtexkey>bib1</bibtexkey>
</object>
<object ID="Bibinfo.bib2">
<bibtexkey>bib2</bibtexkey>
<booktitle>ICDE '98</booktitle>
<category>Semistructured Data</category>
<title>Optimizing ...</title>
<author IDREF="Bibinfo.pers1"/>
<author ID="Bibinfo.pers2">
<firstname>Dan</firstname>
<lastname>Suciu</lastname>
<homepage type="url">http://www.research.att.com/~suciu</homepage>
</author>
<year>1998</year>
</object>
</objects>
</STRUDEL>
The document type descriptor (DTD) for a Strudel site graph is:
<?xml encoding="US-ASCII"?>
<!ELEMENT STRUDEL (collections,objects)>
<!ELEMENT collections (collection*)>
<!ELEMENT collection (member*)>
<!ATTLIST collection name ID #REQUIRED>
<!ELEMENT member EMPTY >
<!ATTLIST member IDREF IDREF #REQUIRED>
<!ELEMENT objects (object*)>
<!ELEMENT object ANY >
<!ATTLIST object ID ID #REQUIRED>
<!ATTLIST object type CDATA #IMPLICIT>
Second, the site builder declaratively specifies the Web site's structure using a site-definition query in StruQL, Strudel's query language. The result of evaluating the site-definition query on the data graph is a site graph , which models both the site's content and structure. A site graph can be rendered as a browsable Web site by Strudel's
HTML generator. Even though we give them a different name, site graphs are just data graphs and they can be provided as input to other StruQL queries.
2.1
StruQL query:
A query in Strudel's query language allows a site builder:
To extract the data that will be available in the site from tuple-stream and/or graph-structured sources.
To create a site graph that specifies both the content and structure of the site.
StruQL query is divided into two parts: a query part and a graph-construction part .
Each part corresponds to one of the above steps.
StruQL queries are declarative , the expressions in a StruQL query specify what and where data will appear in resulting site graph. It does not specify how to compute this information. Because StruQL queries are declarative, their bodies can be evaluated in any order.
StruQL queries are also compositional , a StruQL query accepts a graph as input and produces a graph as output. This means the results of a query can be provided as input to another StruQL query.
2.2
Examples of StruQL queries:
In each of the following queries, the where clause selects objects and values of interest; the collect clause creates a new collection and adds the selected objects to the new collection. The new collection is defined in the output (or site) graph; the input graph is immutable and remains unchanged.
Selection on attributes
This query selects all objects b in the Bibentry collection that have a booktitle or journal attribute; it puts all such objects in the new collection RefereedPub . Because the query refers to objects in the input data graph, the result of this query contains the entire input graph and the new collection RefereedPub . where Bibentry{b}, b -> l -> x, l = "booktitle" or l = "journal" collect RefereedPub(b)
Selection on attribute values
This query selects objects b in the Bibentry collection that have a booktitle attribute whose value is "SIGMOD" and puts all such objects in the new collection
InSIGMOD . As above, the result is a graph that contains the entire input graph and the new collection InSIGMOD . where Bibentry{b}, b -> "booktitle" -> "SIGMOD" collect InSIGMOD(b)
Traversing paths with regular path expressions
Regular path expressions support traversal of arbitrary paths in the input graph.
For example, this query selects all Bibentry objects that have an author attribute that refers to an object that has a lastname attribute. where Bibentry{b},
b -> "author" -> x,
x -> "lastname" -> "Fernandez" collect ByMe(b)
The above query doesn't select objects that might have an author attribute that is a string; we could rewrite this query to handle both representations as follows: where Bibentry{b},
b -> "author" -> x,
x -> "lastname" -> "Fernandez" or x = "Fernandez" collect ByMe(b)
Third, the builder specifies the graphical presentation of pages in Strudel's HTMLtemplate language . Strudel's HTML generator produces HTML text for every node in the site graph from a corresponding HTML template; the result is the browsable Web site.
To sum up, Strudel can be a very efficient tool for creating large web sites. This is mainly due to the following features:
Multiple views of the Web site can be defined with minimal effort
Personalized Web sites can be offered
The three Web creation tasks are separated
Maintenance is easier. It is easier to understand and modify a declarative program like StruQL than a CGI-BIN script
Using a declarative query language allows to express more complex queries
References: http://citeseer.nj.nec.com
http://www.research.att.com/~mff/strudel/