Semi-structured data

advertisement
Semi-structured data
• Data is not just text, but is not as wellstructured as data in databases
• Occurs often in web databanks
• Occurs often in integration of databanks
Semi-structured data
1
Semi-structured data - properties
2
Semi-structured data - properties
•
•
•
•
irregular structure
implicit structure
partial structure
a posteriori ’data guide’
versus a priori schema
• large data guides
• It should be possible to ignore the data
guide upon querying
• Data guide changes fast
• object can change type/class
• difference between data guide and data is
blurred
3
4
OEM (Object Exchange Model)
Semi-structured data - model
• network of nodes
• object model (oid)
• query: path search in the network
5
• Graph
• Nodes: objects
oid
atomic or complex
- atoms: integer, string, gif, html, …
- value of a complex object is a set of
object references (label, oid)
• Edges have labels
• OEM is used by a number of systems (ex. Lorel)
6
OEM example
12
restaurant
Lorel query language
Restaurant Guide
Guide
restaurant
cafe
nearby
19
35
zipcode
category name address
17
gourmet
13
category
14
Chef Chu
street
44
El Camino Real
city
15
Palo Alto
54
77
92310
nearby
name
66
18
Vietnamese
Saigon
address address
23
price
25
Mountain
View
Menlo Park
price
category
name
55
79
80
cheap
fast food
Sandra
zipcode
16
92310
1. Find all places to eat Vietnamese food
select P
from RestaurantGuide.% P
where P.category grep “ietnamese”
nearby
2. Find the names and streets of all restaurants in Palo Alto
select R.name, A.street
from RestaurantGuide.restaurant{R}.address A
where A.city = “Palo Alto”
7
Data Guides
Lorel query language
3. Find all restaurants to eat with zipcode 92310
select RestaurantGuide.restaurant
where
RestaurantGuide.restaurant(.address)?.zipcode = 92310
Wildcards and variables
? - 0 or 1 path
+ - 1 or more paths
* - 0 or more paths
# - any path
% - 0 or more chars
8
- object variables
select P from Guide.% P
select A from #.address{A}
- path variables
select Guide.#@P.name
• A structural summary over a databank
that is used as a dynamic schema
• Is used in query formulation and
optimization
• Is often created a posteriori
• Properties:
– concise
– accurate
– convenient
9
Data Guides - definitions
10
Data Guides - definitions
• Label path: sequence of labels
L1.L2. … .Ln
• Data path: alternating sequence of
labels and oid:s
L1.o1.L2.o2. … .Ln.on
• Data path d is an instance of label
path l if the sequences of labels are
identical in l and d.
11
• A data guide for object s is an object
d such that every label path of s has
exact one data path instance in d,
and each label path in d is a label
path of s.
12
Data Guides - example
Data Guides
Data model
minimal Data Guide
1
• A databank can have several data
guides
A
• Minimal data guides
the smallest data guides
B
18
A
B
B
2
3
4
19
C
C
C
C
5
6
7
20
D
D
D
D
9
10
8
(a)
21
(c)
13
Minimal Data Guides
14
Strong Data Guides
Intuitively:
”label paths that reach the same set of objects
in the data model = label paths that reach
the same objects in the data guide”
• Concise
• May be hard to maintain
Example: child node for 10 with label E
15
Strong Data Guides - definitions
16
Strong Data Guides - definitions
An object o can be reached from s via l if
there is a data path of s that is an instance
of l and that has o as last oid
(L1.o1.L2.o2. … Ln.o)
The target set for label path l in object s is
the set of objects that can be reached from s
via l. Notation: T(s,l)
L(s,l): set of label paths of s that have the
same target set in s as l.
17
Definition:
d is a strong data guide for s if
for all label paths l of s it holds that
L(s,l) = L(d,l)
There is a 1-1-mapping between target
sets in the data model and nodes in a
strong data guide.
18
Data Guides - example
strong Data Guide
Data model
1
A
B
Strong Data Guides - algorithm
minimal Data Guide
11
A
B
18
A
B
B
2
3
4
12
13
19
C
C
C
C
C
C
5
6
7
14
15
20
D
D
D
D
D
D
9
10
16
8
17
21
(b)
(a)
Implementation:
- Traverse data model depth-first.
- Each time you find a new target set for
label path l, create a new object in the data
guide.
If the target set is already represented in the
data guide, do not create a new object, but
link to the existing object.
(c)
19
20
Strong Data Guides - use
– Easier to maintain
– Used as path index for query
optimization
Semi-structured data
exercises
21
Exercise 1
22
Exercise 2
• Represent the relations below using the OEM data
model.
• Using the data model from the previous question,
formulate the following queries using Lorel:
– find all the restaurants that are located in Linkoping
r_id
r1
r2
r3
c_id
c1
c2
name
Hamlet
Normandie
McDonald's
name
Linkoping
Norkoping
Cities
– list the restaurants by city (equivalent of GROUP BY)
Restaurants
r_id
r1
r2
r3
c_id
c1
c1
c2
– find the address (city and street) of the “Hamlet” restaurant
street
Storgatan
St.Larsgatan
Kungsgatan
Restaurants&Cities
23
24
Exercise 3
Exercise 4
• Draw the strong Data Guide for
the restaurant guide data model below.
• Write 4 simple queries in Lorel that illustrate the use of
coercion in the following types of comparison:
–
–
–
–
string type against integer type;
value against atomic object
value against complex object
value against set of objects
Guide
RestaurantGuide
1
restaurant
restaurant
cafe
nearby
nearby
2
4
3
nearby
category name address
5
gourmet
Explain how coercion works in each case
16
El Camino Real
name
8
Chef Chu
street
25
contact
7
6
9
Saigon
city
zipcode
17
18
Palo Alto
92310
manager
address contact
10
11
Menlo Park
category name
13
fast food
Sandra
reservation
19
20
phone
phone
street
21
Rydsvagen
address contact
14
12
15
city zipcode reservation manager
22
Linkoping
23
24
25
phone
phone
58435
26
27
28
71-72-73
11-12-13
31-32-33
29
26
34-35-36
Download