Change Management XML and Data on the Internet Versions and Versioning +

advertisement

2003

Change Management

Versions and Versioning

+

XML and Data on the Internet

XML

Semi-Structured Data

Querying XML

Structured Data

Data Interchange

Based upon slides by Earl and Denise Ecklund for INF312, IfI/UiO, høsten 2002

INF 3180/4180, IfI/UiO 1

Overview

Change Management

– Overview

– Notions

– Schema Versioning

– Extension Versioning

– Implementation Concepts

XML and Data on the Internet

– XML

– Semi-Structured Data

– Structured Data

– Querying XML Data

– Data Interchange

2003 INF 3180/4180, IfI/UiO 2

1

Applications Domains

• Design applications

- electronic design (eCAD)

- mechnical design (mCAD)

- manufacturing (CAM)

- software engineering (CASE)

- technical documentation

- ...

• Temporal applications

- business & commerce

- science

- statistics

- ...

2003 INF 3180/4180, IfI/UiO 3

Versions in Engineering

• Versioned Objects

• Workspaces

• Collaborative Engineering

Design

Applications

User Interface/

Browser

• Parts Libraries

• Configurations

Process

Management

System

HISTORY

MANAGER

WORKFLOW

EVENT LOG

WORKSPACE

MANAGER

OBJECT

MANAGER

PHYSICAL

DATABASE

Architecture for design and process management

Design

Management

System

2003 INF 3180/4180, IfI/UiO 4

2

Workspaces and Developers

• Versioned Objects

– Updates are non-destructive

• Check-out/check-in transactions

• “update” creates a new version (successor)

• Currency (current versions)

– Structural objects and versions

• Workspaces

– A place to check out a group of objects

– Private, team, group, development, released, project, …

– Hierarchical checkout, nearest version visibility

– Active workspace

• Collaborative Development

– Sharing objects within a workspace

– Multiple concurrent writers (?!!)

2003 INF 3180/4180, IfI/UiO 5

Libraries and Configurations

• Parts Libraries

– Library parts are versioned objects

• Functionally equivalent variants

• Context-dependent version selection rules

– Path search

– Working context rules

» Work group

» Work space

• Configurations

– Compound objects

• Aggregation of components (part-of)

• Part specifies version selection rules (make)

– Bound Configurations

• A specific version is identified for each component

– Configurations may themselves be versioned

2003 INF 3180/4180, IfI/UiO 6

3

System Requirements for Versioning

• Versioning is applied to objects (seldom to classes).

• All persistent objects can have versions and there should (in theory) be no predefined limit on the number of versions an object should have.

• Applications should be able to access either the current version

(which should be the default) or any specified version. Only the current version of an object can be updated. Access to older versions should normally be read-only.

2003 INF 3180/4180, IfI/UiO 7

Overview

Change Management

– Overview

– Notions

– Schema Versioning

– Extension Versioning

– Implementation Concepts

XML and Data on the Internet

– XML

– Semi-Structured Data

– Structured Data

– Querying XML Data

– Data Interchange

2003 INF 3180/4180, IfI/UiO 8

4

Notions

• Transaction time

• Schema evolution & schema versioning

• Extension versioning

• Version:

– revision

– alternative

– variant

– representation

– configuration

2003 INF 3180/4180, IfI/UiO

Transaction Time

• 3 kinds of time:

– User-defined time:

• semantics of time only known by the user.

– Valid time:

• the time a fact was true in reality.

– Transaction time:

• the time the fact was stored in the database .

• Valid time and transaction time are supported by the DBMS.

• Considering transaction time: an important distinction

– Extension versioning

– Schema versioning

2003 INF 3180/4180, IfI/UiO 10

9

5

Overview

Change Management

– Overview

– Notions

– Schema Versioning

– Extension Versioning

– Implementation Concepts

XML and Data on the Internet

– XML

– Semi-Structured Data

– Structured Data

– Querying XML Data

– Data Interchange

2003 INF 3180/4180, IfI/UiO 11

Schema Evolution

• Schema can change over time in response to the varying needs of the application.

• 2 ways to implement schema evolution:

– eager: all instances are changed immediately according to the schema changes made.

– lazy: when schema is modified, no disk-resident instances need to be updated. Instead, when an instance is referenced by an application program and fetched into main memory, it is transformed into an instance conforming to the schema currently in effect.

2003 INF 3180/4180, IfI/UiO 12

6

Schema Changes

• Change definition of class/type:

– add/change/drop an attribute/relationship

– add/change/drop a method

– inherit different attribute/method definition through changes in the structure of the class/type hierarchy

• Change structure of the class/type hierarchy:

– add/change/drop class/type

– make a class/type S a new superclass/type of a class/type C

– remove a class/type S as a superclass/type of a class/type C

– change inheritance ordering (only for multiple inheritance)

2003 INF 3180/4180, IfI/UiO 13

Schema Versioning

• Multiple schemata logically in effect.

• 3 general approaches:

– Entire schema is timestamped with a transaction time.

– Specifying the system catalog as transaction time relations.

– Viewing the entire schema as versioned object with status information and version-derivation hierarchy.

2003 INF 3180/4180, IfI/UiO 14

7

Overview

Change Management

– Overview

– Notions

– Schema Versioning

– Extension Versioning

– Implementation Concepts

XML and Data on the Internet

– XML

– Semi-Structured Data

– Structured Data

– Querying XML Data

– Data Interchange

2003 INF 3180/4180, IfI/UiO 15

Extension Versioning

• 3 general approaches:

– Use model directly (no changes to data model and query language).

– General extensions to data model and query language to support time-varying information.

– Data model and query language are modified to explicitly support transaction time.

2003 INF 3180/4180, IfI/UiO 16

8

Component Hierarchies

(PART_OF) - Aggregation

• Associated with each design object are primitives of its type, such as layout geometries, logic schematics, of functional descriptions.

• A particular representation of a complete design is a component hierarchy of representation objects rooted at the object that describes the top level of the design.

Datapath.Layout

ALU.Layout

INF 3180/4180, IfI/UiO

Shifter.Layout

RFile.Layout

17 2003

Version Histories (IS_A, DERIVED_FROM)

• Incorporating the time dimension into the representation of design objects by adding a version number to the name and type associated with each object.

• Version histories explicitly record the ancestor/descendent interrelationships among versions through distinguished

DERIVED_FROM relationships.

ALU.Layout

2003

ALU[1].Layout

ALU[0].Layout

ALU[4].Layout

Alternative

ALU[5].Layout

ALU[2].Layout

ALU[3].Layout

Derivation

INF 3180/4180, IfI/UiO 18

9

Versions

• Explicit representation of timely/causal history of objects

Example: low-polluting car

Body: B1

Drive: 4WD

Catalyst: 9

Consumption: 101

Body: B1

Drive: 4WD

Catalyst: —

Consumption: 71

2003

Body: B1

Drive: 4WD

Catalyst: 9

Consumption: 81

INF 3180/4180, IfI/UiO 19

Versions - 2

• “Generic” information common to all versions

+ specific information for each version

Generic information:

Body: B1

Drive: 4WD

Catalyst: 9

Consumption: 101

Catalyst: —

Consumption: 71

Catalyst: 9

Consumption: 81

2003 INF 3180/4180, IfI/UiO 20

10

Kinds of Versions

• Revisions

– Explicit representation of different object states resulting from subsequent modifications

Catalyst: 9

Consumption: 101

Catalyst: 9

Consumption: 81

• Alternatives

– Different approaches/ways for the development of an object

Catalyst: 9

Consumption: 101

Catalyst: —

Consumption: 71

2003 INF 3180/4180, IfI/UiO 21

Kinds of Versions - 2

• Variants

– Versions that are equal under a specific abstraction, distinguishable through certain parameters

Catalyst: —

Consumption: 71

Fuel: unleaded

Catalyst: —

Consumption: 71

Fuel: normal

• Representations (equivalences)

– Different views of the same object

Catalyst: —

Consumption: 101

Fuel: normal

ALU

ALU[4].Layout

ALU[4].Transistor

ALU[2].Function

2003 INF 3180/4180, IfI/UiO 22

11

Kinds of Versions - 3

• Configurations

– Describe the composition and structure of compound objects

– The object is composed of selected versions of its components

Datapath[3].Layout

Auto

V1

RFile[4].Layout

ALU[2].Layout

Engine

V1

Body

V2

2003 INF 3180/4180, IfI/UiO 23

Currency

• It is frequently the case that the default or current version is not the most recently created version.

– It is useful to separate current and last and to make currency update explicit.

– Operations are needed to position currency explicitly anywhere within the version history.

• Including forking new update branches from the current version when it is not a leaf node in the version tree!

ALU[0]

Current Version

ALU[3]

ALU[2]

Main Derivation Branch ALU[5]

ALU[1]

ALU[4]

2003 INF 3180/4180, IfI/UiO 24

12

Dynamic Configurations

• It is useful to be able to create dynamic configurations: configurations in which references to components are not resolved until the PART_OF relationships are actually traversed.

• Variety of implementation approaches

ALU[0]

ALU[1]

ALU[2]

RFile[1]

RFile[0]

RFile[2]

2003 INF 3180/4180, IfI/UiO 25

Workspaces

• Workspaces are named repositories for design objects.

• 3 kinds of workspaces:

– Archive

– Group

– Private

ARCHIVE

V[4]

Is-a-descendent-of

GROUP

Check-out

V[5]

PRIVATE

V[5] V[5]

V[5]

Check-in

2003 INF 3180/4180, IfI/UiO 26

13

Other Currency Labels

• What is the ”current” version may depend on the context from which it is referenced or how it is to be used

– The current version in:

• product manufacturing

• new product development

• research and development

• Labeled derivation paths: mfg, dev, r&d, …

• A particular path may be used in a particular workspace

ALU[0]

Current Version

ALU[3]

ALU[2]

Main Derivation Branch mfg

ALU[5] r&d

ALU[1]

ALU[4]

2003 dev

INF 3180/4180, IfI/UiO 27

Logical Versus Physical Representations

Logical

A

0

A

2

A

1

B C

D

Physical

Con A

2

Ver A

Con B

E

Con D

Is-a-descendent of is-a-component-of is-equivalent

Implementation of version model relationships

INF 3180/4180, IfI/UiO

Con A

1

Con C

Eq

Con A

0

Con E

28 2003

14

Change/Constraint Propagation

• Change propagation: process that automatically incorporates new versions into configurations.

(I) D

1

D

0 • 2 key issues:

– How to limit the scope of propagation?

– How to disambiguate the path of changes?

C

1

B

0

C

1

A

1

A

0

B

0

A

0

A

1

(II)

D

1

• Constraint propagation: enforcement of equivalence constraints be procedurally regenerating new versions.

Is-a-descendent of

(III)

D

1 change path

C

1

C

0

B

1

A

1

B

1

A

0

A

1

2003 INF 3180/4180, IfI/UiO 29

Group Check-In / Group Check-Out

A

0

A

1

A

0

A

1

A

2

B

0 B

1

C

0

C

1

B

0 B

1

C

0

C

A

0

Check-in ( B

1

, C

1

)

Check-in ( B

1

); Check-in ( C

1

)

Group Check-in versus Individual Check-in

D

0

A

0 D

0

1

B

0

C

0

E

0

B

0

C

0

E

0

C

1

C

1

E

1

Check-out ( C

1

); Check-out ( E

1

)

2003

Check-out ( C

1

, E

1

)

Individual Check-out versus Group Check-out

INF 3180/4180, IfI/UiO

E

1

30

15

Logical

A

0

Change/Constraint Propagation - 2

Con D

1 D

0

D

0

Con A

1

B

0

C

0

Physical

Con A

0

(a) Initial

Configuration

Con B

0

Con D

0

Con C

0

Con E

0

Eq

E

0

E

0

Eq

(c) Propagate change to A

1

Con E

1

Eq

(e) Propagate change to D

1

A

1 A

0 D

1 D

0

B

1

B

0 C

0

E

1 E

0

Con B

1

Eq

(d) Spawn E

1

(f) Final logical view

Eq

(b) Check-out B

0 to create B

1

2003 INF 3180/4180, IfI/UiO 31

Inheritance

SUPERTYPES

• Mechanism to model versions:

Allows defining default operations and values for new versions TYPES

Datapath Objects

ALU.Layout

(Generic)

INSTANCES

ALU

1

.Layout

Problems with type-instance inheritance

ALU

4

.Layout

ALU

5

.Transistor

ALU

5

.Transistor

SUPERTYPE

TYPES

ALU.Layout

ALU

3

.Transistor

ALU

5

.Layout

2003

ALU

5

.Layout

ALU

4

.Layout

INSTANCES

INF 3180/4180, IfI/UiO 32

16

Overview

Change Management

– Overview

– Notions

– Schema Versioning

– Extension Versioning

– Implementation Concepts

XML and Data on the Internet

– XML

– Semi-Structured Data

– Structured Data

– Querying XML Data

– Data Interchange

2003 INF 3180/4180, IfI/UiO 33

Realization of Versions

• Database Systems support the management of version graphs

– part of the data model

• similar to subobjects existence of versions depend on existence of generic object

• realized through references or relationships explicit relationships “version_of” and “has_version” operations for navigation in version graphs

– no pre-defined semantics of “version”

(e.g., “alternative” or “revision”)

– by means of class hierarchies

IS_A

IS_A class

“Version”

Further Functionality

• merging versions

• change notification

• version percolation user-defined class

IS_A

2003 INF 3180/4180, IfI/UiO 34

17

Approaches for Versioning in OODBS

• As part of the data model (different approaches) object type CELL attributes cell_id: string(20), cell_name: string(20) versions treelike

(attributes cdata: … structure is PATH) end CELL; object type PATH

… end PATH;

CELL3

(v1)

P1 P2

(v2)

P1 P2 P3

(v3)

P1 P4 P5

P6 P7 P8

2003 INF 3180/4180, IfI/UiO 35

Approaches for Versioning in OODBS - 2

• With help of the class/type hierarchy

-> versions as subobjects

2003 INF 3180/4180, IfI/UiO 36

18

Approaches for Versioning in OODBS - 3

• By predefined classes (using Inheritance)

Mechanism to model versions:

Version.<operation> and de-referencing

IS_A

Also allows defining default operations and default values for new versions

IS_A

IS_A user-defined class class

“Version”

2003 INF 3180/4180, IfI/UiO 37

Approaches for Versioning in OODBS - 4

• “Version percolation”

CELL3/

V1

2003

PATH1/

V1

CELL3/

V2

CELL3/

V1

PATH2/

V1 create a new version of PATH1

PATH1/

V2

PATH1/

V1

PATH2/

V1

INF 3180/4180, IfI/UiO 38

19

Approaches for Versioning in OODBS - 5

• Intension

interface versionable

current() current4(label) version(#)

… implements

class structural layout class layout class version of layout interface version

derivedFrom() version#()

… implements

2003 INF 3180/4180, IfI/UiO 39

Approaches for Versioning in OODBS - 5

• Extension

ALU

Current Labeled Version

Current Version mfg dev r&d

ALU[0]

ALU[3]

ALU[5]

ALU[2]

ALU[1]

ALU[4]

2003 INF 3180/4180, IfI/UiO 40

20

Change Management Summary

• semantics of time domain important for many applications

• significant amount of research has been expended on temporal data models and versioning concepts

• various versioning concepts, temporal data models and query languages have been proposed

– ”If we have three versions of a chip, how many instances of type CHIP are we talking about? Reasonable answers include one, three, and four.” -- W. Kent

• nearly all OODBSs provide versioning concepts

2003 INF 3180/4180, IfI/UiO 41

Versioning Literature

Katz, R.H., Toward a Unified Framework for Version Modeling in

Engineering Databases, ACM Computing Surveys,

Vol. 22, No. 4, December 1990, pp. 375-409.

Kent, W., An Overview of the Versioning Problem, Proc.

SIGMOD’89, ACM, Portland, Oregon, 1989, pp. 5-7.

Kim, W. ed., Modern Database Systems: The Object Model,

Interoperability, and Beyond, Addison-Wesley, 1995, chapters 19 & 20.

LeBlang, D. And Chase, R., Parallel Software Conbfiguration

Management in a Network Environment, IEEE Software,

Vol 4, No. 6, December 1987, pp. 28-35.

2003 INF 3180/4180, IfI/UiO 42

21

Overview

Change Management

– Overview

– Notions

– Schema Versioning

– Extension Versioning

– Implementation Concepts

XML and Data on the Internet

– XML

– Semi-Structured Data

– Structured Data

– Querying XML Data

– Data Interchange

2003 INF 3180/4180, IfI/UiO 43

XML

Extensible Markup Language, World Wide Web Consortium (W3C)

• Standard of the W3C in 1998

• Documents (content)

• XML Schema or DTD (structure)

• Namespaces

• XSL, XSLT (stylesheets, transformations)

• XPath, XPointer, XLink

• XQuery

• Tagged elements with attributes

– Extensible, self-describing

• Tag names must be unique (use namespaces)

2003 INF 3180/4180, IfI/UiO 44

22

Namespaces

• All tags must be defined in a namespace

– Unique global names using URI references

• W3C provides certain standard namespaces (xsd, xsi, html)

• Organizations can provide domain specific namespaces

• Each document specifies the namespaces used in its schema

– A default namespace - a standard or domain namespace

– Its own target namespace - for elements defined in document

– Others …

• Referencing a name (syntax)

– Qualified names (namespace:name)

– name (default namespace implied)

2003 INF 3180/4180, IfI/UiO xsd:element car

45

Elements

• Element declarations (see XML Schema)

<xsd:element name="numStudents" type="xsd:positiveInteger"/>

• Element definitions

– start tag

– end tag

<tagName>

</tagName> must be paired with an

– <emptyElement/>

<numStudents>31</numStudents>

• Sub-elements may be repeated

– Subject to minOccurs and maxOccurs constraints (XML Schema)

• Both default to 1 if not specified

• References

<xsd.element name=“foo”, type=“xsd.string”/>

<xsd.element ref=“foo”/>

2003 INF 3180/4180, IfI/UiO 46

23

Attributes, Entities

• Attributes (of an element)

– Name-value pairs placed in the start tags

• name = “value”

• Each name may only appear once (single-valued)

<elevation units="feet">5440</elevation>

• Entity

– & mdash ; an entity representing the character ‘—’

– starts with & and ends with ;

– Used to distinguish reserved words from content

&lt; represents the character ‘<‘ (also &gt; &amp; &apos; &quot;)

– <!ENTITY myPic SYSTEM “me.jpg” NDATA JPEGFORMAT>

2003 INF 3180/4180, IfI/UiO 47

XML Document

<? xml version=“1.0” ?>

<racingTeams xmlns=“ http://www.cart.org

” xmlns:xsi=“ http://www.w3.org/2001/XMLSchema-Instance ” xsi:schemaLocation=“ http:// www.cart.org team.xsd

”>

<team>

<name>“Shell</name>

<car number=3><make>“Helix Ford”</make>

<engine id=“47456”/></car>

<car number=4><make>“Helix Ford”</make>…</car>

<driver><name>“Steve Johnson”</name>

<points>2217</points></driver >

<driver><name>“Paul Radisich”</name></driver >

<spares><engine id=“102555”/>

<engine id=“102556”/></spares>

</team>

. . .

</racingTeams>

<!-- Each document has one root element (e.g., racingTeams) whose name matches a schema element or <!DOCTYPE name> -->

2003 INF 3180/4180, IfI/UiO 48

24

XML in Action

1.

Export: XML document written to file or stream

2. Delivery: Passed to import software

3. Import: Parse XML document,

Validate using DTD or Schema

(optional)

Schema or DTD

Validate

Application1

Export

Document

Import

Application2

2003 INF 3180/4180, IfI/UiO 49

Overview

Change Management

– Overview

– Notions

– Schema Versioning

– Extension Versioning

– Implementation Concepts

XML and Data on the Internet

– XML

– Semi-Structured Data

– Structured Data

– Querying XML Data

– Data Interchange

2003 INF 3180/4180, IfI/UiO 50

25

Semi-Structured Data

• Semi-structured data is data that may be irregular or incomplete and have a structure that may change rapidly or unpredictably.

– It generally has some structure, but does not conform to a fixed schema

– Self-describing (e.g., in terms of XML element tags)

• Characteristics

– Heterogeneous

– Irregular structure

– Large evolving schema

– Hyperdata nodes and references

• Desirable to treat web sources like a database

– Often XML documents are like semi-structured data

2003 INF 3180/4180, IfI/UiO 51

Semi-Structured Example

DreamHome (&1)

Branch (&2) street (&7) “22 Deer Rd”

Manager &3

Staff (&3) name (&8) fName (&17) “John” lName (&18) “White”

ManagerOf &2

PropertyForRent (&5) street (&11) “2 Manor Rd” type (&12) “Flat” monthlyRent (&13) 375

OverseenBy &4

PropertyForRent (&6) street (&14) “18 Dale Rd” type (&15) 1 annualRent (&16) 7200

2003 name (&9) “Ann Beech” salary (&10) 12000

Oversees &5

Oversees &6

INF 3180/4180, IfI/UiO 52

26

Overview

Change Management

– Overview

– Notions

– Schema Versioning

– Extension Versioning

– Implementation Concepts

XML and Data on the Internet

– XML

– Semi-Structured Data

– Structured Data

– Querying XML Data

– Data Interchange

2003 INF 3180/4180, IfI/UiO 53

XML Schema

• Purpose of XML Schema, to declare

– The

organization

of instance documents

– The

datatype

of each element/attribute

(New element types go in targetNamespace)

• Features

– Same syntax as XML

– Integrated with namespace mechanism

• xsd is standard default namespace for schemata

• Included schema must have same targetNamespace

– Rich set of built-in types (44+ data types)

– Elements have nested scope (scoped naming)

– Keys definable in terms of elements

– Referential integrity constraints enforced with keys

– Options for ordered (sequences) or unordered (sets)

– Validation of documents against a schema

2003 INF 3180/4180, IfI/UiO 54

27

XML Schema Grammar

• Namespace

• Compositors

<sequence>, <choice>, <all>, <union>, <list>

• complexType, simpleType

• complexContent, simpleContent

• restriction, extension, enumeration

• ID, IDREF [List(IDREF) replaces IDREFS ]

• unique

• key like a candidate key, whenever present => unique minOccurs > 0, nillable = “false”

– selector collection to which key applies (like Table)

– field path expressions relative to selector collection

• keyref specifies foreign key constraint

• Empty elements

2003 INF 3180/4180, IfI/UiO 55

XML Schema Example

<?xml version="1.0"?>

<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.binary.org" xmlns="http://www.binary.org">

<xsd:element name="life">

<xsd:complexType>

<xsd:sequence minOccurs="0" maxOccurs="unbounded">

<xsd:sequence minOccurs="0" maxOccurs="unbounded">

<xsd:element name="work" type="xsd:string"/>

<xsd:element name="eat" type="xsd:string"/>

</xsd: sequence>

<xsd:choice>

<xsd:element name="work" type="xsd:string"/>

<xsd:element name="play" type="xsd:string"/>

</xsd:choice>

<xsd:element name="sleep" type="xsd:string"/>

</xsd:sequence>

</xsd:complexType>

</xsd:element>

</xsd:schema> life ::= ( (work, eat)*, [ work | play ], sleep )*

2003 INF 3180/4180, IfI/UiO 56

28

XML Schema Example 2

<?xml version="1.0"?>

<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.cart.org" xmlns="http://www.cart.org">

<xsd:simpleType name=“engine”>

<xsd:attribute name=“id" type="xsd:string“ use="required"/>

</xsd:simpleType>

<xsd:element name=“team">

<xsd:complexType>

<xsd:sequence>

<xsd:element name=“name“ type="xsd:string"/>

<xsd:element name=“car” minOccurs="0" maxOccurs="unbounded”>

<xsd:complexType>

<xsd:sequence >

<xsd:element name=“make” type=“xsd:string” />

<xsd:element name=“engine” type=“engine” minOccurs="0" maxOccurs=“1”/>

</xsd:sequence >

</xsd:complexType>

</xsd:element>

<xsd:element name=“driver” minOccurs="0" maxOccurs="unbounded”> …

<xsd:element name=“spares” minOccurs="0" maxOccurs="unbounded”>

<xsd:simpleType>

<xsd:list itemType=“engine"/>

</xsd:simpleType>

</xsd:element>

</xsd:sequence>

</xsd:complexType>

</xsd:element>

2003 INF 3180/4180, IfI/UiO 58

Overview

Change Management

– Overview

– Notions

– Schema Versioning

– Extension Versioning

– Implementation Concepts

XML and Data on the Internet

– XML

– Semi-Structured Data

– Structured Data

– Querying XML Data

– Data Interchange

2003 INF 3180/4180, IfI/UiO 59

29

XPath

• View XML document as a “tree” of nodes and references (edges)

• XPath: sequence of steps that yields a set of nodes

/ step / step / step … step = basis + [predicates]

(Like a unix path, but each step can be a query expression)

.

/ root node current node

..

@attr

[n] last() parent node access attribute “attr” access the nth occurrence of a node at this level

// text() descendent or self text content of node functions sum(), count(), …

[ . . . ] selection condition (predicate)

• Example:

//student[ status=“ok” and starts_with(.//last, “E”) and not (.//last = .//first)]

2003 INF 3180/4180, IfI/UiO 60

Xpointer

• Xpointer : an Xpath expression within a URI + points (position within a document), or ranges (start-point, end-point)

– xpointer(id('list37')/item)

– xpointer(id('list37')/item[1]/range-to(following-sibling::item[2]))

• http://someURL#xpointer(XPath expression) (entities, escapes)

– doc.xml#xpointer(string-range(//P,&quot;a little hat ^^&quot;))

• Nested elements from different namespaces (same local name)

– xmlns(x=http://example.com/foo) xmlns(y=http://example.org/bar) xpointer(//x:a/y:a)

2003 INF 3180/4180, IfI/UiO 61

30

XQuery

• "The goal of the XML Query WG is to produce a data model for

XML documents, a set of query operators on that data model, and a query language based on these query operators.“

• Strongly influenced by OQL (+ SQL, XML-QL and XPath)

• XQuery is a functional language in which a query is represented as an expression

• XQuery expressions can be nested with full generality

• Filters can strip out fields, like relational project

• Grouping

2003 INF 3180/4180, IfI/UiO 62

XQuery Expressions

• Path expressions

– XPath syntax + dereference operation (->) + range predicate

– Path expression returns an ordered list of nodes

– Start at document(string), /, or //

• Element constructors

• FLWR expressions

– FOR

– LET

– WHERE

– RETURN

• Expressions involving operators and functions

• Conditional expressions

• Quantified expressions

• List constructors

• Expressions that test or modify datatypes

2003 INF 3180/4180, IfI/UiO 63

31

FLWR

• FOR iterates through a sequence of individual nodes out of the selected collection, in order, one at at time

• LET binds a variable to the set of nodes in the selected collection

• FOR or LET clauses are considered nested, Later clauses may reference variables bound in previous clauses

2003 INF 3180/4180, IfI/UiO 64

FLWR (2)

• WHERE clause is optional

– Predicates using bound variables

• RETURN generates the output (constructor expression for output)

– node | [ordered] forest of nodes | primitiveValue

– The constructor is executed once for each tuple of bindings from FOR or LET clauses that satifies the WHERE clause.

– It preserves ordering where ordering exists

– The RETURN expression often contains Element contructors

• With references to bound variables and nested subelements

– A list may be constructed by enclosing zero or more expressions in square brackets, separated by commas

2003 INF 3180/4180, IfI/UiO 65

32

XQuery Examples

Path expression example: document("zoo.xml")/chapter[title = "Frogs"]//figref/@refid->fig/caption

FLWR expression example:

Note: result(//) = result(.)+result(/)

Probably here / and // give the same result

FOR $p IN distinct(document("bib.xml")//publisher)

LET $a := avg(document("bib.xml")

/book[publisher = $p]/price)

Note: this “publisher” name is in the local scope of document(“bib.xml”) from the context of the path expression

RETURN

<publisher>

<name> $p/text() </name>

And this “publisher” name must match the name of an element in the default namespace

-

<avgprice> $a </avgprice>

</publisher>

2003 INF 3180/4180, IfI/UiO 66

Overview

Change Management

– Overview

– Notions

– Schema Versioning

– Extension Versioning

– Implementation Concepts

XML and Data on the Internet

– XML

– Semi-Structured Data

– Structured Data

– Querying XML Data

– Data Interchange

2003 INF 3180/4180, IfI/UiO 67

33

XML Metadata Interchange

• XMI is an XML language to interchange the objects of software systems:

– UML/MOF 1 designs

– Computer software classes and interfaces

– Databases and database schemas

– DTDs and XML schemas

• Added to the list of OMG adopted technologies in

February, 1999 as XMI 1.0

• Most recent minor revision is XMI 1.1 (November 2000)

1 MOF = Meta-Object Facility, describing data about data, i.e., the classes and associations structure of the data

2003 INF 3180/4180, IfI/UiO 68

XMI: Sharing Objects

• XML - Sharing Data

• XMI - Sharing Objects

– Creates custom

• XML Schemas (DTDs)

• XML documents from class definitions.

Application1

Objects

Data

Text

Application2

Objects

Data

Text

Key

XMI

XML

Unicode

2003 INF 3180/4180, IfI/UiO 69

34

XMI Object Mapping

MOF Concept

• Class

• Object typed attribute

• Containment

• Reference

• Basic attribute (int, String)

XML Concept

XML element, odd levels

XML element, even levels

XML element, even levels

XML attribute IDREF

XML attribute CDATA

2003 INF 3180/4180, IfI/UiO 70

XMI Mapping Example

Class Car {

String color ;

// Class

// Field, String

Person customer ; // Field, Reference

Engine standardEngine ; // Field, Object

Engine optionalEngine; // Field, Object

}

Class Definition

< Car color="Blue" customer="Joe" >

< Car.

standardEngine >

< Engine ..... />

</ Car.

standardEngine >

< Car.

optionalEngine >

< Engine ..... />

</ Car.

optionalEngine >

</ Car >

2003

XMI Instance h h h h

XML element, odd levels

XML element, even levels

XML attribute IDREF

XML attribute CDATA

INF 3180/4180, IfI/UiO 71

35

Validation in XMI

2003

Model

MOF model

(XMI Document)

Generate

Generated

XML Schema

Semantic Validation Optional

Syntactic Validation

Objects

(XMI Document)

Instance

Well-formed

XML Parser

INF 3180/4180, IfI/UiO 72

XML Schema: Specifies the Properties for a Class of Resources

2003

<xsd:complexType name="Book">

<xsd:sequence>

<xsd:element name="Title" type="xsd:string"/>

<xsd:element name="Author" type="xsd:string" maxOccurs="unbounded"/>

<xsd:element name="Date" type="xsd:year"/>

<xsd:element name="ISBN" type="xsd:string"/>

<xsd:element name="Publisher" type="xsd:string"/>

</xsd:sequence>

</xsd:complexType>

"For the class of Book resources, we identify five properties - Title, Author, Date, ISBN, and Publisher"

INF 3180/4180, IfI/UiO 73

36

2003

XML Instance Document: Specifies

Values for the Properties

<Book>

</Book>

<Title>My Life and Times</Title>

<Author>Paul McCartney</Author>

<Date>July, 1998</Date>

<ISBN>94303-12021-43892</ISBN>

<Publisher>McMillin Publishing</Publisher>

"For a specific instance of a Book resource, here are the values for the properties.

Use schemaLocation to identify the companion document (i.e., the schema) which defines the Book class of resources."

INF 3180/4180, IfI/UiO 74

Metadata Interchange using XML Schema

• XML Schema Strategy - two documents are used to provide metadata:

– a schema document specifies the properties (metadata) for a class of resources (objects);

– an instance document provides specific values for the properties of each object instance being transferred.

• Describing a schema for an object model using XML Schema requires another XML Schema that describes the object model’s metamodel!

• XMI uses an XML Schema for the UML metamodel

-

2003 INF 3180/4180, IfI/UiO 75

37

XML Literature

Chapter 29 “Semi-Structured Data and XML” in T. Connolly and C.

Begg, Database Systems: a practical approach to design, implementation, and management – 3 rd ed., Pearson Education

Ltd (Addison-Wesley), Essex, UK, 2002.

W3C Extensible Markup Language Website (XML page and links to nearby pages) http://www.w3.org/XML/

David Maier, “Database Desiderata for an XML Query Language” http://www.w3.org/TandS/QL/QL98/pp/maier.html

OMG XML Metadata Interchange (XMI) Specification Version 1.1,

OMG document 2000-11-02, November 2000. http://www.omg.org/technology/documents/formal/xmi.htm

2003 INF 3180/4180, IfI/UiO 76

38

Download