16/01/2013 Semantics/Formal Semantics for Versioning in OPUS
Provenance systems are designed to store the history lineage of digital entities such as files, sockets, pipes, etc. Entities are typically modelled as objects and versioned. Versioning is defined as the creation of new copies of the object at semantically relevant epochs. Typically, versions are chained (they link back to their previous representation).
The goal of versioning is to relate object state with distinct, specific time epochs in the object timeline. It also gives us a weak sense of ordering of events. Versioning assists reasoning about object state and is useful for optimising object queries. Moreover, formalising versioning rules enables reasoning independently from implementation.
Previous provenance systems have relied on two wellestablished versioning models:
In this model a new version is created on every object mutation (writes or similar).
Advantages:
Simple to express.
Versioning rules are natural and obvious.
Disadvantages:
Can cause a version explosion for mutation (write) intensive processes.
Expensive to query as traversals through the resulting graph can be very long.
Efficiently modelling nonmutable operations (e.g. reads) can add significant complexity and overhead to the system.
wiki.dtg.cl.cam.ac.uk/fresco/Semantics/Formal-Semantics-for-Versioning-in-OPUS 1/9
16/01/2013 Semantics/Formal Semantics for Versioning in OPUS
Creates a new version when a close operation is performed.
Advantages:2
Provides clear start and end points for periods of manipulation of objects.
Provides a concise graph representation.
Disadvantages:
Under opentoclose versioning operations that do not act on local handles explicitly must be adapted to use the opentoclose semantics. This can cause extra versioning for the provenance object. E.g. in Unix the stat operation takes a filepath and thus would be sandwiched between an open and a close, thereby versioning twice for a single stat operation.
Impossible to version on nonclose operations.
It is clear that previous provenance object versioning models have:
1. Implicitly based provenance versioning semantics on the underlying system on which they are implemented
2. Been defined with the focus of recording (rather than logically reasoning about) changes to objects
3. Been defined based on the constraint that versioning will only be required and carried out on object mutation or dereference
In order to facilitate more expressive and flexible versioning semantics that can be defined independently and orthogonally to the underlying system versioning semantics, a model for object versioning is required. In the remainder of this document we define this model and in the companion document
1
use it to define the provenance versioning semantics for OPUS for the POSIX (IEEE
1003) interface.
wiki.dtg.cl.cam.ac.uk/fresco/Semantics/Formal-Semantics-for-Versioning-in-OPUS 2/9
16/01/2013 Semantics/Formal Semantics for Versioning in OPUS
This model can be extended to encompass other paradigms as long as a mapping between the entities and operations of the paradigm and the objects and operations of our model can be found.
For example a distributed storage system.
By modelling system operations in an abstract semantic we can reason about properties of our semantics that will then hold for any implementation of them. Such as reasoning about the completeness of the semantic.
PVM is a simple model defined to enable the ratification of and reasoning about update semantics in provenance systems. In essence PVM is used to
1. Formalise the concept of tracking and recording changes in system entities (i.e. versioning)
2. Formalise the versioning sideeffects of concurrent updates to system entities.
The calculus assumes a system model as follows:
The basic unit of modelling is an object . Objects are abstractions that group associated data and metadata in a logically distinct and addressable unit. Objects correspond to actual system entities such as files, sockets, etc. Objects are uniquely addressable and accessible by name.
Objects are accessed and modified by processes. Processes are abstractions for entities that access and modify objects as a side effect of execution. For example, in a UNIX system processes would correspond to program threads.
Processes access and modify objects by carrying out operations on them. An operation may be defined as mutating in which it changes underlying data or metadata in the object (e.g. a write operation) or non mutating in which case it does not (e.g. a read operation). It is assumed mutating and nonmutating operations can occur concurrently over different processes.
wiki.dtg.cl.cam.ac.uk/fresco/Semantics/Formal-Semantics-for-Versioning-in-OPUS 3/9
16/01/2013 Semantics/Formal Semantics for Versioning in OPUS
It is also assumed that an operation carried out by a process can affect the behaviour of or semantics of other processes concurrently interacting with the object (e.g. in POSIX a process unlinking a file being used by other processes orphans the file in the other processes) or future processes wishing to interact with the object (e.g. a process unlinking a file would cause a process opening the file at a later time to either fail or recreate the file). The model provides support for propagating the effects of a processlocal operation on an object to other processes interacting with the object.
As the model is designed independently from platform and implementation specific details it is expected that it will be able to express the provenance versioning semantics of POSIX IEEE 1003, the Windows I/O model and distributed file system models.
The following entities are defined:
Global objects are system scope uniquely identifiable logical representations of entities (e.g. files, sockets) in the modelled filesystem or namespace. It is expected that there will be a corresponding global object for every entity in the modelled system. Global objects are versioned in response to system events.
Local objects are process scope uniquely identifiable logical representations of process references to global objects. For example, local objects would exist for every file descriptor in POSIX systems and process handles in Windows systems. Local objects are versioned corresponding to version changes in global objects.
Aliases are a common property of modern namespace management systems. Depending on their properties and behaviour aliases can be considered hard or soft. Hard aliases are names that are purely monikers for existing names (i.e. they resolve to an identical entity) while soft aliases are entities that point to other entities in the system. Soft aliases are distinguishable from hard aliases by the property that they maintain metadata independently of the object to which they point.
In order to unify the provenance of an underlying global object it is necessary to model the links between aliases and the objects they point to. In PVM we treat hard and soft aliases as separate wiki.dtg.cl.cam.ac.uk/fresco/Semantics/Formal-Semantics-for-Versioning-in-OPUS 4/9
16/01/2013 cases due to their properties.
Semantics/Formal Semantics for Versioning in OPUS
Hard aliases are a set of (one or more) names in the namespace which point to the same underlying physical entity. For example in POSIX When two file names are hardlinked they refer to the same inode structure. In PVM hard aliases are represented by a set of global objects. There is a separate global object for every alias in the set of names for the entity.
x y aliases for the same entity. This is denoted by x ⇌ y. The relation ⇌ is transitive, reflexive and symmetric.
Global Object Equivalence Set: Every global object has a corresponding global object equivalence set. A set of global objects are considered equivalent iff ∀x, y. x ∈ S. y ∈ S. x ⇌ y .
a b ₂ . The set x y ₁
.
Global Object Equivalence Set Functions: We define the following functions for use in manipulating and equivalence testing Global Object Equivalence Sets:
Manipulation: eadd(S, a): ∀g. g ∈ S → a ⇋ g erem(S, a):
∀g. g ∈ S → ¬(a ⇋ g)
Equivalence testing:
The function eqv returns all the global objects in the global object equivalence set: eqv(x) = {r : r ∈ E, x ∈ r}
Versioning: Operations that cause versioning act on an entire global object equivalence set.
By modelling hard aliased files as global objects belonging to the same global object equivalence set, we ensure that even in certain special cases, provenance data is captured accurately to reflect the current state of the underlying system. The example given below describes such a scenario.
Example:
X Y P X L
X Y R
Section 4.4
follows: R = {(P, L, {X, Y })}
2
) would be as wiki.dtg.cl.cam.ac.uk/fresco/Semantics/Formal-Semantics-for-Versioning-in-OPUS 5/9
16/01/2013 Semantics/Formal Semantics for Versioning in OPUS
As outlined earlier soft aliases differ from hard aliases in that they are pointers to other global objects in the system maintaining metadata independently of the objects to which they point. E.g. in modelling POSIX 1003 semantics soft alias global objects would be useful to represent soft symbolic links.
As the segregation of versioning soft alias objects and the global objects pointed to by soft aliases is reliant on system specific semantics we introduce the concept of canonicalisation to model this property. The canonicalisation function is defined on a persystem basis and is expected to be used to eliminate soft aliases in all situations other than when the provenance being generated directly relates to the alias itself. For example, in POSIX 1003 when soft aliases are being created, destroyed or renamed the provenance generated is applied directly to the global object for the soft alias. However operations that act on the canonicalised file are linked to its own file object.
Canonicalisation
The canonicalisation function is represented by C(a, g) object. It is expected to mimic the canonicalisation behaviour of the underlying system and is used to resolve operation arguments to the appropriate global object on which versioning is to be carried out.
For example, in POSIX IEEE 1003 it may be defined as follows where canonicalise(g) is the
POSIX file path canonicalisation function: if a ∈ {rename, unlink, softlink} otherwise
It would be applied as follows fopen(C(fopen,g),"w") which would resolve to fopen(canonicalise(g), "w")
Recall that Global Objects ( Section 4.2.1.1
3 ) are system scope representations of entities while
Local Objects ( Section 4.2.1.2
4
) are process scope representations of process references to Global
Objects.
PVM formalises the relationship between Local Objects and Global Objects as follows: Local Objects are used to track, and record process interaction with global objects. In order to achieve this goal
Local Objects are associated with Global Objects after which operations are carried out on the Local
Object to manipulate the Global Object. Finally, Local Objects are disassociated with the Global wiki.dtg.cl.cam.ac.uk/fresco/Semantics/Formal-Semantics-for-Versioning-in-OPUS 6/9
16/01/2013
Object.
Semantics/Formal Semantics for Versioning in OPUS
In effect, the Local Object acts as a process scope unique identifier to the physical entity the corresponding Global Object represents. This identifier (and the associated Global Object it is associated with) is considered valid regardless of the actual state of the entity in the system. For example, on a POSIX system a Local Object associated with a file continues to validly refer to the file until disassociation even if the file is deleted by another process. Enforcing this constraint simplifies the model and enables us to reason about entities in the system.
Associating a Local Object with a Global Object is defined as binding the Local Object to the Global
Object, while disassociation is referred to as unbinding. By strict definition, binding a Local Object to a Global Object results in an association between the Local Object and the Global Object
Equivalence Set ( Section 4.2.2.1
5 ) for the Global Object. For example, in POSIX systems when a process associates a Local Object with a Hard Alias Global Object( Section 4.2.2.1
5 ) it in fact associates with all the Hard Alias Objects in the Global Object Equivalence Set for the object.
Unbinding a Local Object from a Global Object disassociates the Local Object from the Global
Object.
The system relationship set is an abstraction designed to represent the set of all LocaltoGlobal between local and global objects in the system.
Every element in the set is composed of a 3tuple in the form g a process, is a Local Object it contains and the set g
1 g
2 g n
1 g
2 g n
is the Global Object
Equivalence Set to which the Local Object is bound.
System Relationship Set Lookup Functions
We define two convenience functions to promote succinctness of expression:
LocalToGlobal Object Mapping: globals(l)
GlobalToLocal Object Mapping: locals(g)
The following operations are defined to express our versioning calculus. All the I/O operations of the underlying system are mapped using these operations.
wiki.dtg.cl.cam.ac.uk/fresco/Semantics/Formal-Semantics-for-Versioning-in-OPUS 7/9
16/01/2013
Semantics/Formal Semantics for Versioning in OPUS
The following naming conventions are used to aid succinctness.
l A Local Object.
g A Global Object.
get(l) Obtains a given Local Object.
drop(l) Drops a given Local Object.
tie(l,g) Binds a given local and global object together.
untie(l,g) Unbinds a given local and global object.
These functions are defined to bind and unbind to all global objects in the Global Object
Equivalence set for a given Global Object get(l,g):
∀ r.r
g.get(r).tie(l,r) drop(l,g):
∀ r.r
g.untie(l,r).drop(r)
As outlined earlier, functions are also available for:
1. Global Object Equivalence Set manipulation and testing
2. Soft Alias Canonicalisation
3. LocaltoGlobal and GlobalToLocal Object Lookup Mapping
1. http://wiki.dtg.cl.cam.ac.uk/fresco/Semantics/APOSIX1003PVMMapping
2. http://wiki.dtg.cl.cam.ac.uk/fresco/Semantics/FormalSemanticsforVersioninginOPUS#4.3.1SystemRelationshipSet wiki.dtg.cl.cam.ac.uk/fresco/Semantics/Formal-Semantics-for-Versioning-in-OPUS 8/9
16/01/2013 Semantics/Formal Semantics for Versioning in OPUS
3. http://wiki.dtg.cl.cam.ac.uk/fresco/Semantics/FormalSemanticsforVersioninginOPUS#4.2.1.1GlobalObjects
4. http://wiki.dtg.cl.cam.ac.uk/fresco/Semantics/FormalSemanticsforVersioninginOPUS#4.2.1.2LocalObjects
5. http://wiki.dtg.cl.cam.ac.uk/fresco/Semantics/FormalSemanticsforVersioninginOPUS#4.2.2.1HardAliasGlobalObjects
Last edited by Thomas Bytheway, 20130116 16:57:58 wiki.dtg.cl.cam.ac.uk/fresco/Semantics/Formal-Semantics-for-Versioning-in-OPUS 9/9