Investigations on coupled resources

advertisement
Investigations on coupledResource
Introduction
The following sections summarize the different approaches for expressing the relationship between
services and their resources.
Next to the latest ISO 19139, CSW AP ISO and INSPIRE TG spec, one further specification is required
to be taken into consideration: ISO 19118.
ISO 19118 provides semantical definitions for object references and is the conceptual basis for ISO
19139 gco:ObjectReference. ISO 19139 uses “uuidref” and “xlink:href“ as defined by ISO 19118. The
INSPIRE TG shall take this into account.
Definitions by CSW AP ISO
The (informative) Annex F of CSW AP ISO Specification recommends:
To link a service metadata instance with a dataset metadata instance, the value of
MD_DataIdentification.citation.CI_Citation.identifier.MD_Identifier.code
should be equal to either
SV_ServiceIdentification.operatesOn@uuidref (by reference)
or
SV_ServiceIdentification.operatesOn.MD_DataIdentification.citation.CI_Citation.identifier.MD_I
dentifier.code (by instance)
By providing the appropriate values either by reference or by instance, the relationship between a
service and a dataset is modelled sufficiently.
A number of inconsistencies arise from this approach:
1. Semantically, it is not correct to use the attribute @uuidref that way. As per ISO 19118,
A.5.5.2, “[…] The “uuidref” attribute shall be used to refer to an object within the universe of
an application domain. […]”. By example, this means that an elements defined as:
<first id="i05" uuid="dce:F6A120B3"> … </first>
can be referenced elsewhere by a new element like this:
<second uuidref="dce:F6A120B3"/>
Hence, the approach given above links a “uuidref” attribute with the element value of an
identifier class, which is not correct.
2. It is not clear how to deal with instances of MD_DataIdentification that only provide an
RS_Identifier (which can be provided in theory). How shall the codespace value be
considered for the coupling? The CSW specification is not clear at that point.
3. The connections between the metadata documents are not self-contained. This means that
one will always need a CSW service to find dataset coupled with a service and vice versa. For
that reason the CSW AP ISO spec defines additional queryables:
a. OperatesOn
b. OperatesOnIdentifier
c. OperatesOnName
4. From an infrastructural perspective, one crucial point is that subsequent queries must always
be executed within the same search scope.
Example: a user finds a specific service metadata document using a distributed query. The
service documents indicated some coupled datasets, identified by appropriate “operatesOn”
attribute values. To search for these dataset, the user must execute any subsequent query as
a distributed query. If the query will be processed only on local level, it is likely that the
coupled datasets will not be found, since they are hosted elsewhere.
Definitions by INSPIRE TG/IR
The INSPIRE Metadata IR recommends to implement the requirement defined in the INSPIRE
metadata IR as follows:
<srv:operatesOn xlink:href="http://vapxgeodev.jrc.ec.europa.eu/geonetwork/srv/eng/csw?SERVICE=CSW&VERSION=2.0.2&REQUE
ST=GetRecordById&ID=f9ee6623-cf4c-11e19100017085a97ab&OUTPUTSCHEMA=http://www.isotc211.org/2005/gmd&ELEMENTSETNAME=fu
ll#lakes"/>
The implementation is “by reference”, using the xlink:href attribute and a GetRecordById-Request
against a CSW instance hosting the dataset record(s).
As an alternative, a list of unique resource identifiers can be given, but there is no example provided
using that aspect.
I see the following implications using this approach:
1. The idea of providing a “by reference” connection is to link an XML element with another
(external) XML element.
In our case this means: link an SV_ServiceIdentification instance with (one or more)
MD_DataIdentification instances.
However, the given approach links an SV_ServiceIdentification instance with a
GetRecordByIdResponse instance, which is the response of the CSW request.
2. Using the above encoding, the SV_ServiceIdentification instance is tightly coupled with the
host providing the CSW service. What happens, if the host name changes? However, this
problem has already been addressed by the linked data community and might be solved by
convention (see: http://www.w3.org/TR/ld-bp/#HTTP-URIS and
http://www.w3.org/TR/webarch/#URI-persistence)
3. If we interpret it very strictly, the approach is not compliant with the INSPIRE IR.
From a conceptual point of view, the INSPIRE metadata IR states that
“If the resource is a spatial data service, this metadata element identifies, where
relevant, the target spatial data set(s) of the service through their unique resource
identifiers (URI).
The value domain of this metadata element is a mandatory character string code,
generally assigned by the data owner, and a character string namespace uniquely
identifying the context of the identifier code (for example, the data owner).”
The IR requirement is to identify “the target spatial data set(s) of the service through their
unique resource identifiers”.
The approach given above uses the fileIdentifier of the MD_Metadata instance containing
the MD_DataIdentification instance, not the unique resource identifiers of the datasets.
Conclusions
According to the IR text, everything that is required to comply with the requirements is to provide a
list of unique resource identifiers, defined a code/namespace tuples, along with the spatial data
service metadata.
That can be done by using plain CSW AP ISO techniques. But what further alternative do we have
(bearing in mind all the shortcomings listed above)?
Option one: use plain CSW approach
Each dataset metadata document has to provide a unique resource identifier, which can be
referenced by a service metadata document.
Example:
MD_Identifier.code = urn:inspire:dataset:…:abcdefg:4.1.2
SV_ServiceIdentification.operatesOn@uuidref = urn:inspire:dataset:…:abcdefg:4.1.2
CSW queryables are used to resolve this relationship and help to search for connected documents.
Pros:

in line with the OGC CSW AP ISO spec
Cons:


Not self-contained, you need a CSW to find related records
Not fully compliant with ISO 19118 semantics, since the uuidref element is not supposed to
reference an element value, but rather the uuid attribute of the target element.
Option two: use xlink:href as is
The INSPIRE metadata TG will recommend to use the “xlink:href” approach using the GetRecordById
request as proposed by the latest version. The alternative, using a list of unique resource identifier,
will be deleted.
Pros:

best practice at the moment
Cons:



Not strictly compliant with INSPIRE IR
The reference to the dataset is directly given but contained in a GetRecordById - Response
Bound to conventions regarding stable URLs
Option three: use xlink:href with GetRecords
To take into account unique resource identifiers as required by the INSPIRE metadata IR, we could
use a GetRecord request using the ResourceIdentifier queryable as defined by “7.2.4 Additional
search properties” in the CSW AP ISO spec,.
Pros:

Cons:
more in line with INSPIRE IR since the unique resource identifiers are references and used

GetRecords is only optional via HTTP GET. This means that the INSPIRE Discovery Service TG
must be extended to make the GET binding mandatory for INSPIRE.
Download