pptx file - The CIDOC CRM

advertisement
Linked Data Patterns using the
CRM
Richard Light
CIDOC CRM SIG meeting, Heraklion
22 October 2013
CIDOC Doc. Standards WG goals
• Linked Data patterns
– Following Dodds and Davis:
http://patterns.dataincubator.org/book/linkeddata-patterns.pdf
– Concentrating on Cultural Heritage issues
– List of challenges prepared
– One worked up to show possible results
Linked Data Patterns
•
•
•
•
•
•
Topic/problem statement
Context
Solution
Examples
Discussion
Related
Example
Literal Keys
How do we publish non-global identifiers in RDF?
Context
The Natural Keys pattern encourages the creation of URIs from existing non-global identifiers. While this
provides a way to begin identifying a resource so that we can describe it in RDF, it does not address the
issue of how to publish these existing identifiers. Nor does it address situations where natural keys change
over time, e.g. the move from ISBN-10 to ISBN-13 in the publishing world.
Solution
Create a custom property, as a sub-class of the dc:identifier property for relating the existing literal key
value with the resource.
Example(s)
The nasa dataset in dataincubator uses Patterned URIs based on the NSSDC international designator, but
includes these as literal values associated with each spacecraft using a custom property.
Discussion
While hackable URIs are a useful short-cut they don't address all common circumstances. For example
different departments within an organization may have different non-global identifiers for a resource; or
the process and format for those identifiers may change over time. The ability to algorithmically derive a
URI is useful but limiting in a global sense as knowledge of the algorithm has to be published separately to the data
Areas of challenge
• Museum objects and their identity
• Dates
• Places
• Conservation
• Text
• Other
http://network.icom.museum/cidoc/workinggroups/documentation-standards/docstandardslddp/ has full list
Potential challenges (1)
• Museum objects and their identity
– Who should generate identities for museum
objects?
– What relationship should hold between the
number physically marked on a museum object
and its Linked Data identity?
– To what degree should museum object Linked
Data identifiers contain “meaning”?
– How should the institution responsible for an
identifier be identified?
Potential challenges (2)
• Dates
– How should different calendar systems be
represented in a LD context?
– How should date ranges be expressed?
– How should uncertainty about a single date be
expressed?
– How should named periods be recorded?
– How should the varying geographical extent over
time of a named period be expressed?
Tackling one challenge
• Dates in a cultural heritage context
• Checked sources:
– ‘web search’ advice
– ISO dates in XML Schema
– CIDOC CRM guidance
– British Museum guidance document
• Aiming for ‘end to end’ advice: source data to
Linked Data RDF
Results (1)
Linked Data design pattern: date recording
How should dates relevant to cultural history be recorded?
Context
Historical dates are rarely exact, unlike the date/time values (often
computer-generated and accurate to the millisecond) which are
routinely used in contemporary applications. There are two aspects to
this inexactness:
–
–
The event described by the date may have a significant duration
The precise value of the date may not be known for certain. It is
common practice to record just a year, or a range of years. Entries
such as “c.1850” are often encountered
Dates may be a long way in the past, so it must be possible to record
e.g. “25000 BCE”.
Dates may be recorded using the Gregorian, Julian or some other
calendar.
Results (2)
Solution
Convert all values to the proleptic Gregorian calendar, and allow the year 0000 (see[ 1]
for details).
Express each date value using W3C date syntax [2]. Use the appropriate datatype, e.g.
gYear if only the year has been recorded. Do not introduce spurious accuracy. The
datetime datatype should only be used where an exact time has been recorded.
These dates should be expressed as an instance of the CRM class E2_Temporal_Entity.
This can be expressed directly within the RDF, or referred to via a Linked Data URL
which denotes the date. (In the latter case, the structure described here will be found
when the URL is dereferenced.)
Where a single date is provided, use the property P82_at_some_time_within to
specify the time span.
Where a range of dates is provided, use the properties P82a_begin_of_the_begin and
P82b_end_of_the_end to specify the start and end points.
Use CIDOC CRM property P4_has_time-span (or P4i_is_time-space_of) to relate this
time span to an E2_Temporal_Entity.
Results (3)
Examples
Single year-only date:
crm:E2_Temporal_Entity crm:P82_at_some_time_within
“1634”^^xsd:gYear .
Discussion
In both cases (single dates and date ranges), the default
assumption is that the dates specified will represent the outer
bounds of the period within which the event actually took
place.
Note that P82a/P82b are not in the official CRM release (nor
in Erlangen).
Results (4)
Further reading
[1] http://www.hackcraft.net/web/datetime/
[2] http://www.w3.org/TR/xmlschema2/#isoformats
[3] http://www.cidoccrm.org/docs/How_to%20implement%20CRM_
Time_in%20RDF.pdf
CRM Linked Data support
• Advice document on date recording only
found by chance via a Google search
• Proposed new properties P81a, P82b, P82a
and P82b not in official release
• Absolute requirement for a stable URL set
representing complete CRM
Proposed way forward
• Design patterns fall usefully between CRM
publications and (e.g.) BM document: bitesized, LD-specific, generic, implementable
• (Re-)form a group to work on this
• Welcome advice and input from CRM SIG
Download