Linked Data Patterns using the CRM Richard Light CIDOC CRM SIG meeting, Heraklion 22 October 2013 CIDOC Doc. Standards WG goals • Linked Data patterns – Following Dodds and Davis: http://patterns.dataincubator.org/book/linkeddata-patterns.pdf – Concentrating on Cultural Heritage issues – List of challenges prepared – One worked up to show possible results Linked Data Patterns • • • • • • Topic/problem statement Context Solution Examples Discussion Related Example Literal Keys How do we publish non-global identifiers in RDF? Context The Natural Keys pattern encourages the creation of URIs from existing non-global identifiers. While this provides a way to begin identifying a resource so that we can describe it in RDF, it does not address the issue of how to publish these existing identifiers. Nor does it address situations where natural keys change over time, e.g. the move from ISBN-10 to ISBN-13 in the publishing world. Solution Create a custom property, as a sub-class of the dc:identifier property for relating the existing literal key value with the resource. Example(s) The nasa dataset in dataincubator uses Patterned URIs based on the NSSDC international designator, but includes these as literal values associated with each spacecraft using a custom property. Discussion While hackable URIs are a useful short-cut they don't address all common circumstances. For example different departments within an organization may have different non-global identifiers for a resource; or the process and format for those identifiers may change over time. The ability to algorithmically derive a URI is useful but limiting in a global sense as knowledge of the algorithm has to be published separately to the data Areas of challenge • Museum objects and their identity • Dates • Places • Conservation • Text • Other http://network.icom.museum/cidoc/workinggroups/documentation-standards/docstandardslddp/ has full list Potential challenges (1) • Museum objects and their identity – Who should generate identities for museum objects? – What relationship should hold between the number physically marked on a museum object and its Linked Data identity? – To what degree should museum object Linked Data identifiers contain “meaning”? – How should the institution responsible for an identifier be identified? Potential challenges (2) • Dates – How should different calendar systems be represented in a LD context? – How should date ranges be expressed? – How should uncertainty about a single date be expressed? – How should named periods be recorded? – How should the varying geographical extent over time of a named period be expressed? Tackling one challenge • Dates in a cultural heritage context • Checked sources: – ‘web search’ advice – ISO dates in XML Schema – CIDOC CRM guidance – British Museum guidance document • Aiming for ‘end to end’ advice: source data to Linked Data RDF Results (1) Linked Data design pattern: date recording How should dates relevant to cultural history be recorded? Context Historical dates are rarely exact, unlike the date/time values (often computer-generated and accurate to the millisecond) which are routinely used in contemporary applications. There are two aspects to this inexactness: – – The event described by the date may have a significant duration The precise value of the date may not be known for certain. It is common practice to record just a year, or a range of years. Entries such as “c.1850” are often encountered Dates may be a long way in the past, so it must be possible to record e.g. “25000 BCE”. Dates may be recorded using the Gregorian, Julian or some other calendar. Results (2) Solution Convert all values to the proleptic Gregorian calendar, and allow the year 0000 (see[ 1] for details). Express each date value using W3C date syntax [2]. Use the appropriate datatype, e.g. gYear if only the year has been recorded. Do not introduce spurious accuracy. The datetime datatype should only be used where an exact time has been recorded. These dates should be expressed as an instance of the CRM class E2_Temporal_Entity. This can be expressed directly within the RDF, or referred to via a Linked Data URL which denotes the date. (In the latter case, the structure described here will be found when the URL is dereferenced.) Where a single date is provided, use the property P82_at_some_time_within to specify the time span. Where a range of dates is provided, use the properties P82a_begin_of_the_begin and P82b_end_of_the_end to specify the start and end points. Use CIDOC CRM property P4_has_time-span (or P4i_is_time-space_of) to relate this time span to an E2_Temporal_Entity. Results (3) Examples Single year-only date: crm:E2_Temporal_Entity crm:P82_at_some_time_within “1634”^^xsd:gYear . Discussion In both cases (single dates and date ranges), the default assumption is that the dates specified will represent the outer bounds of the period within which the event actually took place. Note that P82a/P82b are not in the official CRM release (nor in Erlangen). Results (4) Further reading [1] http://www.hackcraft.net/web/datetime/ [2] http://www.w3.org/TR/xmlschema2/#isoformats [3] http://www.cidoccrm.org/docs/How_to%20implement%20CRM_ Time_in%20RDF.pdf CRM Linked Data support • Advice document on date recording only found by chance via a Google search • Proposed new properties P81a, P82b, P82a and P82b not in official release • Absolute requirement for a stable URL set representing complete CRM Proposed way forward • Design patterns fall usefully between CRM publications and (e.g.) BM document: bitesized, LD-specific, generic, implementable • (Re-)form a group to work on this • Welcome advice and input from CRM SIG