LSIDs and RDF in TDWG Roger Hyam, TDWG, RBGE Donald Hobern, GBIF

advertisement
LSIDs and RDF in TDWG
Roger Hyam, TDWG, RBGE
Donald Hobern, GBIF
June 7-9, 2006 - Edinburgh, UK
Paradigm
• Starting assumption is that standards are
about sharing data.
• Sharing data also implies sharing data
through time.
Archive
What is Shared?
• Sharing raw literals isn’t much use.
• They need to be gathered together into
‘semantic’ units or objects.
perennis
1234
Bellis
TaxonName:1234
Bellis perennis
Semantics of Objects
• Objects need to be based on some shared
semantics.
• There needs to be somewhere to look up what
they mean – an ontology.
Ontology
TaxonName:
Bellis perennis
Identity of Objects
•
•
•
•
How do I refer to this object?
Who should I credit?
Who should I send corrections to?
Is it the same record as I already have or
is it a new one?
• What is the official version of this data has some one altered it before I received
it?
TDWG TAG-1 Meeting
• There was consensus on– Architecture is concerned with shared data
– Biodiversity data will be modeled as a graph
of identifiable objects
– The semantics of these objects will be
encoded in a series of shared ontologies
– Ontologies will be related to each other on the
basis of a shared Base and Core ontologies
as a minimum
• Discussion continues on how this is done
Implications
• We need a ontology to define and relate
the objects we exchange.
• Ontology governance/management is
paramount.
• We need a system of GUIDs to identify the
objects.
• We need a roadmap for the protocols to
exchange these objects.
Structure of the Ontology
Base Ontology
BaseThing
BaseActor
CoreTaxonName
CoreInstitution
TaxonName
Herbarium
Core Ontology
Domain Ontology
NomencalturalType
NomeclaturalNote
Application Ontologies
ABCD
DarwinCore
???
Ontology Governance
• Allow people to create Domain subontologies easily – prevent alienation.
• Each ontology construct (concept) has a
status.
• Status is increased by passing through
explicit gates defined by actual usage.
Experimental
Shared
Recommend
What about RDF?
• The need to share identifiable objects has been
established without reference to a technology.
• We are interested in objects not triples.
• Typical use case involves a client consuming
semantically heterogeneous data from multiple
sources.
• Semantic Web technologies would be ideal –
but aren’t part of the TDWG culture and there
are ‘unbelievers’.
Current ‘Standards’
• DarwinCore & DiGIR
– Based on Z39.50
– HTTP based XML message / response
– Simple ‘flat’ application schemas (RDF-like)
• ABCD & BioCASe
– Based on DarwinCore & DiGIR
– Complex document structure.
• TAPIR
– Unification of BioCASe and DiGIR
• No RDF, Objects or GUIDs here yet!
Combing Data
• GBIF data portal is the only ‘application’
that does data integration between these
formats.
• No standard way to include XML
fragments from other XSD other than
xs:any.
• There is overlap between the different
schemas and no easy way to merge
them.
What about LSIDs
• GUID-1 meeting considered several
GUID technologies including (LSID, DOI
& Handle).
• Life Science Identifiers are being
assessed.
– I3C & OMG URNs
– urn:lsid:ncbi.nlm.nih.gov:pubmed:12571434
– getData()
– getMetadata()
LSID Permanence
• LSIDs should not be recycled – i.e. Used
for more that one object.
• LSIDs should always resolve but it is OK
for them to resolve to a 404 (Gone) error.
• No central authority to control these
things.
• Even DOIs go away if there isn’t
institutional backing!
LSIDs for Everything?
• Are there some things for which LSIDs
are inappropriate?
– <logo rdf:resource=“urn:lsid:example.com:branding:logo.gif” />
– xsi:schemaLocation=“urn:lsid:example.com:xsd:taxon.xsd”
– xmlns:tn=“urn:lsid:example.com:ontology:taxon/”
• Definitely places where we will use
something else.
• Other people will use their own identifiers
e.g. DOI, Handle etc.
So what’s cooking?
XSD Based
Conceptual Schemas
Recognised Need
For GUIDS
Different GUID
Technologies
A TDWG
Ontology
XML Based
Exchange Protocols
Emergent Semantic Web
OGC Standards (GML)
Other!
200+ Data Providers
50+ Million Anonymous ‘Records’
BioMOBY
Clients?
Possible Roadmap
• Build the ontology as a focus for
semantics.
• Resolution and Harvest protocols
should be relatively easy to plug into or
wrap round existing service providers so
approach these first.
• Search/Query – More problematic
BioCASe, DiGIR, TAPIR, SPARQL,
other?
Thank You
• Gordon and Betty
Moore Foundation
• Global Biodiversity
Information Facility
• NESC
• TDWG Members
Download