Triple Store Experiences

advertisement
Triplestore Experiences
Nathan Wilhelmi
11/27/2012
NCAR - CISL/TDD/VETS
Our Experiences…
• Disclaimers:
▫ Did not have an ontologist
▫ Codebase passed through multiple developers
• Timelines (changing landscape)
▫ Started work 2006
▫ Stopped active development ~2011
▫ Sesame version 2.3.0
Why a Triplestore?
• Search functionality
▫ Faceted
▫ Free text
• Model metadata
▫ Metadata storage
▫ Display
• Semantic web
Initial Architecture
• Authoritative metadata source was RDBMS
• Metadata harvested into the triplestore at
periodic intervals
• Triplestore only contained metadata to drive
search
• Sesame used as a stand alone service
Sesame Triplestore
• Standalone Sesame server
▫ Stability problems
▫ No security, triplestore could be updated by anyone
• Changed to in-memory store
▫ Stable
▫ Picked up performance improvements
• Embedded triplestore was only internally
referencing
▫ RDF didn’t work outside of the application
▫ Distilled to key-value store
Internal Referencing
<rdf:RDF ...>
<rdf:Description
rdf:about="http://www.earthsystemgrid.org/esg.owl#esgncar__ucar_cgd_ccsm_b30_072b">
....
<esg:hasUnconfiguredModelComponent
rdf:resource="http://www.earthsystemgrid.org/esg.owl#mod
elcomponent_ccsm_run_b30.072b" />
....
</rdf:Description>
</rdf:RDF>
Performance
• For our query patterns were not seeing
needed performance
• Inferencing was removed and performance
improved to acceptable levels for <5k datasets
▫ Target volume 50K datasets
• Sparql missing key operators: ordering, limits
Tooling Support
• Managing the triplestore
▫ Protégé round trips didn’t work well
▫ Dump full triple store to XML and grep by hand
• Deleting and updating triples
▫ Deletes were difficult, dangling triples
▫ Rebuild from authoritative sources
Implementation Issues
• Schema-less design was perceived as faster
▫ Rapid ontology changes during development
▫ Still needed data migration tools
• Modeling the problem domain
▫ Modeled a triplestore, not the domain
▫ Very tightly coupled code was difficult to maintain
and replace
• Steep learning curve for new developers
URIs Are Foundational
• Properly encoding URIs
▫ Created unencoded URIs within the triplestore
▫ Queries were created with string concentration
▫ Lead to broken queries and data
• Generated instance URIs through a lossy
algorithm to get around encoding
▫ Could only relate from source -> triple store
Our Current Path Forward
• Using SOLR Search
▫ Fantastic search tool!
• Metadata in RDBMS
▫ Working well
▫ Effective tools, including schema migration
▫ Scales very well for our metadata
• Still needed to expose RDF metadata…
RDF with RDBMS
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
xmlns:sesame="http://www.openrdf.org/schema/sesame#"
xmlns:esg="http://www.earthsystemgrid.org/esg.owl#">
<rdf:Description
rdf:about="http://www.earthsystemgrid.org/esg.owl#${rdfIdFactory.getDatasetId(dataset)}">
<rdf:type rdf:resource="http://www.earthsystemgrid.org/esg.owl#Resource" />
<rdf:type rdf:resource="http://www.earthsystemgrid.org/esg.owl#Dataset" />
<rdf:type rdf:resource="http://www.earthsystemgrid.org/esg.owl#GeophysicalDataset" />
<rdf:type rdf:resource="http://www.earthsystemgrid.org/esg.owl#ModelDataset" />
<esg:hasUri rdf:datatype="http://www.w3.org/2001/XMLSchema#string">
resource://${gateway.name?upper_case}#${dataset.persistentIdentifier}
</esg:hasUri>
<rdfs:label
rdf:datatype="http://www.w3.org/2001/XMLSchema#string">${dataset.name}</rdfs:label>
</rdf:Description>
</rdf:RDF>
Looking Forward
• Storing metadata
▫ Content management systems?
▫ NoSql storage options?
• Modeling complicated relationships
▫ Neo4J looks promising…
Questions / Discussion
• Nathan Wilhelmi
▫ wilhelmi@ucar.edu
Download