Triplestore Experiences Nathan Wilhelmi 11/27/2012 NCAR - CISL/TDD/VETS Our Experiences… • Disclaimers: ▫ Did not have an ontologist ▫ Codebase passed through multiple developers • Timelines (changing landscape) ▫ Started work 2006 ▫ Stopped active development ~2011 ▫ Sesame version 2.3.0 Why a Triplestore? • Search functionality ▫ Faceted ▫ Free text • Model metadata ▫ Metadata storage ▫ Display • Semantic web Initial Architecture • Authoritative metadata source was RDBMS • Metadata harvested into the triplestore at periodic intervals • Triplestore only contained metadata to drive search • Sesame used as a stand alone service Sesame Triplestore • Standalone Sesame server ▫ Stability problems ▫ No security, triplestore could be updated by anyone • Changed to in-memory store ▫ Stable ▫ Picked up performance improvements • Embedded triplestore was only internally referencing ▫ RDF didn’t work outside of the application ▫ Distilled to key-value store Internal Referencing <rdf:RDF ...> <rdf:Description rdf:about="http://www.earthsystemgrid.org/esg.owl#esgncar__ucar_cgd_ccsm_b30_072b"> .... <esg:hasUnconfiguredModelComponent rdf:resource="http://www.earthsystemgrid.org/esg.owl#mod elcomponent_ccsm_run_b30.072b" /> .... </rdf:Description> </rdf:RDF> Performance • For our query patterns were not seeing needed performance • Inferencing was removed and performance improved to acceptable levels for <5k datasets ▫ Target volume 50K datasets • Sparql missing key operators: ordering, limits Tooling Support • Managing the triplestore ▫ Protégé round trips didn’t work well ▫ Dump full triple store to XML and grep by hand • Deleting and updating triples ▫ Deletes were difficult, dangling triples ▫ Rebuild from authoritative sources Implementation Issues • Schema-less design was perceived as faster ▫ Rapid ontology changes during development ▫ Still needed data migration tools • Modeling the problem domain ▫ Modeled a triplestore, not the domain ▫ Very tightly coupled code was difficult to maintain and replace • Steep learning curve for new developers URIs Are Foundational • Properly encoding URIs ▫ Created unencoded URIs within the triplestore ▫ Queries were created with string concentration ▫ Lead to broken queries and data • Generated instance URIs through a lossy algorithm to get around encoding ▫ Could only relate from source -> triple store Our Current Path Forward • Using SOLR Search ▫ Fantastic search tool! • Metadata in RDBMS ▫ Working well ▫ Effective tools, including schema migration ▫ Scales very well for our metadata • Still needed to expose RDF metadata… RDF with RDBMS <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:sesame="http://www.openrdf.org/schema/sesame#" xmlns:esg="http://www.earthsystemgrid.org/esg.owl#"> <rdf:Description rdf:about="http://www.earthsystemgrid.org/esg.owl#${rdfIdFactory.getDatasetId(dataset)}"> <rdf:type rdf:resource="http://www.earthsystemgrid.org/esg.owl#Resource" /> <rdf:type rdf:resource="http://www.earthsystemgrid.org/esg.owl#Dataset" /> <rdf:type rdf:resource="http://www.earthsystemgrid.org/esg.owl#GeophysicalDataset" /> <rdf:type rdf:resource="http://www.earthsystemgrid.org/esg.owl#ModelDataset" /> <esg:hasUri rdf:datatype="http://www.w3.org/2001/XMLSchema#string"> resource://${gateway.name?upper_case}#${dataset.persistentIdentifier} </esg:hasUri> <rdfs:label rdf:datatype="http://www.w3.org/2001/XMLSchema#string">${dataset.name}</rdfs:label> </rdf:Description> </rdf:RDF> Looking Forward • Storing metadata ▫ Content management systems? ▫ NoSql storage options? • Modeling complicated relationships ▫ Neo4J looks promising… Questions / Discussion • Nathan Wilhelmi ▫ wilhelmi@ucar.edu