June 2006
Hui Dong, Bob Morris
UMass Boston
Image LSID resolvers
• Toy application at http://tamarin.cs.umb.edu:8081/jspexamples/MyJsp.jsp
June 2006 Image LSID resolvers
– Projects
• UMB from local image store of 13,000 field photographs. (Morris, Haber, Dong)
• Morphbank (U. Florida) from project to document morphological characters (Rohnquist, Riccardi)
• U.T. Austin X-Ray CT facility to scans of paleo and very small vertebrates. (Humphrey, Mirenkar)
– Huge variation in social and technical image and metadata acquisition
June 2006 Image LSID resolvers
• Known
– UMB 13000 unedited images from skilled naturalist
• Metadata: exif, taxonomy, habitat, location, voucher number for type specimen of identified taxon, part imaged. 13,000
Images in folders. Mine file and folder names and correlate to checklist(s) then metadata into MySQL with generated LSID.
• cf. ENBI report on Imaging Type Specimens
• Data == ???
– Morphbank (U. Florida) from project to document morphological characters
• Metadata – Darwin Core plus local attributes. Automated by
Contribution process
– U.T. Austin X-Ray CT facility to document XRCT
• Metadata: automated by scan configuration
June 2006 Image LSID resolvers
• Resolution is easy.
• Acquiring metadata is hard.
June 2006 Image LSID resolvers
• Services implemented on sourceforge.lsid.net Java suite:
– Authority, Data, and Metadata interfaces exposed as separate web services
– Omitted security service and assignment service (use adhoc assignment, not exposed.
Would consider making assignment as part of the image deposit service).
June 2006 Image LSID resolvers
• Triples on-the-fly were too slow. We cache them in MySQL. Could use native triple store but haven’t yet encountered any use case except that needs it in the face of a shadow SQL metadata store and a warehouse model.
• Most integrated apps might be easier to do with something that appears to the outside like a triple store though.
June 2006 Image LSID resolvers
• Jena RDF serialization generated huge numbers of triples irrelevant to us (e.g. graph support). Result was intolerable performance so serialized with hibernate.org relational persistence framework. (Message from Mirenkar forcefully and us weakly: there are no standards for serializing naturally occurring RDBs to RDF).
June 2006 Image LSID resolvers
• Current resolution discovery scheme does not support multiple resolution services for a given LSID. Hence metadata cannot presently be distributed. Example: distributed annotation. Bill may not have authority to add annotation to Susan’s metadata store but might still have valuable annotation which should be keyed by the LSID.
June 2006 Image LSID resolvers
• Given some metadata values, how to find all the
LSID’s that have that metadata value. Need entire metadata RDF store someplace (for each resolution service!) in order to make the query
SELECT lsid WHERE metadataAttributeA(lsid) = value_b
• Reasonable image RDF is 50-100 attributes.
Reasonable personal image store is 10 5 images.
• This is not specific to RDF, but there is no history of supporting this kind of query at large scale.
June 2006 Image LSID resolvers
• Typical utility in applications will(?) arise from metadata containing other LSIDs. But there are no standards for querying this or for recursive resolution. That is, the embedded LSID is a proxy for more metadata + implied ontological relations.
How to make resolvers accept ontological data, reason over it, and decide what recursive resolution should take place.
June 2006 Image LSID resolvers
• LSID Launchpad doesn’t allow showing namespaces in the attribute-value pairs
• sourceforge.lsid.net framework does not support DDNS or some other magical multi-resolver discovery
• Jena rdf serialization doesn’t seem to be scalable.
June 2006 Image LSID resolvers