Image LSID Resolution Prototypes Hui Dong, Bob Morris UMass Boston

advertisement

June 2006

Image LSID Resolution

Prototypes

Hui Dong, Bob Morris

UMass Boston

Image LSID resolvers

Application

• Toy application at http://tamarin.cs.umb.edu:8081/jspexamples/MyJsp.jsp

June 2006 Image LSID resolvers

Image Lsid Resolution Servers

– Projects

• UMB from local image store of 13,000 field photographs. (Morris, Haber, Dong)

• Morphbank (U. Florida) from project to document morphological characters (Rohnquist, Riccardi)

• U.T. Austin X-Ray CT facility to scans of paleo and very small vertebrates. (Humphrey, Mirenkar)

– Huge variation in social and technical image and metadata acquisition

June 2006 Image LSID resolvers

Image Lsid Resolution Servers

• Known

– UMB 13000 unedited images from skilled naturalist

• Metadata: exif, taxonomy, habitat, location, voucher number for type specimen of identified taxon, part imaged. 13,000

Images in folders. Mine file and folder names and correlate to checklist(s) then metadata into MySQL with generated LSID.

• cf. ENBI report on Imaging Type Specimens

• Data == ???

– Morphbank (U. Florida) from project to document morphological characters

• Metadata – Darwin Core plus local attributes. Automated by

Contribution process

– U.T. Austin X-Ray CT facility to document XRCT

• Metadata: automated by scan configuration

June 2006 Image LSID resolvers

Summary

• Resolution is easy.

• Acquiring metadata is hard.

June 2006 Image LSID resolvers

UMB details

• Services implemented on sourceforge.lsid.net Java suite:

– Authority, Data, and Metadata interfaces exposed as separate web services

– Omitted security service and assignment service (use adhoc assignment, not exposed.

Would consider making assignment as part of the image deposit service).

June 2006 Image LSID resolvers

Implementation issues

• Triples on-the-fly were too slow. We cache them in MySQL. Could use native triple store but haven’t yet encountered any use case except that needs it in the face of a shadow SQL metadata store and a warehouse model.

• Most integrated apps might be easier to do with something that appears to the outside like a triple store though.

June 2006 Image LSID resolvers

Implementation issues

• Jena RDF serialization generated huge numbers of triples irrelevant to us (e.g. graph support). Result was intolerable performance so serialized with hibernate.org relational persistence framework. (Message from Mirenkar forcefully and us weakly: there are no standards for serializing naturally occurring RDBs to RDF).

June 2006 Image LSID resolvers

Warehousing vs distributed metadata stores

• Current resolution discovery scheme does not support multiple resolution services for a given LSID. Hence metadata cannot presently be distributed. Example: distributed annotation. Bill may not have authority to add annotation to Susan’s metadata store but might still have valuable annotation which should be keyed by the LSID.

June 2006 Image LSID resolvers

Warehousing vs distributed metadata

• Given some metadata values, how to find all the

LSID’s that have that metadata value. Need entire metadata RDF store someplace (for each resolution service!) in order to make the query

SELECT lsid WHERE metadataAttributeA(lsid) = value_b

• Reasonable image RDF is 50-100 attributes.

Reasonable personal image store is 10 5 images.

• This is not specific to RDF, but there is no history of supporting this kind of query at large scale.

June 2006 Image LSID resolvers

Interesting research problem

• Typical utility in applications will(?) arise from metadata containing other LSIDs. But there are no standards for querying this or for recursive resolution. That is, the embedded LSID is a proxy for more metadata + implied ontological relations.

How to make resolvers accept ontological data, reason over it, and decide what recursive resolution should take place.

June 2006 Image LSID resolvers

Grumble

• LSID Launchpad doesn’t allow showing namespaces in the attribute-value pairs

• sourceforge.lsid.net framework does not support DDNS or some other magical multi-resolver discovery

• Jena rdf serialization doesn’t seem to be scalable.

June 2006 Image LSID resolvers

Download