What are GUIDs and Why Do We Need Them ???

advertisement
What are GUIDs and Why Do We
Need Them ???
Steve Baskauf
Vanderbilt Dept. of Biological Sciences
http://bioimages.vanderbilt.edu/
What is a GUID?
A globally unique identifier (GUID) should be:
1. globally unique
2. actionable
3. persistent
1. How do you make an identifier
globally unique? (part 1)
• Make it locally unique within your institution
• A common strategy:
– identifier (catalog number) unique within a
collection, e.g. 66920
– namespace (collection code) unique within the
institution, e.g. ind-baskauf
• Unique local identifier: ind-baskauf/66920,
ind-baskauf:66920, ind-baskauf_66920, etc.
How do you make an identifier globally
unique?(part 2)
• Make your local identifier globally unique
• Use your institution code? TENN, BOON,
bioimages?
• No! How do you know that is globally unique?
• Consensus: use a domain (or subdomain)
name, e.g. www.biology.appstate.edu,
tenn.bio.utk.edu, or
bioimages.vanderbilt.edu
Some identifiers that are globally
unique
• bioimages.vanderbilt.edu_ind-baskauf_66920
• urn:lsid:bioimages.vanderbilt.edu:baskauf:66920
• http://bioimages.vanderbilt.edu/ind-baskauf/66920
• Do these qualify as GUIDs???
– globally unique
– actionable????
• What happens if you put them in a web
browser?
2. How do you make an identifier
actionable?
• Something has to happen when the identifier
is put in a web browser.
• LSIDs
– need a special browser plugin that nobody has.
– need a special system for its resolvers to talk to
each other
• HTTP URIs
– work in any web browser
– DNS nameservers already talk to each other
Can a material or conceptual object
have an HTTP URI?
• We know web page can have a URI that the
web browser uses to find the HTML
document…
• But physical objects (specimens, living plants)
and conceptual entities (species) can also
have HTTP URIs!
CAN I HAVE A URI???
• Yes! Here it is:
http://people.vanderbilt.edu/~steve.baskauf/foaf.rdf#me
How is my URI actionable???
If I put that HTTP URI in a web browser, does it
deliver me to the user, like a web page?
http://people.vanderbilt.edu/~steve.baskauf/foaf.rdf#me
Darn, no transporter technology!
• What should I use for my HTTP URI?
steve.baskauf@vanderbilt.edu
https://medschool.mc.vanderbilt.edu/biosci/bio_fac.php?id3=13257
http://people.vanderbilt.edu/~steve.baskauf/foaf.rdf
http://people.vanderbilt.edu/~steve.baskauf/foaf.rdf#me
• The web server doesn’t do anything with the fragment
identifier (#me), but it makes the URI different from the RDF
metadata file. URIs for objects must be different from the
URIs of other things that represent them.
• A URI is a Uniform Resource Identifier, not a URL (Uniform
Resource Locator). It identifies me, but doesn’t deliver me.
Back to the tree…
http://bioimages.vanderbilt.edu/ind-baskauf/66920.htm
= a URI and URL for a web page about the tree
http://bioimages.vanderbilt.edu/ind-baskauf/66920.rdf
= a URI and URL for an RDF metadata file about the tree
http://bioimages.vanderbilt.edu/baskauf/66921.jpg
= a URI and URL for an image of the tree
http://bioimages.vanderbilt.edu/ind-baskauf/66920
= a URI for the tree itself
How did the web server know what do
do with the HTTP URI?
• Content negotiation=rules about what
representation of a resource a web server
should send when a non-information URI is
sent to it.
• Apache web servers can do it if set up
properly.
• Web browsers ask for HTML content
• Computers (“semantic web user-agents”) ask
for RDF/XML content
What the heck’s the Semantic Web?
• same thing as “Web 2.0”
• an idea pushed by Tim Berners-Lee (inventor
of the Web
• a way for programs like web crawlers (e.g.
GoogleBot) to know rather than guess.
• Disco=an RDF browser
• http://www4.wiwiss.fu-berlin.de/rdf_browser/
• http://bioimages.vanderbilt.edu/ind-baskauf/66920
3. What is a persistent HTTP URI?
One of my favorite websites:
http://tenn.bio.utk.edu/vascular/vascular.html
Oops. It’s now:
http://tenn.bio.utk.edu/vascular/vascular.shtml
Unchanging local file names
http://bioimages.vanderbilt.edu/baskauf/66921.htm
vs.
http://bioimages.vanderbilt.edu/metadata.htm?baskauf/66921/metadata/img/34
56/2304
What’s in the HTML of the first URI?
<script type="text/javascript">
window.location.replace("../metadata.htm?baskauf/66921/metadata/img/3456/2304");
</script>
The first URI is also a “cool” URI (easy to
remember).
Unchanging domain names
http://www.bioblitznashville.org/
vs.
http://bioimages.vanderbilt.edu/
If I die, get fired, or loose interest in Bioimages, the
HTTP URIs could still continue to be resolved for a
long time.
How long is “persistent”?
• Forever is a pretty long time.
• The Internet is only 40 years old and the Web
only 20.
• I say if you can foresee your institution and
domain name lasting 10 years, go for it!
• Alternative? tdwg.org subdomain (but GUID
review is 188 days old!)
Why do we need GUIDs?
• They provide a convenient way to cite
ANYTHING and allow a reader to obtain
further information with only a Web browser.
• They allow metadata about a resource to
unambiguously refer to other resources at
other institutions (e.g. duplicate specimens,
live plant images and specimens)
• They make it possible to have a system that
can update itself automatically.
STOP WAITING and go for it!
• There is nothing that would stop most of us
from starting to use HTTP URI guids within a
month. Forget about LSIDs.
• If you are afraid of RDF, ignore it and worry
about it later. Rules were made to be broken.
• See http://bioimages.vanderbilt.edu/ for more
information about everything here and
examples. Also a link to Apache page on
content negotiation.
Download