What are GUIDs and Why Do We Need Them ??? Steve Baskauf Vanderbilt Dept. of Biological Sciences http://bioimages.vanderbilt.edu/ What is a GUID? A globally unique identifier (GUID) should be: 1. globally unique 2. actionable 3. persistent 1. How do you make an identifier globally unique? (part 1) • Make it locally unique within your institution • A common strategy: – identifier (catalog number) unique within a collection, e.g. 66920 – namespace (collection code) unique within the institution, e.g. ind-baskauf • Unique local identifier: ind-baskauf/66920, ind-baskauf:66920, ind-baskauf_66920, etc. How do you make an identifier globally unique?(part 2) • Make your local identifier globally unique • Use your institution code? TENN, BOON, bioimages? • No! How do you know that is globally unique? • Consensus: use a domain (or subdomain) name, e.g. www.biology.appstate.edu, tenn.bio.utk.edu, or bioimages.vanderbilt.edu Some identifiers that are globally unique • bioimages.vanderbilt.edu_ind-baskauf_66920 • urn:lsid:bioimages.vanderbilt.edu:baskauf:66920 • http://bioimages.vanderbilt.edu/ind-baskauf/66920 • Do these qualify as GUIDs??? – globally unique – actionable???? • What happens if you put them in a web browser? 2. How do you make an identifier actionable? • Something has to happen when the identifier is put in a web browser. • LSIDs – need a special browser plugin that nobody has. – need a special system for its resolvers to talk to each other • HTTP URIs – work in any web browser – DNS nameservers already talk to each other Can a material or conceptual object have an HTTP URI? • We know web page can have a URI that the web browser uses to find the HTML document… • But physical objects (specimens, living plants) and conceptual entities (species) can also have HTTP URIs! CAN I HAVE A URI??? • Yes! Here it is: http://people.vanderbilt.edu/~steve.baskauf/foaf.rdf#me How is my URI actionable??? If I put that HTTP URI in a web browser, does it deliver me to the user, like a web page? http://people.vanderbilt.edu/~steve.baskauf/foaf.rdf#me Darn, no transporter technology! • What should I use for my HTTP URI? steve.baskauf@vanderbilt.edu https://medschool.mc.vanderbilt.edu/biosci/bio_fac.php?id3=13257 http://people.vanderbilt.edu/~steve.baskauf/foaf.rdf http://people.vanderbilt.edu/~steve.baskauf/foaf.rdf#me • The web server doesn’t do anything with the fragment identifier (#me), but it makes the URI different from the RDF metadata file. URIs for objects must be different from the URIs of other things that represent them. • A URI is a Uniform Resource Identifier, not a URL (Uniform Resource Locator). It identifies me, but doesn’t deliver me. Back to the tree… http://bioimages.vanderbilt.edu/ind-baskauf/66920.htm = a URI and URL for a web page about the tree http://bioimages.vanderbilt.edu/ind-baskauf/66920.rdf = a URI and URL for an RDF metadata file about the tree http://bioimages.vanderbilt.edu/baskauf/66921.jpg = a URI and URL for an image of the tree http://bioimages.vanderbilt.edu/ind-baskauf/66920 = a URI for the tree itself How did the web server know what do do with the HTTP URI? • Content negotiation=rules about what representation of a resource a web server should send when a non-information URI is sent to it. • Apache web servers can do it if set up properly. • Web browsers ask for HTML content • Computers (“semantic web user-agents”) ask for RDF/XML content What the heck’s the Semantic Web? • same thing as “Web 2.0” • an idea pushed by Tim Berners-Lee (inventor of the Web • a way for programs like web crawlers (e.g. GoogleBot) to know rather than guess. • Disco=an RDF browser • http://www4.wiwiss.fu-berlin.de/rdf_browser/ • http://bioimages.vanderbilt.edu/ind-baskauf/66920 3. What is a persistent HTTP URI? One of my favorite websites: http://tenn.bio.utk.edu/vascular/vascular.html Oops. It’s now: http://tenn.bio.utk.edu/vascular/vascular.shtml Unchanging local file names http://bioimages.vanderbilt.edu/baskauf/66921.htm vs. http://bioimages.vanderbilt.edu/metadata.htm?baskauf/66921/metadata/img/34 56/2304 What’s in the HTML of the first URI? <script type="text/javascript"> window.location.replace("../metadata.htm?baskauf/66921/metadata/img/3456/2304"); </script> The first URI is also a “cool” URI (easy to remember). Unchanging domain names http://www.bioblitznashville.org/ vs. http://bioimages.vanderbilt.edu/ If I die, get fired, or loose interest in Bioimages, the HTTP URIs could still continue to be resolved for a long time. How long is “persistent”? • Forever is a pretty long time. • The Internet is only 40 years old and the Web only 20. • I say if you can foresee your institution and domain name lasting 10 years, go for it! • Alternative? tdwg.org subdomain (but GUID review is 188 days old!) Why do we need GUIDs? • They provide a convenient way to cite ANYTHING and allow a reader to obtain further information with only a Web browser. • They allow metadata about a resource to unambiguously refer to other resources at other institutions (e.g. duplicate specimens, live plant images and specimens) • They make it possible to have a system that can update itself automatically. STOP WAITING and go for it! • There is nothing that would stop most of us from starting to use HTTP URI guids within a month. Forget about LSIDs. • If you are afraid of RDF, ignore it and worry about it later. Rules were made to be broken. • See http://bioimages.vanderbilt.edu/ for more information about everything here and examples. Also a link to Apache page on content negotiation.