NSLA OPEN BORDERS PROJECT USER AUTHENTICATION FOR E-RESOURCES WHICH WILL BE ACCESSED VIA TROVE: A DRAFT MODEL Working Draft: 2 December 2009 Introduction The Open Borders Project is one of the “Reimagining Libraries” Projects sponsored by National & State Libraries Australasia (NSLA). The underlying objective of this Project is to allow Australian library users to have improved access to e-resources, especially those e-resources subscribed to by NSLA member libraries and Australian public libraries. The Project aims to ensure that these e-resources are used to the maximum extent possible. It will support a user-centric approach to the discovery and access of these e-resources, whereby users can link to those articles which they are entitled to access by virtue of their library memberships and their libraries’ licences. This access and linking framework will be based on the National Library of Australia’s new discovery service, known as Trove. To support this user-centric approach, the National Library will develop a set of partnerships with e-resource vendors, whereby: article-level metadata including the vendor’s article URL, and in some cases full-text articles for indexing, will be provided by the vendor to the NLA for inclusion in Trove; the vendor will also supply data about which articles are in which products and which products are licensed by which libraries; users of Trove will be encouraged to register with Trove and to provide information to Trove about which libraries they are affiliated with; Trove will index the article-level metadata (and full text where available) and would use the subscription and affiliation data to give “Available online” status to those articles which the user is entitled to click through to and read; Trove will facilitate the process of authenticating the user; and Trove will refer the user to the vendor’s site, where any remaining authentication and access to the full text would be managed. To date two vendors (Cengage Gale and RMIT Publishing) have agreed to work with the National Library to expose their e-resource content in Trove in accordance with the scenario above. The remainder of this paper presents a draft model for how the linking and authentication process could operate. This model is offered in order to gain the feedback of the Open Borders Project Group. Comments and suggestions for improving the model are welcomed. In this model, five use cases have been identified: these are described below. Please note that the National Library has planned to undertake these extensions to Trove during the second half of 2010. However, at this stage the Library is not able to make a firm commitment to implementing all five of the cases below by the end of 2010. CASE 1. Trove has no information about what library, if any, the user is affiliated with. Linking process An article of interest is discovered by the Trove user after applying the facet “Online – Access conditions”. The user clicks on the article details in the result set Trove informs the user that if they registered with Trove and established a profile identifying their affiliated libraries, Trove may be able to offer them free access courtesy of those libraries. [Trove may do this via a mouse-over text note, or a “help” icon next to the link which pops up a box explaining this, or an intermediate screen which explains the situation and a “continue” button which leads to the vendor’s pay-per-view page] If the user supplies such details – see Cases 2-5 below If not, Trove refers the user to the vendor site’s “pay per view” page, passing the URL of the article as a parameter. Access to the PDF of the article will be provided after the user supplies valid credit card details. Caveats It is assumed that the e-resource vendor has a “pay per view” option (as does RMIT Publishing). CASE 2. Trove has information about the user’s library affiliations, but none of the affiliated libraries subscribe to a product containing the article The linking process for this case is identical to that for Case 1, except that Trove will not inform the user about the benefits of providing the affiliation information. Caveats In some cases Trove’s knowledge of the vendor’s subscribing libraries will be out of date, or its knowledge of IP address ranges within libraries will be out of date. In some cases, therefore, the user may get free access instead of pay-per-view access. 2 CASE 3. Trove has information about the user’s library affiliations, at least one of these libraries subscribes to a product containing the article, and the user can be IP authenticated as having onsite access privileges Linking process An article of interest is discovered by the Trove user after applying the facet “Online – Freely available”. The user clicks on the article details in the result set Trove provides an intermediate page informing the user of which of their affiliated libraries can provide access to this article, and asks the user to select a library Trove refers the user to the vendor site, where the IP address of the library will be verified, and the user will be given access to the PDF of the article without further authentication. Caveats Some users that are onsite may not be affiliated with the library (they may be a “walkin” user). In some cases these users may be entitled to access the article Some universities require the users to logon with student/staff-id and password in order to access e-resources, whether they are onsite or not. Such cases will be handled as per Case 4 below In some cases it will not be possible to determine with certainty that the user is onsite o To infer onsite status, Trove will need to keep track of IP address ranges for Australian libraries and individual library branches and campuses. Some of this information may be incorrect or out of date o For some IP address ranges, Trove can do name lookups to convert the IP address into a name and extract the domain name. For example, any IP address which resolves to a domain name ending “.anu.edu.au” may be automatically (regardless of user preferences) associated with the Australian National University library o For some IP addresses which do not resolve to names, Trove can see who owns the network containing them, and could similarly infer the associated library o For some libraries, especially public libraries, the ISP arrangement may be such as to make it impossible to determine a domain name that reflects the library. o For libraries with an OPAC, the library server will usually have a “proper” and permanent name, but the “in library” public-use network may be unrelated to this. 3 CASE 4. Trove has information about the user’s library affiliations, at least one of these libraries subscribes to a product containing the article, the user is not onsite at that library, but the library has an EZproxy server. Linking process An article of interest is discovered by the Trove user after applying the facet “Online – Freely available”. The user clicks on the article details in the result set Trove checks its directory databases and finds that at least one of the user’s affiliated libraries subscribes to a product containing the article, and has an EZproxy server Trove provides an intermediate page informing the user of these affiliated libraries, and asks the user to select a library Trove creates a link to the relevant EZproxy server, passing the article URL as a parameter The user enters their credentials and the EZproxy server authenticates the user The EZproxy server redirects the user to the article URL The vendor site trusts the referrer, given that it can verify the address of the EZproxy server, and the user will be given access to the PDF of the article. Caveats Some libraries, rather than using EZproxy, use a “simple authentication page” which redirects the user with a referrer. This case implies that the vendor is willing to trust the referrer header as an indicator that user really does come from the customer-library. This alternative within Case 4 is not recommended because it is not secure (ie, the referrer header can be “spoofed”). CASE 5. Trove has information about the user’s library affiliations, at least one of these libraries subscribes to a product containing the article, the user is not onsite at that library, and the library does not have an EZproxy server. Linking process An article of interest is discovered by the Trove user after applying the facet “Online – Freely available”. The user clicks on the article details in the result set. Trove checks its directory databases and finds that at least one of the user’s affiliated libraries subscribes to a product containing the article, that none of these libraries has an EZproxy server, but that some of these libraries are listed in the directory of library login pages Trove provides an intermediate page informing the user of which of their affiliated libraries can provide access to this article, and asks the user to select a library 4 The user is presented with a Trove login screen which requests local login information for that library. This screen will include: o the library name and perhaps its logo (to help trigger context) o a sample picture of the borrower card this library issues o instructions for completing the credentials, which may vary, eg: o user name, password o borrower-id, pin o barcode, surname The user enters their credentials Trove attempts to validate these details by “pretending to be a human being” and entering the login credentials at the real library login page If this is login is successful (ie the library website isn’t down and the credentials are accepted), one of the following three actions occur, depending on the arrangements agreed with the vendor: o Behind the scenes (ie hidden from the user) Trove connects to the vendor’s site, using the URL representing the article, and providing the user’s library information as the vendor’s customer library code, which Trove has derived from its own library code. (There will be a new session for every request. The vendor will have previously agreed to permit and trust sessions that originate from Trove, and will expect such sessions to be started in such a way as to tell the publisher of the customer on whose behalf this request is being made.) The PDF of the article is obtained in this “behind the scenes” process, and it is then returned to the user by Trove. o Trove analyses the page that is obtained from the vendor’s site to find the link to the PDF, and then issues another HTTP request to obtain the PDF and return it to the user. o The user is referred to the vendor web site, starting a session for the user’s librarycustomer. (In this case, the vendor must trust Trove as a referrer, and the referring URL must contain information which determines which library-customer this session is to be run on behalf of.) Caveats The user’s access to the article will be dependent on the existence of a database of Australian library login web addresses and screens. Each entry in this database will give a “category” to the web page which appears following the login – this category usually relates to the type of library ILMS The vendor site may experience timeouts on its sessions, which may leave the user “stranded” Case 4 will always be preferred to Case 5. Case 4 gives control to the library and means that Trove does not have to handle userids and passwords for other organisations. However, Case 5 may be all that is available for most public libraries 5 There are two security issues with this process: (a) the National Library is handling third party authentication credentials, and (b) in the case of the third alternative above where the NLA referrer URL is trusted, it can be easily spoofed or forged Summary of data development tasks The analysis above has revealed that, contrary to some earlier expectations, a significant effort will be required by the National Library to assist with user authentication. In particular, the Library will need to: Create a database of all Australian library EZproxy server addresses with sufficient configuration information to enable Trove to format article URLs so they will be correctly handled by the EZproxy server, and also record local library IP allocations to assist Trove in determining if a user is onsite in a library Create database of “short library names”, to help Trove users recognize and select their library by name Obtain lists from RMIT Publishing and Gale of all of their Australian library customers and their customer codes For all libraries without EZproxy servers, create a database of Australian library login web addresses and associated information Make mappings from Trove library codes to the vendor library codes Work with the Open Borders Project Group to identify mechanisms to allow the above information, if possible, to be collected and maintained in an efficient and timely manner for state and public libraries. 6