TECHNICAL ASPECTS OF MANAGING E-RESOURCES FACILITATOR’S HANDBOOK Authors: Table of Contents E-RESOURCES AUTHENTICATION.............................................................................. 5 Authentication tools and controls ................................................................................. 5 Hardware tokens ...................................................................................................... 5 Software tokens ....................................................................................................... 5 Digital certificates on smart cards and USB tokens.................................................. 6 Challenge response ................................................................................................. 6 Biometric authentication ........................................................................................... 6 Out-of-band authentication ....................................................................................... 6 IP authentication ...................................................................................................... 7 Proxy servers and ezProxy ...................................................................................... 7 Examples of authentication tools ................................................................................. 7 OpenVPN technologies ............................................................................................ 7 JASIG....................................................................................................................... 8 SimpleSAMLphp ...................................................................................................... 8 Shibboleth ................................................................................................................ 9 Concept ..................................................................................................................... 10 How it works ........................................................................................................... 10 2 FEDERATED SEARCH TOOLS ................................................................................... 12 Introduction ................................................................................................................ 12 What is federated search? ......................................................................................... 12 Meta search ............................................................................................................... 13 Approaches to federated search ................................................................................ 14 Index-time merging ................................................................................................ 16 Hybrid federated search ......................................................................................... 17 Which approach works best? ..................................................................................... 17 Practical application of federated search tools .......................................................... 18 Federated search tools options ................................................................................. 19 VuFind.................................................................................................................... 19 Blacklight ................................................................................................................ 20 Subjectsplus ........................................................................................................... 20 Google CSE ........................................................................................................... 21 The benefits of federated search ............................................................................... 23 Efficiency, time savings .......................................................................................... 23 Quality of results .................................................................................................... 23 Most current content .............................................................................................. 23 3 Marketing opportunities .......................................................................................... 24 Challenges in federated searches ............................................................................. 24 Nomenclature confusion ........................................................................................ 24 Access issues with federated search ..................................................................... 25 Interface issues with federated search ................................................................... 25 Removing duplicates .............................................................................................. 25 Maintenance .......................................................................................................... 26 4 E-RESOURCES AUTHENTICATION In many organizations, the function of verifying a user's identity — known as authentication — is important in establishing trust in critical business processes. In its simplest form, authentication is the act of verifying a person's claim on his or her identity and is usually implemented through a username and password combination when logging into an IT system or application. As this definition suggests, part of the authentication process consists of correctly identifying a user, application, or group. There are multiple ways by which users can provide their identity, such as typing a username and password. In fact, the basis of authentication lies in the principle that without a proper form of identification, a system will not be able to correlate an authentication factor with a specific subject. Authentication tools and controls As mentioned earlier, organizations can use other authentication tools besides usernames and passwords. Following is a list of the main authentication tools or controls auditors can recommend: Hardware tokens These devices display generated random numbers that change every 60 seconds and are synchronized with the authenticating system. Users simply type the number that is displayed on the token whenever they need to login. Software tokens These software programs generate a unique string of characters that is identified by the authenticating system and resides in the computer's hard drive or another device, such as storage media, a personal digital assistance, or compact disc. 5 Digital certificates on smart cards and USB tokens These unique certificates are issued by a third-party certifying authority or by the operating system to ensure users are communicating with the right person or device. Digital certificates contain specific identifying information and are governed by an international standard, standard x.509. Challenge response This activity consists of a question-answer dialog where the user responds to a set of pre-recorded questions, such as the mother's maiden name, or a token device that generates passwords or responses based on a pre-determined algorithm. When using a token device, the authentication system displays a challenge in the form of a code or a password phrase. The user then enters the challenge into the token device, which provides a response containing the code or password phrase the user must reenter into the system for authentication. Biometric authentication This is the use of technologies that measure and analyze a person's physical and behavioral characteristics (e.g., fingerprints, eye retinas and irises, facial patterns, and hand measurements) to authenticate the individual into a system. Out-of-band authentication Under this method, the authentication device accepts the person's credentials and sends a secret password to the user through an out-of-band medium, such as an email, short message service, or phone call. The password is then valid for a one-time use only. 6 IP authentication Publishers/users uses an organization’s outward facing IP addresses as a means to identify users coming from a subscribing institution and in turn authenticate access to the subscribed resources. Proxy servers and ezProxy Many libraries use proxy servers as a tool to help authenticate offsite users who are unable to be authenticated by an institution’s IP address. Examples of authentication tools OpenVPN technologies This is a privately held company based in the Pleasanton, California, integrating a suite of leading-edge networking and software technologies. OpenVPN technologies has designed and deployed virtual network software that provides secure, reliable, and scalable communication services, fulfilling the requirements of the traditional virtual private network (VPN) market. Tunnel any IP sub network or virtual ethernet adapter over a single UDP or TCP port, Configure a scalable, load-balanced VPN server farm using one or more machines which can handle thousands of dynamic connections from incoming VPN clients, Use all of the encryption, authentication, and certification features of the openSSL library to protect your private network traffic as it transits the internet, Create secure ethernet bridges using virtual tap devices, and control OpenVPN using a GUI on windows or Mac Os X. The OpenVPN is a gateway to allow remote access of the e-resources. Libraries need to embrace the and implement it and thus allowing the accessibility of their repositories, digital libraries, journals e-books and other electronic materials 7 available to the users away from the libraries which will in turn increase and bring information closer to the users. JASIG This is a central authentication service project, more commonly referred to as CAS. CAS is an authentication system created by Yale University to provide a trusted way for an application to authenticate a user. It became a JASIG project in December 2004. An open and well-documented protocol An open-source java server component A library of clients for java, .net, php, perl, Apache, uportal, and others Integrates with Uportal, Bluesocket, Tikiwiki, Mule, Liferay, Moodle and others Community documentation and implementation support An extensive community of adopters and users. The JASIG community is committed to increasing openness in higher education and beyond thus free and open access to all versions of the software and have access to the entire community surrounding the software. SimpleSAMLphp SimpleSAMLphp is an application written in native PHP that deals with authentication. The project is led by uninett, has a large user base, a helpful user community and a large set of external contributors. It uses the following included authentication modules Simple LDAP Multiple LDAP CAS remote authentication lets you connect authentication to your existing CAS service, and subsequently retrieve attributes from LDAP. Radius authentication lets to check the credentials against a radius server 8 SQL authentication Open id Shibboleth Shibboleth (http://shibboleth.internet2.edu/about.html) is an open source software package that allows an individual to use a single username/login to access multiple online resources which are subscribed to by their institution. Shibboleth is a widely deployed federated identity solution, connecting users to applications both within and between organizations. It’s an open-source project that provides single sign-on capabilities and allows sites to make informed authorization decisions for individual access of protected online resources in a privacy-preserving manner. 9 Concept It works the same as every other web-based Single Sign-On (SSO) system. It is adherence to standards, ability to provide SSO support to services outside of a user's organization and protects privacy. How it works Step 1: user accesses the resource The user starts by attempting to access the protected resource. The resource monitor determines if the user has an active session and, discovering that they do not, directs them to the service provider in order to start the SSO process. Step 2: service provider issues authentication request The user arrives at the service provider which prepares an authentication request and sends it and the user to the identity provider. The service provider software is generally installed on the same server as the resource. Step 3: user authenticated at identity provider When the user arrives at the identity provider it checks to see if the user has an existing session. If they do, they proceed to the next step. If not, the identity provider authenticates them (e.g. by prompting for, and checking, a username and password) and the user proceeds to the next step. Step 4: identity provider issues authentication response After identifying the user, the identity provider prepares an authentication response and sends it and the user back to the service provider. Step 5: service provider checks authentication response When the user arrives with the response from the identity provider, the service provider will validate the response, create a session for the user, and make some information retrieved from the response (e.g. the user's identifier) available to the protected resource. After this, the user is sent to the resource. 10 Step 6: resource returns content As in step 1, the user is now trying again to access the protected resource, but this time the user has a session and the resource knows who they are. With this information the resource will service the user's request and send back the requested data. 11 FEDERATED SEARCH TOOLS Introduction Does information in your organization reside in ‘silos’? Do you and your users have to remember multiple passwords? Do you send your patrons to the OPAC terminal to find audio-visual, texts and journals held in-house? Then forward them on to yet another computer to find online e-journals and perhaps a third link located on the same computer or a different one for internet access? Are your end users confused about one particular information source against another? Do results from a web search and a fee-based premium information source look totally different? When researching a subject can you imagine being able to do so in a single search, including subscription databases, intranet search engines and electronic publications, instead of doing multiple searches across different sources and deleting duplicates? What is federated search? Federated search technology enables users to search multiple information resources simultaneously through one search query. Users can then view search results in a single integrated list. In other words, users do no longer need to consult each information resource individually. Instead, they can search multiple library catalogs (OPACs), web sites, and subscription and citation databases all at once. Federated search technology is an integral component of an information portal, which provides the interface to diverse information resources. Once the user enters his or her search query in the search box of the information portal, the system uses federated search technology to send the search string to each resource that is incorporated into the portal. The individual information resources then send the information portal a list of results from the search query. Users can view the number of documents retrieved in each resource and link directly to each search result. For instance, federated search helps researchers avoid outdated articles and spam, allowing for the exploration of only the most pertinent information. Also, federated search enables private or other 12 collections that can't be indexed, to be searched (this is more common than you might imagine). In the library space, federated search evolved from the ‘broadcast search’, which involved Z39.50 protocol. Libraries moving beyond virtual on-line catalogues, gave the ability to include subscription databases, the internet and virtually anything in the electronic arena via authentication. Meta search, federated search, cross-searching of databases, parallel search, single search and broadcast search are terms that describe the current trend of offering simultaneous searching of multiple e-resources. Meta search Meta search offers simple and advanced search options. When a user submits a query to a meta search system, it broadcasts it to heterogeneous information resources simultaneously. Even though Z39.50 protocol and compatibility exists, the meta search system must make adjustments so that the databases search engine will make the relevant adjustments to produce appropriate answers to the query put forward. The algorithm displays the best results first. The process is in two stages. It delivers the query and obtains the number of hits, along with a reference to the result list. Dialog, Lexis Nexis and Ovid are some of the database providers, who provide cross database searching within their collections for some time now. Z39.50 protocol or standard was established in 1988 in order to offer a similar solution across library catalogues. Since Z39.50 was established prior to the web, it is difficult to use by the industry, and is not flexible for web protocols that are now available. Not all resources can be set up for meta searching. Some use Z39.50 protocols, others web http protocol and some xml, and yet others leave it to the meta-search vendor, to determine their methodology. Meta searching can be slow due to IP validation, filtering through a proxy server across resources both in-house and to external servers. 13 Approaches to federated search There are two distinct approaches to federated search, which can be labeled as indextime merging and query-time merging. Search-time merging in most circumstances, this is the faster and easier solution to implement. A query federator intercepts the query, and passes it to multiple search engines The federator then waits to hear replies from the search engines, and when received, merges or concatenates the results into a results list. This model relies on data repositories to provide a search function. Pros: The primary advantage of this approach is ease of implementation, because no additional indexing of content is necessary. The query federation system simply taps 14 into existing systems and extracts results, which are then merged. In some cases, query-based federation is the only viable option. For example: Federating to large-scale web content via a major search engine such as Google Federating to a private data set, held behind a pay-wall and therefore not available to be indexed locally Cons: Performance issues can occur if the federator waits for the slowest remote search engine to respond The merging of search results into a sensible hit list is difficult if based on relevancy, as each search engine called will score relevancy in a different way. Often, is it better not to attempt merge on relevancy but instead; either present separate results lists (behind tabs for example) or use a more deterministic data item to merge on, such as date, location or price, or present results from different sources in blocks. Search engines provide varying levels of query sophistication. Federation at query time usually implies a "dumbing down" to suit the least capable search engine, however, this need not always be the case. For example, sophisticated query parsers can be used to ensure that search clues are optimized for each search engine involved Document level security is a potential cause of performance issues, but this depends on the complexity of the security environment 15 Index-time merging This approach requires content to be acquired into a central index, and it is typical of traditional enterprise search systems. Pros: Most search engines default to ranking by relevancy, which is what most users expect. Through acquiring all data into a central index, sophisticated query enhancement and relevancy algorithms can be applied, providing the user with excellent search results. Cons: The effort needed to acquire the content from the various repositories can be substantial. This is done via read-only processes. The content of remote repositories is not moved or changed, but the indexing process must read each 16 item, and re-read it every time a change occurs. In some cases, for example where private content behind pay wall is involved, this is not possible Hybrid federated search Sometimes, the optimum solution is a hybrid approach. Where practical, content is indexed centrally. Repositories for which that are not cost effective (or simply not possible) are federated to a query time. If this approach is used, careful thought is needed about results presentation, to make sure that users understand how the system is set up, and how to navigate and interpret results efficiently. Which approach works best? The approach that works best all depends on your data environment and your user needs. Start by looking at the data environment, user requirements and business drivers, then informed decisions can be taken. In our engagements, this process usually begins with a search assessment. 17 Practical application of federated search tools Some federated search applications include: mednar.com - searches medical information sources. biznar.com - searches business-related sources. worldwidescience.org - searches science content from all over the world, from government agencies, as well as other quality research and academic organizations. http://search.smartlib-bibliogen.ca/zengine?vdxaction=zsearchsimple - searches capital smart library, consortium of libraries. http://osulibrary.oregonstate.edu/metafind/about.html - searches Oregon state university’s library. http://scienceroll.polymeta.com/search/ui7/searchfr.jsp?un=scienceroll - searches a medical student’s journey inside genetics and medicine through web 2.0. http://lifesearch.indexdata.dk/# - searches university of Copenhagen’s library of faculty of life sciences. scitopia.org - searches digital libraries of leading science and technology societies. http://www.techxtra.ac.uk - searches 31 different collections relevant to engineering, mathematics and computing, including content from over 50 publishers and providers. 18 Federated search tools options VuFind VuFind is a library resource portal designed and developed for libraries by libraries. The goal of VuFind is to enable your users to search and browse through all of your library's resources by replacing the traditional OPAC to include catalog records a) Locally cached journals b) Digital library items c) Institutional repository d) Institutional bibliography e) Other library collections and resources VuFind is completely modular so you can implement just the basic system, or all of the components. And since its open source, you can modify the modules to best fit your need or you can add new modules to extend your resource offerings. VuFind runs on Solr energy Apache Solr, an open source search engine, offers amazing performance and scalability to allow for VuFind to respond searching queries in milliseconds time. It has the ability to be distributed if you need to spread the load of the catalog over many servers or in a server farm environment. VuFind is offered for free through the GPL open source license. This means that you can use the software for free. You can modify the software and share your successes with the community! Features of VuFind Search with faceted results: The search system allows for the user to search from a basic search box and then to be able to narrow down the results by clicking on the various facets of the results. Browse for resources: The user has the ability to browse the catalog allowing them to explore what the library has rather than only being able to see a very narrow spectrum of results. 19 Author biographies: The user can learn more about the author with contextual information and see all of the books that they have written in the library. Persistent URLs: Allows the user to bookmark their queries or records to allow permanent access to a page they were once on. Zotero compatible: Your users can save and tag any records with Zotero or any other coins based application so they can store their records in one place. Resource suggestions: When viewing a record, the user will be offered suggestions of resources that are similar to the current resource. Blacklight Blacklight is an open source ruby on rails gem that provides a discovery interface for any Solr index. Blacklight provides a default user interface which is customizable via the standard rails mechanisms. Blacklight accommodates heterogeneous data, allowing different information displays for different types of objects. Blacklight uses Apache Solr an enterprise-scale index for its search engine. Features of Blacklight: Faceted browsing: Relevance based searching (with the ability to locally control the relevancy algorithms), Bookmark able items, Permanent URLs for every item, and user tagging of items. Blacklight is licensed under a Creative Commons Attribution-Share Alike 3.0 United States license and thus open source and is used for free and customized according to your liking. Considering it as an open source tool its features that is faceted browsing, relevance based searching (with the ability to locally control the relevancy algorithms), bookmark able items, thus conclude that tool is of great benefit to our libraries and a major tool in relation to access of e-resources. Subjectsplus Subjectsplus is developed by the Joyner library at East Carolina University. It became abandon ware, and with permission, an expanded version of this original software was open sourced and it still is. Its development was undertaken at the Ithaca college library and now the university of Miami libraries. 20 Features: Create guides: Create unlimited research guides via drag 'n' drop interface. Staff list: Sorted A-Z, by department, by librarian Database list: A-Z, by format, by subject Responsive design: Looks better on tablets & mobile devices Suggestion box: Easy way to display and respond to patron comments; now multi-site Video management: Ingest video metadata from YouTube or Vimeo, organize and display in one place on your site Customizable: You have complete control; add your own headers & footers, tweak the layout & CSS, add data via the API Multilingual: French, Spanish & Russian versions included; new translations coming. Subjectsplus is offered for free through the GPL open source license. This means that you can use the software for free. You can modify the software and share your successes with the community. It easily integrates with the library information systems and thus have no doubt affirming it as a great federated tool that is a necessity for better access of the e-resources materials in libraries. Google CSE Google custom search is a platform provided by Google that allows web developers to feature specialized information in web searches, refine and categorize queries and create customized search engines, based on Google search. Google custom search engine allows creators to select what websites will be used to search for information which helps to eliminate any unwanted websites or information. Google CSE users can also attach their custom search engine to any blog or webpage. How it works: User must first create a Gmail account or use an existing. Login to Google CSE and add the name of the CSE. Add the sites that one wants to search from e.g. the url to the online OPAC, eresources etc. The links should be added carefully i.e. Use the original or the home link. Once the links are added click on the option get code. A code will be provided which one can copy and add it to your website where users can search from. 21 The Google CSE is a free to use software. It is the easiest to install yet very effective and reliable. Its free and customizable thus one can customize it to their own liking. This can be a big step for libraries where users can search for e-materials from one central position and search for many databases, repositories and other site .the libraries should adopt it and implement it to libraries. Practical on how to implement the Google CSE 22 The benefits of federated search The essential benefits of federated search to its users include efficiency, quality of search results, and current, relevant content. Efficiency, time savings Using a federated search engine can be a huge time saver for researchers. Instead of needing to search many sources, one at a time, the federated search engine performs the many searches on the user’s behalf. While federated search engines specialize in finding content that requires form submissions to retrieve, it isn’t the only criterion for being a federated search engine. A federated search engine also associates content from different sources. Federated search uses just one search form to cover numerous sources, and combines the results into a single results page. Quality of results Federated search engines show their value best in environments in which the quality of results matters, such as libraries, corporate research environments, and governments. A major difference between a federated search engine and a standard search engine like Google is that the client who contracts for the federated search service selects the sources to search. In almost every case, the sources will be authoritative. Google, on the other hand, has very minimal criteria for source selection. If a web page doesn’t look like outright junk, Google will present it among the search results. Thus, the federated search engine acts as a helpful librarian does, directing users to excellent quality. Most current content In addition to filling out forms and combining documents from multiple sources, another important benefit of federated search engines is that they search content in real time. Real time data is crucial for researchers who are searching for up-to-the-minute content or for content that change frequently. As soon as the content owner updates their 23 source, the information is available to the searcher on the very next query. By contrast, with standard search engines/Google, the results are only as current as the last time that Google crawled sites with content that matches your search words. Content you find via Google might be days or weeks old, which can be fine depending on your situation, but can be problematic if you want the most current information. Marketing opportunities If resistance is low and libraries embrace federated search technology; this could put marketing library services in a whole new light. Because these systems can be accessed remotely, yet are simple and dynamic, this is an opportunity to expand the library's reach and service, making it the "digital one-stop service to users." with database acquisition decisions already being made by the library staff behind the scenes, users have few decisions to make on their end. For the average end user, the less decision making, the better. Google, for the general public, sets the gold standard for returning relevant results. Federated search offers another opportunity for libraries to out-Google Google, this time by returning relevant results that Google misses. When the appropriate databases are chosen in advance for the end user, then there is a higher likelihood of relevant results. Challenges in federated searches Nomenclature confusion The use of multiple names to describe the same thing plagues the information industry. Federated search is no exception. NISO, the U.S. National information standards organization, and many libraries claim federated searching as meta-searching. However, vendors in this space prefer not to be known as meta-search engines, as this conjures up thoughts of searching only previously crawled databases such as Google. 24 Access issues with federated search Verification, authentication, and certification can be difficult for the federated search vendor. Since federated search engines don't hold the data locally, meaning the engines perform the search and send the results back, the federated search engine must be able to access multiple, password-protected databases behind the scenes, all at one time, and show users their results in one easy-to-read interface. The challenge for federated search vendors is to ensure that only licensed users can access databases in an appropriate manner, as specified by their license. This may require a library or a corporation to set up multiple areas where only certain licensed users can access a federated search. Interface issues with federated search For several years now, libraries and corporate information centers have faced the "Google phenomenon." many patrons believe that doing a Google search covers all the bases. Libraries now have an excellent opportunity to provide a simple, yet powerful interface that out-Google’s Google. They can set up their interface based on subject and sources, or customize it to specific user needs. Libraries and corporations need to take note of Google’s simple interface--users expect an interface as streamlined as Google’s. Uncomplicated and intuitive interfaces without a high learning curve will see expanded usage. Most of the federated search vendors allow clients to create their own "look and feel" for the search interface and results pages. However, if you do not have the staff resources, they will often allow a more static look where little decision making on your part needs to be done. Removing duplicates De-duplication of results seems to be controversial in the federated search space. The gist of it is that most federated search engines will de-dup the results you have on your current results screen. Some of the federated search engines will even de-dup all results when requested. However, this opens up a pandora 's box about how the results 25 are returned. Anyone familiar with search engine optimization understands that audiences will usually only view the first 10 hits. How do the vendors and interface designers ensure the highest-quality hits are returned first? Would their algorithms include making the proprietary databases higher on the relevancy results? Maintenance Finally, after spending thousands of dollars of your library's budget on this magic tool, you, the librarian, will still have to set most of it up yourself, which can be complicated and time-consuming .additionally, when inputting the databases you subscribe to into the federated search engine to search, you may find that your FSE is not compatible with all of the databases to which your library subscribes 26 Questions/Feedback 27