Technical_aspects_of_managing_e_resources

advertisement
TECHNICAL ASPECTS OF
MANAGING E-RESOURCES
FACILITATOR’S HANDBOOK
Authors:
Table of Contents
E-RESOURCES AUTHENTICATION.............................................................................. 5
Authentication tools and controls ................................................................................. 5
Hardware tokens ...................................................................................................... 5
Software tokens ....................................................................................................... 5
Digital certificates on smart cards and USB tokens.................................................. 6
Challenge response ................................................................................................. 6
Biometric authentication ........................................................................................... 6
Out-of-band authentication ....................................................................................... 6
IP authentication ...................................................................................................... 7
Proxy servers and ezProxy ...................................................................................... 7
Examples of authentication tools ................................................................................. 7
OpenVPN technologies ............................................................................................ 7
JASIG....................................................................................................................... 8
SimpleSAMLphp ...................................................................................................... 8
Shibboleth ................................................................................................................ 9
Concept ..................................................................................................................... 10
How it works ........................................................................................................... 10
2
FEDERATED SEARCH TOOLS ................................................................................... 12
Introduction ................................................................................................................ 12
What is federated search? ......................................................................................... 12
Meta search ............................................................................................................... 13
Approaches to federated search ................................................................................ 14
Index-time merging ................................................................................................ 16
Hybrid federated search ......................................................................................... 17
Which approach works best? ..................................................................................... 17
Practical application of federated search tools .......................................................... 18
Federated search tools options ................................................................................. 19
VuFind.................................................................................................................... 19
Blacklight ................................................................................................................ 20
Subjectsplus ........................................................................................................... 20
Google CSE ........................................................................................................... 21
The benefits of federated search ............................................................................... 23
Efficiency, time savings .......................................................................................... 23
Quality of results .................................................................................................... 23
Most current content .............................................................................................. 23
3
Marketing opportunities .......................................................................................... 24
Challenges in federated searches ............................................................................. 24
Nomenclature confusion ........................................................................................ 24
Access issues with federated search ..................................................................... 25
Interface issues with federated search ................................................................... 25
Removing duplicates .............................................................................................. 25
Maintenance .......................................................................................................... 26
4
E-RESOURCES AUTHENTICATION
In many organizations, the function of verifying a user's identity — known as
authentication — is important in establishing trust in critical business processes. In its
simplest form, authentication is the act of verifying a person's claim on his or her identity
and is usually implemented through a username and password combination when
logging into an IT system or application. As this definition suggests, part of the
authentication process consists of correctly identifying a user, application, or group.
There are multiple ways by which users can provide their identity, such as typing a
username and password. In fact, the basis of authentication lies in the principle that
without a proper form of identification, a system will not be able to correlate an
authentication factor with a specific subject.
Authentication tools and controls
As mentioned earlier, organizations can use other authentication tools besides
usernames and passwords. Following is a list of the main authentication tools or
controls auditors can recommend:
Hardware tokens
These devices display generated random numbers that change every 60 seconds and
are synchronized with the authenticating system. Users simply type the number that is
displayed on the token whenever they need to login.
Software tokens
These software programs generate a unique string of characters that is identified by the
authenticating system and resides in the computer's hard drive or another device, such
as storage media, a personal digital assistance, or compact disc.
5
Digital certificates on smart cards and USB tokens
These unique certificates are issued by a third-party certifying authority or by the
operating system to ensure users are communicating with the right person or device.
Digital certificates contain specific identifying information and are governed by an
international standard, standard x.509.
Challenge response
This activity consists of a question-answer dialog where the user responds to a set of
pre-recorded questions, such as the mother's maiden name, or a token device that
generates passwords or responses based on a pre-determined algorithm. When using a
token device, the authentication system displays a challenge in the form of a code or a
password phrase. The user then enters the challenge into the token device, which
provides a response containing the code or password phrase the user must reenter into
the system for authentication.
Biometric authentication
This is the use of technologies that measure and analyze a person's physical and
behavioral characteristics (e.g., fingerprints, eye retinas and irises, facial patterns, and
hand measurements) to authenticate the individual into a system.
Out-of-band authentication
Under this method, the authentication device accepts the person's credentials and
sends a secret password to the user through an out-of-band medium, such as an email, short message service, or phone call. The password is then valid for a one-time
use only.
6
IP authentication
Publishers/users uses an organization’s outward facing IP addresses as a means to
identify users coming from a subscribing institution and in turn authenticate access to
the subscribed resources.
Proxy servers and ezProxy
Many libraries use proxy servers as a tool to help authenticate offsite users who are
unable to be authenticated by an institution’s IP address.
Examples of authentication tools
OpenVPN technologies
 This is a privately held company based in the Pleasanton, California, integrating
a suite of leading-edge networking and software technologies.
 OpenVPN technologies has designed and deployed virtual network software that
provides secure, reliable, and scalable communication services, fulfilling the
requirements of the traditional virtual private network (VPN) market.
 Tunnel any IP sub network or virtual ethernet adapter over a single UDP or TCP
port,
 Configure a scalable, load-balanced VPN server farm using one or more
machines which can handle thousands of dynamic connections from incoming
VPN clients,
 Use all of the encryption, authentication, and certification features of the
openSSL library to protect your private network traffic as it transits the internet,
 Create secure ethernet bridges using virtual tap devices, and control OpenVPN
using a GUI on windows or Mac Os X.
 The OpenVPN is a gateway to allow remote access of the e-resources. Libraries
need to embrace the and implement it and thus allowing the accessibility of their
repositories, digital libraries, journals e-books and other electronic materials
7
available to the users away from the libraries which will in turn increase and bring
information closer to the users.
JASIG
 This is a central authentication service project, more commonly referred to as
CAS. CAS is an authentication system created by Yale University to provide a
trusted way for an application to authenticate a user. It became a JASIG project
in December 2004.
 An open and well-documented protocol
 An open-source java server component
 A library of clients for java, .net, php, perl, Apache, uportal, and others
 Integrates with Uportal, Bluesocket, Tikiwiki, Mule, Liferay, Moodle and others
 Community documentation and implementation support
 An extensive community of adopters and users.
 The JASIG community is committed to increasing openness in higher education
and beyond thus free and open access to all versions of the software and have
access to the entire community surrounding the software.
SimpleSAMLphp
SimpleSAMLphp is an application written in native PHP that deals with authentication.
The project is led by uninett, has a large user base, a helpful user community and a
large set of external contributors.
It uses the following included authentication modules
 Simple LDAP
 Multiple LDAP
 CAS remote authentication lets you connect authentication to your existing CAS
service, and subsequently retrieve attributes from LDAP.
 Radius authentication lets to check the credentials against a radius server
8
 SQL authentication
 Open id
Shibboleth
Shibboleth (http://shibboleth.internet2.edu/about.html) is an open source software
package that allows an individual to use a single username/login to access multiple
online resources which are subscribed to by their institution.
 Shibboleth is a widely deployed federated identity solution, connecting users to
applications both within and between organizations.
 It’s an open-source project that provides single sign-on capabilities and allows
sites to make informed authorization decisions for individual access of protected
online resources in a privacy-preserving manner.
9
Concept
 It works the same as every other web-based Single Sign-On (SSO) system.
 It is adherence to standards, ability to provide SSO support to services outside of
a user's organization and protects privacy.
How it works
 Step 1: user accesses the resource
The user starts by attempting to access the protected resource. The resource
monitor determines if the user has an active session and, discovering that they
do not, directs them to the service provider in order to start the SSO process.
 Step 2: service provider issues authentication request
The user arrives at the service provider which prepares an authentication request
and sends it and the user to the identity provider. The service provider software is
generally installed on the same server as the resource.
 Step 3: user authenticated at identity provider
When the user arrives at the identity provider it checks to see if the user has an
existing session. If they do, they proceed to the next step. If not, the identity
provider authenticates them (e.g. by prompting for, and checking, a username
and password) and the user proceeds to the next step.
 Step 4: identity provider issues authentication response
After identifying the user, the identity provider prepares an authentication
response and sends it and the user back to the service provider.
 Step 5: service provider checks authentication response
When the user arrives with the response from the identity provider, the service
provider will validate the response, create a session for the user, and make some
information retrieved from the response (e.g. the user's identifier) available to the
protected resource. After this, the user is sent to the resource.
10
 Step 6: resource returns content
As in step 1, the user is now trying again to access the protected resource, but
this time the user has a session and the resource knows who they are. With this
information the resource will service the user's request and send back the
requested data.
11
FEDERATED SEARCH TOOLS
Introduction
 Does information in your organization reside in ‘silos’?
 Do you and your users have to remember multiple passwords?
 Do you send your patrons to the OPAC terminal to find audio-visual, texts and
journals held in-house? Then forward them on to yet another computer to find online e-journals and perhaps a third link located on the same computer or a
different one for internet access?
 Are your end users confused about one particular information source against
another?
 Do results from a web search and a fee-based premium information source look
totally different?
 When researching a subject can you imagine being able to do so in a single
search, including subscription databases, intranet search engines and electronic
publications, instead of doing multiple searches across different sources and
deleting duplicates?
What is federated search?
Federated search technology enables users to search multiple information resources
simultaneously through one search query. Users can then view search results in a
single integrated list. In other words, users do no longer need to consult each
information resource individually. Instead, they can search multiple library catalogs
(OPACs), web sites, and subscription and citation databases all at once.
Federated search technology is an integral component of an information portal, which
provides the interface to diverse information resources. Once the user enters his or her
search query in the search box of the information portal, the system uses federated
search technology to send the search string to each resource that is incorporated into
the portal. The individual information resources then send the information portal a list of
results from the search query. Users can view the number of documents retrieved in
each resource and link directly to each search result. For instance, federated search
helps researchers avoid outdated articles and spam, allowing for the exploration of only
the most pertinent information. Also, federated search enables private or other
12
collections that can't be indexed, to be searched (this is more common than you might
imagine).
In the library space, federated search evolved from the ‘broadcast search’, which
involved Z39.50 protocol. Libraries moving beyond virtual on-line catalogues, gave the
ability to include subscription databases, the internet and virtually anything in the
electronic arena via authentication.
Meta search, federated search, cross-searching of databases, parallel search, single
search and broadcast search are terms that describe the current trend of offering
simultaneous searching of multiple e-resources.
Meta search
Meta search offers simple and advanced search options. When a user submits a query
to a meta search system, it broadcasts it to heterogeneous information resources
simultaneously. Even though Z39.50 protocol and compatibility exists, the meta search
system must make adjustments so that the databases search engine will make the
relevant adjustments to produce appropriate answers to the query put forward. The
algorithm displays the best results first. The process is in two stages. It delivers the
query and obtains the number of hits, along with a reference to the result list. Dialog,
Lexis Nexis and Ovid are some of the database providers, who provide cross database
searching within their collections for some time now. Z39.50 protocol or standard was
established in 1988 in order to offer a similar solution across library catalogues. Since
Z39.50 was established prior to the web, it is difficult to use by the industry, and is not
flexible for web protocols that are now available. Not all resources can be set up for
meta searching. Some use Z39.50 protocols, others web http protocol and some xml,
and yet others leave it to the meta-search vendor, to determine their methodology. Meta
searching can be slow due to IP validation, filtering through a proxy server across
resources both in-house and to external servers.
13
Approaches to federated search
There are two distinct approaches to federated search, which can be labeled as indextime merging and query-time merging.
Search-time merging
in most circumstances, this is the faster and easier solution to implement.

A query federator intercepts the query, and passes it to multiple search engines

The federator then waits to hear replies from the search engines, and when
received, merges or concatenates the results into a results list.
This model relies on data repositories to provide a search function.
Pros:
The primary advantage of this approach is ease of implementation, because no
additional indexing of content is necessary. The query federation system simply taps
14
into existing systems and extracts results, which are then merged. In some cases,
query-based federation is the only viable option. For example:

Federating to large-scale web content via a major search engine such as Google

Federating to a private data set, held behind a pay-wall and therefore not
available to be indexed locally
Cons:

Performance issues can occur if the federator waits for the slowest remote
search engine to respond

The merging of search results into a sensible hit list is difficult if based on
relevancy, as each search engine called will score relevancy in a different way.
Often, is it better not to attempt merge on relevancy but instead; either present
separate results lists (behind tabs for example) or use a more deterministic data
item to merge on, such as date, location or price, or present results from different
sources in blocks.

Search engines provide varying levels of query sophistication. Federation at
query time usually implies a "dumbing down" to suit the least capable search
engine, however, this need not always be the case. For example, sophisticated
query parsers can be used to ensure that search clues are optimized for each
search engine involved

Document level security is a potential cause of performance issues, but this
depends on the complexity of the security environment
15
Index-time merging
This approach requires content to be acquired into a central index, and it is typical of
traditional enterprise search systems.
Pros:

Most search engines default to ranking by relevancy, which is what most users
expect. Through acquiring all data into a central index, sophisticated query
enhancement and relevancy algorithms can be applied, providing the user with
excellent search results.
Cons:

The effort needed to acquire the content from the various repositories can be
substantial. This is done via read-only processes. The content of remote
repositories is not moved or changed, but the indexing process must read each
16
item, and re-read it every time a change occurs. In some cases, for example
where private content behind pay wall is involved, this is not possible
Hybrid federated search
Sometimes, the optimum solution is a hybrid approach. Where practical, content is
indexed centrally. Repositories for which that are not cost effective (or simply not
possible) are federated to a query time. If this approach is used, careful thought is
needed about results presentation, to make sure that users understand how the system
is set up, and how to navigate and interpret results efficiently.
Which approach works best?
The approach that works best all depends on your data environment and your user
needs. Start by looking at the data environment, user requirements and business
drivers, then informed decisions can be taken. In our engagements, this process
usually begins with a search assessment.
17
Practical application of federated search tools
Some federated search applications include:
 mednar.com - searches medical information sources.
 biznar.com - searches business-related sources.
 worldwidescience.org - searches science content from all over the world, from
government agencies, as well as other quality research and academic
organizations.
 http://search.smartlib-bibliogen.ca/zengine?vdxaction=zsearchsimple - searches
capital smart library, consortium of libraries.
 http://osulibrary.oregonstate.edu/metafind/about.html - searches Oregon state
university’s library.
 http://scienceroll.polymeta.com/search/ui7/searchfr.jsp?un=scienceroll - searches
a medical student’s journey inside genetics and medicine through web 2.0.
 http://lifesearch.indexdata.dk/# - searches university of Copenhagen’s library of
faculty of life sciences.
 scitopia.org - searches digital libraries of leading science and technology
societies.
 http://www.techxtra.ac.uk - searches 31 different collections relevant to
engineering, mathematics and computing, including content from over 50
publishers and providers.
18
Federated search tools options
VuFind
VuFind is a library resource portal designed and developed for libraries by libraries. The
goal of VuFind is to enable your users to search and browse through all of your library's
resources by replacing the traditional OPAC to include catalog records
a) Locally cached journals
b) Digital library items
c) Institutional repository
d) Institutional bibliography
e) Other library collections and resources
 VuFind is completely modular so you can implement just the basic system, or all
of the components. And since its open source, you can modify the modules to
best fit your need or you can add new modules to extend your resource offerings.
 VuFind runs on Solr energy Apache Solr, an open source search engine, offers
amazing performance and scalability to allow for VuFind to respond searching
queries in milliseconds time. It has the ability to be distributed if you need to
spread the load of the catalog over many servers or in a server farm
environment.
 VuFind is offered for free through the GPL open source license. This means that
you can use the software for free. You can modify the software and share your
successes with the community!
Features of VuFind
 Search with faceted results: The search system allows for the user to search
from a basic search box and then to be able to narrow down the results by
clicking on the various facets of the results.
 Browse for resources: The user has the ability to browse the catalog allowing
them to explore what the library has rather than only being able to see a very
narrow spectrum of results.
19
 Author biographies: The user can learn more about the author with contextual
information and see all of the books that they have written in the library.
 Persistent URLs: Allows the user to bookmark their queries or records to allow
permanent access to a page they were once on.
 Zotero compatible: Your users can save and tag any records with Zotero or any
other coins based application so they can store their records in one place.
 Resource suggestions: When viewing a record, the user will be offered
suggestions of resources that are similar to the current resource.
Blacklight
Blacklight is an open source ruby on rails gem that provides a discovery interface for
any Solr index. Blacklight provides a default user interface which is customizable via the
standard rails mechanisms. Blacklight accommodates heterogeneous data, allowing
different information displays for different types of objects. Blacklight uses Apache Solr
an enterprise-scale index for its search engine.
Features of Blacklight:
 Faceted browsing: Relevance based searching (with the ability to locally control
the relevancy algorithms),
 Bookmark able items, Permanent URLs for every item, and user tagging of items.
 Blacklight is licensed under a Creative Commons Attribution-Share Alike 3.0
United States license and thus open source and is used for free and customized
according to your liking.
 Considering it as an open source tool its features that is faceted browsing,
relevance based searching (with the ability to locally control the relevancy
algorithms), bookmark able items, thus conclude that tool is of great benefit to
our libraries and a major tool in relation to access of e-resources.
Subjectsplus
 Subjectsplus is developed by the Joyner library at East Carolina University.
 It became abandon ware, and with permission, an expanded version of this
original software was open sourced and it still is.
 Its development was undertaken at the Ithaca college library and now the
university of Miami libraries.
20
Features:









Create guides: Create unlimited research guides via drag 'n' drop interface.
Staff list: Sorted A-Z, by department, by librarian
Database list: A-Z, by format, by subject
Responsive design: Looks better on tablets & mobile devices
Suggestion box: Easy way to display and respond to patron comments; now
multi-site
Video management: Ingest video metadata from YouTube or Vimeo, organize
and display in one place on your site
Customizable: You have complete control; add your own headers & footers,
tweak the layout & CSS, add data via the API
Multilingual: French, Spanish & Russian versions included; new translations
coming.
Subjectsplus is offered for free through the GPL open source license. This
means that you can use the software for free. You can modify the software and
share your successes with the community. It easily integrates with the library
information systems and thus have no doubt affirming it as a great federated tool
that is a necessity for better access of the e-resources materials in libraries.
Google CSE



Google custom search is a platform provided by Google that allows web
developers to feature specialized information in web searches, refine and
categorize queries and create customized search engines, based on Google
search.
Google custom search engine allows creators to select what websites will be
used to search for information which helps to eliminate any unwanted websites or
information.
Google CSE users can also attach their custom search engine to any blog or
webpage.
How it works:






User must first create a Gmail account or use an existing.
Login to Google CSE and add the name of the CSE.
Add the sites that one wants to search from e.g. the url to the online OPAC, eresources etc.
The links should be added carefully i.e. Use the original or the home link.
Once the links are added click on the option get code.
A code will be provided which one can copy and add it to your website where
users can search from.
21
The Google CSE is a free to use software. It is the easiest to install yet very effective
and reliable. Its free and customizable thus one can customize it to their own liking. This
can be a big step for libraries where users can search for e-materials from one central
position and search for many databases, repositories and other site .the libraries should
adopt it and implement it to libraries.
Practical on how to implement the Google CSE
22
The benefits of federated search
The essential benefits of federated search to its users include efficiency, quality of
search results, and current, relevant content.
Efficiency, time savings
Using a federated search engine can be a huge time saver for researchers. Instead of
needing to search many sources, one at a time, the federated search engine performs
the many searches on the user’s behalf. While federated search engines specialize in
finding content that requires form submissions to retrieve, it isn’t the only criterion for
being a federated search engine. A federated search engine also associates content
from different sources. Federated search uses just one search form to cover numerous
sources, and combines the results into a single results page.
Quality of results
Federated search engines show their value best in environments in which the quality of
results matters, such as libraries, corporate research environments, and governments.
A major difference between a federated search engine and a standard search engine
like Google is that the client who contracts for the federated search service selects the
sources to search. In almost every case, the sources will be authoritative. Google, on
the other hand, has very minimal criteria for source selection. If a web page doesn’t look
like outright junk, Google will present it among the search results. Thus, the federated
search engine acts as a helpful librarian does, directing users to excellent quality.
Most current content
In addition to filling out forms and combining documents from multiple sources, another
important benefit of federated search engines is that they search content in real time.
Real time data is crucial for researchers who are searching for up-to-the-minute content
or for content that change frequently. As soon as the content owner updates their
23
source, the information is available to the searcher on the very next query. By contrast,
with standard search engines/Google, the results are only as current as the last time
that Google crawled sites with content that matches your search words. Content you
find via Google might be days or weeks old, which can be fine depending on your
situation, but can be problematic if you want the most current information.
Marketing opportunities
If resistance is low and libraries embrace federated search technology; this could put
marketing library services in a whole new light. Because these systems can be
accessed remotely, yet are simple and dynamic, this is an opportunity to expand the
library's reach and service, making it the "digital one-stop service to users." with
database acquisition decisions already being made by the library staff behind the
scenes, users have few decisions to make on their end. For the average end user, the
less decision making, the better. Google, for the general public, sets the gold standard
for returning relevant results. Federated search offers another opportunity for libraries to
out-Google Google, this time by returning relevant results that Google misses. When
the appropriate databases are chosen in advance for the end user, then there is a
higher likelihood of relevant results.
Challenges in federated searches
Nomenclature confusion
The use of multiple names to describe the same thing plagues the information industry.
Federated search is no exception. NISO, the U.S. National information standards
organization, and many libraries claim federated searching as meta-searching.
However, vendors in this space prefer not to be known as meta-search engines, as this
conjures up thoughts of searching only previously crawled databases such as Google.
24
Access issues with federated search
Verification, authentication, and certification can be difficult for the federated search
vendor. Since federated search engines don't hold the data locally, meaning the
engines perform the search and send the results back, the federated search engine
must be able to access multiple, password-protected databases behind the scenes, all
at one time, and show users their results in one easy-to-read interface. The challenge
for federated search vendors is to ensure that only licensed users can access
databases in an appropriate manner, as specified by their license. This may require a
library or a corporation to set up multiple areas where only certain licensed users can
access a federated search.
Interface issues with federated search
For several years now, libraries and corporate information centers have faced the
"Google phenomenon." many patrons believe that doing a Google search covers all the
bases. Libraries now have an excellent opportunity to provide a simple, yet powerful
interface that out-Google’s Google. They can set up their interface based on subject
and sources, or customize it to specific user needs. Libraries and corporations need to
take note of Google’s simple interface--users expect an interface as streamlined as
Google’s. Uncomplicated and intuitive interfaces without a high learning curve will see
expanded usage. Most of the federated search vendors allow clients to create their own
"look and feel" for the search interface and results pages. However, if you do not have
the staff resources, they will often allow a more static look where little decision making
on your part needs to be done.
Removing duplicates
De-duplication of results seems to be controversial in the federated search space. The
gist of it is that most federated search engines will de-dup the results you have on your
current results screen. Some of the federated search engines will even de-dup all
results when requested. However, this opens up a pandora 's box about how the results
25
are returned. Anyone familiar with search engine optimization understands that
audiences will usually only view the first 10 hits. How do the vendors and interface
designers ensure the highest-quality hits are returned first? Would their algorithms
include making the proprietary databases higher on the relevancy results?
Maintenance
Finally, after spending thousands of dollars of your library's budget on this magic tool,
you, the librarian, will still have to set most of it up yourself, which can be complicated
and time-consuming .additionally, when inputting the databases you subscribe to into
the federated search engine to search, you may find that your FSE is not compatible
with all of the databases to which your library subscribes
26
Questions/Feedback
27
Download