Faculty/AP Un-Retreat, January 7, 2014 Session Topic: Discovery and Access: Primo or What’s Next Session Leaders: Bill Mischo and Tim Cole Introduction One of the University Library’s guiding values is “improving access to library content and collections” and Goal 1 of the Strategic Initiatives is to “Promote Access to, and Discovery of, Library Content and Collections. The University Library has pursued access to information resources through a variety of initiatives in the past decade and continues to seek improvements in this area. The two Un-Retreat sessions on Discovery and Access generated some wide-ranging and in-depth discussions regarding access services and elicited a great deal of useful information and opinions on Library discovery technologies and philosophy. While the session leaders did frame the group discussions around the efficacy of Primo and Easy Search, a myriad of information access and delivery mechanisms were discussed. We are at a critical juncture in our discovery and delivery strategy in the Library. The implementation of Primo has allowed us to examine key issues in search and discovery, including the role of a web-scale discovery system (WSDS) in the Library's Gateway, the relationship between a web-scale aggregated central index and the specialty disciplinary abstracting and indexing services the Library licenses, the effectiveness of vendor databases such as EBSCO databases, ISI, and Scopus when integrated into Primo, the use and effectiveness of blended result displays, instruction issues connected with WSDS, the relationship between a web-scale system and a federated search/recommender system such as Easy Search, the efficacy of full-text search as compared with metadata-based searching, and user search behavior within web-scale discovery systems. Session participants were presented with the following discussion questions: What principles should be the foundation for the Library’s “discovery and delivery strategy” (e.g., fully develop and implement fewer tools)? What does it mean to be “in the flow” of our user’s work? How can we best engage our user communities in order to understand their information search and retrieval needs? What practices could we adopt in the University Library to achieve a more coherent and efficient search, discovery, and delivery experience for our users? What are best practices in nimble implementation/retirement of systems? The proposed principle of “fully develop and implement fewer tools” was not interpreted by participants in either session as necessarily a desirable tactic or goal. Many of the other questions were addressed indirectly. It is important to note that the Discovery and Delivery Study Team (DDST) appointed by CAPT has discussed strategies for meeting all the goals and tasks detailed below. The DDST has held several open sessions with Library staff, APs, and faculty and will address all the goals and tasks identified by this Un-Retreat report. The DDST charge is provided as an appendix to this report. 1 Challenges: The session leaders provided background and historical information on federated search technologies, Easy Search, Primo, and other web-scale discovery systems (WSDS). The participants were informed that the Primo WSDS implementation was initially viewed as a natural progression from Easy Search. The Primo implementation team planned to utilize Primo’s Google-like display capabilities and the publisher-based collections in conjunction with the search suggestions and links within the Primo custom tile. The custom tile was designed to incorporate many of the Easy Search tactical search tips and suggestive prompts that had been developed for Easy Search. After early issues with the comprehensiveness of the Primo Central Index collection arose, the implementation team decided that loading additional A&I services into Primo would solve some of the comprehensiveness issues. The loading of major A&I service records from Scopus and Web of Knowledge is now being offered by several WSDS, but participants commented that loading this content into WSDS does not replace the native A&I services, as more metadata and controlled vocabularies -- in addition to more customizable interfaces -- are typically available in the disciplinary A&I services. Each session participant was asked to describe their use of Library discovery tools and services. The use of Primo by session participants was very infrequent and, in the opinion of participants, often did not result in a successful resolution of their (or the patron’s) information need. These problems center around Primo’s sole reliance on full-text search (every search is an AND search across the full-text of a document; there is no “metadata only” search and no OR search across selected fields) coupled with issues with the search results relevancy rankings and blended format result displays. For these reasons, searches for specific knownitems where there is no match in Primo can bring back a large number of irrelevant matches, and, when there is a match, the desired known-item match may not appear in the first several pages of Primo results – although this has improved markedly recently for some searches (but not all). There is some irony in the fact that approximately 50% of the searches we see at the Gateway are known-item searches but that Primo is relying on a full-text search system for effective retrieval of these known-items. At the same time, topical searches may bring back results with matches from words on separate pages of the full-text as the highest ranked results – rather than matches from words in the title, subject vocabulary, and abstract. In addition, Primo may match on data that cannot be displayed to users due to contractual reasons between Primo/Ex Libris and database producers. Because of this, in general, Primo is not being promoted by instructional and subject librarians, particularly as testing has shown other tools are better suited to more precise or relevant search results sets. While these problems exist in all WSDS, Primo, in particular, is a poor resource for topical undergraduate research and its introduction as a potential tool in library instruction for undergraduate students is questionable. It also does not offer the comprehensiveness and custom relevancy rankings that subject specialists are accustomed to with their disciplinary A&I services. All Un-Retreat session participants noted that they typically used a combination of Easy Search and one (or both) of the Voyager and VuFind catalogs. Many of the session participants also 2 used disciplinary A&I services, citing fuller metadata records and enhanced and flexible search features and mechanisms. Participants noted that there is little compelling reason or need to use Primo. Online catalog information and access mechanisms are covered in the Voyager and VuFind OPACs. While Primo has added the Scopus and ISI Web of Knowledge A&I services, they are indexed in the same way as all content in the Primo Central Index and are subject to Primo’s relevancy ranking problems. They are not as useful within Primo as they are in their native form, especially given that within Primo the records are represented with fewer data elements and reduced metadata. There was general agreement that Easy Search is easier to train on, that it is easier to customize at the departmental library/subject area level, and that the grouping of search result targets by result categories mimicked the “bento box” approach employed by several other ARL libraries. Easy Search helps guide newbies looking for best database option, but admittedly also suffers from what can be an overwhelming display format and information overkill. Statistics from the Primo custom tile logs show that approximately 75 searches a day are being performed from the native Primo front-end interface. That’s a small number. All other Primo searches are from Easy Search -- which is averaging over 6,500 searches per day. In addition, from analysis of Easy Search logs, we know that users do not demonstrate a preference for Primo results from the target display listings. Some other clearly defined challenges emerged in our conversations: Challenge #1 -- we need to better define the information needs of our users and address the identified issues and problems we have with our information resources and determine how the Library's collections and resources are best placed into the pathway of the user. Challenge #2 -- we need to better define priorities; that should come before we institute a WSDS. Goal 1: Get more complete data about our how our users search and the types of searches they are performing. Strategies for Goal 1: The custom transaction logs that we have gathered from the Gateway and departmental library single entry search boxes have provided insights into user search behaviors. They have been used to design and deploy the search assistance mechanisms and tactical tips used in both Easy Search and the Primo custom tile. Several detailed log analyses (the most recent in 2011) have identified the types of searches being performed, and several studies have looked at Primo, Scopus, and Ebsco database coverage and retrieval effectiveness. The large-scale transaction log data has been supplemented by user interviews and focus group interviews as well as focused log analysis projects. 3 Task 1: Identify a representative sample set of user searches (a test suite) and use them to examine the performance of different WSDS, including Primo, Ebsco Discovery Services, Summon, Google Scholar, and WorldCat Local in terms of coverage, ease of retrieval, and delivery effectiveness. This is part of the “bakeoff” that has been widely discussed. Task 2: Look at the target clickthrough patterns identified in the logs, particularly with regard to search success. Task 3: Examine the session tracking available in Easy Search to observe user navigation patterns after search has been initiated. Task 4: Examine search reformulation patterns and how search support systems (like the suggestions in Easy Search and custom tile in Primo) can incorporate these findings. Task 5: Compare what we know about searches at the Gateway with searches being done in disciplinary A&Is. Goal 2: Provide Discovery Approaches that Address our User Needs Participants felt that the overriding concern of the Library should be in providing access services that address our environment of heterogeneous users. We have a broad continuum of users and user needs, interests, and discovery characteristics. We need to address the needs of a wide variety of users -- from undergraduate students with few information literacy and scholarly communication skills to senior researchers in very specific subject domains. It will be difficult if not impossible to provide a one-stop shopping environment for these users. Some users do not utilize or need Easy Search or Primo. A large number of users and reference staff perform predominantly known-item searching and know what they are looking for. For these users, efficient delivery of content is most important. These known-item searches cover a variety of materials formats. The Library needs to highlight and place in front of the user the tools that are most useful. Library tools and services need to be placed in the flow of our users’ work. It is important to put people into a situation where they can readily access the most useful resources. With the importance of the A&I services, we need to determine how to best direct the user to the most relevant database in the subject area. We cannot design a perfect system, but we need to design an evolving, improving system, There may not be an ideal system: We currently have multiple search and discovery systems and multiple user paths. We seem to struggle with consensus --that in itself is an answer. We know that our users demand online full-text. There has been a growth in the use of ebooks and there is an almost universal reliance on e-journals. Strategies for Goal 2: 4 Task 1: Work on a better Wayfinding function. Develop mechanisms for matching the most appropriate tools and techniques with the specific user community. Task 2: Design a system that combines the Easy Search recommender approach with the WSDS model. This could possibly be done with extending the custom tile functionality. . Task 3: Other display approaches may be easier for users to comprehend and use. Easy Search is a bento box approach similar to Google’s grouping by type or format. Note that Rochkind, NCSU, and other bento approaches require going into the target displays to reach the full-text links – just as the user has to do in Easy Search. In many ways, Easy Search incorporates many of the best practices currently emerging in WSDS but is lacking the quick display of sample results. Task 4: Look at a more “classical” bento box utilizing Easy Search and a WSDS. Task 5: Given that full-text delivery is paramount, examine the relationships between search techniques, discoverability mechanisms, retrieval effectiveness, and delivery technologies. The participants strongly felt that we need to clean up the data and the tools we have. Perhaps more important than deploying a WSDS is fixing the problems we have with existing services. There are record accuracy problems and SFX sometimes gives bad results. These errors are only magnified in a WSDS and so, regardless of the specific tools we adopt, must be addressed. There is great frustration with trying to locate things we have but that don't show up in discovery tools, including things we've digitized but don't show up in search results. For the first time, a system like the Primo online catalog scope has allowed for the amalgamation of the Library's MARC bibliographic records for the physical collections and non-MARC metadata records for our digitized collections. We have learned that mixing these local collections into a blended search that also includes millions of article citations may not be the most optimum search solution. Goal 3: Investigate the Overarching Role of a WSDS WSDS seem to be more useful for bachelor degree granting institutions than Research I institutions. The Primo Central Index (PCI) has deep coverage over many sources but is still not comprehensive and cannot take the place of the myriad subject A&I Services and publisher repository search systems. And the WSDS interface is not as feature rich and customized to the subject discipline (e.g. there is no Chemical Abstracts Registry Index search in Primo). Primo is intended as a Google for academics. It may be good for survey searches but there are issues with scope of coverage and relevancy rankings for known-item search. In addition, Primo tends to obscure the characteristics that help undergraduate students determine what kind of item something is and whether it is considered scholarly within the discipline of the course they are taking. 5 There is overlap but also a clear difference between a WSDS and disciplinary A&Is. The A&Is will always offer search features not available in a lowest common denominator WSDS. The specialty databases and publisher portals are still needed. Strategies for Goal 3: Task 1: Look at additional custom tile functions that can mitigate some of the issues with WSDS. This will aid us in future development for whatever service we utilize. Task 2: We know that some locally developed services (e.g. Journal and Article Locator, Archon) provide important access and delivery services. Look at their integration into WSDS and further integration through the Gateway and subject library websites. Task 3: The literature is less than clear regarding the intended audience of a WSDS. Several surveys suggest institutions view the primary role of a WSDS as serving undergraduates; however, our local analysis has determined that undergraduate needs are better met through EBSCO Academic Search. Is there a way to balance undergraduate needs with the needs of users accessing the Library through the single entry search box on the Gateway? Task 4: Because of the difficulties users have with WSDS blended result displays and the relevancy rankings in the Primo Central Index, it was suggested that a version of Primo that includes only monographs and local digital content be generated (we have put together a Primo View for this). Some participants felt that a useful WSDS may be an impossible dream or a solution in search of a problem. However, it may be that we picked the wrong WSDS. We should stand up a fully-loaded functional EBSCO Discovery Service, a Summon system, and a WorldCat Local system (we already have WorldCat Local) to compare with Primo. Participants felt that we need to find a way to support discovery of non-bibliographic information within special collections, data, etc. Primo is not a good solution for finding datasets and datasets will be of increasing importance. Participants also felt that we need to find a way to make sure staff know more about what tools we have (and what they cover and what they should be used for). They also felt that we need to involve more people with regard to WSDS selection and implementation -- more than CAPT. We also need more networking with others. With all we have learned regarding user search over the past several years, we should take a leadership role in helping determine the best scenarios regarding search and discovery. We could sponsor a conference here on WSDS or use a CIC conference for discussions with others member libraries. We are all facing similar situations. And we may need a broader discussion of what should be in the WSDS. Note: Lisa Hinchliffe and Susan Avery will be presenting at the CIC Spring Information Literacy Conference on their discovery analysis work. 6 Resources needed for addressing the Goals and Tasks identified in this report. The DDST wants to use some of the remaining funds in the recurring Student Library/IT Fee allocation for Next-Generation System Support to fund full-blown UIUC EBSCO Discovery Services and Summon WSDS implementations. These systems will be used in the comparative retrieval studies testing and comparisons. There is a recurring allocation of $250K in this budget line. While much of the transaction log comparative studies can be automated or partially automated, there will be a need for hourly GAs to perform the detailed analysis and assist in the statistical analysis. It is important to identify this representative sample set of user search so we can use it to test any available system and the funding to set up this study should be set up as soon as possible. Appendix: Discovery and Delivery Study Team Charge: The Discovery and Delivery Study Team is charged to develop a recommended “discovery and delivery strategy” for the University Library. Developing this strategy entails comprehensive review of how the Library currently facilitates discovery of and provide access to content, the marketplace of current and emerging search, retrieval and access technologies, and approaches for coordinating methods and techniques throughout the Library’s decentralized service structure as well as articulations of principles and assumptions that should guide the Library’s work in this area. The Discovery and Delivery Study Team will review the Priorities (2003) from the Taskforce on Access, and evaluate the Library’s implementation work over the past decade as well as user perspectives gathered through user surveys, usability studies, and search log analysis. Through this review, as well as forums for library employees and users to discuss current challenges and opportunities, the Team will identify 4-6 topics for small groups to investigate. The Study Team will recommend the topics and small group membership to CAPT, which will determine both the topics and the makeup of the small groups. The small groups will investigate these topics in detail and develop recommendations. The Study Team will then articulate these recommendations as well as principles and assumptions into an integrated discovery and delivery strategy, which will be submitted to CAPT. Membership: Lisa Hinchliffe (co-lead), Bill Mischo (co-lead), Kirstin Dougan, Sarah Williams, Michael Norman, Susan Avery. 7