Timothy W. Miller Fall 2013 SLIS 289 Core Competency E: design, query and evaluate information retrieval systems. Statement of Competency: Information retrieval is a complex process that must be handled according to the way that the data is structured. A database such as Dialog is structured very differently from the basket of children’s board books at the public library. Understanding how databases are designed is a requisite for knowing how to query and find information. Evaluating the search results to ensure that the queries are precise and aren’t inadvertently filtering out the information you want is also a skill that requires a basic understanding of how databases are designed. I have studied various types of databases, have created simple databases and have also created simple search interfaces to gain a better understanding of how I can best conduct my searches. I have worked with advanced features of basic web search engines such as Google and Bing and have investigated other methods of retrieving information from the Internet. I have found that whether I am searching on Yahoo!, Factiva, Ebsco, a library OPAC or if I’m querying a MySQL database, that I can’t expect to find everything I’m looking for without knowing how the search works. Therefore I have studied and experimented with a variety of these types of databases to learn all that I can about the best ways to plan my searches. Evidence: 1 Timothy W. Miller Fall 2013 SLIS 289 My first piece of evidence is a basic evaluation of a growing trend in library OPACs- the evolution of NextGen OPACs with Web 2.0 features (evalNextGenOPACs.docx). This study examines the new features that OPAC designers are incorporating to bridge the gap between the traditional OPAC and the web search engine. My findings did not indicate that there is a magic recipe for building the perfect OPAC. One conclusion is that libraries should design their search interfaces to be as user-friendly as web search engines. Studies show that most users will simply abandon searches that aren’t simple and intuitive- searches that don’t reveal obvious results early on in the process. However, it has also been shown that users often aren’t satisfied when such searches bring up relevant titles, but won’t bring up the accompanying full text articles. There is a middle ground between overly complex searches (that include elaborate filter options and controlled vocabularies) and overly simple searches (that bring up too many irrelevant or inaccurate results). Web 2.0 features such as user-generated content (tags, reviews), assistive search (spellchecking, recommending similar queries), federated searching (searching more than one database at a time), and faceted navigation (computer-generated links to narrow overly broad result sets) have been recommended in an attempt to bridge these two extremes. However, the two most important considerations are ensuring that the interface is user-friendly and that the results are relevant and valid. I argue that when trying to reconcile these two concepts, the former should be the primary focus. If the interface is unusable to the average library patron, it simply will not be their first choice. However, if we strive to make the OPAC as 2 Timothy W. Miller Fall 2013 SLIS 289 user-friendly as possible and release ‘beta’ versions as we make improvements, users will be attracted to the interface and even though some functions may not always be perfect, they will be improved upon over time. Furthermore, as users get more and more accustomed to this type of ‘beta’ development (as seen in web browsers and general search engines similar to Google), they will be more forgiving of those types of short-comings and more responsive when asked to learn how to use a new feature that meets their needs. The next piece of evidence is a basic interface that I designed as an experiment in creating a PHP/MySQL database (assignment_8.php). This database is a mock-up of a library OPAC but is built on principles that apply to any type of online database such as Ebsco or ProQuest. It features a way to create user accounts as well as bibliographic records. Since it is built using a relational database management system (MySQL), the data within the database can be linked and used to make the interface interactive. A user can enter a search term to find a specific item record using keyword or subject searches. Each overly simple bibliographic record contains subject, ISBN, publishing date and title fields that can all be searched with a simple Google-like search (one search box). A user can create an account with a username and password (that is secure from a malicious attack). Using the basic structure of this database, it can easily be extended to allow for the addition of features such as usergenerated content (once logged in, a user could fill out a form to add tags to a new column in a table that links those tags to the ‘booklist’ table) or creating librarian-curated filtering options (by creating a restricted form that allows staff to 3 Timothy W. Miller Fall 2013 SLIS 289 create links that filter to criteria such as subject type). However, since this was created as a basic experiment and is therefore an over-simplification of an OPAC, there are a few limitations- issues that I have not addressed. These include: search results that are only sorted by title, not relevance (there also is no dedicated search engine); no access control to restrict users from viewing sensitive information (such as other user passwords); and there is no feature to look up the thesaurus in order to choose precise terms. As databases such as these grow in size and become more feature-rich, there is also a considerable increase in demands on computing resources. A MySQL database that would include these and other features (as well as the SQL queries themselves) would need to be designed so that searches can be run quickly and efficiently. The third piece of evidence is a log of some searches I performed using Dialog (miller_exercise1.docx). The searches demonstrate some essential database search techniques, including: constructing a precise search string; using the thesaurus to choose controlled vocabulary; referencing the manual (the ‘bluesheets’) to determine which features can be used with specific databases; using a variety of commands to find all variations of an author’s name and all relevant controlled vocabulary terms for a specific concept (e.g. audiobook(s) = playaway(s) = recorded book(s) = talking books); using truncation and wildcards to expand search terms; and using the bluesheet information and search commands to make cost-effective queries (e.g. avoid bringing up results that are costly until the search has been adequately narrowed down and a level of precision has been determined). The searches also demonstrate some essential 4 Timothy W. Miller Fall 2013 SLIS 289 search strategies, including: pearl-growing (finding a relevant result and using information within that item’s record, such as subject terms, to continue the search); onion-peeling or successive fractions (starting with a broad search and filtering to narrow down the results); and combining relevant terms with Boolean operators to narrow or widen a search. The searches that I conducted show that I use a variety of techniques and strategies to find each piece of information. Using only one strategy or technique will rarely suffice. I also use these techniques to ensure that I have found only relevant resources (precision) and that my search has uncovered all of the relevant records (recall). Conclusion: Evaluating database design and search strategies is highly interesting to me. From learning how a database is constructed and what features are included, to experimenting to find out which strategies are most efficient and/or cost effective, I enjoy the detective work. Being a person who is also interested in learning how systems function, I also have experimented with building my own databases for specific purposes. In my work I construct SQL queries to run reports that enable me to keep track of item circulation. I also construct programs so that I can enter data into MySQL databases rather than spreadsheets so that I can run more powerful queries and have more control over data formats and exporting. To keep my database design and search skills current, I read the product literature (such as the Dialog bluesheets or Ebsco help sheets). I test new features and products, such as Ebsco’s mobile apps, to see what techniques can be used and how they can be accessed. To learn 5 Timothy W. Miller Fall 2013 SLIS 289 about data storage techniques I build and create test cases as examples: assignment_8.php (PHP/MySQL), mobileCatalog.html (jQuery/JSON), henry_v.html (XML/XSLT). This is an area that is constantly changing as designers are figuring out ways to make searching more intuitive and userfriendly and as technologies are advancing to allow for bigger databases and faster searches. This is also an area that I truly enjoy staying informed about and gladly accept the challenge. 6