ICOLC Boston Google Scholar, Google Print Monday, April 11th Presented by Adam Smith, Business Product Manager **Main message: It is not Google's intention to start a global free library for the universe and put libraries out of business.** Adam provided an overview of Google Print, Google Scholar; these are entirely separate products. The Google Mission: Organize all the world's information and make it universally accessible and useful What is the Google culture, philosophy, practice? * Google company runs in a graduate school kind of way * Each engineer can work on whatever interests him or her, 20% of the time * This allows for creativity to flourish * Google search engine drives more search referrals than all others combined * Google makes money from advertising * Deeply committed to "separation of church and sate", i.e., ads on right, search results on left * There is lots of online content – but also lots of good offline content not in Google, which cries to be digitized * "We don't really want to be in the digitizing business but it needs to be done" I. Partnerships to digitize content (Google Print): * Publisher partnerships were announced at Frankfurt Book Fair * Announced (December) library partnerships also contribute to that goal * U of M, Harvard, NYPL, Oxford, Stanford: these are all partnerships of slightly different sorts with different libraries, driven by those libraries What does the user see? * For Public Domain books – published in US prior to 1923, Google allows unlimited browsing; user can see entire book; full text is indexed * A reader can find specific library holdings via OCLC WorldCat * The in copyright experience: will show a piece of the book, 3 short snippets, tells user how many times the word is used; but the reader can see only3 results. Future of Google Print: * Continues to add new books * Also committed to building an international multilingual product – look for more soon * Intended to be a catalyst for more digitization efforts * Will work to include books from other digitization efforts * Are creating products that all libraries can leverage, i.e., OPAC integration, restricted library search. Adam reviewed his own "Google Print FAQ, which went something like this: Q. Why are you doing GP? A. It's part of our commitment to improving the search experiences. It's not a standalone effort – we are building user loyalty Q. How ill it impact digital libraries created by libraries? A. We want to build an inclusive product and find ways to work with those other efforts. Q. How did you choose which books to do first? A. This was done in partnership with libraries. Google is not deciding this. The partners are deciding what fits best for them. Q. How do you determine what's in the public domain? A. Pre-1923 publications in the US. Overseas is more difficult; we are more conservative; our analysis impacts what we will scan and what reader can see. Q. What technology is being used in scanning and processing books? A. Publisher books: this is a destructive scanning process, with pages run thru a standard processing system. On the library side: limited in what we can say; this is a proprietary technology Q. Is this the beginning of the end for libraries? A. No, Google Print will enable and encourage additional use of libraries and library resources; it will send readers back to their libraries. We intend to be building lots of integration plans with libraries and library vendors. Comment: GP is still in beta; early days; our goal to improve discovery and access. Q. How can my library participate? A. Write to print-support@google.com. Send size of collection and any other pertinent information. II. Google Scholar Our Goal: To create the best possible scholarly search experience; and to provide a single place to find material, easy to use. What and How? * Millions of items are being indexed * Peer reviewed is a criterion * Content includes books, articles, pamphlets, abstracts, tech reports * We link to things you can't get to (citations without full text), i.e., we tell you about the link even if the content hasn't been crawled or there is no content. * Publishers version comes first * Google does not take referral fees for traffic sent to publishers * All search results based on relevance, not accessibility or payment Some Google Search "Big Ideas" * Index all forms of articles * Prefer full text * Unfortunately only a small fraction of full text is online * Index whatever form is available—even if abstract or citation * Be inclusive: * With good ranking, selection is less important Google values comprehensiveness Selection is important but much less so Picking on the margins is really hard * Do this all automatically – citation extracting Automation is essential There is much variance in citation styles on the Web (This introduces inconsistencies and error, of course) * Try to group all forms & versions (A single work can have many forms or versions, each may be cited independently) Grouping facilities ranking & presentation Google Scholar coverage – exactly what is it? * We cover many publishers and societies * We have agreements for full text from most major publishers except Elsevier & ACS * We include free resources such as Pub Med, online repositories, open access journals * A number of aggregators Description of current library access pilot: * Are looking to connect products to libraries * Pilot began about a month ago * Allows entitles users to take advantage of their library's licensed resources * Part of the goal of enabling access and directing readers to individual libraries * Users can tell GS that they are associated with an institution (via IP range reported to Google or can set preferences) Challenges for GS: * Frequent updates – index fresher * How to rank recent articles * Author name disambiguation * More citation normalization and extraction How you can participate: * We are working with most link resolvers * If not, contact that vendor and ask them to work with us * Make it easy to find your licensed resources: * NOTE: 70% of undergrad faculty and grads use search engines daily * We can help you * You can work with your publishers and vendors by telling them to be open to Google Wrapup: * Google shares many values with libraries * We improve access * Google will drive greater library usage * We want to develop products that are great for users, libraries, publishers * We listen, so send suggestions, comment Web sites: Print.google.com Scholar.coogle.com Q&A: Q. What publishers are included in GS? How can I find a list? If you include a vendor do you have full content? A. Most major publishers are in, yes. We cannot provide a list. Q. Relevance Ranking – how do you do that without full text? A. We use citations where we don't have full text. There are other relevance signals but I can't go into it and don't know them all. Q. Partnership with Open WorldCat: why the delay in indexing it? A. It's an ongoing thing – 55M records – takes a lot of time. Q. Will ranking of Open WorldCat become higher than it currently is? A. Yes, we are doing our best to get these into the indexes. Q. If a library subscribes to EBSCO proprietary databases, and takes SFX – does Google Scholar crawl all those EBSCO journals? A. I don't know the answer, i.e., if EBSCO is in or not. In theory, yes. Q. I'm interested in migration of the publisher's print content when the book goes out of print? Will you do this? A. We haven't faced that problem yet. Then it will be a discussion between publisher, author, and ourselves. We hope they could stay in. Q. What does it mean that you are producing an international multi-lingual product? A. We have initiated a publisher program overseas and are working on Euro publishers. On the library side it's more difficult, given copyright restrictions to do this, so we are starting with publishers. There is a lot of foreign content in the currently participating libraries, even though they are US and UK based. Q. Currency: How current is GS? A. We are working on a new index. It's current up to a little while ago. Q. Any talk at Google about buying content? A. The amount of time a user spends on our site is limited. This is not in our current planning. Q. Can you process our per article payments to publishers? A. Do you want that? I'll make a note of it. Q. What's your sustainability – will your business approach change away from providing a free service? A. We've built tremendous user loyalty based on our current approach – our users are our core asset; so we don't want to tamper with that. Internally people feel very strongly about this point. Q. How do you handle aggregations such as EBSCO – do you crawl aggregations as well as the full text source files? A. We may be crawling the aggregation separately as well, yes, and after that it's a matter of library protocols for their own linking choices. Q. Where does Google Search fit into the marketplace of federated search engines that are out there, which have faced serious limitations? A. There are major infrastructural differences. We don't query each item – we build an index ahead of time so we know where things exist and we query our own index on our own searches. All the material that's on Google Scholar is also in Google itself. Q. Google vs. Scopus. We heard that Google Search covers less than 50% of what SCOPUS does. A. I've never used it. We're building something that's free. We hope to make it as useful for as many people as possible for free. So, it's a very different concept. Q. The Atlanta Journal Constitution wrote recently about the huge building purchased by Google on the East side of Atlanta. What's that for? A. No comment. Q. Will vendors give you the data so you don't have to crawl it? A. It's easier for us to crawl it – that's what we do. Q. What is Elsevier's and ACS's official position on why not participating? A. No comment. Q. Are you doing this forever? Is this possible? Is there an end in sight? A. Yes, the dream is to do it forever. Or at least as a very long term commitment. Note that all of our deals are nonexclusive. We are not building the one and only site. Q. Right now search results are ranked on relevance; that doesn't always work; are you thinking about adding date of publication as part of the ranking? A. Your interest is noted. adamsmith@google.com