Google and Google Scholar

advertisement
Google Scholar - pros and
cons
Roger Mills and Sue Bird
February 2009
Today
• Google Scholar offers a very convenient method of
retrieving article citations and often the accompanying full
text, and is growing in popularity. This session offers tips on
using it effectively and extending your search to other
sources should Scholar's coverage prove inadequate for
your purposes.
• What Google does and doesn't do
• Coverage of Google Scholar (GS)
• Setting up GS to retrieve local full-text
• Other customisation of GS
• Comparison GS, Web of Science, SCOPUS etc
Welcome to the Web
The world’s biggest haystack
What can you do in a haystack?
• Romp about
• Get hay fever
• Have unexpected encounters
• Sleep
• Not do research
• So why would you start there?
Finding needles
• Google helps you find needles in haystacks
But:
• Google is an index of web pages
• A journal article is not [necessarily] a web page
• So Google is not good at finding journal articles
However:
• An image of a journal article may be placed on a web page
• So Google may find it
• If it’s free and not behind a firewall
• How do you know?
Google is fast
• Very fast
• Proudly fast
• Tells you how fast
• Found OUCS home page in 0.08 secs
• Also found 428,000 other ‘relevant’ pages
• But put home page first
• Brilliant - How does it do it?
• Not telling….
Did I need 428,000 references?
• Nobody looks at all the references Google retrieves
• So why display them?
• Algorithm takes into account links made by other pages
• And click-throughs
• So the top result for a given search is determined over time
by the people who make that search
• Is that the same as the ‘best’ result?
• It means Google can work out appropriate advertising to
display
OK, how would you do it?
To index a document, I’d read it first.
• Google can’t read
• We don’t read the web – we view it
• We remember references visually – that red book on the
third shelf down…
• If Google can list all the red books on all the third shelves
down in all the world I’m bound to find it, right?
• Actually I remember I saw in Oxford, so I just need to list all
the red books in Oxford – doddle
That’s not really how Google works – is it?
based on memory, rather than problem analysis?
So you read the article, and then…?
Give it some index terms
• Not ones I’ve just made up, but ones from a standard list.
• That way, everyone will know what the article’s about, and
every article on the same topic can be found.
• Provided everyone agrees what the article’s about.
Then I’d list the authors in a standard form: so everything by
Roger Mills, Roger Anthony Mills, Roger A Mills, R Anthony
Mills, Anthony Mills, R A Mills can be found in one go.
• That’s a controlled vocabulary.
• Works for journal titles too.
Google doesn’t do that
• No controlled terms
• So you must think of synonyms, different forms of name,
title abbreviations etc
• You must also define the context – that matters….
Knitting according to Google
OK, we get it. So let’s invent…
• Let’s team up with publishers so they let us search behind
their firewalls
• Let’s modify our algorithm so it excludes non-scholarly
material (how do we define that?)
• Let’s look at citations so when one article we index cites
another one we index, we can move it higher up the
relevance ranking
• Let’s link together different versions of the same article
• Let’s include library locations for full-text access
• Let’s see how it goes
But let’s not allow:
• creation of sets
• Or controlled vocabularies
• Or combining of searches
• Or hit rate figures for individual search terms
• Or proximity searching
• Or saving and e-mailing results
• Or creation of alerts
• Or standardisation of journal names/abbreviations
• Or info on what is included and what is not
• Or info on how the system decides what is scholarly
• Or an indication of update frequency – seems slower than
normal Google
Which of these statements is true?
• Google is comprehensive
• Google is all I need
• Google is up-to-date
• Google is not evil
• Google is commercial
• Google is independent
• Google is secretive
• Google wants to rule the world
• Google wants to beat Microsoft
• Google loves me
• I love Google
Google is a family
• A range of products under a common brand
• Some add value to the basic search engine; others are
nothing to do with searching
• Google Scholar is a variant of the standard search engine
• It uses a different algorithm, but we don’t know how it
differs
What’s in Google Scholar?
“Google Scholar provides a simple way to broadly
search for scholarly literature. From one place, you
can search across many disciplines and sources:
peer-reviewed papers, theses, books, abstracts and
articles, from academic publishers, professional
societies, preprint repositories, universities and other
scholarly organizations. Google Scholar helps you
identify the most relevant research across the world of
scholarly research.”
NB: only in Beta
• Launched 18 Nov 2004 but still beta - features change
• Developing in tandem with Google Books, which includes
digitised texts from Oxford collections and others
• In competition with WoK, SCOPUS etc
Content
• Algorithm to identify scholarly materials crawled by Google
from the open web
• Access to materials locked behind subscription barriers
• Must include abstract
• Full-text access requires institutional subscriptions or
individual payment, unless open-access
• Includes peer-reviewed papers, theses, books, preprints,
abstracts, full-text, citations, etc.
• Mostly post 93?
• Updated 2-3 monthly?
Library links
• Includes OpenURL links to local library holdings
• In Oxford displays as ‘Oxford Full Text’ beside title
• May need to set this up in ‘Scholar preferences’
Includes citation data
• Uses ‘citation extraction’ to build connections between
papers
• ‘Cited by’ link lists items (known to Google Scholar) that
cite the original paper
• Cited items not available online are listed with prefix
[citation]
• ‘Citation analysis’ puts the most-cited papers at the top of
the results list
Citation analysis
• ‘Cited by’ numbers will differ in GS, WoS, Scopus because
based on different literature sets
For a recent comparison see:
• Harzing, Anne-Wil K. and Ron van der Wal
Google Scholar as a new source for citation analysis
Ethics in science and environmental politics, Vol. 8: 61–73,
2008
http://www.int-res.com/articles/esep2008/8/e008p061.pdf
From that article:
• as a general rule of thumb, we would suggest that using
GS might be most beneficial for 3 of the GS categories: (1)
business, administration, finance & economics; (2)
engineering, computer science & mathematics; (3) social
sciences, arts & humanities.
• Although broad comparative searches can be done for
other disciplines, we would not encourage heavy reliance
on GS for individual academics working in other areas
without verifying results with either Scopus or WoS.
and
Meho & Yang (2007) [found] that GS missed 40.4% of the citations found
by the union of WoS and Scopus, suggesting that GS does miss some
important refereed citations. It must also be said though that the union
of WoS and Scopus misses 61.04% of the citations in GS. Further,
Meho & Yang (2007) found that most of the citations uniquely found by
GS are from refereed sources.
The social sciences, arts and humanities, and engineering in particular
seem to benefit from GS’s better coverage of (citations in) books,
conference proceedings and a wider range of journals. The natural and
health sciences are generally well covered in ISI and hence GS might
not provide higher citation counts. In addition, user feedback … seems
to indicate that for some disciplines in the natural and health sciences
GS’s journal coverage is very patchy.
Searching
• AND implied between words as in normal Google
• + to include common words, letters or numbers that
Google’s search technology generally ignores
• “quote marks” to search for a phrase
• minus sign – to exclude from a search
• OR for either search term
• author: for author search
• intitle: to search document title
• restrict by date and publication
Advanced search screen available
Help screen - original version
For example
Always worth searching Google too
Let’s try
rhinoceros tusks
Context might be
• Ecology
• Law
• Medicine
• Art
etc
Alternatives to Google
• Google it!
• Try www.altsearchengines.com for specialised alternatives
• Use Intute www.intute.ac.uk for reputable human-selected
sites, chosen for a UK academic audience
• Check OxLIP+ www.oxlip-plus.ouls.ox.ac.uk for complete
listing and subject guide to university-subscribed
databases. Most list the sources they cover and use
controlled vocabularies for indexing
An example of Google’s strengths
- and weaknesses in finding a specific article:
a search done in 2005 and repeated in Nov 2006:
Conclusion
• Maintain a balanced diet!
• Five a day…
•
WoK, Scopus, Intute, subject-specific database,
Google Scholar…
More help
Contact the presenters
roger.mills@ouls.ox.ac.uk
sue.bird@ouls.ox.ac.uk
Or your subject librarian (listed on
http://www.ouls.ox.ac.uk/libraries/subjects/librarians)
at any time
Happy Googling!
Download