An evaluation of citation databases as sources of data in

advertisement
An evaluation of citation databases as sources of data in research
assessment exercises: A report commissioned by the Libraries of the
Australian Technology Network (LATN)
Gaby Haddow, Curtin University Library
Introduction
This report examines the coverage and functionality of Web of Science and
Scopus, the two main databases with citation information, across a range of
discipline areas. Freely accessible sources of citations, such as Publish or
Perish (which draws citation data from Google Scholar) and alternative
quantitative measures of research impact are also considered. The report
includes a literature review, feedback from a trial of Scopus, and case studies
of subject areas for which the main citation databases are inadequate in
determining the impact of research output.
A literature review of studies comparing Web of Science, Scopus and Google
Scholar presents the findings of 14 papers which compared two or more of
these sources. These findings are discussed in terms of coverage and
functionality of the databases, and the review includes general comments
made in relation to them.
Feedback gained from LATN library staff from libraries where Scopus had
been trialled was supplemented with an analysis of citations located in
Scopus and Web of Science to publications (between 2004 and 2006) by
Curtin University researchers. This analysis provided additional data with
which to compare the databases. All subject areas of the University were
represented, to varying degrees, and the publications were then mapped
against the new Fields of Research (FoR) codes published as part of the
revised Australian and New Zealand Standard Research Classification
(ANZSRC).
Data from the mapping exercise and the conclusions of the literature review,
provided an indication of subject area strengths and weaknesses in the
databases, and determined the selection of subject areas for examination in
the case studies component of the report.
Literature Review Comparing Sources of Citation Data
The papers included in this literature review were identified through a search
of the databases Web of Science, LISA, ProQuest, and Informit Online and
limited to publications from 2004 onwards. Papers were also located by
1
August 2008
examining reference lists of the search results and through personal
recommendations. In total, 14 papers were reviewed, of which ten reported
research studies 1-10 and four were evaluations of citation sources.11-14 One of
the papers was an evidence summary of a research study.7 Most of the
papers (nine) were published in 2007 and 2008; the remaining five were
published in 2005 and 2006. Nine papers compared Web of Science with
Scopus and Google Scholar,1-4, 6-9, 12 three compared Web of Science with
Scopus,5, 11, 13 one paper compared Web of Science with Google Scholar,10
and one paper compared Scopus with Google Scholar.14 Publish or Perish, a
new tool which draws citation data from Google Scholar, was not discussed in
any of the papers.
The types of samples used to study the databases ranged widely, affecting
the extent to which direct comparisons can be made. Three papers examined
citations to papers in the library and information science (LIS) field2, 8, 10 and
two studies looked at titles overlap.3, 5 Other samples included: papers by
researchers from a single country,1 oncology and condensed matter physics
titles;7 a rare health term search;4 a single article and title;6 and papers
submitted to the 2001 Research Assessment Exercise (RAE).9
This review is divided into two parts. The first is loosely described as
coverage and includes discussion about citation numbers, subject areas,
formats of indexed materials, titles and abstracts, and time frame for each of
the citation sources. In the second part, functionality is discussed in relation to
searching, results, and updating. Finally, concluding remarks drawn from the
papers are presented to demonstrate the similarities, differences and general
attitudes to the citation sources.
Coverage
Depending upon subject area and year of publication, the citations found in
Web of Science and Scopus differ widely. More citations were found in
Scopus than in Web of Science for the health term search, 2003 oncology
papers, and the LIS papers between 1996-2005.4, 7, 8 Conversely, citations to
a 1985 LIS title, the single article search, and 1993 papers from oncology and
condensed matter physics titles reported Web of Science retrieved more
citations than Scopus.2, 6, 7 Three studies (examining citations to a 2000 LIS
title, title overlap, and a single title) reported no difference between the
citations found in at least one aspect of their data analysis.2, 3, 6
When Google Scholar was included in the sources for comparison, three
studies found more citations were retrieved from this source than from Web of
Science or Scopus.7, 8, 10 In the study of LIS papers these were 35% higher
than Web of Science and Scopus combined.8 However, many of the studies
reported the duplication of citations listed in Google Scholar and the fact that
it includes citations to non-scholarly materials, such as pre-prints and
2
August 2008
technical reports.1, 2, 4, 7, 8, 10 Google Scholar retrieved fewer citations than
Web of Science and Scopus in two studies.4, 6
The influence of subject area, format and age of publication on citation
numbers is clear in several of the studies. Scopus appears to be strong in
scientific fields, but weaker in the arts and humanities than Web of Science.3,
5, 6 Also, Web of Science was found to have a higher proportion of social
science titles.6These generalisations should be tempered with the findings of
individual subject searches, such as unique cites to condensed matter
physics papers which were higher in Web of Science than Scopus. 7 The
findings are also affected by the format of materials indexed by the
databases. Scopus indexes more conference proceedings than Web of
Science,1, 6, 13 particularly in computer science, engineering, physics and
chemistry and was found to have the most citations to conference papers by
LIS authors.8 Google Scholar is considered excellent for its coverage of
conference papers and non-English language journals.8 Scopus indexes
books in some science fields and Web of Science indexes monographic
series.6, 8 Both Web of Science and Scopus are weak in their coverage of
journals from non-English language countries,9 and no early online papers
were located in a health term search of the databases.4
In terms of citation numbers, Web of Science performed better than Scopus
for older materials,2, 9 while Google Scholar outperformed both Web of
Science and Scopus for newer materials.2 To some extent these findings
reflect the history of the databases. Web of Science (as the Science Citation
Index) was established in 1945, with the Social Science Citation Index and
the Arts and Humanities Index following in the next three decades. 5 Scopus
indexes articles back to 1966, but this is incomplete.5 The citation indexing in
Scopus starts in 1996.1, 4, 13
Overall, Scopus indexes the most titles and includes more records than Web
of Science.1, 3, 5, 6, 11, 13 Google Scholar cannot be compared to these
databases because it is not selective in terms of indexing and accesses many
more, but unknown, sources than Scopus and Web of Science.10, 14 One
paper noted that Scopus is strong in biomedical sciences and geosciences 3
and another that Scopus is stronger in life sciences than Web of Science,
borne out in the findings of the two studies that look at health related topics.4,
7 Scopus is weak in sociology, physics and astronomy3 while Web of Science
is stronger in chemistry and physics, possibly reflected in the findings for the
condensed matter physics titles.7 Interviews with academic and student users
of the databases found that those in the sciences preferred Scopus, while arts
and humanities users preferred Google Scholar.3
Functionality
3
August 2008
The table below presents the key differences and comparisons between Web
of Science and Scopus in relation to functionality.
Author
identification
Truncation
Searching
within
references
Refining
searches
Analysis tools
Web of Science
Easy to browse author
names12
Truncation requires
symbol13
‘Cited Reference
Search’ function9
Graphics capabilities
excellent4
More options for
citation analysis by
institution3
Scopus
Unique author
identifiers3
Automatic truncation13
Available, but not as
effective as Web of
Science9
Good refine bar3
‘Citation Tracker’
function3
Google Scholar retrieved results faster than the other databases3, 10 and did
not limit keywords or language used in a search,4 but Boolean operators were
not as effective.1, 6
When results retrieved by the citation sources were examined, Scopus was
criticised for limiting the number of results retrieved14 and that a title keyword
search of Scopus omitted a number of relevant articles.4 The ability to sort
results in Web of Science by country, institution, name and language was
noted.13 However, Scopus was reported to give better results for a simple
search using a single health term.11 Two papers noted the inaccuracy and
duplication of results in Google Scholar,1, 12 and Google Scholar results were
rated less highly than Web of Science and Scopus by the academics and
students interviewed.3
Web of Science is updated daily, Scopus once or twice each week, and
Google Scholar monthly.4 However, one paper noted that no updates of
Google Scholar occurred for six months in one year.6
Other comments
The Scopus interface was preferred to Web of Science by interviewees,
followed by Google Scholar, 3 and Scopus was judged the best database in
one paper.9 However, neither Scopus nor Web of Science is the best source
for all citation needs and their performance and usefulness differed
depending upon subject area and publication years.7 Two studies concluded
that Scopus and Web of Science complement each other and libraries would
want both, funding permitted.8, 11 Although Scopus has ‘made impressive
progress’ in coverage and functionality since its inception,5 Web of Science is
closest to providing pan-discipline coverage.6 A number of papers speculated
about the usefulness of Google Scholar, in the most part because of its
4
August 2008
inaccuracy and unknown coverage,4 questioning whether Google Scholar
should be used by clinicians or to measure scholarly activity. 9, 14 Concerns
relating to ranking and assessment of journals and academics were raised,
with some papers suggesting that subsequent analyses, such as impact
factor and h-index calculations, will vary with the database used to source
citations.1, 8
Currently, there is no one source of citations data which is comprehensive
across all subject areas, formats and time periods. In the author’s opinion,
this is unlikely to change in the future. It is possible that Google Scholar will
improve its functionality, but there remains a great deal unknown about the
coverage of Google Scholar. For reliable and wider coverage of citations
data, libraries of large academic institutions with a broad range of research
areas will have to consider subscribing to Web of Science and Scopus. In
smaller, more specialised research institutions this review will assist in
making an informed decision about the most useful database.
Library Staff Experience of Scopus and Web of Science
This part of the report is based on information sought from LATN staff about
their experience with Scopus and Web of Science, either through trials or
ongoing subscription. Feedback was received from five libraries, including a
library that subscribes to both databases and a library that conducted a recent
trial of Scopus. It should be noted that much of the feedback received was
based on conducting a ‘Cited Reference Search’ in Web of Science. The
feedback has been defined in the following areas: coverage, functionality, and
ease of use. It lists the positive and negative aspects of the databases, and
similarities and differences between them when feedback was provided. In
addition, a comparison of the citations found in the databases as a result of
an exercise conducted at Curtin University Library is presented against the
new FoR codes associated with Excellence in Research for Australia (ERA)
groups.
Coverage
Web of Science
Citations to books, chapters, &
conference papers
Coverage of titles over longer
period, dependent on
subscription (1900+ available)
-ve
Lack of conference papers
indexed
Lack of Australian journals
indexed & US bias
similarities Missing issues and articles
differences Titles indexed = 10,000
+ve
5
Scopus
Coverage of conference papers
50% titles from Europe, Middle
East, and Africa
Post-1996 greater than Wos (2045%)
Citations listed for indexed
resources only
Coverage pre-1996 is uneven (515% less than WoS)
Titles indexed =15,000
August 2008
Documents = ~35 million
Documents = ~28 million
Functionality
+ve
Web of Science
Does not list items with zero
citations
Alpha order of results by journal
title
Citations to non-journal sources
are listed
-ve
Few options to assist in common
name searching
Does not allow secondary
sorting of results
Journal abbreviations difficult to
guess
similarities Publication year limit search
Scopus
Affiliation searching useful
Secondary sorting, refining,
limiting of results
Keyword and author search to
limit results
Variations of author names colocated
Lists items with zero citations
Different results when different
search for same author was
conducted
Ease of use
+ve
-ve
Web of Science
Expanded titles for additional
information
More difficult to search
Lack of bibliographic details for
some entries leads to
uncertainty about citing source
Scopus
Clean layout and easy to read
bibliographic details
General comments
In addition, library staff reported:
 Almost 60% citations are in both Web of Science and Scopus, the
remaining
 40% unique citations to one source or the other;
 Scopus has more searching options;
 Scopus is easier to use;
 Scopus interface is preferred;
 Staff are familiar with Web of Science; and
 Scopus would be good to have as a complementary product.
Subscription cost
The cost of subscribing to the databases differs according to criteria such as
years of coverage (for Web of Science) and discounting by the vendors. An
exact price for an annual Scopus subscription cannot be published due to
commercial confidentiality.
6
August 2008
Citations by FoR codes
The data below are a sample of the results found when Curtin Library staff
undertook a citation search for academics’ publications between 2004-2006 in
Scopus and Web of Science. A wider analysis of these results will be reported
in the ‘Case Studies’ component of this report.
ERA Cluster & FOR
Cluster 1 Physical, Chemical & Earth Sciences
0402
Cluster 2 Humanities & Creative Arts
2002
Cluster 3 Engineering & Environmental Sciences
0599
0907
0903
Cluster 4 Social, Behavioural & Economic Sciences
1503
1506
1605
1301/2
1701
1303
1399
1504
Cluster 5 Mathematics, Information & Communication
Sciences
0806
0906
Cluster 6 Biological Sciences & Biotechnology
0607
0602
0704
0703
Cluster 7 Biomedical & Clinical Research
1101
1115
0605
Cluster 8 Public & Allied Health & Health Sciences
1110
1117
Wos
Scopus
719
733
4
1
7
9
9
9
20
15
6
29
10
4
75
14
17
26
22
3
4
10
92
50
27
15
91
10
158
36
407
49
9
8
411
66
7
14
39
12
106
53
3
138
7
217
42
301
In a number of FOR areas the citations found in each database was the
same. Citations found in Scopus were, for most FORs, higher than those
identified in Web of Science. It should be noted that the data were collected
by 10 staff members over a two week period, and due to searching variants
7
August 2008
some inconsistencies may have occurred and consequently affected the
results.
Feedback from librarians
In terms of coverage of the two databases, Scopus has the advantage of
indexing conference proceedings, which may be reflected in the data for
Cluster 5 where conference proceedings are a common form of research
output. Scopus also indexes more titles overall, but over a shorter time
period. See the ‘Literature Review’ of this report for a more detailed
discussion of coverage.
Scopus appears to offer more functionality in relation to searching and
general ease of use. However, the ‘Cited Reference Search’ in Web of
Science does enable searching for citations to non-journal sources – a
function not available in Scopus.
From the general comments made and the subscription costs of the
databases, it might be concluded that library staff have become accustomed
to the Web of Science product and despite a preference for the Scopus
interface and functionality, there is insufficient support for a switch to Scopus
and insufficient funds to support subscriptions to both products.
Case Studies in Humanities and Social Sciences
Based on the findings of the literature review and library staff feedback,
subject areas in arts and humanities were selected for the case studies.
Three subject areas comprised the sample which was drawn from a list of
publications, between 2004-2006, by Curtin researchers. The subject areas
with their Field of Research code(s) are:
 Art - FoR 1901/1905
 Cultural Studies - FoR 2002
 Literature (Australian and English) - FoR 2005
To identify alternative measures of research impact in these subject areas,
the following sources were examined:
 Publish or Perish
 Amazon
 Library holdings
In total, 130 research outputs in the form of books, book chapters, journal
articles, conference papers, exhibition catalogues, and newspaper articles
were included in the analysis. Citations to only seven of these items (5.4%)
were found in the major citation databases, Web of Science and Scopus – six
in Web of Science and one in Scopus. The subject areas represented in
these citations were cultural studies (four publications) and literature (three
8
August 2008
publications). These results support the literature review findings that Web of
Science provides better coverage of arts and humanities than Scopus.
When the items were checked for citations in Publish or Perish (which draws
its data from Google Scholar) the results were only marginally better with
eight items (6.1%) found listed with citations against them. Two items, a
journal article (literature) and a book (cultural studies), had seven and nine
citations respectively in Publish or Perish, whereas in Web of Science and
Scopus they received none. The issues related to relying on Publish or Perish
data was documented in the literature review section of this report and the
findings of this examination suggests it is not a viable alternative for arts and
humanities research output.
Publications with citations in Web of Science, Scopus and Publish or Perish
WoS
Scopus
PoP
n
%*
n
%*
n
%*
Literature
3
2.3
0
0
1
0.7
Cultural Studies
3
2.3
1
0.7
7
5.4
Total
6
4.6
1
0.7
8
6.1
* Percentage calculated from total of 130 items
A search of the books subset of the online bookstore Amazon can identify intext and bibliographic citations for some authors. The difficulty of this search
is in the exact identification of cited items. Amazon restricts access to much of
the full-text and provides very short excerpts of the context of a citation in the
results list. Unless the searcher is familiar with the work of a cited author,
these excerpts are not sufficient to identify a work in many instances. In a
number of cases, excerpts against books clearly identified the cited work, but
some did not. The Amazon book search found citations to 12 publications
from the sample of 130 (9.2%); four from literature and eight from cultural
studies, presented below.
Publications with citations in Amazon book search
n
%*
Literature
4
3
Cultural studies
8
6
Total
12
9
* percentage rounded and calculated from total of 130 items
An alternative to citations as a measure of research impact, is the number of
libraries that hold a publication. While this measure may be dependent on
publisher activities (for example promotion strategies) and library acquisitions
activities (such as approval plans with certain publishers) it could also be a
reflection of library client demand. There is no real way of knowing exactly
what library holdings demonstrate, but they do offer a numeric indicator.
9
August 2008
Library holdings data is not available for journal and newspaper articles. In
the sample of 130 publications 50 items (38%) were in these formats, leaving
80 items (62%) that were books, book chapters, conference papers or
exhibition catalogues. WorldCat and Libraries Australia were the two sources
used to identify holdings data. Libraries Australia listed holdings for 35 of the
80 publications (44%) and WorldCat listed 30 (37%). The table below
presents the finds for subject areas of the items with holdings data.
Publications with library holdings
Libraries Australia
WorldCat
n
%*
n
%*
Art
6
8
3
4
Literature
10
12
10
12
Cultural studies
19
24
17
21
Total
35
44
30
37
* Percentage rounded and calculated from total of 80 items
The analysis of publications in the subject areas art, literature and cultural
studies indicates that the main citation databases and Publish or Perish
(Google Scholar data) do not provide sufficient data with which to measure
impact. A slightly higher number of publications were found to have citations
in the Amazon book search, but the functionality of an Amazon search means
results should be treated with caution. Citations to publications in the art
subject area were not found in any of these sources. Library holdings provide
the only data available for art publications and, overall, library holdings data
captures more publications in all these subject areas.
In previous discussion about how to measure impact of research outputs in
arts and humanities, other measures such as ranked tiers of publishers and
art exhibition venues, and number of exhibition visitors and books reviews
have been touted (see for example, a 1998 report by Dennis Strand and a
Quality Metrics report for the RQF in 2006). These alternatives are not
examined in this report and research on these measures has not been
located. They all present distinct difficulties in terms of reliability of the
measure, but perhaps not greatly different to reliance on citations data from
the major citation databases. That is, at some stage in the process a measure
of quality is made. In the case of the citation databases it is the source
materials indexed; for ranked tiers of publishers and exhibition venues it is at
the point that the scholarly community make ranking judgements. For book
reviews, impact may be a product of a publisher’s promotion strategy rather
than the quality of the book.
As this case study report indicates, there are few measures that adequately
measure the impact of research outputs in arts and humanities. Art is
particularly poorly served in comparison to literature and cultural studies, with
the latter having a higher profile across all sources of data examined in this
exercise.
10
August 2008
Conclusion
In conclusion, the findings of the three components of this report provide
librarians with information with which to offer advice to research offices and
make in-house subscription decisions. However, these findings have not
advanced our knowledge of the area to the extent that the discussion can be
closed. Given the nature of the subject matter, citations and measures of
research impact, and the time and words already expended on attempts at
clarification, this is perhaps not surprising. Reliable and acceptable measures
of research impact continue to challenge governments and researchers in the
field.
What the report has achieved is a co-location of a range of concerns, such as
the preferred source of citation data and what alternatives exist for poorly
indexed subject areas. The literature review and feedback components
indicate most science areas are relatively well indexed by Scopus or Web of
Science, or both. According to the literature reviewed, and supported by the
findings of the citation analysis exercise conducted at Curtin University
Library, much of the arts and humanities are not as well served by these
major databases, although Web of Science is superior to Scopus in this
regard. The findings have indicated that while Scopus is generally viewed as
the preferred interface, Web of Science has distinct advantages, and
therefore the databases are complementary. Most libraries with available
funding would want to subscribe to both databases. In terms of alternative
quantitative measures of research impact, the report has shown that there are
few available to subject areas in arts and humanities. Library holdings,
although untested on a broader scale, appear to offer the best alternative for
non-serial publications in these discipline areas.
Papers included in Literature Review
1 Bar-Ilan J. Which h-index? - a comparison of WoS, Scopus and Google
Scholar. Scientometrics. 2008; 74(2):257-271.
2 Bauer K, Bakkalbasi N. An examination of citation counts in a new
scholarly communication environment. D-Lib Magazine. 2005; 11(9):[np].
3 Bosman J, van Mourik I, Rasch M, Sieverts E, Verhoeff H. Scopus
Reviewed and Compared: The Coverage and Functionality of the Citation
Database Scopus, Including Comparisons with Web of Science and
Google Scholar. Utrecht: Utrecht University Library, 2006. [Online at:
http://igitur-archive.library.uu.nl/DARLIN/2006-1220-200432/UUindex.html]
4 Falagas ME, Pitsouni EI, Malietzis GA, Pappas G. Comparison of
PubMed, Scopus, Web of Science, and Google Scholar: strengths and
weaknesses. Faseb Journal. 2008; 22(2):338-342.
5 Gavel Y, Iselid L. Web of Science and Scopus: a journal title overlap
study. Online Information Review. 2008; 32(1):8-21.
11
August 2008
6 Jacso P. As we may search - Comparison of major features of the Web of
Science, Scopus, and Google Scholar citation-based and citationenhanced databases. Current Science. 2005; 89(9):1537-1547.
7 Kloda LA. Use Google Scholar, Scopus and Web of Science for
comprehensive citation tracking. Evidence Based Library and Information
Practice. 2007; 2(3):87-90.
8 Meho LI, Yang K. Impact of data sources on citation counts and rankings
of LIS faculty: Web of Science versus Scopus and Google Scholar.
Journal of the American Society for Information Science and Technology.
2007; 58(13):2105-2125.
9 Norris M, Oppenheim C. Comparing alternatives to the Web of Science for
coverage of the social sciences' literature. Journal of Informetrics. 2007;
1(2):161-169.
10 Vaughan L, Shaw D. A new look at evidence of scholarly citation in
citation indexes and from web sources. Scientometrics. 2008; 74(2):317330.
11 Burnham JF. Scopus database: A review. Biomedical Digital Libraries.
2006; 3(1). [Online at: http://www.bio-diglib.com/content/3/1/1]
12 Jacso P. The plausibility of computing the h-index of scholarly productivity
and impact using reference-enhanced databases. Online Information
Review. 2008; 32(2):266-283.
13 Libmann F. Web of Science Scopus, and classical online: philosophies of
searching. Online. 2007; 31(3):36-40.
14 Smith AG. Benchmarking Google Scholar with the New Zealand PBRF
research assessment exercise. Scientometrics. 2008; 74(2):309-316.
Other references
(2006) Research Quality Framework: Assessing the Quality and Impact of
Research in Australia Metrics. Quality Metrics. [Online at:
http://www.dest.gov.au/NR/rdonlyres/EC11695D-B59D-4879-A84D87004AA22FD2/14099/rqf_quality_metrics.pdf]
Strand, D. (1998) Research in the Creative Arts. Canberra: Dept. of Employment,
Education, Training and Youth Affairs. [Online at:
http://www.dest.gov.au/archive/highered/eippubs/eip98-6/eip98-6.pdf]
12
August 2008
Download