An evaluation of citation databases as sources of data in research assessment exercises: A report commissioned by the Libraries of the Australian Technology Network (LATN) Gaby Haddow, Curtin University Library Introduction This report examines the coverage and functionality of Web of Science and Scopus, the two main databases with citation information, across a range of discipline areas. Freely accessible sources of citations, such as Publish or Perish (which draws citation data from Google Scholar) and alternative quantitative measures of research impact are also considered. The report includes a literature review, feedback from a trial of Scopus, and case studies of subject areas for which the main citation databases are inadequate in determining the impact of research output. A literature review of studies comparing Web of Science, Scopus and Google Scholar presents the findings of 14 papers which compared two or more of these sources. These findings are discussed in terms of coverage and functionality of the databases, and the review includes general comments made in relation to them. Feedback gained from LATN library staff from libraries where Scopus had been trialled was supplemented with an analysis of citations located in Scopus and Web of Science to publications (between 2004 and 2006) by Curtin University researchers. This analysis provided additional data with which to compare the databases. All subject areas of the University were represented, to varying degrees, and the publications were then mapped against the new Fields of Research (FoR) codes published as part of the revised Australian and New Zealand Standard Research Classification (ANZSRC). Data from the mapping exercise and the conclusions of the literature review, provided an indication of subject area strengths and weaknesses in the databases, and determined the selection of subject areas for examination in the case studies component of the report. Literature Review Comparing Sources of Citation Data The papers included in this literature review were identified through a search of the databases Web of Science, LISA, ProQuest, and Informit Online and limited to publications from 2004 onwards. Papers were also located by 1 August 2008 examining reference lists of the search results and through personal recommendations. In total, 14 papers were reviewed, of which ten reported research studies 1-10 and four were evaluations of citation sources.11-14 One of the papers was an evidence summary of a research study.7 Most of the papers (nine) were published in 2007 and 2008; the remaining five were published in 2005 and 2006. Nine papers compared Web of Science with Scopus and Google Scholar,1-4, 6-9, 12 three compared Web of Science with Scopus,5, 11, 13 one paper compared Web of Science with Google Scholar,10 and one paper compared Scopus with Google Scholar.14 Publish or Perish, a new tool which draws citation data from Google Scholar, was not discussed in any of the papers. The types of samples used to study the databases ranged widely, affecting the extent to which direct comparisons can be made. Three papers examined citations to papers in the library and information science (LIS) field2, 8, 10 and two studies looked at titles overlap.3, 5 Other samples included: papers by researchers from a single country,1 oncology and condensed matter physics titles;7 a rare health term search;4 a single article and title;6 and papers submitted to the 2001 Research Assessment Exercise (RAE).9 This review is divided into two parts. The first is loosely described as coverage and includes discussion about citation numbers, subject areas, formats of indexed materials, titles and abstracts, and time frame for each of the citation sources. In the second part, functionality is discussed in relation to searching, results, and updating. Finally, concluding remarks drawn from the papers are presented to demonstrate the similarities, differences and general attitudes to the citation sources. Coverage Depending upon subject area and year of publication, the citations found in Web of Science and Scopus differ widely. More citations were found in Scopus than in Web of Science for the health term search, 2003 oncology papers, and the LIS papers between 1996-2005.4, 7, 8 Conversely, citations to a 1985 LIS title, the single article search, and 1993 papers from oncology and condensed matter physics titles reported Web of Science retrieved more citations than Scopus.2, 6, 7 Three studies (examining citations to a 2000 LIS title, title overlap, and a single title) reported no difference between the citations found in at least one aspect of their data analysis.2, 3, 6 When Google Scholar was included in the sources for comparison, three studies found more citations were retrieved from this source than from Web of Science or Scopus.7, 8, 10 In the study of LIS papers these were 35% higher than Web of Science and Scopus combined.8 However, many of the studies reported the duplication of citations listed in Google Scholar and the fact that it includes citations to non-scholarly materials, such as pre-prints and 2 August 2008 technical reports.1, 2, 4, 7, 8, 10 Google Scholar retrieved fewer citations than Web of Science and Scopus in two studies.4, 6 The influence of subject area, format and age of publication on citation numbers is clear in several of the studies. Scopus appears to be strong in scientific fields, but weaker in the arts and humanities than Web of Science.3, 5, 6 Also, Web of Science was found to have a higher proportion of social science titles.6These generalisations should be tempered with the findings of individual subject searches, such as unique cites to condensed matter physics papers which were higher in Web of Science than Scopus. 7 The findings are also affected by the format of materials indexed by the databases. Scopus indexes more conference proceedings than Web of Science,1, 6, 13 particularly in computer science, engineering, physics and chemistry and was found to have the most citations to conference papers by LIS authors.8 Google Scholar is considered excellent for its coverage of conference papers and non-English language journals.8 Scopus indexes books in some science fields and Web of Science indexes monographic series.6, 8 Both Web of Science and Scopus are weak in their coverage of journals from non-English language countries,9 and no early online papers were located in a health term search of the databases.4 In terms of citation numbers, Web of Science performed better than Scopus for older materials,2, 9 while Google Scholar outperformed both Web of Science and Scopus for newer materials.2 To some extent these findings reflect the history of the databases. Web of Science (as the Science Citation Index) was established in 1945, with the Social Science Citation Index and the Arts and Humanities Index following in the next three decades. 5 Scopus indexes articles back to 1966, but this is incomplete.5 The citation indexing in Scopus starts in 1996.1, 4, 13 Overall, Scopus indexes the most titles and includes more records than Web of Science.1, 3, 5, 6, 11, 13 Google Scholar cannot be compared to these databases because it is not selective in terms of indexing and accesses many more, but unknown, sources than Scopus and Web of Science.10, 14 One paper noted that Scopus is strong in biomedical sciences and geosciences 3 and another that Scopus is stronger in life sciences than Web of Science, borne out in the findings of the two studies that look at health related topics.4, 7 Scopus is weak in sociology, physics and astronomy3 while Web of Science is stronger in chemistry and physics, possibly reflected in the findings for the condensed matter physics titles.7 Interviews with academic and student users of the databases found that those in the sciences preferred Scopus, while arts and humanities users preferred Google Scholar.3 Functionality 3 August 2008 The table below presents the key differences and comparisons between Web of Science and Scopus in relation to functionality. Author identification Truncation Searching within references Refining searches Analysis tools Web of Science Easy to browse author names12 Truncation requires symbol13 ‘Cited Reference Search’ function9 Graphics capabilities excellent4 More options for citation analysis by institution3 Scopus Unique author identifiers3 Automatic truncation13 Available, but not as effective as Web of Science9 Good refine bar3 ‘Citation Tracker’ function3 Google Scholar retrieved results faster than the other databases3, 10 and did not limit keywords or language used in a search,4 but Boolean operators were not as effective.1, 6 When results retrieved by the citation sources were examined, Scopus was criticised for limiting the number of results retrieved14 and that a title keyword search of Scopus omitted a number of relevant articles.4 The ability to sort results in Web of Science by country, institution, name and language was noted.13 However, Scopus was reported to give better results for a simple search using a single health term.11 Two papers noted the inaccuracy and duplication of results in Google Scholar,1, 12 and Google Scholar results were rated less highly than Web of Science and Scopus by the academics and students interviewed.3 Web of Science is updated daily, Scopus once or twice each week, and Google Scholar monthly.4 However, one paper noted that no updates of Google Scholar occurred for six months in one year.6 Other comments The Scopus interface was preferred to Web of Science by interviewees, followed by Google Scholar, 3 and Scopus was judged the best database in one paper.9 However, neither Scopus nor Web of Science is the best source for all citation needs and their performance and usefulness differed depending upon subject area and publication years.7 Two studies concluded that Scopus and Web of Science complement each other and libraries would want both, funding permitted.8, 11 Although Scopus has ‘made impressive progress’ in coverage and functionality since its inception,5 Web of Science is closest to providing pan-discipline coverage.6 A number of papers speculated about the usefulness of Google Scholar, in the most part because of its 4 August 2008 inaccuracy and unknown coverage,4 questioning whether Google Scholar should be used by clinicians or to measure scholarly activity. 9, 14 Concerns relating to ranking and assessment of journals and academics were raised, with some papers suggesting that subsequent analyses, such as impact factor and h-index calculations, will vary with the database used to source citations.1, 8 Currently, there is no one source of citations data which is comprehensive across all subject areas, formats and time periods. In the author’s opinion, this is unlikely to change in the future. It is possible that Google Scholar will improve its functionality, but there remains a great deal unknown about the coverage of Google Scholar. For reliable and wider coverage of citations data, libraries of large academic institutions with a broad range of research areas will have to consider subscribing to Web of Science and Scopus. In smaller, more specialised research institutions this review will assist in making an informed decision about the most useful database. Library Staff Experience of Scopus and Web of Science This part of the report is based on information sought from LATN staff about their experience with Scopus and Web of Science, either through trials or ongoing subscription. Feedback was received from five libraries, including a library that subscribes to both databases and a library that conducted a recent trial of Scopus. It should be noted that much of the feedback received was based on conducting a ‘Cited Reference Search’ in Web of Science. The feedback has been defined in the following areas: coverage, functionality, and ease of use. It lists the positive and negative aspects of the databases, and similarities and differences between them when feedback was provided. In addition, a comparison of the citations found in the databases as a result of an exercise conducted at Curtin University Library is presented against the new FoR codes associated with Excellence in Research for Australia (ERA) groups. Coverage Web of Science Citations to books, chapters, & conference papers Coverage of titles over longer period, dependent on subscription (1900+ available) -ve Lack of conference papers indexed Lack of Australian journals indexed & US bias similarities Missing issues and articles differences Titles indexed = 10,000 +ve 5 Scopus Coverage of conference papers 50% titles from Europe, Middle East, and Africa Post-1996 greater than Wos (2045%) Citations listed for indexed resources only Coverage pre-1996 is uneven (515% less than WoS) Titles indexed =15,000 August 2008 Documents = ~35 million Documents = ~28 million Functionality +ve Web of Science Does not list items with zero citations Alpha order of results by journal title Citations to non-journal sources are listed -ve Few options to assist in common name searching Does not allow secondary sorting of results Journal abbreviations difficult to guess similarities Publication year limit search Scopus Affiliation searching useful Secondary sorting, refining, limiting of results Keyword and author search to limit results Variations of author names colocated Lists items with zero citations Different results when different search for same author was conducted Ease of use +ve -ve Web of Science Expanded titles for additional information More difficult to search Lack of bibliographic details for some entries leads to uncertainty about citing source Scopus Clean layout and easy to read bibliographic details General comments In addition, library staff reported: Almost 60% citations are in both Web of Science and Scopus, the remaining 40% unique citations to one source or the other; Scopus has more searching options; Scopus is easier to use; Scopus interface is preferred; Staff are familiar with Web of Science; and Scopus would be good to have as a complementary product. Subscription cost The cost of subscribing to the databases differs according to criteria such as years of coverage (for Web of Science) and discounting by the vendors. An exact price for an annual Scopus subscription cannot be published due to commercial confidentiality. 6 August 2008 Citations by FoR codes The data below are a sample of the results found when Curtin Library staff undertook a citation search for academics’ publications between 2004-2006 in Scopus and Web of Science. A wider analysis of these results will be reported in the ‘Case Studies’ component of this report. ERA Cluster & FOR Cluster 1 Physical, Chemical & Earth Sciences 0402 Cluster 2 Humanities & Creative Arts 2002 Cluster 3 Engineering & Environmental Sciences 0599 0907 0903 Cluster 4 Social, Behavioural & Economic Sciences 1503 1506 1605 1301/2 1701 1303 1399 1504 Cluster 5 Mathematics, Information & Communication Sciences 0806 0906 Cluster 6 Biological Sciences & Biotechnology 0607 0602 0704 0703 Cluster 7 Biomedical & Clinical Research 1101 1115 0605 Cluster 8 Public & Allied Health & Health Sciences 1110 1117 Wos Scopus 719 733 4 1 7 9 9 9 20 15 6 29 10 4 75 14 17 26 22 3 4 10 92 50 27 15 91 10 158 36 407 49 9 8 411 66 7 14 39 12 106 53 3 138 7 217 42 301 In a number of FOR areas the citations found in each database was the same. Citations found in Scopus were, for most FORs, higher than those identified in Web of Science. It should be noted that the data were collected by 10 staff members over a two week period, and due to searching variants 7 August 2008 some inconsistencies may have occurred and consequently affected the results. Feedback from librarians In terms of coverage of the two databases, Scopus has the advantage of indexing conference proceedings, which may be reflected in the data for Cluster 5 where conference proceedings are a common form of research output. Scopus also indexes more titles overall, but over a shorter time period. See the ‘Literature Review’ of this report for a more detailed discussion of coverage. Scopus appears to offer more functionality in relation to searching and general ease of use. However, the ‘Cited Reference Search’ in Web of Science does enable searching for citations to non-journal sources – a function not available in Scopus. From the general comments made and the subscription costs of the databases, it might be concluded that library staff have become accustomed to the Web of Science product and despite a preference for the Scopus interface and functionality, there is insufficient support for a switch to Scopus and insufficient funds to support subscriptions to both products. Case Studies in Humanities and Social Sciences Based on the findings of the literature review and library staff feedback, subject areas in arts and humanities were selected for the case studies. Three subject areas comprised the sample which was drawn from a list of publications, between 2004-2006, by Curtin researchers. The subject areas with their Field of Research code(s) are: Art - FoR 1901/1905 Cultural Studies - FoR 2002 Literature (Australian and English) - FoR 2005 To identify alternative measures of research impact in these subject areas, the following sources were examined: Publish or Perish Amazon Library holdings In total, 130 research outputs in the form of books, book chapters, journal articles, conference papers, exhibition catalogues, and newspaper articles were included in the analysis. Citations to only seven of these items (5.4%) were found in the major citation databases, Web of Science and Scopus – six in Web of Science and one in Scopus. The subject areas represented in these citations were cultural studies (four publications) and literature (three 8 August 2008 publications). These results support the literature review findings that Web of Science provides better coverage of arts and humanities than Scopus. When the items were checked for citations in Publish or Perish (which draws its data from Google Scholar) the results were only marginally better with eight items (6.1%) found listed with citations against them. Two items, a journal article (literature) and a book (cultural studies), had seven and nine citations respectively in Publish or Perish, whereas in Web of Science and Scopus they received none. The issues related to relying on Publish or Perish data was documented in the literature review section of this report and the findings of this examination suggests it is not a viable alternative for arts and humanities research output. Publications with citations in Web of Science, Scopus and Publish or Perish WoS Scopus PoP n %* n %* n %* Literature 3 2.3 0 0 1 0.7 Cultural Studies 3 2.3 1 0.7 7 5.4 Total 6 4.6 1 0.7 8 6.1 * Percentage calculated from total of 130 items A search of the books subset of the online bookstore Amazon can identify intext and bibliographic citations for some authors. The difficulty of this search is in the exact identification of cited items. Amazon restricts access to much of the full-text and provides very short excerpts of the context of a citation in the results list. Unless the searcher is familiar with the work of a cited author, these excerpts are not sufficient to identify a work in many instances. In a number of cases, excerpts against books clearly identified the cited work, but some did not. The Amazon book search found citations to 12 publications from the sample of 130 (9.2%); four from literature and eight from cultural studies, presented below. Publications with citations in Amazon book search n %* Literature 4 3 Cultural studies 8 6 Total 12 9 * percentage rounded and calculated from total of 130 items An alternative to citations as a measure of research impact, is the number of libraries that hold a publication. While this measure may be dependent on publisher activities (for example promotion strategies) and library acquisitions activities (such as approval plans with certain publishers) it could also be a reflection of library client demand. There is no real way of knowing exactly what library holdings demonstrate, but they do offer a numeric indicator. 9 August 2008 Library holdings data is not available for journal and newspaper articles. In the sample of 130 publications 50 items (38%) were in these formats, leaving 80 items (62%) that were books, book chapters, conference papers or exhibition catalogues. WorldCat and Libraries Australia were the two sources used to identify holdings data. Libraries Australia listed holdings for 35 of the 80 publications (44%) and WorldCat listed 30 (37%). The table below presents the finds for subject areas of the items with holdings data. Publications with library holdings Libraries Australia WorldCat n %* n %* Art 6 8 3 4 Literature 10 12 10 12 Cultural studies 19 24 17 21 Total 35 44 30 37 * Percentage rounded and calculated from total of 80 items The analysis of publications in the subject areas art, literature and cultural studies indicates that the main citation databases and Publish or Perish (Google Scholar data) do not provide sufficient data with which to measure impact. A slightly higher number of publications were found to have citations in the Amazon book search, but the functionality of an Amazon search means results should be treated with caution. Citations to publications in the art subject area were not found in any of these sources. Library holdings provide the only data available for art publications and, overall, library holdings data captures more publications in all these subject areas. In previous discussion about how to measure impact of research outputs in arts and humanities, other measures such as ranked tiers of publishers and art exhibition venues, and number of exhibition visitors and books reviews have been touted (see for example, a 1998 report by Dennis Strand and a Quality Metrics report for the RQF in 2006). These alternatives are not examined in this report and research on these measures has not been located. They all present distinct difficulties in terms of reliability of the measure, but perhaps not greatly different to reliance on citations data from the major citation databases. That is, at some stage in the process a measure of quality is made. In the case of the citation databases it is the source materials indexed; for ranked tiers of publishers and exhibition venues it is at the point that the scholarly community make ranking judgements. For book reviews, impact may be a product of a publisher’s promotion strategy rather than the quality of the book. As this case study report indicates, there are few measures that adequately measure the impact of research outputs in arts and humanities. Art is particularly poorly served in comparison to literature and cultural studies, with the latter having a higher profile across all sources of data examined in this exercise. 10 August 2008 Conclusion In conclusion, the findings of the three components of this report provide librarians with information with which to offer advice to research offices and make in-house subscription decisions. However, these findings have not advanced our knowledge of the area to the extent that the discussion can be closed. Given the nature of the subject matter, citations and measures of research impact, and the time and words already expended on attempts at clarification, this is perhaps not surprising. Reliable and acceptable measures of research impact continue to challenge governments and researchers in the field. What the report has achieved is a co-location of a range of concerns, such as the preferred source of citation data and what alternatives exist for poorly indexed subject areas. The literature review and feedback components indicate most science areas are relatively well indexed by Scopus or Web of Science, or both. According to the literature reviewed, and supported by the findings of the citation analysis exercise conducted at Curtin University Library, much of the arts and humanities are not as well served by these major databases, although Web of Science is superior to Scopus in this regard. The findings have indicated that while Scopus is generally viewed as the preferred interface, Web of Science has distinct advantages, and therefore the databases are complementary. Most libraries with available funding would want to subscribe to both databases. In terms of alternative quantitative measures of research impact, the report has shown that there are few available to subject areas in arts and humanities. Library holdings, although untested on a broader scale, appear to offer the best alternative for non-serial publications in these discipline areas. Papers included in Literature Review 1 Bar-Ilan J. Which h-index? - a comparison of WoS, Scopus and Google Scholar. Scientometrics. 2008; 74(2):257-271. 2 Bauer K, Bakkalbasi N. An examination of citation counts in a new scholarly communication environment. D-Lib Magazine. 2005; 11(9):[np]. 3 Bosman J, van Mourik I, Rasch M, Sieverts E, Verhoeff H. Scopus Reviewed and Compared: The Coverage and Functionality of the Citation Database Scopus, Including Comparisons with Web of Science and Google Scholar. Utrecht: Utrecht University Library, 2006. [Online at: http://igitur-archive.library.uu.nl/DARLIN/2006-1220-200432/UUindex.html] 4 Falagas ME, Pitsouni EI, Malietzis GA, Pappas G. Comparison of PubMed, Scopus, Web of Science, and Google Scholar: strengths and weaknesses. Faseb Journal. 2008; 22(2):338-342. 5 Gavel Y, Iselid L. Web of Science and Scopus: a journal title overlap study. Online Information Review. 2008; 32(1):8-21. 11 August 2008 6 Jacso P. As we may search - Comparison of major features of the Web of Science, Scopus, and Google Scholar citation-based and citationenhanced databases. Current Science. 2005; 89(9):1537-1547. 7 Kloda LA. Use Google Scholar, Scopus and Web of Science for comprehensive citation tracking. Evidence Based Library and Information Practice. 2007; 2(3):87-90. 8 Meho LI, Yang K. Impact of data sources on citation counts and rankings of LIS faculty: Web of Science versus Scopus and Google Scholar. Journal of the American Society for Information Science and Technology. 2007; 58(13):2105-2125. 9 Norris M, Oppenheim C. Comparing alternatives to the Web of Science for coverage of the social sciences' literature. Journal of Informetrics. 2007; 1(2):161-169. 10 Vaughan L, Shaw D. A new look at evidence of scholarly citation in citation indexes and from web sources. Scientometrics. 2008; 74(2):317330. 11 Burnham JF. Scopus database: A review. Biomedical Digital Libraries. 2006; 3(1). [Online at: http://www.bio-diglib.com/content/3/1/1] 12 Jacso P. The plausibility of computing the h-index of scholarly productivity and impact using reference-enhanced databases. Online Information Review. 2008; 32(2):266-283. 13 Libmann F. Web of Science Scopus, and classical online: philosophies of searching. Online. 2007; 31(3):36-40. 14 Smith AG. Benchmarking Google Scholar with the New Zealand PBRF research assessment exercise. Scientometrics. 2008; 74(2):309-316. Other references (2006) Research Quality Framework: Assessing the Quality and Impact of Research in Australia Metrics. Quality Metrics. [Online at: http://www.dest.gov.au/NR/rdonlyres/EC11695D-B59D-4879-A84D87004AA22FD2/14099/rqf_quality_metrics.pdf] Strand, D. (1998) Research in the Creative Arts. Canberra: Dept. of Employment, Education, Training and Youth Affairs. [Online at: http://www.dest.gov.au/archive/highered/eippubs/eip98-6/eip98-6.pdf] 12 August 2008