The bibliometric database at the Swedish

 The bibliometric database at the Swedish Research
Council – contents, methods and indicators
Ulf Kronman, Magnus Gunnarsson and Staffan Karlsson 2010‐05‐11. Version 1.0 Contents
1 Introduction .................................................................................................................................................. 4 2 Data source and database ............................................................................................................................ 4 3 Data properties ............................................................................................................................................. 4 4 5 3.1 Journals ................................................................................................................................................ 4 3.2 Document types .................................................................................................................................. 5 3.3 Subject fields ........................................................................................................................................ 5 3.4 Addresses ............................................................................................................................................. 6 3.5 Author names ...................................................................................................................................... 6 3.6 References ............................................................................................................................................ 7 Data preparation .......................................................................................................................................... 7 4.1 Re‐classification of publication type Letter ..................................................................................... 7 4.2 Subject field classification and fractionalisation ............................................................................. 8 4.3 Address deduplication ....................................................................................................................... 9 4.4 Address matching ............................................................................................................................... 9 4.5 Address counting and fractionalisation ........................................................................................ 11 4.6 Reference adjustments ..................................................................................................................... 12 4.7 Citation windows ............................................................................................................................. 12 4.8 Self citations ....................................................................................................................................... 13 4.9 Citation reference values for Thomson subject fields (μf) ........................................................... 13 4.10 Citation reference values for journals (μj) ..................................................................................... 14 4.11 Citation percentile threshold values for fields (τf) ....................................................................... 15 Methods for analyses ................................................................................................................................. 16 5.1 6 Publication address/field fractions ................................................................................................. 16 Indicators ..................................................................................................................................................... 16 6.1 Publication counts (P) ...................................................................................................................... 16 6.2 Share of un‐cited publications (puc) ................................................................................................ 17 6.3 Share of self‐citations (csc) ................................................................................................................ 17 6.4 Field normalised citation rate (cf) ................................................................................................... 17 6.5 Share of publications above the 90th, 95th and 99th citation percentile ....................................... 18 6.6 Journal normalised citation rate (cj) ............................................................................................... 19 6.7 Journal to field normalised citation rate (jf) ................................................................................... 19 7 References .................................................................................................................................................... 20 8 Appendices ................................................................................................................................................. 20 8.1 SRC denotation of parameters and indicators .............................................................................. 20 8.2 Thomson document types ............................................................................................................... 23 The bibliometric database at the Swedish Research Council 2 8.3 Thomson subject fields ..................................................................................................................... 24 The bibliometric database at the Swedish Research Council 3 1 Introduction
The Department of Research Policy Analysis at the Swedish Research Council (SRC) maintains and develops a database for bibliometric analyses. This document describes the properties of the SRC bibliometric database, data preparation, analysis tools and methods, and the resulting bibliometric indicators used at the SRC in detail. 2 Data source and database
The bibliometric analyses made at the SRC are based on scientific publication data records licensed from the US‐based company Thomson Reuters. The database corresponds approximately to the data that can be retrieved in the Thomson Reuters web service Web of Science. The licensed products are the Thomson Reuters indices Science Citation Index Expanded, Social Science Citation Index and Arts and Humanities Citation Index1. The SRC database contains publication records of all serial titles and publication types covered by Thomson Reuters2 together with their reference lists. SRC licenses records for articles published from 1982 and onwards; approximately 30 million records and 560 million references (citations). The database is updated once a year in March‐April. Data is delivered from Thomson Reuters in the form of tagged text files containing about 1.3 million new publication records per year. As Thomson also complements and corrects old data, the delivery of new records is accompanied by batches of “gaps” and “corrections” that are supposed to fill gaps and replace old amended records. The SRC receives updates, gaps and corrections by the end of January each year, a shipment that includes all changes made to the licensed products during the previous calendar year. Since there is a certain time lag between a scientific document’s publication and its entering into the Thomson indexes, the January shipment does not include all publications that were published during the preceding year. In order to make the SRC database complete regarding the preceding year the January shipment is supplemented in April with updates, gaps and corrections made to the Thomson indexes during the first quarter of the year that is loaded into the SRC database in April each year. By this addition the publication numbers for the previous year is close to complete. The publication data is parsed from the delivered text files and loaded into a relational database with the means of an in‐house developed loader program. Citation counts and reference values for field and journal normalisation of citation counts are then calculated and stored in the database. 3 Data properties
This section describes the general properties of the bibliometric data as it is delivered from Thomson Reuters. 3.1
Journals
The Thomson data is organised around journal issues, rather than publications or journal titles. This has implications on the later classification of publications into subject fields, since it is the journal issues that are being subject classified by Thomson, not journals, nor individual publications. Certain data included herein are derived from the Science Citation Index Expanded, prepared by Thomson Reuters®, Philadelphia, Pennsylvania, USA. © Copyright Thomson Scientific® 2006. All rights reserved. 1
Around 9 500 active serial titles in 2009 2
The bibliometric database at the Swedish Research Council 4 For each issue Thomson registers a full serial title, ISSN, publisher address, etc. As both the serial title and the ISSN may change over the years for something generally being considered as one and the same journal, there is no consistent way of identifying journals in the Thomson data. Thomson does its own judgement of what is to be considered as a journal and assigns each journal a unique identifier called a sequence number. For each issue the sequence number, the full title of the journal and four different normalised shorter title variants are given; title_11, title_20, title_29, where the number indicates the maximum number of characters in the title variant. There is also an ISO title and an ISSN number for each issue, but about 100 000 issues lack ISO title, and around 6000 issues lack ISSN number. Number of unique sequence numbers Number of unique values in title_11 Number of unique values in title_20 Number of unique values in title_29 Number of unique values in full title 1982‐2008 23 954 15 953 16 020 16 007 23 366 2003‐2008 15 307 11 515 11 623 11 545 14 946 2008 10 812 10 608 10 619 10 609 10 815 Table 1. The number of unique values for various potential journal identifiers in the SCR Thomson database. A list of all journal titles indexed in the Thomson database can be found at the Thomson website1. In January 2009, the list covered 15 717 titles. The lack of a consistent identifier for what on daily basis is called a “journal” indicates some need for caution when doing journal‐based analyses. The tests that SRC has performed have indicated that title_11 is the most stable and reliable identifier, and SRC presently uses this field for identifying journals. 3.2
Document types
Thomson classifies the publications as belonging to one (1) of a number of document types (around 30), the most important being articles, reviews, letters, and meeting abstracts. See appendix 8.2 for a list of document types, together with the number of records and the share for each document type in the SRC database. Since the SRC products licensed from Thomson does not include the ISI Conference Proceedings index, the SRC database and analyses does not presently2 include the Thomson document type Proceedings Papers. Bibliometric analyses including citation level normalisations (i.e. comparisons to other publications of equal type) take the document type into account, since different document types tend to have different citation characteristics. Of the document types listed in appendix 8.2 Chronology and Note are not used separately anymore, but included in the type Article. Document type Discussion is included in the type Editorial Material as from 1996. See also 4.1 for details on how SRC treats document types. 3.3
Subject fields
The subject classification in the SRC database is primarily based on the Thomson classification system that assigns different scientific field categories to journal issues. The number of fields used in the Thomson database has varied over the years; presently there are 255 different subject fields in the http://scientific.thomson.com/mjl/ 1
September 2009 2
The bibliometric database at the Swedish Research Council 5 database, whereof 247 have been used after year 2000. The Thomson subject fields are listed in appendix 8.3, together with a number of publications in each category in the SRC database. Each journal issue is classified by Thomson as belonging to one or several (maximum 7) of these 255 scientific fields. For “regular” journals the fields used for the issues tend to stay the same over time, but for monograph series or report series the classification may vary from issue to issue, leading up to a serial title classified as belonging to more than 30 different fields. 3.4
Addresses
The SRC bibliometric database contains about 56 million author or reprint addresses. 10 million of the addresses can be considered as duplicates, generated by Thomson double registration of author and reprint addresses that has varied in methodology over the years, see “Reprint addresses” below. The 660 000 Swedish addresses1 in the SRC database have been refined using a set of address‐to‐
organisation matching rules. The procedure for the address matching is described in section 4.3 Address Matching under Data preparation. The addresses for the publications in the Thomson database are of two types: author addresses and reprint addresses. A reprint address contains name, affiliation and address details for the corresponding author of the publication. Before 1998 the address of the corresponding author was only registered in the database as a reprint address, but beginning in 1998 the address of the corresponding author is registered both as an author address and as a reprint address, resulting in a number of duplicate addresses in the database. See section 4.3 for details on how SRC handles these duplicates. 3.5
Author names
The SRC bibliometric database contains 88 million author name entries. Author names are not database normalised (merged), which means that each occurrence of an author name is recorded as a separate entry, even if the name is exactly the same as a previous entry. The reason not to merge identical author names is that there is no identifier of persons in the input data, and therefore there is no way to tell if two identical author names denote the same one or two different persons (a homonym). The author names are registered in the Thomson database using the form Lastname, Initial(s). No diacritics are registered in the database, so for instance Swedish characters å, ä, ö will end up as a, a, o (in a few occasions older transliteration rules making for instance ä to ae and ö to oe are used). Varying transliteration, non‐consistent use of initials and mix‐up of middle names and given names often cause problems when trying to locate the publications of a specific researcher, together with the homonym problem mentioned above. 3.5.1
Reprint/corresponding author
Each author record is given a number of attributes, of which reprint author is one. The reprint author name is also stored as a part of the reprint address. The reprint/corresponding author of a publication may therefore be retrieved both by fetching the author record marked as reprint author or by extracting the author name from the reprint address information. 3.5.2
Corporate authors
The Thomson data files also contain an author type marked as corporate authors, denoting the name of a group of authors belonging to a large researcher community or an organisational body. In the SRC Address records containing SWEDEN in the Thomson field country. 1
The bibliometric database at the Swedish Research Council 6 database corporate author names are being stored and used in the same way as ordinary author names. 3.5.3
Authors and affiliations
In the source files delivered by Thomson Reuters each publication record contains a list of author names and a list of author addresses. However, for records entered in the Thomson system before 2008 there is no indication of which addresses that relates to which authors, except for the corresponding author. As author names and author addresses are processed in sequence according to the information from the journal issue, a relation between the first author name and the first author address can be presumed, but there are no guaranties from Thomson on that. As from second half of 2008 the records delivered contains an indication of which author names and addresses belong together. The reprint address also contains an author name, and therefore this information can be used as an identifier of an author‐affiliation relation. Details about author‐affiliation relations are not presently used in the calculation of any of the regular indicators that SRC produces. 3.6
References
The reference lists of the publications in the SRC bibliometric database contain about 560 million references (outbound citations) to other publications. Each of the publication records in the Thomson database contains a list of the publication’s references to other publications. These reference records contain, among other things, a generated reference ID number (RefNumberID), built by a special Thomson algorithm from selected key data of the referred publication. The same algorithm is also used to generate a unique identifier for all registered publications in the Thomson database, but here it is called the ItemNumberID. By matching the RefNumberIDs from the reference lists of newly registered publications with the ItemNumberIDs of previously registered publications, the number of references to each publication (i.e. the number of inbound citations) can be calculated. Since ItemNumberIDs are generated for all publications entered in the database, there is no problem if the chronological order of the referring publication and the cited publication is reversed (i.e. the referring publication is entered in the database before the cited publication). When the matching algorithm is run, a match will occur anyhow. Almost half (47%) of the references in the SRC database are pointing to sources external to the SRC database, i.e. their generated RefNumberIDs does not match any item in the database. These non‐
matched references usually refer to conference proceedings, reports, books, web‐published material or journals not covered by the Thomson database. See also section 4.6 for details on how the SRC deals with references. 4 Data preparation
After the tagged text data for the Thomson publication records has been loaded into the database, a number of SRC‐specific refinements and calculations are performed to generate the underlying data needed for indicator calculations. 4.1
Re-classification of publication type Letter
At the SRC bibliometric analyses are mostly based only on the document types article and review. Since documents of the Thomson type letter contribute significantly to the scientific communication in The bibliometric database at the Swedish Research Council 7 several fields, the Thomson document types are re‐classified, so that types article and letter (and the previously used document types note and chronology) are joined into a new SRC‐specific document type; article, covering 70% of the publications in the database. The reason for not treating letters as a separate document type is that the groupings of letters into subject categories tend to generate too small groups for stable statistics and that the average citation level for letters tend to be rather low (even though there are well‐cited letters). So at the SRC, letters are grouped together with original articles and compared to the citation rates of those while doing citation normalisations. 4.2
Subject field classification and fractionalisation
The SRC methodology uses the Thomson fields of the journal issue as a proxy to classify the individual publications, so that each publication gets classified as belonging to 0 to 7 scientific fields via the issue it is published in. During this mapping process, a SRC reclassification of publications in journal issues classified as multidisciplinary is also applied, see 4.2.1 below. The result of the subject field mapping leads to a number of links between Thomson subject fields and publications and a count of the number of fields used for classification that is stored in each publication record. The information about the fields and number of fields is used for a subject fractionalisation in later calculations of citation reference values and citation indicators. 4.2.1
Re-classifying publications in Thomson subject field Multidisciplinary Sciences
Thomson Reuters uses the subject tag Multidisciplinary Sciences for journals that contain papers from many different scientific fields, such as PNAS, Science and Nature. This poses a well‐known problem for bibliometric analyses involving field relative indexes, since this means that papers in these journals are not compared to other papers of their ‘true’ field, but rather to other papers in journals classified as Multidisciplinary Sciences. For example, an article titled The origins of Afroasiatic was published in 2004 in the journal Science, which is tagged as Multidisciplinary Sciences, and by the end of 2008, this article had accumulated 5 citations. Since the average citation count for articles in that field that year was 2.13, the field normalised citation rate for this article was 2.35. However, if this article had been published in Journal of the Royal Asiatic Society, which is tagged as Asian Studies, it would have had a field normalised citation rate of 14.3, since the average citation count for Asian Studies articles that year was 0.35. This problem is aggravated by the fact that the journals tagged with Multidisciplinary Sciences regularly are prominent journals with high average citation levels. An article published in any of these journals is thus likely to receive a lower field normalised citation rate than would have been the case if it had been published in a journal with more specific subject tagging. In an effort to find more article‐relevant scientific field classifications for publications in journals classified as Multidisciplinary Sciences, the publications in these journals are being re‐classified by an in‐house developed algorithm, using the publication’s references and the (inbound) citations to the publication itself. The algorithm is based on the assumption that the Thomson subject fields of the articles referred to in reference lists and the citing articles indicate witch subject areas the referring/referred article belongs to. The classification algorithm is described in detail in a separate article from the SRC (Gunnarsson et. al. 2009). Using the SRC reference‐citation‐based reclassification method 50% of the publications in journals classified as multidisciplinary can be classified as belonging to other subject areas. For articles published after year 2001, 90% are moved from multidisciplinary to other subject areas using this method. The re‐classification produced by this method is now included as a standard procedure in the SRC bibliometric database. The bibliometric database at the Swedish Research Council 8 4.3
Address deduplication
As described in section 3.4 above, the set of reprint addresses and the set of author addresses partly overlap. This, in combination with the fact that the SRC often uses address fractionalisation, means that there is need for a method to identify and remove address duplicates. Originally, the production year could be used to identify the records for which the reprint address is a duplicate (before or after 1998), but since records older than 1998 have been added or amended according to the new method, this method does not work anymore. After discussions with Thomson Reuters and after detailed tests, SRC settled on the following criteria for identifying address duplicates: •
•
The reprint address of a publication record is a duplicate o if there is an author address that has the same organisation, sub‐organisation, city, state and country o or if the organisation and sub‐organisation fields of the reprint address are NULL and there is at least one author address where those fields are not NULL in the publication record. An author address of a publication is a duplicate if there is another author address in the publication record with the same organisation, sub‐organisation, city, state and country. 4.4
Address matching
The addresses supplied by the journal are parsed into several fields by Thomson: •
•
•
•
•
•
•
•
Complete address as supplied by publisher Organisation Sub‐organisation (as department) Postal code Street address City State or region Country The addresses supplied by the authors may be formatted in a number of different ways, not always with the main organisation name given as the first instance of the address, which can make it difficult for Thomson to identify the main organisation, especially for organisations in non‐English speaking countries. The organisation name may also be given in various synonym forms, especially if the name is translated ad hoc to English from a non‐English name form. Example 1 Full address Organisation Sub‐organisation City State Country Street Postal code Example 2 Full address Organisation Sub‐organisation Tufts Univ, Dept Mech Engn, Medford, MA 02155 USA Tufts Univ Dept Mech Engn Medford MA USA NULL 02155 AP Karolinska Inst, Huddinge Univ Hosp, Dept Immunol Microbiol Pathol & Infect d is,Div Clin Bacteriol,S‐14186 Huddinge, Sweden Karolinska Inst NULL The bibliometric database at the Swedish Research Council 9 City State Country Street Postal code Huddinge NULL Sweden Huddinge Univ Hosp, Dept Immunol Microbiol Pathol & Infect Dis, Div Clin Ba cteriol S‐14186 BC Example 3 Full address Organisation Sub‐organisation City State Country Street Postal code Univ Lund, Ctr Chem, Dept Biotechnol, S‐22100 Lund, Sweden Univ Lund Ctr Chem; Dept Biotechnol Lund NULL Sweden NULL S‐22100 BC Example 4 Full address Organisation Sub‐organisation City State Country Street Postal code Inst Microbiol, Dept Clin Immunol, Goteborg, Sweden Inst Microbiol NULL Goteborg NULL Sweden Dept Clin Immunol NULL Fig 2. Four examples of Thomson address records. The second record is an example of two different organisations written in the same address and the fourth record is an example of a Swedish address with a hard‐to‐identify organisation field. The variations in address formatting and naming of organisations lead to problems when trying to identify the publications of an organisation while doing a bibliometric analysis. The SRC has therefore developed a semi‐automatic address‐to‐organisation matching process to improve the identification of publications from Swedish organisations. The address matching is based on 17 000 rules containing unique strings that match text strings in the various Thomson address fields and link the 660 000 addresses that are designated with country SWEDEN to 600 Swedish organisation records. The organisation records also contain information of the type of organisation; for instance university, university college, university hospital, hospital, company, organisation, museum, governmental, etc. The matching process is performed in a sequence of steps, beginning with the most straightforward synonym matching algorithms, followed by more intricate string matching rules: 1.
2.
3.
4.
The Thomson organisation field is searched for unique variants of Swedish organisation names. The 4 600 rules of this step match 610 000 Swedish addresses (around 90%). The Thomson organisation and Thomson city fields are searched in conjunction for combinations making up unique identifications of Swedish organisations. The 9 000 rules of this step matches 25 000 addresses. A set of 500 unique string identifiers for Swedish universities and university colleges is matched against the full address field. This procedure also picks up addresses where several Swedish university names have been written into the same address and therefore produces some address duplicates (that later are handled as split addresses). This step matches around 42 000 addresses. The Thomson organisation field is searched for unique identifying strings and combination of identifying strings together with the Thomson city field. The same The bibliometric database at the Swedish Research Council 10 5.
6.
7.
procedure is also repeated without the inclusion of Thomson city field to handle organisations located in several cities. These steps match approximately 13 000 addresses. The full address field is searched for unique identifying strings and combination of strings once more, this time trying to catch residuals from previous matches. This matches about 800 addresses. A number of addresses that have been manually linked to organisations or considered as unidentifiable are being processed using the whole address field as identifier. This is used for 200 addresses. Names of joint university organisations are being searched in Thomson organisation and Thomson city fields in two steps and split addresses for those are produced. This produces around 800 organisation matches. The result of the address matching process is a relational database table with 690 000 links1 between Swedish organisations and their publications. The links cover over 99%2 of the Swedish publications in the SRC database and work to get 100% coverage is ongoing. The matching process allows a single address to be matched to more than one organisation, which in practice means that addresses are split. Because of this all statistics based on or including the number of addresses per publication are updated. For Denmark, Norway and Finland a first rudimentary normalisation of organisation names for universities, university colleges and university hospitals has been performed and there are plans to extend this normalisation to a rule matching system of the same type as for Swedish addresses in the future. 4.4.1
Country name normalisation
For all 56 million addresses in the SRC database a normalisation of country names is performed by using a system of 353 rules for matching name variants to 246 standardised3 country names. For instance ”Scotland” and ”England” get normalised to ”United Kingdom”, ”Cambodia” and ”Khmer Republic” map to ”Cambodia”; and ”Fed Rep Ger” and ”Ger Dem Rep” to ”Germany”. No attempt to correct possible errors in country assignment (organisation names assigned to the wrong country) is done, but there are plans for such a procedure in the future. 4.5
Address counting and fractionalisation
Since almost all indicators calculated by the SRC are being weighted based on the analysed unit’s share of the addresses in the publication, the number of addresses in each publication needs to be counted and stored as a part of the publication record. After the address matching the total number of addresses is stored as a part of each publication record and used as a denominator when calculating fractionalised publication counts and weighted citation indicators. One rationale for using address fractionalisation and using fractions for weighting of citation averages is the correlation between the number of addresses and the number of citations. Publications with many authors usually attract more citations, even disregarding self‐citations. If citation values for publications would attributed to organisations without any form of fractionalisation and weighting, a The number of links between organisations and publications are more than the number of addresses, due to joint organisations and several organisations contained within the same address. 1
Measured September 2009. 2
ISO 3166‐1, http://www.iso.org/iso/english_country_names_and_code_elements 3
The bibliometric database at the Swedish Research Council 11 sort of inflation in citations would be created, since highly‐cited publications would be attributed to more organisations than lower‐cited publications. The SRC presently do not use any kind of fractionalisation based on author names and/or the number of authors to the publications, since there presently are no links between authors and affiliations, and SRC rarely does analyses on individual researchers. 4.6
Reference adjustments
As mentioned above Thomson uses an elaborate mechanism for creating links between referring and cited documents, but the SRC does some adjustments to this citation matching mechanism while counting citations for publications. 4.6.1
Split references
The Thomson algorithm that builds the RefNumberIDs and the ItemNumberIDs is not perfect; sometimes different publications can get the same ItemNumberIDs1. Of the 560 million references in the SRC database about 160 000 (0.2‰) are duplicates, i.e. the same reference ID number points to two different publication records. When this situation occurs there is no way to automatically differentiate between the two records that are being referred to and the citation to these publications is therefore split between them. The SRC database thus contains a few publications with a non‐integer number of citations. 4.6.2
Duplicate references
In some scientific areas there are citing traditions leading to a publication to occur several times in the same reference list. The reference may for instance point to different pages of the referred publication. In the SRC database, these duplicate references are only being counted once, i.e. the same RefNumberID is only entitled to occur once in each reference list. 4.7
Citation windows
Citations to publications can be measured using different citation time windows. The citation window stipulates from which publication years publications are to be searched for references when gathering citations to a publication published a certain publication year. For instance, a three year citation window used for a publication published year 2000 will mean that references from publications published the same year as the cited publication (2000) and two succeeding years (2001‐2002) are counted as citations. A citation window of six years for the same publication will count references from publications published years 2000‐2005. An open citation window will count references from all publications published up to the time the analysis is performed. For a publication published year 2000 and analysed year 2009, references from publications published years 2000‐2009 would then be counted as citations. The fixed citation windows 3 and 6 years have the advantage that citation counts to all publications older than 3 or 6 years can be compared on like grounds and therefore make a good choice for preparing time series of citation counts. The citation count for a publication published before the end of the used citation window will also stay the same, no matter when analysis is done; for instance if the citations publication published year 2000 is measured using a 3‐year window in 2003, 2005, 2008 or 2009, the result will be the same, since only citations from publications from year 2000‐2002 are counted. Usually, this occurs when two publications have same the first author, the same title, the same journal and the same first page number. 1
The bibliometric database at the Swedish Research Council 12 The open citation window has the advantage of gathering as much citation data as possible for indicator calculation, and has the possibility of spotting so called “sleeping beauties” – publications that not get started to be cited before a number of years from their publication. But the open window has the drawback of non‐time‐neutrality, since the worldwide general rate of citations is increasing for each year. Internal studies at the SRC have shown that there is a good correlation between normalised citation rates calculated from 3‐year, 6‐year and an open citation window at aggregated levels. If an analysis using a fixed 3‐year or 6‐year citation windows includes publications that are younger than 3 or 6 years, respectively, the window will act as an open window for those publications, i.e. all citations from the publication date up to the analysis date will be included in the citation count. Since the same will be valid for the publications included in the citation reference value for the same year (the denominator used for normalisation) the deviation from the fixed window will not cause any major differences in the normalised citation rate. The SRC pre‐calculates citation counts for all publications in the database using a 3‐year window (Cw3), a 6‐year window (Cw6) and an open window (Cwo). 4.8
Self citations
References between publications are usually considered to reflect some kind of scientific recognition and the number of citations a publication receives can thus be said to be a measure of the amount of scientific recognition it has gained. But if a researcher refers to his own previous work, this is not a measure of recognition from rest of the research community. Therefore it is customary to try to remove these self citations from the citation counts in bibliometric studies. The SRC method for removing self citations is based on all author names in both the referring and the cited publication. If any of the author names in Thomson format (lastname + initials) in the author list of the referring publication is found in the author list of the cited publication, the citation is considered to be a self‐citation. No attempt to differentiate homonyms (different researchers sharing the same Thomson form of name) is done, and there is no separate rule for publications with long author lists. Using this process to identify self‐citations, the SRC calculates values for the number of citations including self‐citations (Csci), and the number of citations excluding self‐citations (Cscx) for all three citation windows and stores these pre‐calculated values for each publication in the database. Each publication will thus have attributes for six citation values: Csci,w3, Csci,w6, Csci,w0, Cscx,w3, Cscx,w6 and Cscx,wo. 4.9
Citation reference values for Thomson subject fields (µf)
After the various citation values have been calculated for all publications in the database, the basic data are in place to start to calculate the field reference values used for field normalisation of citations. The average number of citations for publications of a certain document type a certain year in a certain field is called the Field Reference Value and is denoted μf or FRV1. The field reference value is calculated for each combination of subject field, publication year and document type according to the following formula: ∑
∑
1
[1] where: μf = weighted average citation rate for field, year and document type The Field Reference Value has previously been called the Field Citation Score, denoted as FCS (Karlsson & Wadskog 2005, pg 54.). 1
The bibliometric database at the Swedish Research Council 13 P = the number of publications of the studied document type the studied year classified as belonging to the subject field in question Ci = the number of citations to publication i (according to separately specified citation window and self‐citation handling) Si = the number of subject fields the publication i has been classified as belonging to The use of the number of fields each publication is classified in (Si) in the denominator of both of the sums in the formula means that the average citation values are being based on publication subject fractions and the resulting field reference value will (μf) be a weighted average1 with regards to the publications. Publications without any subject field classifications are obviously excluded from the calculation, as well as publications without any author addresses, since the latter can not be a part of any country or organisation analysis, and should therefore not be a part of the reference value for the field. When calculating the field reference values all references to the publications are counted as potential incoming citations, regardless of publication year, document type or field, before being filtered by conditions regarding citation window and self‐citations. At the SRC the field reference values are calculated for all of the six variants of citation values for each publication; with (sci) or without (scx) self‐citations, with citations windows 3 (w3) or 6 (w6) years, or an open window (wo). The calculated values are denoted μf[sci,w3], μf[sci,w6], μf[sci,wo], μf[scx,w3], μf[scx,w6] and μf[scx,wo] respectively, and stored in a separate field reference value table in the database as a foundation for the later calculation of the field normalisation of citation rates. Since the field normalised citation rate is the most commonly used indicator, pre‐calculated weighted average field normalised citation values for all the combinations of citation windows and self‐citation filtering are stored as a part of each publication record in the database. The calculated values are denoted cf[sci,w3], cf[sci,w6], cf[sci,wo], cf[scx,w3], cf[scx,w6] and cf[scx,wo] respectively. See section Indicators below for a description of how the field normalised citation level is calculated at the SRC. 4.10 Citation reference values for journals (µj)
Sometimes it can be of interest to study how much a publication has been cited in relation to publications of the same document type, the same publication year in the same journal, an indicator called the journal normalised citation rate. To be able to do this, a mean citation value for each document type, year and journal has to be calculated. This value is called the journal reference value and is denoted μj or JRV2. For each journal, identified by the field title_11 in the journal issue records, the journal reference values are calculated according to the following formula: ∑
[2] where: μj = the journal reference value A publication classified in only one field will have a fraction weight of 1, whereas a publication classified in 5 different fields only will have a fraction weight of 0.2 for the calculation of each of the field reference values. 1
This value has previously been called the Journal Citation Score (JCS) 2
The bibliometric database at the Swedish Research Council 14 Ci = the number of citations to publication i (according to separately specified citation window and self‐citation handling) P = the number of publications of the studied document type the studied year published in the journal in question At the SRC the journal reference values are calculated for all of the six variants of citation values for each publication; with (sci) or without (scx) self‐citations, with citations windows 3 (w3) or 6 (w6) years, or an open window (wo). The calculated values are denoted μj[sci,w3], μj[sci,w6], μj[sci,wo], μj[scx,w3], μj[scx,w6] and μj[scx,wo] respectively, and stored in a separate table in the database for later calculation of journal normalised citation rates. Publications without any author addresses are excluded from the calculation of the journal reference value, since these can’t be a part of any country or organisation analysis, and should therefore not be a part of the reference value for the journal. 4.11 Citation percentile threshold values for fields (τf)
As a complement to the study of publication citation rates in relation to mean values, it can be of interest to study how many and what share of publications are cited more than a specified percentile threshold of a subject field. Commonly used percentile thresholds are 90%, 95% and 99%, which indicate that a publication is among the 10%, 5% or 1% most cited in a field if it has yielded more citations than the corresponding percentile threshold value. The percentile threshold values are defined and calculated using the same conditions as the field average reference values, i.e. it is publications of the same document type, the same publication year and the same Thomson subject field that is grouped to calculate the percentile values. Publications without any subject field classifications are excluded from the calculation, as well as publications without any author addresses. At the SRC the percentile thresholds for the fields are being calculated on subject fractions of the publications, which means that each publication gets a fraction weight in inverted proportion to the number of fields it is classified in when the summing up of the various percentages of publications is done. The percentiles are calculated by sorting the publications in order of the number of citations to each publication. Each publication is assigned a fraction weight that is the inverse of the number of fields it is classified in. Then groups of publications containing 90%, 95% and 99% of the total number of weighted publication fractions are extracted and the number of citations to the publication at the top of the list is noted as the corresponding percentile threshold value. If the number of (weighted) publications is such that the 90% (or 95% or 99%) limit goes through a publication (and not between two publications), the average citation count for the two publications on both sides of the limit is used for percentile threshold value1. The percentile calculation is performed for all six combinations of citation windows and self‐citation handling for each field and the results are stored in the field reference value table in the database as 18 different threshold values, one for each combination of percentile threshold (90, 95 and 99), citation window (w3, w6, wo) and handling of self‐citations (sci, scx). The values are denoted τ90f[sci,w3], τ95f[sci,w3], τ99f[sci,w3], etc. The calculation is made in the statistics software SAS, using the SAS standard definition for precentiles, PCTLDEF=5. 1
The bibliometric database at the Swedish Research Council 15 5 Methods for analyses
Data for bibliometric analyses are being extracted from the SRC database in a number of different ways, the most common being to retrieve sets of publication data in the form of lists of publication fractions. Each publication is split into fractions by author address and subject classification, so that a publication with two addresses and three subject classifications is split into six fractions, one for each combination of address and subject. The basis of the subject classification is the Thomson subject classification of journal issues included in the tagged text files; see section 3.3 above. The final analyses are either performed with the statistical program SAS or with in‐house developed programs and scripts. Usually the analyses are post‐processed in Microsoft Excel for tabulation and diagrams. 5.1
Publication address/field fractions
The output from the SQL database is typically a list of relevant data for each publication to be analysed, with one row for each combination of publication, address and subject field. Each row usually contains the following data: •
•
•
•
•
•
•
•
•
•
•
•
Normalised organisation name of the Swedish address, or Thomson organisation name if non‐
Swedish address Normalised country name Full address Publication year Document type Subject field The number of subject fields in the classification of this publication The number of addresses for this publication The number of citations in 6 variants with different citation windows and self‐citations included or removed Field reference values in corresponding 6 variants for the field the fraction is classified in Percentile thresholds for 90th, 95th, and 99th percentile with 3 different citation windows and self‐citations excluded Journal reference value, open citation window, self‐citations excluded An extraction of fractions for Swedish publications 1982‐2009 generates about 1 million rows. Extraction of world publication data for the same period generates about 73 million rows, which corresponds to about 27 GB of data. 6 Indicators
In this section the most commonly used bibliometric indicators used at the SRC are presented, together with descriptions on how they are being calculated by the SRC. 6.1
Publication counts (P)
An analysed unit’s publications may be counted either in full counts or in fractional counts based on the unit’s share of addresses in the publication, and, in relevant cases, the number of subject fields assigned to the publication. The number of publications in full counts is denoted Ph and the number of fractionalised publications is denoted Pr. A commonly used bibliometric indicator is the number of publications produced per year, which is denoted Py for full counts and Py,r for fractionalised counts. The bibliometric database at the Swedish Research Council 16 6.2
Share of un-cited publications (puc)
It can also be of interest to study what share of an analysed unit’s publications that have not made an impact in the scientific community, i.e. that have not yielded any citations besides self‐citations. The indicator is calculated according to the following formula: ∑
∑
1
1
[3] where: puc = weighted average share of uncited publications attributed to the analysed unit Ruc = the number of publication fractions to uncited publications (self‐citation removed) attributed to the analysed unit R = the total number of publication fractions attributed to the analysed unit Ai = the total number of addresses in the publication of fraction i 6.3
Share of self-citations (csc)
The analysed unit’s share of self‐citations is easily calculated by dividing the number of self‐citations by the total number of citations to the unit’s publications. Since the SRC calculates the weighted average based on the unit’s share of addresses to the analysed publications the formula gets a bit more complicated: ∑
[4] ∑
where: csc = weighted average share of self‐citations for the analysed publications Csci(i) = the number of citations to the publication of fraction i, self‐citations included Cscx(i) = the number of citations to the publication of fraction i, self‐citations excluded R = the number of publication fractions attributed to the analysed unit Ai = the total number of addresses in the publication of fraction i The share of self‐citations can be calculated using different citation windows, depending on what is the most suitable for the analysis in question. 6.4
Field normalised citation rate (cf)
The field normalised citation rate is one of what is called “state‐of‐the‐art” bibliometric indicators. The general idea of the indicator is to relate the number of citations to a publication or a group of publications to the average citation level of a group of comparable publications of the same document type, publication year and Thomson scientific field. The SRC calculates its field normalised citation rate (cf) indicator using a publication fraction oriented method, which means that the number of citations of each subject‐address fraction of a publication is normalised against an average citation rate for the same document type, publication year and subject field as the fraction in question belongs to. The bibliometric database at the Swedish Research Council 17 When the final average normalised citation rate for the analysed unit’s publications is calculated, each publication fraction is weighted by its share of all subject‐address fractions for that publication, so that the resulting average will be a weighted average. The SRC average cf is calculated according to the following formula: ∑
1
∑
[5] where: cf = the weighted average field normalised citation rate Ci = the number of citations to the publication of fraction i (according to separately specified citation window, self‐citations removed) μf(i) = the field reference value for the field of fraction i R = the number of publication fractions attributed to the analysed unit Si = the number of subject fields the publication of fraction i has been classified as belonging to Ai = the total number of addresses in the publication of fraction i The field normalised citation rate can be calculated using different citation windows, depending on the situation, but at the SRC it does always exclude self‐citations. Presently, SRC does not do any adjustments while normalising against very low‐cited fields. Please note that even though the SRC cf indicator resembles the CWTS “crown” CPP/FCSm indicator, it is not the same indicator. The CTWS indicator groups publications and calculates average citation levels for both the nominator and the denominator before citation normalisation is done (Moed et al 1995), whereas the SRC indicator does the normalisation on publication fraction level and that averaging is done after that. This difference is described in detail by Lundberg (2007). Furthermore, the CWTS crown does not seem to use address fraction weighting, which means that crown values usually will be higher than cf values for the same set of publications, due to the correlation between numbers of authors/addresses and citations discussed in the section about fractionalisation above. Share of publications above the 90th, 95th and 99th citation percentile
6.5
If the average field normalised citation rate for an analysed unit says something about the average impact of the unit’s publications, the share of publications cited above a certain citation percentile can tell us something about the distribution of the impact of the unit’s publications. Is an average normalised citation rate of 1.2 the result of a majority of well‐cited publications or a few highly‐cited publications? This indicator is calculated by looking at how many of a unit’s publication fractions that are cited more than the citation level for the percentile in question for the subject fraction it is classified in. If the publication is cited more than the threshold for the field, the value is 1, otherwise it is 0. This value is called the Pprc#f ‐ cited over threshold for #thpercentile value in the formula below. The SRC calculation of the indicator is weighted on the analysed unit’s share of address fractions to the publications according to the following formula: ∑
#
∑
#
1
[6] where: The bibliometric database at the Swedish Research Council 18 pPRC#f(i) = the weighted average share of publications cited above the #th percentile R = the number of publication fractions attributed to the analysed unit PPRC#f(i) = the cited over threshold for #th percentile value of fraction i (according to separately specified citation window, self‐citations removed) Ai = the total number of addresses in the publication of fraction i Si = the number of subject fields the publication of fraction i has been classified as belonging to The indicator can be calculated using different citation windows, but it seldom includes self‐citations. It is worth pointing out that the share of publications cited more than 99th percentile is usually less than 1%, and correspondingly for the other percentile values. 6.6
Journal normalised citation rate (cj)
This indicator shows how an analysed unit’s publications are cited in relation to the average citation rate for publications of the same document type and publication year in the same journal. Since no normalisation against subject fields is performed here, the data set for calculation is only fractionalised on addresses (not subject‐address as customary). The indicator is calculated according to the following formula: ∑
∑
[7] 1
where: cj = the weighted average journal normalised citation rate R = the number of publication fractions attributed to the analysed unit Ci = the number of citations to the publication of fraction i (according to separately specified citation window, self‐citations removed) Ai = the total number of addresses in the publication of fraction i μj(i) = the journal reference value for the publication of fraction i The indicator can be calculated using different citation windows, but it seldom includes self‐citations. Only publications with at least one subject classification and one address are considered in this calculation. 6.7
Journal to field normalised citation rate (jf)
Sometimes, it can be of interest to study the average citation rate of the journals a unit publishes in, in relation to the average citation rate of the fields the journal is classified in. This indicator is called the journal to field normalised citation rate and is calculated according to the following formula: ∑
∑
1
[8] where: jf = the weighted journal to field normalised citation rate The bibliometric database at the Swedish Research Council 19 R = the number of publication fractions attributed to the analysed unit μj(i) = the journal reference value for the publication of fraction i (according to separately specified citation window, self‐citations removed) μf(i) = the field reference value for the field where fraction i is classified (according to separately specified citation window, self‐citations removed) Si = the number of subject fields the journal of fraction i has been classified as belonging to Ai = the total number of addresses in the publication of fraction i The indicator can be calculated using different citation windows, but self‐citations are seldom included. 7 References
Gunnarsson, M.; Fröberg, J.; Jacobsson, C. & Karlsson, S. (2008) Subject classification of publications in the Thomson database based on references and citations. Vetenskapsrådet, Stockholm. Karlsson, S. & Wadskog, D. (2006). Hur mycket citeras svenska publikationer? Bibliometrisk översikt över Sveriges vetenskapliga publicering mellan 1982 och 2004. Vetenskapsrådets rapportserie 13:2006. (www.vr.se/download/18.5b5b80b310e317e3c0680001207/Bibliometrirapport) Lundberg, Jonas (2007). Lifting the crown – citation z‐score. Journal of Informetrics, 1 (2007), 145‐154. Moed, H. F., Debruin, R. E., & Vanleeuwen, T. N. (1995). New bibliometric tools for the assessment of National Research Performance – Database description, overview of indicators and first applications. Scientometrics, 33(3), 381–422. 8 Appendices
8.1
SRC denotation of parameters and indicators
8.1.1
Denotation overview
General rules guiding a suggested extendable denotation scheme: •
•
•
•
•
•
8.1.2
Absolute numbers are denoted with upper case letters (P, C). Relative numbers (quotients) are denoted with lower case letters (p, c). Reference values for normalisations are denoted using Greek characters (μ, τ) Index letters are used to indicate special conditions regarding the indicator or the reference value. In situations where it is not possible to use index letters, indices may be written using brackets or underscores, i.e. cf may be denoted as c[f] or c_f. Methodological aspects as fractionalisation, weighting, averaging, normalisation level, length of citation windows and removal of self‐citations are not indicated in suggested denotations where those can be considered to be an integral part of the resulting indicator. However, where the handling of self‐citations or fractionalisation is a vital part of the resulting indicator, they are used. Please see the method index below denotation list. In situations where the same methods are used throughout a study and methods are clearly stated, methodological indices may be omitted and raw denotations as P, C or c may be used to make the presentation less cluttered. General abbreviations
p publication c citation, cited The bibliometric database at the Swedish Research Council 20 sc self‐citation h whole counts r fractionalised j journal t top %, prc percentile u un‐, non‐, zero a author y year f field g group w citation window 8.1.3
Specific abbreviations
Denotation English description Swedish description P Number of publications (counted as separately defined) Antal publikationer (beräknat enligt separat definition) Ph Number of publications, whole counts Antal publikationer, utan fraktionering P r Number of publications, fractionalized Antal publikationer, fraktionerat (enligt counts (as separately defined) separat definition) Pf#% Number of publications cited more than the #th percentile of the field, usually 90, 95 and 99 Antal publikationer citerade mer än #:e percentilen i fältet, vanligen 90, 95 och 99 Puc Number of non‐cited publications Antal ej citerade publikationer pf#% Share of publications cited more than the #th percentile of the field, usually 90, 95 and 99 Andel publikationer citerade mer än #:e percentilen i fältet, vanligen 90, 95 och 99 pf50% Share of publications cited more than the median (50th percentile) of the field Andel publikationer citerade mer än medianen (50:e percentilen) i fältet top#f Relativ andel publikationer citerade fler World‐relative share of publications cited more than the #th percentile in the gånger än den #:e percentilen för fältet field puc Share of non‐cited publications Andel ej citerade publikationer C Total number of citations Totalt antal citeringar Cp Number of citations to a single publication Antal citeringar till en publikation Cy Number of citations to publications published year y Antal citeringar till publikationer publicerade år y The bibliometric database at the Swedish Research Council 21 cp Average number of citations per publication Genomsnittligt antal citeringar per publikation cy Average number of received citations per publication year Genomsnittligt antal mottagna citeringar per publiceringsår cp,y Average number of received citations per publication and year Genomsnittligt antal mottagna citeringar per publikation och år cf Field normalised citation rate Fältnormerad citeringsgrad CPP/FCSm Crown indicator – CWTS field normalised citation score CWTS kronidikator Cf Sum of field normalised citations Summa av fältnormerade citeringar cj Journal normalised citation rate Tidskriftsnormerad citeringsgrad cjg Journal group normalised citation rate (as separately defined) Tidskriftsgruppsnormerad citeringsgrad (enligt separat definition) csc Share of self‐citations (as separately defined) Andel självciteringar (enligt separat definition) μf Field reference value (FRV) (field citation score, FCS) for publications of the same type, age and in the same field of research Fältets genomsnittliga citeringsgrad för publikationer av samma typ och ålder τf#% Tröskelvärde för #:e percentilen för Citation threshold value for the #th percentile of the most cited in the field, publikationer citeringsfördelade inom ett usually 90, 95 and 99 forskningsfält, vanligen 90, 95 och 99 μj Journal reference value (JRV) for publications of the same type and age published in the same journal Genomsnittlig citeringsgrad för publikationer av samma typ, publiceringsår och samma tidskrift jcf Journal to field citation rate Fältnormerad medelcitering för tidskrift 8.1.4
Method indices
In those cases where the choice of methods makes considerable differences to indicators, methods can be indicated by adding extra information to the index, preceded by a comma sign. Verbose example: cf,scx,wo,r means field normalised citation rate with self citations excluded, measured with open citation window, averaged with fraction weighting. Here is an extendable list of suggested methodological indices: Index Meaning w<n|o> Citation window. Examples: w3 – publication year plus 2 following years. w6 – publication year plus 5 years. wo – open citation window; all citations from publication date up to measuring date are counted. sc Self‐citations sci Self‐citations included scx Self‐citations excluded h Whole counts The bibliometric database at the Swedish Research Council 22 r Fractionalised counts 8.1.5
Other Terminology
Citation windows Citation window numbers count the publication year separately, so that a 3‐year citation window for a paper published 2005 means citations from 2005, 2006 and 2007. Fractionalisation Fractionalisation can be made on subjects, addresses, authors, or any other publication property with more than one value. Usually the type of fractionalisation is easy to infer from the context, but sometimes the type needs to be specified: subject fractionalisation, address fractionalisation, etc. For publication counts for countries, address fractionalisation can be done in (at least) two ways. •
Weighted country fractionalisation: Each address is given one fraction of the publication and the fraction for each country is the sum of that country’s address fractions. Unweighted country fractionalisation: The countries represented in the address list are given equal fractions, regardless of how many addresses the countries have. •
For example, an article with 3 Swedish addresses and 1 Danish address will in the first case, using weighted country fractionalisation, result in 3/4 publications for Sweden and 1/4 publication for Denmark. In the second case, using unweighted country fractionalisation, it will result in 1/2 publication for Sweden and 1/2 publication for Denmark. 8.2
Thomson document types
Document types in the Thomson database, together with the number of records and the share for each document type in the SRC database. Document type Number of publications Share Art Exhibit Review 74 039 0.2% Article 19 202 324 64.1% Bibliography 10 987 0.0% Biographical‐Item 135 386 0.5% Book Review 2 189 037 7.3% Chronology (coded as Article effective 1996) 762 0.0% Correction, Addition 186 480 0.6% Dance Performance Review 15 862 0.1% Database Review 1 334 0.0% Discussion (coded as Editorial effective 1996) 35 015 0.1% Editorial Material 1 328 252 4.4% Excerpt 5 746 0.0% Fiction, Creative Prose 33 966 0.1% Film Review 44 413 0.1% Hardware Review 2 509 0.0% Letter 1 137 212 3.8% The bibliometric database at the Swedish Research Council 23 Meeting Abstract 3 355 383 11.2% Music Performance Review 45 993 0.2% Music Score 644 0.0% Music Score Review 14 118 0.0% News Item (new in 1996) 345 802 1.2% Note (coded as Article effective 1996) 743 090 2.5% Poetry 171 525 0.6% Record Review 49 295 0.2% Reprint 14 096 0.0% Review 753 167 2.5% Script 1 999 0.0% Software Review 15 189 0.1% Theater Review 21 811 0.1% TV Review, Radio Review, Videocassette Review 9 306 0.0% Sum 29 944 742 8.3
Thomson subject fields
Thomson subject Publ. count Acoustics 85985 Agricultural Economics & Policy 21391 Agricultural Engineering 14072 Agricultural Experiment Station Reports Agriculture, Dairy & Animal Science Agriculture, Multidisciplinary Agronomy 4420 146547 48617 208092 Allergy 81222 Anatomy & Morphology 45688 Andrology 10271 Anesthesiology 137793 Anthropology 99716 Archaeology 47122 Architecture 85042 Area Studies 82326 Art Asian Studies Astronomy & Astrophysics 187655 56278 279348 Automation & Control Systems 81375 Behavioral Sciences 104985 Biochemical Research Methods 166786 Biochemistry & Molecular Biology Biodiversity Conservation Biology Biology, Miscellaneous 1379443 26305 332101 31575 Biophysics 315989 Biotechnology & Applied Microbiology 314415 Business 116738 Business, Finance 73506 Cardiac & Cardiovascular System 519925 Cell Biology 655843 Chemistry, Analytical 339292 Chemistry, Applied 168278 Chemistry, Inorganic & Nuclear 254715 Chemistry, Medicinal 120445 Chemistry, Multidisciplinary 918875 Chemistry, Organic 392178 Chemistry, Physical 571342 The bibliometric database at the Swedish Research Council 24 Classics Clinical Neurology 68297 Engineering, Aerospace 95759 561996 Engineering, Biomedical 107496 Communication 45180 Engineering, Chemical 412839 Computer Applications & Cybernetics 24898 Engineering, Civil 162070 Engineering, Electrical & Electronic 689822 Computer Critical Reviews Computer Science, Artificial Intelligence 59 104373 Computer Science, Cybernetics 20825 Computer Science, Hardware & Architecture 96941 Computer Science, Information Systems 111094 Computer Science, Interdisciplinary Applications 133667 Computer Science, Software Engineering 132148 Computer Science, Theory & Methods 184758 Construction & Building Technology 54721 Criminology & Penology 29691 Critical Care Medicine 73634 Crystallography 149707 Cytology & Histology 17755 Dance 47728 Demography 24156 Dentistry, Oral Surgery & Medicine 211839 Dermatology 191229 Developmental Biology 92738 Ecology 209848 Economics 265903 Education & Educational Research 159055 Education, Scientific Disciplines 51862 Education, Special 28770 Electrochemistry Emergency Medicine 110456 62080 Endocrinology & Metabolism 406380 Energy & Fuels 199547 Engineering, Environmental 91093 Engineering, Geological 16737 Engineering, Industrial 72304 Engineering, Manufacturing 55927 Engineering, Marine 23102 Engineering, Mechanical 227900 Engineering, Multidisciplinary 156612 Engineering, Ocean 10393 Engineering, Petroleum 85684 Entomology 117369 Environmental Sciences 380080 Environmental Studies 67478 Ergonomics 20867 Ethics 19910 Ethnic Studies 11749 Evolutionary Biology 48497 Family Studies 36533 Film, Radio, Television 90121 Fisheries 73736 Folklore 20842 Food Science & Technology Forestry 264573 65061 Gastroenterology & Hepatology 401861 Genetics & Heredity 384517 Geochemistry & Geophysics 160495 Geography 72994 Geography, Physical 29492 Geology 46375 Geosciences, Multidisciplinary 266054 Geriatrics & Gerontology 86708 Gerontology 47937 The bibliometric database at the Swedish Research Council 25 Health Care Sciences & Services 67255 Materials Science, Biomaterials 29272 Health Policy & Services 69610 Materials Science, Ceramics 95555 Materials Science, Characterization, Testing 38876 Materials Science, Coatings & Films 97232 Materials Science, Composites 37727 Hematology 496447 History 549641 History & Philosophy of Science 55893 History of Social Sciences 39115 Horticulture 55435 Hospitality, Leisure, Sport & Tourism Humanities, Multidisciplinary Imaging Science & Photographic Technology Immunology Industrial Relations & Labor 3943 340095 28736 528013 29338 Infectious Diseases 190386 Information Science & Library Science 216751 Instruments & Instrumentation 226017 Integrative & Complementary Medicine 11700 International Relations 96721 Materials Science, Multidisciplinary Materials Science, Paper & Wood 49009 Materials Science, Textiles 23034 Mathematical & Computational Biology 21228 Mathematics 339520 Mathematics, Applied 264313 Mathematics, Interdisciplinary Applications Mechanics Medical Ethics 93543 Medicine, General & Internal Law 128033 Medicine, Miscellaneous Limnology 31337 Medicine, Research & Experimental Linguistics 28782 Medieval & Renaissance Studies Literature 14057 241183 7665 Medical Laboratory Technology Medicine, Legal Literary Theory & Criticism 207592 37294 101074 284641 62817 Medical Informatics Language & Linguistics Literary Reviews 610147 Metallurgy & Metallurgical Engineering Metallurgy & Mining 874288 33761 8027 363217 17634 219100 39367 Meteorology & Atmospheric Sciences 139182 Microbiology 321346 Literature, African, Australian, Canadian 19768 Literature, American 21609 Microscopy 30311 Literature, British Isles 21560 Mineralogy 43500 Literature, German, Dutch, Scandinavian 37729 Mining & Mineral Processing 48345 Literature, Romance Literature, Slavic 119466 16591 Management 100834 Marine & Freshwater Biology 169383 Multidisciplinary Sciences 553126 Music 227333 Mycology 32519 Nanoscience & Nanotechnology 77521 Neuroimaging 22424 The bibliometric database at the Swedish Research Council 26 Neurosciences 732657 Politics & Policy 916 Nuclear Science & Technology 210588 Polymer Science 270229 80526 Psychiatry 399316 Nutrition & Dietetics 152242 Psychology 216420 Obstetrics & Gynecology 233374 Psychology, Applied 63720 Psychology, Biological 50099 Nursing Oceanography Oncology Operations Research & Management Science 99869 584071 93898 Ophthalmology 235581 Optics 290599 Oriental Studies 21 Ornithology 26997 Orthopedics 140616 Otorhinolaryngology 95314 Paleontology 35862 Parasitiology 63571 Psychology, Clinical 126379 Psychology, Developmental 72283 Psychology, Educational 37386 Psychology, Experimental 114480 Psychology, Mathematical 18897 Psychology, Multidisciplinary 119068 Psychology, Psychoanalysis 22335 Psychology, Social 58804 Public Administration 33068 Public, Environmental & Occupational Health 362167 Radiology, Nuclear Medicine & Medical Imaging 401979 Rehabilitation 100513 Religion 180082 Pathology 262707 Pediatrics 330857 Peripheral Vascular Disease 372704 Pharmacology & Pharmacy 762410 Philosophy 159292 Physics, Applied 611786 Physics, Atomic, Molecular & Chemical 292763 Physics, Condensed Matter 500788 Robotics 10251 Physics, Fluids & Plasmas 102655 Social Issues 84277 Physics, Mathematical 139151 Social Sciences, Biomedical 43175 Physics, Multidisciplinary 480473 Social Sciences, Interdisciplinary Physics, Nuclear 143513 34575 Physics, Particles & Fields 167225 Social Sciences, Mathematical Methods Physiology 305364 Social Work 41780 Planning & Development Plant Sciences Poetry Political Science 63976 394973 54425 239382 Remote Sensing 26342 Reproductive Biology 106158 Respiratory System 184139 Rheumatology 126216 Sociology Soil Science 104802 145798 81362 Spectroscopy 153545 Sport Sciences 104188 Statistics & Probability 119479 The bibliometric database at the Swedish Research Council 27 Substance Abuse 54606 Surgery 660232 Telecommunications 152432 Theater 59549 Thermodynamics 84363 Toxicology 189145 Transplantation 134265 Transportation 15327 Transportation Science & Technology 18813 Tropical Medicine 50651 Urban Studies 34322 Urology & Nephrology 285444 Water Resources 133593 Veterinary Sciences 346116 Virology 119414 Womenʹs Studies Zoology 30876 221093 The bibliometric database at the Swedish Research Council 28 The bibliometric database at the Swedish Research Council 29