Jointly published by Akadémiai Kiadó, Budapest and Kluwer Academic Publishers, Dordrecht Scientometrics, Vol. 56, No. 3 (2003) 357–367 A new classification scheme of science fields and subfields designed for scientometric evaluation purposes WOLFGANG GLÄNZEL,*,** ANDRÁS SCHUBERT** *Katholieke Universiteit Leuven, Steunpunt O&O Statistieken, Leuven (Belgium) **Hungarian Academy of Sciences, Institute for Research Organisation, Budapest (Hungary) A two-level hierarchic system of fields and subfields of the sciences, social sciences and arts & humanities is proposed. The system was specifically designed for scientometric (evaluation) purposes with the ultimate goal of classifying every single document into a well-defined category. This goal was achieved using a three-step iterative process. The basic concepts and some preliminary results are presented. Introduction Classification of science into a disciplinary structure is at least as old as science itself. After many centuries of constructive but yet inconclusive search for a perfect classification scheme, the only sensible approach to the question appears to be the pragmatic one: what is the optimal scheme for a given practical purpose? To this end, ever so many systems have been conceived and installed by general and special libraries, publishers, encyclopedias and, in ever growing number, by electronic databases, internet based information services, web crawlers, etc. Classification systems developed by the producer of the Science Citation Index (SCI; ISI – Thomson Scientific, PA, USA), by institutions working extensively with this database and by the producers of other multidisciplinary science journal databases are worthy of distinguished attention (see, for instance, Narin, 1976). These classification systems are mostly based on journal assignment, originally created for retrieval purposes. Most existing systems, however, proved to have shortcomings when used in the context of research evaluation. The classification of scientific literature into appropriate subject fields is, nevertheless, one of the basic preconditions of valid scientometric analyses. Publication activity and citation habits considerably differ among subfields. In comparative studies, inappropriate reference standards obtained from questionable subject assignment might result in misleading conclusions. This paper is, therefore, Received October 11, 2002. Address for correspondence: WOLFGANG GLÄNZEL Katholieke Universiteit Leuven Steunpunt O&O Statistieken, Dekenstraat 2, B-3000 Leuven, Belgium E-mail: h1533bra@ella.hu 0138–9130/2003/US $ 20.00 Copyright © 2003 Akadémiai Kiadó, Budapest All rights reserved W. GLÄNZEL, A. SCHUBERT: A new classification scheme of science fields aiming at the development of a new classification system including also papers published in multidisciplinary journals, and especially designed for research evaluation purposes. Methods For the given practical purpose, two different basic schemes are used: hierarchic and fine-structured classification systems used in information retrieval and more “robust” schemes emphasizing science organisation aspects and science policy needs. In this paper, a two-level hierarchical classification scheme has been constructed, so that the categories cover the whole scope of the sciences by and large evenly, and the subfields behave consistently in scientometric evaluations, i.e., common standards could be set in each of them regarding publication and citation habits. The objectives of the work have been approached by three successive steps allowing multiple feedback loops throughout the whole process. 1. The “cognitive” approach (setting the categories): In this iterative process, an initial scheme has been elaborated on the basis of both the experience of scientometricians and external experts. 2. The “pragmatic” approach (journal classification): On the basis of existing journal classification schemes the majority of the journal set extracted from the SCI has been classified into the preset subfields. The classification scheme has been adjusted according to co-heading frequency to keep multiple assignments within reasonable limits. 3. The “scientometric” approach (article classification): Articles published in core journals can be unambiguously classified into the subfield of the given journals. Articles of un-assignable or ambiguously assignable journals are classified individually using the analysis of references. The results of this classification exercise had a retroactive effect on the journal classification and also on the basic fields/subfield structure. Results Step 1 – The “cognitive” approach (setting the categories) The application of the above methods resulted in a system with 12 first-level categories (fields) and 60 second-level categories (subfields) of the sciences. For the social sciences and the humanities 3 major fields and 7 subfields were obtained. The results are presented in Table 1. 358 Scientometrics 56 (2003) W. GLÄNZEL, A. SCHUBERT: A new classification scheme of science fields Table 1. Fields and subfields of sciences, social sciences and arts & humanities 1. AGRICULTURE & ENVIRONMENT A1 Agricultural Science & Technology A2 Plant & Soil Science & Technology A3 Environmental Science & Technology A4 Food & Animal Science & Technology 2. BIOLOGY (ORGANISMIC & SUPRAORGANISMIC LEVEL) Z1 Animal Sciences Z2 Aquatic Sciences Z3 Microbiology Z4 Plant Sciences Z5 Pure & Applied Ecology Z6 Veterinary Sciences 3. BIOSCIENCES (GENERAL, CELLULAR & SUBCELLULAR BIOLOGY; GENETICS) B0 Multidisciplinary Biology B1 Biochemistry/Biophysics/Molecular Biology B2 Cell Biology B3 Genetics & Developmental Biology 4. BIOMEDICAL RESEARCH R1 Anatomy & Pathology R2 Biomaterials & Bioengineering R3 Experimental/Laboratory Medicine R4 Pharmacology & Toxicology R5 Physiology 5. CLINICAL AND EXPERIMENTAL MEDICINE I (GENERAL & INTERNAL MEDICINE) I1 Cardiovascular & Respiratory Medicine I2 Endocrinology & Metabolism I3 General & Internal Medicine I4 Hematology & Oncology I5 Immunology 6. CLINICAL AND EXPERIMENTAL MEDICINE II (NON-INTERNAL MEDICINE SPECIALTIES) M1 Age & Gender Related Medicine M2 Dentistry M3 Dermatology/Urogenital System M4 Ophthalmology/Otolaryngology M5 Paramedicine M6 Psychiatry & Neurology M7 Radiology & Nuclear Medicine M8 Rheumatology/Orthopedics M9 Surgery 7. NEUROSCIENCE & BEHAVIOR N1 Neurosciences & Psychopharmacology N2 Psychology & Behavioral Sciences Scientometrics 56 (2003) 359 W. GLÄNZEL, A. SCHUBERT: A new classification scheme of science fields Table 1. (cont.) 8. CHEMISTRY C0 Multidisciplinary Chemistry C1 Analytical, Inorganic & Nuclear Chemistry C2 Applied Chemistry & Chemical Engineering C3 Organic & Medicinal Chemistry C4 Physical Chemistry C5 Polymer Science C6 Materials Science 9. PHYSICS P0 Multidisciplinary Physics P1 Applied Physics P2 Atomic, Molecular & Chemical Physics P3 Classical Physics P4 Mathematical & Theoretical Physics P5 Particle & Nuclear Physics P6 Physics of Solids, Fluids And Plasmas 10. GEOSCIENCES & SPACE SCIENCES G1 Astronomy & Astrophysics G2 Geosciences & Technology G3 Hydrology/Oceanography G4 Meteorology/Atmospheric & Aerospace Science & Technology G5 Mineralogy & Petrology 11. ENGINEERING E1 Computer Science/Information Technology E2 Electrical & Electronic Engineering E3 Energy & Fuels E4 General & Traditional Engineering 12. MATHEMATICS H1 Applied Mathematics H2 Pure Mathematics 13. SOCIAL SCIENCES I (GENERAL, REGIONAL & COMMUNITY ISSUES) S1 Education & Information S2 General, Regional & Community Issues 14. SOCIAL SCIENCES II (ECONOMICAL & POLITICAL ISSUES) O1 Economics, Business & Management O2 History, Politics & Law 15. ARTS & HUMANITIES U1 Arts & Literature U2 Language & Culture U3 Philosophy & Religion An interesting side effect of this new category system is that part of the life-science related fields covered by the SSCI such as parts of Psychology & Behavior and Paramedicine are integrated into the corresponding science areas (see subfields N2 and M5, respectively). 360 Scientometrics 56 (2003) W. GLÄNZEL, A. SCHUBERT: A new classification scheme of science fields Step 2 – The “pragmatic” approach (journal classification) The majority of the journal set extracted from the SCI could be classified on the basis of existing journal classification schemes into the preset subfields presented in Table 1. The scheme had to be adjusted according to co-heading frequency to keep multiple assignments within reasonable limits. Examples for journal assignment obtained this way are given in Table 2. Table 2. Example for journal classification based on the ‘pragmatic’ approach Journal title Natural Product Reports Natural Resources Journal Natural Toxins Nature* Nature & Resources Nature Biotechnology Nature Cell Biology Journal of the American Chemical Society* Journal of the American Leather Chemists Association Journal of the American Musicological Society Schweizer Archiv für Tierheilkunde Schweizerische Mineralogische und Petrographische Mitteilungen Schweizerisches Archiv für Volkskunde Science* Vol. year 2000 2000 1999 2001 2000 1999 2001 2001 1996 2000 1998 2000 2001 2001 F1 B1 A3 R4 X0 A3 Z3 B1 C0 C2 U1 Z6 G2 S2 X0 F2 F3 F4 C3 O2 B2 C6 G5 * These journals are subject to the ‘scientometric’ approach in step 3 The journals assigned to category ‘X0’, i.e., to multidisciplinary sciences, were subjected to further treatment according to the ‘scientometric approach’ as described in step 3. In particular, the papers published in the journals Nature and Science (see Table 2) were individually assigned to both subfields and major fields. Similarly, papers published in JACS will be individually assigned to second-level categories, whereas they were automatically assigned to the first-level field Chemistry through the journal assignment ‘C0’ (see Table 2). Scientometrics 56 (2003) 361 W. GLÄNZEL, A. SCHUBERT: A new classification scheme of science fields Figure 1. Percentage shares of fields in the total (1998) Figure 1 gives a first impression on the distribution of publications and citations over fields. For this sample, all papers indexed in the 1998 volume of the CD-Edition of the SCI as Articles, Letters, Notes and Reviews have been taken into consideration. Citations have been counted for a three-year citation window beginning with the publication year, that is, for the 1998-2000 period. The distribution by fields is more balanced than it was in the case of the schemes comprising five and eight fields, respectively, previously used at ISSRU, Budapest. Nevertheless, Chemistry is the largest field in terms of publication output, followed by Physics and the the two clinical and experimental medicine fields. The smallest ones are Mathematics, Neuroscience & Behavior and Geosciences & Space Sciences. From the viewpoint of citations, the field Biosciences (General, Cellular & Subcellular Biology; Genetics) receives the lion’s share, followed by two Clinical and Experimental Medicine fields and the natural science fields, Chemistry and Physics. This is in part a conseques of the known fieldbiasses in scientific communication. A breakdown by second-level categories has been made to visualise the distribution of publication output and citation impact over subfields within major fields. Table 3 gives an insight into the weight and influence the individual subfields have on the field total. The disciplinary citation impact ranges between 0.68 for H2 (Pure Mathematics) to 10.14 for B2 (Cell Biology). 362 Scientometrics 56 (2003) W. GLÄNZEL, A. SCHUBERT: A new classification scheme of science fields Table 3. Percentage shares of subfields in the main fields and their citation impact (Publications: 1998, Citation window: 1998-2000) FIELD Agriculture & Environment Biology Biosciences Biomedical Research Clinical and Experimental Medicine I Clinical and Experimental Medicine II Neuroscience & Behavior Chemistry Scientometrics 56 (2003) Subfield A1 A2 A3 A4 Z1 Z2 Z3 Z4 Z5 Z6 B0 B1 B2 B3 R1 R2 R3 R4 R5 I1 I2 I3 I4 I5 M1 M2 M3 M4 M5 M6 M7 M8 M9 N1 N2 C0 C1 C2 C3 C4 C5 C6 Share of subfield in the field total Publiactions Citations 8.6% 23.6% 38.3% 34.9% 15.7% 10.3% 41.0% 18.4% 10.2% 11.0% 7.4% 66.9% 21.1% 24.3% 15.3% 7.2% 16.5% 47.5% 17.3% 20.4% 11.1% 27.3% 25.4% 20.3% 14.9% 4.4% 11.2% 7.6% 24.9% 15.1% 9.4% 4.3% 19.3% 87.3% 21.5% 14.3% 21.9% 11.7% 15.4% 22.3% 6.6% 23.6% 7.4% 18.6% 46.5% 33.2% 9.7% 6.7% 57.1% 17.2% 9.2% 4.9% 3.7% 72.3% 32.0% 23.5% 13.6% 4.0% 29.5% 41.2% 14.5% 17.7% 11.9% 20.0% 31.8% 25.7% 11.7% 2.5% 11.0% 5.5% 28.9% 20.2% 9.0% 3.7% 15.3% 93.8% 11.1% 21.8% 23.8% 6.8% 20.8% 21.7% 6.6% 14.0% Subfield Impact 1.55 1.42 2.20 1.72 1.96 2.07 4.42 2.97 2.85 1.40 3.34 7.23 10.14 6.47 3.17 1.97 6.39 3.10 2.99 3.98 4.89 3.36 5.73 5.81 2.03 1.49 2.52 1.88 3.00 3.45 2.45 2.21 2.04 5.58 2.67 4.06 2.88 1.54 3.59 2.58 2.64 1.58 363 W. GLÄNZEL, A. SCHUBERT: A new classification scheme of science fields Table 3. (cont.) FIELD Physics Geosciences & Space Sciences Engineering Mathematics Subfield P0 P1 P2 P3 P4 P5 P6 G1 G2 G3 G4 G5 E1 E2 E3 E4 H1 H2 Share of subfield in the field total Publiactions Citations 17.5% 29.0% 10.9% 17.2% 6.2% 9.9% 28.3% 33.8% 50.4% 16.3% 25.2% 9.2% 26.6% 42.8% 22.6% 22.8% 68.1% 48.8% 22.8% 22.5% 14.5% 12.1% 5.7% 12.1% 24.3% 54.0% 41.1% 17.5% 23.0% 3.5% 25.8% 49.2% 21.4% 16.9% 74.0% 35.0% Subfield Impact 3.70 2.20 3.79 2.00 2.58 3.50 2.44 5.16 2.63 3.46 2.94 1.23 1.11 1.31 1.08 0.84 1.03 0.68 Step 3 – The “scientometric” approach (article classification) All papers published in journals not assignable to ‘well-defined’ subject categories have to be assigned individually, i. e., paper by paper. Two levels can be distinguished, first the Multidisciplinary Science journals like Nature, Science, PNAS US and, second, the general journals not specialised to any particular subject within one broader field, for instance, the chemistry journals Journal of the American Chemical Society (JACS) and Angewandte Chemie – International Edition. Among the possible approaches to solve this problem, we just mention the method of delimiting subfields on the basis of the analysis of cognitive words from the address field proposed by de Bruin and Moed (1993) and the method of analysing the reference literature proposed by Glänzel et al. (1999a, b). The ‘scientometric approach’ applied here is based on the methodology of reference analysis according to Glänzel et al. (1999a, b). Tables 4 and 5 presents examples for identified papers published in Nature and Science. As already mentioned in the paper by Glänzel et al. (1999a), a considerable number of papers (mainly papers without specific references and without institutional addresses) published in the two multidisciplinary journals Nature and Science could be considered scientific journalism 364 Scientometrics 56 (2003) W. GLÄNZEL, A. SCHUBERT: A new classification scheme of science fields rather than original reports on scientific research. Nevertheless, ISI usually regards these papers as scientific articles. Such papers might practically be excluded from scientometric analyses. Table 4. Example for identified papers published in Nature (2000, Vol. 408) (Fi (i = 1, 2, 3, 4) – subject codes with rank i by frequency, % –frequency in per cent) F1 % F2 % F3 % F4 % SCI Refs. 1st author 1st page Title P3 36.4 P4 27.3 P6 27.3 – [small] 11 Zhang J 835 Flexible filaments in a flowing soap film as a model for onedimensional flag in a two-dimensional wind Z2 57.9 Z4 21.1 Z5 15.8 – [small] 19 Salih A 850 Fluorescent pigments in corals are photoprotective 19.0 21 Stuphorn V 857 Performance monitoring by the supplementary eye field N1 61.9 N2 23.8 M6 19.0 M7 Table 5. Example for identified papers published in Science (2001, Vol. 294) (Fi (i = 1, 2, 3, 4) – subject codes with rank i by frequency, % – frequency in per cent) F1 F2 % F3 % 1st author 1st page F4 % SCI Refs. 15.4 Z3 15.4 B0 15.4 – – 13 d’Aignaux JN 1729 Predictability of the UK variant Creutzfeldt-Jacob disease epidemic P0 33.3 P6 25.0 P1 16.7 – – 12 Matsuda T 2136 Oscillating rows of vortices in superconductors 30.8 26 Smith DE 2141 Seasonal variations of snow depth on Mars I3 % G1 53.8 G2 53.8 G3 30.8 G4 Title The following examples are concerned with the individual assignement of papers published in ‘general’ chemistry journals in 1993. In particular, the American journal JACS and the German journal Angewandte Chemie – International Edition have been chosen. Figures 2 and 3 show the results. Although the two journals have had similar profiles in 1993, there are some differences that should be discussed. The largest share Scientometrics 56 (2003) 365 W. GLÄNZEL, A. SCHUBERT: A new classification scheme of science fields of both journals (30% and 36%, respectively) is devoted to Organic & Medicinal Chemistry (C3). The assignment of a relatively great share of papers to Multidisciplinary Chemistry (C0) is due to journal self-citations. About 14% of the papers published in JACS in the year under study is devoted to the subfield of Biochemistry/Biophysics/Molecular Biology (B1), whereas about the same share of papers published in Angewandte Chemie could be assigned to Analytical, Inorganic & Nuclear Chemistry (C1). Figure 2. Example for identified papers published in JACS (1993) Figure 3. Example for identified papers published in Angewandte Chemie – International Edition (1993) 366 Scientometrics 56 (2003) W. GLÄNZEL, A. SCHUBERT: A new classification scheme of science fields The assignment of papers in both journals to Physics shows that publications can well be assigned to other fields although the journal is a typical chemistry journal. This illustrates that research has become increasingly interdisciplinary. The share of unidentified papers amounts to 6.4% (JACS) and 13.9% (Angewandte Chemie). For these papers, the assignment to the category Multidisciplinary Chemistry (C0) seems to be justified. However, the two examples show that the majority of the papers can be individually assigned to ‘well-defined’ second-level categories. Conclusions Beyond the standard use of the classification scheme like the determination of publication profiles for institutions or countries, or the calculation of reference standards for relative citation indicators the profiling of authors and research groups is a further important application. Given the results of article classification, the disciplinary affiliation of their authors can be determined, either individually or by group. The authors’ activity is often not limited to a single subfield, it usually covers a range of subfields with varying weights and their field/subfield profile can be constructed. Such profiles are of primary importance in scientometric evaluation, since standards of scientometric indicators can be set only within subfields, therefore it is only the activity profile that can be accompanied by matching profiles of indicators like, e.g., impact measures, citation rates or reference age. References DE BRUIN, R. E., H. F. MOED, Delimitation of scientific subfields using cognitive words from corporate addresses in scientific publications, Scientometrics, 26 (1993) 65–80. GLÄNZEL, W., A. SCHUBERT, H. J. CZERWON, An item-by-item subject classification of papers published in multidisciplinary and general journals using reference analysis, Scientometrics, 44 (1999) 427–439. GLÄNZEL, W., A. SCHUBERT, U. SCHOEPFLIN, H. J. CZERWON, An item-by-item subject classification of papers published in journals covered by the SSCI database using reference analysis, Scientometrics, 46 (1999) 431–441. NARIN, F., Evaluative Scientometrics: The Use of Publication and Citation Analysis in the Evaluation of Scientific Activity, Computer Horizons, Inc., Washington, D.C., 1976. Scientometrics 56 (2003) 367