Method by Collaboration of Canada with low/middle income economies and the Republic of Korea: Scientometric Profile in Health Biotechnology March 31, 2006 By David Campbell, M.Sc. Grégoire Côté, B.Sc. Presented to the University of Toronto – Joint Center for Bioethics 514.495.6505 4572 avenue de Lorimier Montréal € Québec Canada H2H 2B5 info@science-metrix.com www.science-metrix.com Constitution of datasets This scientometric study is based on the use of the Thomson ISI Science Citation Index Expanded database (SCI Expanded), which contains papers from more than 6,000 journals1. These journals are considered to be the most important peer-reviewed journals in their respective fields. They reflect significant scientific achievements and are the most widely cited journals in the world (more than 80% of the world’s citations). The statistics are drawn from four types of documents that are considered to be original contributions to scientific knowledge: articles, notes, reviews, and conference proceedings. The tables presented in this report refer to these four types of documents as "papers." This scientometric analysis is based on a subset of papers from the SCI Expanded database that fall within the domain of health biotechnology. The construction of the health biotechnology dataset consisted of two steps. First, biotechnology papers were retrieved from the SCI Expanded database using keyword-in-title searches to specifically retrieve biotechnology papers. The keywords were selected as follows: firstly, papers were randomly selected from journals specializing in biotechnology; keywords and keyword combinations were then chosen from the titles of these papers in order to retrieve other papers in the field of biotechnology. Secondly, CHI Research Inc. classification was used to retain only biotechnology papers published in journals classified in any of the three following fields of science: biomedical research, clinical medicine, and the health sciences. Papers published in the 1993-2004 period have been retained. For the whole dataset, addresses from papers were standardized according to country. As the focus of the study is the analysis of scientific collaboration between Canada and developing countries, countries were classified as high income economies or low-middle income economies (i.e., developing countries) using the World Bank classification of countries, which is based on gross national income (GNI) per capita.2 Ex-Warsaw countries, given their historical context, were voluntarily removed from low-middle income economies. Addresses from the papers that resulted from collaboration between Canada and one of the top 15 developing countries with which Canada collaborates in health biotechnology (i.e., health biotechnology papers that are co-authored by researcher[s] with address[es] in Canada and researcher[s] with address[es] from a developing country) were further standardized according to city and institution and then classified into four main sectors (university, government, clinics and hospitals, and company) and a residual category (other). This procedure allowed for the identification of the most active cities, sectors, institutions, and researchers from Canada and developing countries that collaborate in health biotechnology. Papers in health biotechnology were also presented by field and subfield (CHI Research Inc. classification). The main caveat in the construction of the dataset results from the use of the CHI Research Inc. classification of journals in order to retain health-related papers from the biotechnology 1 Data derived from information prepared by the Institute for Scientific Information, Inc. (ISI, Philadelphia, Pennsylvania, USA). Copyright Institute for Scientific Information. All rights reserved. 2 http://www.worldbank.org/ 1 dataset. This procedure led to the inclusion of false positives in the health biotechnology dataset (i.e., including biotechnology papers that are not health-related). This is because journals classified in biomedical research, clinical medicine, and the health sciences sometimes publish papers that are not health-related (e.g., Science, Nature). Although this method is imperfect, biases resulting from false positives occur in all countries such that the dataset provides a strong foundation for the comparison of scientific output and analysis of collaboration at the country level. However, a more in-depth analysis (at the city, sector, institution, and researcher levels) of collaboration in health biotechnology between two countries requires that false positives be removed from the dataset. As such, false positives among papers that resulted from collaboration between Canada and the top 15 developing countries with which Canada collaborates in health biotechnology have been removed manually. Scientometric indicators Statistics were produced based on the following indicators: Number of papers: Number of scientific papers written by authors associated with geographic areas (e.g., countries and cities), sectors, or institutions. International collaboration: Number of scientific papers that are co-authored by researcher(s) with address(es) from country A and researcher(s) with address(es) from Country B. At the world level, international collaboration is the number of papers with at least two addresses from different countries. Collaboration rate: This is an indicator of the relative importance of international collaboration in a country. The indicator is calculated by dividing the country’s number of papers written in collaboration with other countries in health biotechnology by the country's total number of papers in health biotechnology. Average of relative impact factors (ARIF): This indicator is a proxy for the quality of the journals in which an entity publishes. Each journal has an impact factor (IF) which is calculated annually by Thomson Scientific based on the number of citations it received relative to the number of papers it published. The IF of papers is calculated by ascribing to them the IF of the journals in which they are published. Subsequently, to account for different citation patterns across fields and subfields of science (e.g., there are more citations in biomedical research than mathematics), each paper’s IF is divided by the average IF of the papers in its subfield to obtain a Relative Impact Factor (RIF). For the ARIF of a country’s papers, the average IF by subfield is based on papers within the health biotechnology dataset. For the ARIF of a country’s collaboration, the average IF by subfield is based on international collaboration within the health biotechnology dataset. The ARIF of a given entity is computed using the average RIF of each paper belonging to it. When the ARIF is above 1, it means that an entity scores better than the world average; when it is below 1, this means that on average, an entity publishes in journals that are not cited as often as the world average. Specialization index (SI): This is an indicator of the intensity of research of a given geographic or organizational entity (e.g., a country) in a given research area (domain, field, 2 subfield) relative to the intensity of the reference entity (e.g., the world) in the same research area. The SI can be formulated as follows: SI X S /X T N S /N T Where, XS = Papers from entity X in a given research area (e.g., Canada in health biotechnology) XT = Papers from entity X in a reference set of papers (e.g., Canada in SCI Expanded) NS = Papers from the reference entity N in a given research area (e.g., world in health biotechnology) NT = Papers from the reference entity N in a reference set of papers (e.g., world in SCI Expanded). When XS and NS represent an entity in a field or a subfield in health biotechnology, XT and NT represent an entity in health biotechnology instead of an entity in SCI Expanded. An index value above 1 means that a given entity is specialized relative to the reference entity, while an index value below 1 means the opposite. In this study, the SI has also been applied as an indicator of the intensity of collaboration of Canada with other countries in a field or subfield of health biotechnology relative to the intensity of collaboration of Canada with the world in the same research area. An index value above 1 means that Canada specialized its collaborations with a country in a field or subfield of health biotechnology relative to its collaborations with the world, while an index value below 1 means the opposite. Here, XS = Papers that resulted from collaboration between Canada and country X in a given field or subfield (e.g., Canada-Brazil in biomedical research) XT = Papers that resulted from collaboration between Canada and country X in a reference set of collaborations (e.g., Canada-Brazil in health biotechnology) NS = Papers that resulted from collaboration between Canada and the world in a given field or subfield (e.g., Canada-world in health biotechnology) NT = Papers that resulted from collaboration between Canada and the world in a reference set of collaborations (e.g., Canada-world in health biotechnology). Preference index (PI): This is an indicator of the intensity of scientific collaboration between two countries. It is based on the number of bilateral collaborations of countries and not on the number of international collaborations of countries. The distinction here is important. One paper in international collaboration equals one bilateral collaboration only when the paper is co-authored by researchers from only two countries (e.g., a paper co-authored by a Canadian and a US researcher is equal to one bilateral collaboration between Canada and the US). However, when a paper is co-authored by researchers from more than two countries, there is more than one bilateral collaboration. Indeed, a paper co-authored by a Canadian, an Italian, 3 and a US researcher is equal to three bilateral collaborations: one between Canada and Italy, one between Canada and the US, and one between Italy and the US. In this case, Canada would therefore have two bilateral collaborations, but only one paper in international collaboration. The indicator compares the observed number of bilateral collaborations between two countries with the number expected, given their individual share of world bilateral collaborations. The index is computed as follows: PI NO NE Where, NO = Observed number of bilateral collaborations between country x and country y NE = Expected number of bilateral collaborations between country x and country y. The expected number is calculated using the probability of having bilateral collaborations between the two countries given their individual number of bilateral collaborations relative to the total number of bilateral collaborations in the world. The calculation of the expected probability assumes that collaboration must involve two distinct countries and so corrects for the null diagonal in the collaboration matrix. An index value above 1 means that country x and country y collaborate more than expected, whereas an index value below 1 means the opposite. 4