Method by
Collaboration of Canada with low/middle
income economies and the Republic of
Korea:
Scientometric Profile in Health
Biotechnology
March 31, 2006
By
David Campbell, M.Sc.
Grégoire Côté, B.Sc.
Presented to the
University of Toronto – Joint Center for Bioethics
514.495.6505  4572 avenue de Lorimier  Montréal € Québec  Canada  H2H 2B5
info@science-metrix.com  www.science-metrix.com
Constitution of datasets
This scientometric study is based on the use of the Thomson ISI Science Citation Index
Expanded database (SCI Expanded), which contains papers from more than 6,000 journals1.
These journals are considered to be the most important peer-reviewed journals in their
respective fields. They reflect significant scientific achievements and are the most widely cited
journals in the world (more than 80% of the world’s citations). The statistics are drawn from
four types of documents that are considered to be original contributions to scientific knowledge:
articles, notes, reviews, and conference proceedings. The tables presented in this report refer
to these four types of documents as "papers."
This scientometric analysis is based on a subset of papers from the SCI Expanded database
that fall within the domain of health biotechnology. The construction of the health biotechnology
dataset consisted of two steps. First, biotechnology papers were retrieved from the SCI
Expanded database using keyword-in-title searches to specifically retrieve biotechnology
papers. The keywords were selected as follows: firstly, papers were randomly selected from
journals specializing in biotechnology; keywords and keyword combinations were then chosen
from the titles of these papers in order to retrieve other papers in the field of biotechnology.
Secondly, CHI Research Inc. classification was used to retain only biotechnology papers
published in journals classified in any of the three following fields of science: biomedical
research, clinical medicine, and the health sciences. Papers published in the 1993-2004
period have been retained.
For the whole dataset, addresses from papers were standardized according to country. As the
focus of the study is the analysis of scientific collaboration between Canada and developing
countries, countries were classified as high income economies or low-middle income
economies (i.e., developing countries) using the World Bank classification of countries, which
is based on gross national income (GNI) per capita.2 Ex-Warsaw countries, given their
historical context, were voluntarily removed from low-middle income economies.
Addresses from the papers that resulted from collaboration between Canada and one of the
top 15 developing countries with which Canada collaborates in health biotechnology (i.e.,
health biotechnology papers that are co-authored by researcher[s] with address[es] in Canada
and researcher[s] with address[es] from a developing country) were further standardized
according to city and institution and then classified into four main sectors (university,
government, clinics and hospitals, and company) and a residual category (other). This
procedure allowed for the identification of the most active cities, sectors, institutions, and
researchers from Canada and developing countries that collaborate in health biotechnology.
Papers in health biotechnology were also presented by field and subfield (CHI Research Inc.
classification).
The main caveat in the construction of the dataset results from the use of the CHI Research
Inc. classification of journals in order to retain health-related papers from the biotechnology
1
Data derived from information prepared by the Institute for Scientific Information, Inc. (ISI, Philadelphia,
Pennsylvania, USA). Copyright Institute for Scientific Information. All rights reserved.
2
http://www.worldbank.org/
1
dataset. This procedure led to the inclusion of false positives in the health biotechnology
dataset (i.e., including biotechnology papers that are not health-related). This is because
journals classified in biomedical research, clinical medicine, and the health sciences
sometimes publish papers that are not health-related (e.g., Science, Nature).
Although this method is imperfect, biases resulting from false positives occur in all countries
such that the dataset provides a strong foundation for the comparison of scientific output and
analysis of collaboration at the country level. However, a more in-depth analysis (at the city,
sector, institution, and researcher levels) of collaboration in health biotechnology between two
countries requires that false positives be removed from the dataset. As such, false positives
among papers that resulted from collaboration between Canada and the top 15 developing
countries with which Canada collaborates in health biotechnology have been removed
manually.
Scientometric indicators
Statistics were produced based on the following indicators:
Number of papers: Number of scientific papers written by authors associated with geographic
areas (e.g., countries and cities), sectors, or institutions.
International collaboration: Number of scientific papers that are co-authored by
researcher(s) with address(es) from country A and researcher(s) with address(es) from
Country B. At the world level, international collaboration is the number of papers with at least
two addresses from different countries.
Collaboration rate: This is an indicator of the relative importance of international collaboration
in a country. The indicator is calculated by dividing the country’s number of papers written in
collaboration with other countries in health biotechnology by the country's total number of
papers in health biotechnology.
Average of relative impact factors (ARIF): This indicator is a proxy for the quality of the
journals in which an entity publishes. Each journal has an impact factor (IF) which is calculated
annually by Thomson Scientific based on the number of citations it received relative to the
number of papers it published. The IF of papers is calculated by ascribing to them the IF of the
journals in which they are published. Subsequently, to account for different citation patterns
across fields and subfields of science (e.g., there are more citations in biomedical research
than mathematics), each paper’s IF is divided by the average IF of the papers in its subfield to
obtain a Relative Impact Factor (RIF). For the ARIF of a country’s papers, the average IF by
subfield is based on papers within the health biotechnology dataset. For the ARIF of a
country’s collaboration, the average IF by subfield is based on international collaboration within
the health biotechnology dataset. The ARIF of a given entity is computed using the average
RIF of each paper belonging to it. When the ARIF is above 1, it means that an entity scores
better than the world average; when it is below 1, this means that on average, an entity
publishes in journals that are not cited as often as the world average.
Specialization index (SI): This is an indicator of the intensity of research of a given
geographic or organizational entity (e.g., a country) in a given research area (domain, field,
2
subfield) relative to the intensity of the reference entity (e.g., the world) in the same research
area. The SI can be formulated as follows:
SI 
X S /X T 
N S /N T 
Where,
XS = Papers from entity X in a given research area (e.g., Canada in health biotechnology)
XT = Papers from entity X in a reference set of papers (e.g., Canada in SCI Expanded)
NS = Papers from the reference entity N in a given research area (e.g., world in health
biotechnology)
NT = Papers from the reference entity N in a reference set of papers (e.g., world in SCI
Expanded).
When XS and NS represent an entity in a field or a subfield in health biotechnology, XT and NT
represent an entity in health biotechnology instead of an entity in SCI Expanded. An index
value above 1 means that a given entity is specialized relative to the reference entity, while an
index value below 1 means the opposite.
In this study, the SI has also been applied as an indicator of the intensity of collaboration of
Canada with other countries in a field or subfield of health biotechnology relative to the
intensity of collaboration of Canada with the world in the same research area. An index value
above 1 means that Canada specialized its collaborations with a country in a field or subfield of
health biotechnology relative to its collaborations with the world, while an index value below 1
means the opposite.
Here,
XS = Papers that resulted from collaboration between Canada and country X in a given field or
subfield (e.g., Canada-Brazil in biomedical research)
XT = Papers that resulted from collaboration between Canada and country X in a reference set
of collaborations (e.g., Canada-Brazil in health biotechnology)
NS = Papers that resulted from collaboration between Canada and the world in a given field or
subfield (e.g., Canada-world in health biotechnology)
NT = Papers that resulted from collaboration between Canada and the world in a reference set
of collaborations (e.g., Canada-world in health biotechnology).
Preference index (PI): This is an indicator of the intensity of scientific collaboration between
two countries. It is based on the number of bilateral collaborations of countries and not on the
number of international collaborations of countries. The distinction here is important. One
paper in international collaboration equals one bilateral collaboration only when the paper is
co-authored by researchers from only two countries (e.g., a paper co-authored by a Canadian
and a US researcher is equal to one bilateral collaboration between Canada and the US).
However, when a paper is co-authored by researchers from more than two countries, there is
more than one bilateral collaboration. Indeed, a paper co-authored by a Canadian, an Italian,
3
and a US researcher is equal to three bilateral collaborations: one between Canada and Italy,
one between Canada and the US, and one between Italy and the US. In this case, Canada
would therefore have two bilateral collaborations, but only one paper in international
collaboration.
The indicator compares the observed number of bilateral collaborations between two countries
with the number expected, given their individual share of world bilateral collaborations. The
index is computed as follows:
PI 
NO
NE
Where,
NO = Observed number of bilateral collaborations between country x and country y
NE = Expected number of bilateral collaborations between country x and country y. The
expected number is calculated using the probability of having bilateral collaborations
between the two countries given their individual number of bilateral collaborations relative
to the total number of bilateral collaborations in the world. The calculation of the expected
probability assumes that collaboration must involve two distinct countries and so corrects
for the null diagonal in the collaboration matrix.
An index value above 1 means that country x and country y collaborate more than expected,
whereas an index value below 1 means the opposite.
4