Application of bibliometric analysis

advertisement
Application of bibliometric
analysis
Advantages & pitfalls
Thed van Leeuwen
Workshop on Research Evaluation in Statistical Sciences ,
Bologna, 25th March 2010
Introduction of bibliometrics
• Bibliometrics can be defined as the quantitative analysis of science and
technology performance and the cognitive and organizational structure of
science and technology.
• Basic for these analyses is the scientific communication between scientists
through (mainly) journal publications.
• Key concepts in bibliometrics are output and impact, as measured through
publications and citations.
• Important starting point in bibliometrics: scientists express, through
citations in their scientific publications, a certain degree of influence of
others on their own work.
• By large scale quantification, citations indicate influence or (inter)national
visibility of scientific activity, but should not be interpreted as synonym
for ‘quality’.
CWTS data system
• CWTS has a full bibliometric license from Thomson
Reuters Scientific to conduct evaluation studies
using the Web of Science.
• Our database covers the period 1981-2009.
• Some characteristics:
–
–
–
–
Over 31.000.000 publications.
Over 350.000.000 citation relations between source papers.
100.000.000 authors (incl. variations), 15.000.000 ‘unique’ names.
Over 60.000.000 addresses, some 90% cleaned up over the last 10 years.
– Contains reference sets for journal and field citation data.
Bibliometric indicators
produced by CWTS
Some basic indicators are …
• P: number of publications in journals processed for the
Web of Science.
• C: number of received citations, excl. self-citations.
• CPP: mean number of citations per publication, excl. self-
citations
• Pnc: percentage of the publications not cited (within a
certain time-frame !!!)
• % SC: percentage self-citations related to an output set.
Important indicators are…
• CPP/JCSm: ratio between real, actual impact,
and mean journal impact.
• CPP/FCSm: ratio between real, actual impact,
and mean field impact.
• JCSm/FCSm: ratio between journal impact,
and field impact, indicative for the ‘quality’ of
the journal package in the field
Various types of analysis focus on …
• Research profiles: a break down of the output over
various fields of science.
• Scientific cooperation analysis: a break down of the
output over various types of scientific collaboration.
• Knowledge user analysis: a break down of the
‘responding’ output into citing fields, countries or
institutions.
• Highly cited paper analysis: which publications are
among the most highly cited output (top 10%, 5%, 1%) of
the global literature in that same field(s).
• Social network analysis: how is the network of partners
composed, based on scientific cooperation.
Journal & Field Normalization
Calculating the JCSm & FCSm
---------------------------------------------------------------------------------------------Type
publ.
Journal
year
Journal
category
# citations
until 1999
---------------------------------------------------------------------------------------------I
review
1996
II
note1997
III
article
1999
J CLIN END
Endocrinology
6
IV
article
1999
J CLIN END
Endocrinology
8
CANCER RES
J CLIN END
Oncology
Endocrinology
17
4
----------------------------------------------------------------------------------------------
Calculating the JCSm & FCSm 2
-----------------------------------------------------------------
CPP
JCS
FCS
-----------------------------------------------------------------
I
17
16.9
23.7
II
4
3.1
3.0
III
6
4.8
4.1
IV
8
4.8
4.1
-----------------------------------------------------------------
Calculating the JCSm & FCSm 3
The mean citation score is determined as:
17 + 4 + 6 + 8
CPP = ------------------ = 8.8
1+1+1+1
The mean journal citation score as:
(1 x 16.9) + (1 x 3.1) + (2 x 4.8)
JCSm = -------------------------------------- = 7.4
1+1+2
The mean field citation score as:
(1 x 23.7) + (1 x 3.0) + (2 x 4.1)
FCSm = -------------------------------------- = 8.7
1+1+2
CPP / JCSm
(8.8 / 7.4) = 1.19
CPP / FCSm
(8.8 / 8.7) = 1.01
Citation Windows
& Impact Measurement
Citation measurement and ‘windows’
• Publication years, fixed citation ‘window’.
Publications of 2002, with three citation years (namely 2002,
2003, and 2004), followed by 2003, with three years, etc.
• Blocks of publication years with a window decreasing in
length.
Publications of 2002-2005, with citation window of 4 years
(2002-2005), 3 years (2003-2005), 2 years (2004-2005), and 1
year (2005).
Citation measurement with ‘fixed window’
Citation years
2002
2003
2004
2005
2006
2007
2008
2009
2002
2003
2004
2005
2006
2007
2008
2009
2002
2003
2004
2003
2004
2005
2004
2005
2006
2005
2006
2007
2006
2007
2008
2007
2008
2009
2008
2009
2009
Citation measurement with ‘year blocks’
Citation years
2002
2003
2004
2005
2006
2007
2008
2009
2002
2003
2004
2005
2002
2003
2003
2003
2004
2004
2004
2004
2004
2005
2005
2005
2005
2005
2005
2005
2005
2005
2006
2006
2006
2006
2006
2006
2006
2006
2007
2007
2007
2007
2007
2007
2007
2007
2007
2007
2008
2008
2008
2008
2008
2008
2008
2008
2009
2009
2009
2009
2009
Methodological issues
Adequacy of citation indexes :
implications for bibliometric studies
How to tackle this issue ?
• We conduct analyses on the adequacy of the
citation indexes across disciplines based on
reference behavior of researchers themselves.
• The degree of referring towards other indexed
literature indicates the importance of journal
literature in the scientific communication
process.
Assessment of WoS Coverage
Citing/Source
NonWoS
WoS
Non-Wos
Journals
Books
?%
?%
Conference
proceedings
Reports
Cited/Target NonWoS
WoS
Etc.
Total ISI/WoS Database (2002)
Citing/Source
NonWoS
25%
Cited/Target NonWoS
WoS
75%
WoS
The medical & Life sciences
100%
Ref erences non-ISI
Ref erences ISI
90%
80%
70%
60%
50%
40%
30%
20%
10%
AGRICULTURE
AND FOOD
SCIENCE
BASIC LIFE
SCIENCES
BASIC MEDICAL
SCIENCES
BIOLOGICAL
SCIENCES
BIOMEDICAL
SCIENCES
CLINICAL
MEDICINE
HEALTH
SCIENCES
2006
2001
1996
1991
2006
2001
1996
1991
2006
2001
1996
1991
2006
2001
1996
1991
2006
2001
1996
1991
2006
2001
1996
1991
2006
2001
1996
1991
0%
The natural sciences
100%
Ref erences non-ISI
Ref erences ISI
90%
80%
70%
60%
50%
40%
30%
20%
10%
ASTRONOMY
CHEMISTRY
AND
AND
ASTROPHYSICS CHEMICAL
ENGINEERING
COMPUTER
SCIENCES
EARTH
ENVIRONMENTALMATHEMATICS PHYSICS AND
SCIENCES SCIENCES AND
MATERIALS
AND
TECHNOLOGY
SCIENCE
TECHNOLOGY
2006
2001
1996
1991
2006
2001
1996
1991
2006
2001
1996
1991
2006
2001
1996
1991
2006
2001
1996
1991
2006
2001
1996
1991
2006
2001
1996
1991
2006
2001
1996
1991
0%
STATISTICAL
SCIENCES
Statistical sciences
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
1991
1996
References ISI
References non-ISI
2001
2006
The engineering sciences
100%
Ref erences non-ISI
90%
Ref erences ISI
80%
70%
60%
50%
40%
30%
20%
10%
0%
1991 1996 2001 2006 1991 1996 2001 2006 1991 1996 2001 2006 1991 1996 2001 2006 1991 1996 2001 2006 1991 1996 2001 2006
CIVIL ENGINEERING
ELECTRICAL
ENERGY SCIENCE
AND
ENGINEERING AND AND TECHNOLOGY
CONSTRUCTION TELECOMMUNICATION
GENERAL AND
INDUSTRIAL
ENGINEERING
INSTRUMENTS AND
MECHANICAL
INSTRUMENTATION ENGINEERING AND
AEROSPACE
The social– and behavioral sciences
100%
Ref erences non-ISI
Ref erences ISI
90%
80%
70%
60%
50%
40%
30%
20%
10%
2006
2001
1996
1991
2006
2001
1996
1991
2006
2001
1996
1991
2006
2001
1996
1991
2006
2001
1996
1991
2006
2001
1996
1991
2006
2001
1996
1991
0%
ECONOMICS EDUCATIONAL MANAGEMENT
POLITICAL
PSYCHOLOGY
SOCIAL AND SOCIOLOGY AND
AND BUSINESS
SCIENCES
AND PLANNING SCIENCE AND
BEHAVIORAL ANTHROPOLOGY
PUBLIC
SCIENCES,
ADMINISTRATION
INTERDISCIPLINARY
The humanities
100%
References non-ISI
References ISI
90%
80%
70%
60%
50%
40%
30%
20%
10%
INFORMATION AND
COMMUNICATION
SCIENCES
LANGUAGE AND
LINGUISTICS
CREATIVE ARTS,
CULTURE AND
MUSIC
HISTORY,
PHILOSOPHY AND
RELIGION
LAW AND
CRIMINOLOGY
LITERATURE
2006
2001
1996
1991
2006
2001
1996
1991
2006
2001
1996
1991
2006
2001
1996
1991
2006
2001
1996
1991
2006
2001
1996
1991
0%
Overall WoS coverage by main field
EXCELLENT (> 80%)
VERY GOOD (60-80%)
GOOD(40-60%)
Biochem & Mol Biol
Appl Phys & Chem
Mathematics &
Statistical sciences
Biol Sci – Humans
Biol Sci – Anim & Plants
Economics
Chemistry
Psychol & Psychiat
Engineering
Clin Medicine
Geosciences
MODERATE (<40 %)
Phys & Astron
Soc Sci ~ Medicine
Other Soc Sci
Humanities & Arts
Conclusions on adequacy issue
• We can clearly conclude that the application of
bibliometric techniques, solely based on WoS
(but very likely also Scopus) will not be valid for
some of the ‘soft’ fields in the social sciences
and the humanities.
• That is why the tool box has to be extended !
The H-Index and its limitations
The H-Index, defined as …
• The H-Index is the score that indicates the
position at which a publication in a set, the
number of received citations is equal to the
ranking position of that publication.
• Idea of an American physicist, J. Hirsch,
who published about this index in the Proc.
NAS USA.
Examples of Hirsch-index values
350
• Environmental biologist, output
of 188 papers, cited 4,788 times
in the period 80-04.
300
250
200
Citations
150
• Hirsch-index value of 31
Value of H-Index= 31
100
50
0
0
20
40
60
80
100
120
140
160
180
200
Publications
80
• Clinical psychologist, output of
72 papers, cited 760 time sin the
period 80-04.
70
60
50
Citations
40
• Hirsch-index value of 14
30
Value of H-Index= 14
20
10
0
0
10
20
30
40
Publications
50
60
70
80
Problems with the H-Index
• For serious evaluation of scientific
performance, the H-Index is as indicator
not suitable, as the index:
– Is insensitive to field specific characteristics (e.g.,
difference in citation cultures between medicine and
other disciplines).
– Does not take into account age and career length of
scientists, a small oeuvre leads necessarily to a low
H-Index value.
– Is inconsistent in its ‘behaviour’.
7.00
• Actual versus field
normalized impact
(CPP/FCSm)
displayed against
the output.
6.00
Phy
5.00
CPP/FCSm
4.00
Phy
Soc
Med
Psy
Med
3.00
Eng
Med
Eng
Soc
Med
2.00
Hum
Mat
Che
Psy
Phy
Env
Bio
Psy
Che
Bio
Med
Bio
Phy
Med
Med
1.00
0.00
0
50
100
150
TOTAL PUBLICATIONS
200
250
• Large output can
be combined with
a relatively low
impact
60
• H-Index displayed
against the output.
Med
50
Med
40
H-index
Bio
Med
Phy
30
Env
Phy
Psy
Bio Bio
Med
Med
Med Che
Psy
Che
Phy
20
Med
Eng
Soc
Mat
Hum
Soc
10
Phy
Eng
Psy
0
0
50
100
150
TOTAL PUBLICATIONS
200
250
• Larger output is
strongly correlated
with a high HIndex value.
Consistency: Definition
Definition. A scientific performance measure is
said to be consistent if and only if for any two
actors A and B and for any number n ≥ 0 the
ranking of A and B given by the performance
measure does not change when A and B both
have a new publication with n citations.
35
Consistency: Motivation
• Consistency ensures that if the publishing
behavior of two actors does not change over
time, their ranking relative to each other also
does not change
• Consistency ensures that if the individual
researchers in one research group X outperform
the individual researchers in another research
group Y, the former research group X as a whole
outperforms the latter research group Y.
36
Inconsistency of the h-index
9
8
8
7
7
6
6
5
h=4
4
citations
9
5
3
2
2
1
1
0
2
4
6
8
publications
10
0
12
9
9
8
8
7
7
6
6
5
h=8
4
2
2
1
1
2
4
6
8
publications
10
12
2
4
6
8
publications
10
12
h=6
4
3
0
0
5
3
0
h=6
4
3
0
citations
Actor B
citations
citations
Actor A
0
0
2
4
6
8
publications
10
12
37
ISI Impact Factors:
calculation and validity
Methodology: ISI’s classical IF
• The ISI Impact Factor (IF) is defined as the
number of citations received by a journal in year
t, divided by the number of citeable documents
in that same journal in the years t-1 and t-2,
• Or, as a
Formula:
Citations in year t
Number of ‘citeable
documents’ in t-1 & t-2
Share ‘citations-for-free’ for The Lancet
Publications
Citations
90+91
1992
Article
784
2986
Note
144
593
29
232
Review
Sub-total
Letter
957 (a)
4181 (d)
7959 (b)
4264 (e)
Editorial
1313
905
Other
1421
909
Total
7872
14037 (c)
• ISI Method:
Citations in 2000
.
Citeable documents in ‘98 and ‘99
14037
957
(c)
(a)
IF=14.7
• CWTS Method:
Citations to Art/Not/Rev in 2000
Art/Not/Rev in ‘98 and ‘99
7959
957
(b)
(a)
.
IF=8.3
Citations to Art/Let/Not/Rev in 2000 .
Art/Let/Not/Rev in ‘98 and ‘99
7959+4264
957+4181
(b+e)
(a+d)
IF=2.4
ISI Impact Factors
• From 1995 onwards CWTS has analyzed the uses
and validity ISI Journal Impact Factor (IF).
• Most important points of criticism were:
– Calculated erroneously.
– Not sensitive for the composition of the
journal in terms of the document types.
– Not sensitive for the science fields a journal
is attached to …
– Based on too short ‘citation windows’.
Distribution of citations used for the calculationof the IF value of The Lancet
• The red area indicates citations ‘for free’,
while the blue area indicates ‘correct
citations’
100%
90%
80%
70%
60%
50%
• The IF-score of The Lancet is seriously
‘overrated’ by the scientific ‘audience’ of
the journal.
40%
30%
20%
10%
0%
82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98
Impact Factors for Br. J. Clin. Pharm. and Clin. Pharm. & Ther.
4.50
• The graph shows the
correct and erroneous
impact factors of BJCP and
CPT
4.00
3.50
3.00
CPT Err IF
CPT IF
BJCP Err IF
BJCP IF
• In the case of CPT,
citations to published
meeting abstracts are
included, while BJCP
has stopped publishing
of meeting abstracts !
2.50
2.00
1.50
1.00
0.50
0.00
83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98
Document types and fields
Field
Journal
IF
JFIS
IMMUNOLOGY
ANN REV IMMUNOL
50.49
1
5.18
1
BIOCHEM & MOLECULAR BIOL
ANN REV BIOCHEM
34.61
1
4.10
3
PHARMACOL & PHARMACY
PHARMACOLOGICAL REV
27.74
1
4.75
1
CELL BIOL
ANN REV CELL & DEVELOPM BIOL
27.53
1
1.72
13
DEVELOPMENTAL BIOL
ANN REV CELL & DEVELOPM BIOL
27.53
1
1.72
3
PHYSIOLOGY
PHYSIOLOGICAL REV
24.82
1
3.18
1
CELL BIOLOGY
NATURE REV MOL CELL BIOL
22.21
4
2.76
8
ENDOCRINOL & METABOLISM
ENDOCRINE REV
21.98
1
2.87
1
NEUROSCIENCES
ANN REV NEUROSCIENCE
21.89
1
3.12
4
PHYSICS
REV MODERN PHYSICS
20.14
1
5.02
1
CHEMISTRY
CHEMICAL REV
19.67
1
2.89
2
The IF is for ‘02,
JFIS covers ‘98-‘02
Fields and Citation windows
Phsyics
Engineering sciences
Chemistry
0
POLYMER SCIENCE (55)
CHEM, APPLIED (25)
CHEM, CLIN&MEDIC (8)
CHEM, PHYSICAL (78)
CRYSTALLOGRAPHY (18)
ELECTROCHEMISTRY (10)
CHEM, INORG&NUC (37)
BIOCH & MOL BIOL (169)
CHEM, ORGANIC (42)
CHEMISTRY (128)
CHEM, MISCELLAN (7)
CHEM, ANALYTICAL (54)
ENG, INDUSTRIAL (14)
ENG, MANUFACT (5)
ENGINEERING (84)
ENG, BIOMEDICAL (33)
ENG, PETROLEUM (8)
ENG, MECHANIC (69)
ENG, CIVIL (49)
ENG, ENVIRONM (6)
ENG, CHEMICAL (69)
ENG, MARINE (8)
ENG, ELECTRICAL (127)
PHYSICS, MATHEMA (10)
ACOUSTICS (20)
THERMODYNAMICS (11)
PHYSICS, FLUIDS (16)
PHYSICS, MISCELL (6)
PHYSICS, AT,M,C (22)
OPTICS (37)
PHYSICS, APPLIED (49)
PHYSICS, COND MA (36)
PHYSICS (85)
PHYSICS, NUCLEAR (16)
PHYSICS, PART&FI (11)
0.5
1
1.5
2
2.5
3
3.5
4
4.5
Citation measurement of IF
2002
2003
2004
2005
2006
2007
2008
2009
2002
2003
2004
2005
2006
2007
2008
2009
2002
2003
2004
2003
2004
2005
2004
2005
2006
2005
2006
2007
2006
2007
2008
2007
2008
2009
2008
2009
2009
CWTS answer to the problems of the IF
• This indicator is the JFIS, the Journal-to-Field Impact
Score.
• The JFIS solves the main objections against the
Impact Factor, as
– the calculation of JFIS is based on equally large
entities,
– document types are taken into account,
– JFIS is field-normalized, and finally,
– based on longer citation windows (1-4 years)
Citation measurement of JFIS
Citation years
2002
2003
2004
2005
2006
2007
2008
2009
2002
2003
2004
2005
2002
2003
2003
2003
2004
2004
2004
2004
2004
2005
2005
2005
2005
2005
2005
2005
2005
2005
2006
2006
2006
2006
2006
2006
2006
2006
2007
2007
2007
2007
2007
2007
2007
2007
2007
2007
2008
2008
2008
2008
2008
2008
2008
2008
2009
2009
2009
2009
2009
End of the presentation
For questions regarding the contents of the
presentation, mail to: leeuwen@cwts.nl
Download