H-Index Workshop

advertisement
‫کارگاه آشنايی با مبانی و شاخص‌های علم‌سنجی‪:‬‬
‫‪ H-Index‬و محاسبه آن‬
‫برگزار کننده‌گان‪:‬‬
‫معاونت تحقيقات و فن‌آوری وزارت بهداشت‪ ،‬درمان و آموزش‬
‫پزشکی‪ ،‬با همکاری دانشگاه علوم پزشکی تهران‬
:‫آشنايی با مبانی و شاخص‌های علم‌سنجی‬
‫ و محاسبه آن‬H-Index
Payam Kabiri, MD. PhD.
Epidemiologist
Tehran & Isfahan
Universities of Medical Sciences
‫برنامه امروز !‬
‫‪ ‬اطالع‌سنجی و علم‌سنجی‪ ،‬مفاهيم و کاربرد‬
‫‪ ‬معرفی خالصه شاخص‌های علم‌سنجی‬
‫‪ ‬آشنايی با نمايه‌نامه‌های استنادی‬
‫‪ ‬معرفی شاخص ‪H-Index‬‬
‫‪ ‬مزايا و معايب ‪H-Index‬‬
‫‪ ‬روش‌های محاسبه ‪H-Index‬‬
‫‪ ‬کار عملی‬
Scientometrics (bibliometrics)
 Scientometrics (bibliometrics) - The
measurement of scientific output activity
through statistics on academic
publications
• The scope of bibliometrics includes:
“all quantitative aspects and models of
science communication, storage,
dissemination and retrieval of scientific
information”.
‫تعريف علمسنجی‬
‫‪ ‬آن دسته از روش‌های ّ‬
‫کمی را که به تحليل علم بعنوان يک فرآيند‬
‫اطالعاتی تأکيد دارند ”علم سنجی“ می‌نامند‪.‬‬
‫‪” ‬علم سنجی“ به تعبيری ساده تر عبارت است از دانش اندازهگير‬
‫علم‪.‬‬
Scientometrics
informetrics
bibliometrics
scientometrics
cybermetrics
webometrics
Bibliometric data used for..
 Scientific output evaluation









Impact
Citations
History of science
Publication strategies
Science policy; resource allocation
Collection management
Sociology of science
Information organization
Information management & utilization
Links of bibliometrics with related research fields and application services
Science policy
Research management
Scientific information
Librarianship
Services for
Research in
Economics
Sociology of science
applied
Library and
Information Science
History of science
Scientometrics
basic
Informetrics
Life sciences
Mathematics/Physics
Webometrics
Why do we evaluate scientific output
International
National
Institutional
SPLIT IN
NEEDS
Faculty
Researchers
SPLIT IN
NEEDS
•
•
•
•
•
•
•
Grant Allocations
Policy Decisions
Benchmarking
Promotion
Collection management
Funding allocations
Research
Scientists Ranking Methods
 Evaluation of scientists by “experts”

e.g., surveys
 Citation Analysis

Task: Compute a score for the “objects”
 Hybrid method of previous two
3 Kinds of Citation Data Indexes
Articles
 Citation Impact
Authors
 Number of papers (Quantity)
 Number of Citations (Quality)
 Average number of citations/article
 h-index & g-index (Quantity & Quality Both)
Journals
 Journal Impact Factor
 h-index
A Sample of a Sceintometery
Report
3 Kinds of Citation Data
Articles
 Citation Impact
Authors
 Number of papers (Quantity)
 Number of Citations (Quality)
 Average number of citations/article
 h-index & g-index Quantity & Quality Both)
Journals
 Journal Impact Factor
 h-index
ISI Impact Factor
A= total cites in 1992
B= 1992 cites to articles
published in 1990-91 (this is a
subset of A)*
C= number of articles published
in 1990-91
D= B/C = 1992 impact factor
Citation Databases
 Web of Science
 Scopus
 Google Scholar
Other Tools Available
 Other bibliometric indicators:
Journal Citation Reports (JCR)
 Other indicators databases (national,
essential, university, institutional)
 ISIHighlyCited.com

WoS and Scopus: Subject
Coverage (% of total records)
WoS
Social
Sciences, 14
SCOPUS
Arts &
Humanities,
9
Social
Sciences, 2
Physical
Sciences, 25
Science, 77
Biological &
Environmental
Sciences, 13
Google Scholar ?
Health & Life
Sciences, 60
Web of Science
 Covers around 9,000 journal titles and 200 book
series divided between SCI, SSCI and A&HCI.
 Electronic back files available to 1900 for SCI and
mid- 50s for SSCI and mid-70s for A&HCI.
 Very good coverage of sciences; patchy on “softer”
sciences, social sciences and arts and humanities.
 US and English-language biased.
 Full coverage of citations.
 Name disambiguation tool.
 Limited downloading options.
Scopus
 Positioning itself as an alternative to ISI
 More journals from smaller publishers and open access
(+15,000 journals; 750 conf proceedings)
 Source data back to 1960.
 Excellent for physical and biological sciences; poor for social
sciences; does not cover humanities or arts.
 Better international coverage (60% of titles are non-US)
 Back to 1996 ! (e.g. citation data for the last decade only)
 Not “cover to cover” and not up to date
 Easy to use in searching for source publications; clumsy in
searching cited publications.
 Citation tracker works up to 1000 records only.
 Limited downloading options.
Google Scholar








Better coverage for all citations as it retrieve web !
More coverage of references also gray literature !
Coverage and scope?
Inclusion criteria?
Very limited search options
No separate cited author search
Back to 1990 NOT more !
Free!
‫?‪What is Scopus Database‬‬
‫معرفي بانک اطالعاتي (‪Scopus (Database‬‬
‫‪ ‬پوشش اطالعاتي بيش از ‪ 15200‬عنوان مجله‬
‫ناشر بزرگ بين‌املللي‬
‫‪ ‬دربرگيرنده بيش از ‪ 30‬ميليون‌ خالصه مقاله از ‪‌ 4000‬‬
‫‪ ‬دربرگيرنده بيش از ‪ 265‬ميليون‌ ‪Citation‬‬
‫در مدالين‬
‫‪ ‬دربرگيرنده تمامي مجالت نمايه شده ‌‬
What is Scopus?







+15,200 titles from more than 4,000 publishers
+1,000+ Open Access journals
+500 Conference Proceedings
Websites
and digital
Patents
400M web pages
archives
21M patents
Peer
reviewed
Repositories
Institutional
literature
repositories
Digital Archives
Science
Medicine
Technology
Social sciences
23
Content Update
 30 million records, of which:
 15 million records include references going back to 1996
 15 million pre-1996 records go back as far as 1900
 265 million references, added to records from 1996 onwards
 In addition to traditional scientific and academic journals, Scopus covers:
 1000 Open Access journals
 500 Conference Proceedings
 600 Trade Publications
 125 Book Series
 Medline (100% coverage)
 275 million quality web sites including 21 million patents from 5 patent
offices
 UK Patents added to Scirus
What is Scopus?
240 million scholarly
Web items, E-prints,
theses, dissertations,
13 M patents
Focused 15,100 titles
web
STM &
information
World’s Social
sciences
Largest
Abstract &
Citation
Academic Database 4,000
library
publishers
sources
Fastest route to FullText
25
15% Elsevier sources
85% other publishers
Valuable archive included
1966
Abstract
30 million
+1.1 million per year
Abstract
from 1966
1996
Abstract + references
2006/1996
2006
15,100 current journal sources
Cited References
265 million
10 years
+ 25 million each year
Currency
Updated daily
Scopus Coverage
15,100 Unique titles
5,900
Life & Health
(100% Medline)
4,500
Chemistry
Physics
Engineering
2,500
Biological
Agricultural
Environmental
2,700
Social Sciences
Psychology
Economics
International distribution of titles
6872
806
5336
1390
189
198
251
28
Geographical spread of
Scopus content
North America
South America
Asia Pacific
Europe, Middle East
& Africa
29
Iranian Titles indexed in Scopus
•
•
•
•
•
•
•
•
30
Iranian Biomedical Journal
Archives of Iranian Medicine
Daru
Iranian Journal of Diabetes and Lipid Disorders
Iranian Journal of Medical Sciences
Iranian Journal of Public Health
Journal of Medicinal Plants
Yakhteh
Bibliometric Tool Development of Scopus
I
M
P
L
E
M
E
N
T
• Citation Tracker
• Author Identifier
• WebCites
• PatentCites
Launch
Scopus
2004
S
T
R
A
T
E
G
Y Literature Search
2005
2006
Introducing RPM tools
Market Feedback &
Development
• h-index
• Custom Data
* End 2007 release
2007
Scopus for science
evaluation
Difficulties of Old Criteria
 Total number of papers (Quantity)
 Total number of citations (Quality)
 Average number of citations/article
(Deepened on the outliers)
 Journal Impact Factor (Discipline based,
dependent on the outliers)
H-index was born !
 We need an Index both to include quantity &
also quality of an authors' paper




Productivity
Impact
Not affected by “big hits”
Not affected by “noise”
The h-index
 Hirsch, J. E. (2005). An index to quantify an
individual's scientific research output. Proceedings of
the National Academy of Sciences of the United
States of America, 102(46), 16569-16572.
 Meaningful when compared to others within the same
discipline area. Researchers in one field may have
very different h-indices than researchers in another
(e.g. Life Sciences vs. Physics).
The h-index
 Hirsch, J.E. "an index to quantify an
individual's scientific research output" .
fo ymedacA lanoitaN eht fo sgnideecorP
aciremA fo setatS detinU eht fo secneicS
SANP(). 102(46), 16569-16572
 Available at:
http://arxiv.org/pdf/physics/0508025
The H-index: a definition
 ‘The H-index is the highest number of
papers a scientist has that have at least that
number of citations.’ Nature (2005)
What is the h-Index?
 Performance measurement tool for scientific
authors (similar idea to journal impact factors
but for individuals)
Established by Jorge Hirsch at UC San Diego
“A scientist has index h if h of his/her Np papers have
at least h citations each, and the other (Np- h) papers
have no more than h citations each.”
Source: Hirsch, J. E. (2005, September 29). An index to quantify an individual’s scientific research output.
Retrieved from http://arxiv.org/abs/physics/0508025
The h-index
 Definition:
A researcher has h-index h if


h of his Np articles have received at least h
citations each
the rest Np-h articles have received no more than
h citations each
H-index Concept through its Graph
The h - Graph
‫‪The h-index‬‬
‫در سال‬
‫‪ ‬شاخص جديدي ‌از شاخص‌هاي علم‌سنجي است‪ .‬اين شاخص ‌‬
‫در دانشگاه کاليفرنيا ابداع شد‪.‬‬
‫‪ 2005‬ميالدي توسط ‪‌ Jorge Hirsch‬‬
‫اثر ‌و ارزيابي کمي برون‌داد‬
‫در واقع با هدف ارزيابي کيفي ‌‬
‫اين شاخص ‌‬
‫پژوهش ي محققين ابداع شده است‪.‬‬
‫‪The h-index‬‬
‫‪ ‬مفهوم ‪H-Index‬عبارت است ‌از تعداد مقاالت نويسنده که تعداد‬
‫کمتر ‌از آن دارند‪ .‬مث ‌ال چنان‌چه ‪H-Index‬‬
‫ابر با ‪‌ h‬و يا ‌‬
‫ارجاعات بر ‌‬
‫منتشر شده‬
‫‌‬
‫محققي ‪ 5‬باشد‪ ،‬مفهوم آن اين است که اين محقق ‪ 5‬مقاله‬
‫ديگر‬
‫دارد که هرکدام حداقل ‪ 5‬استناد يا ‪ Citation‬دارند‪ .‬به عبارت ‌‬
‫کمتر از ‪ 5‬استناد دارند‪.‬‬
‫ساير مقاالت اين محقق ‌‬
‫مفهوم آن اين است که ‌‬
‫‪ ‬امروزه اين شاخص معادل ‪ Impact Factor‬براي محققين محسوب‬
‫مي‌شود‪.‬‬
‫‪The highest h-index in‬‬
‫‪the World & Iran‬‬
‫ابر با‬
‫در دنيا مربوط به حوزه علوم زيستی بر ‌‬
‫‪ ‬بزرگترين شاخص ‪‌ h‬‬
‫دکتر‬
‫‪‌ 197‬و بزرگترين شاخص ‪ h‬محققان ايران جناب آقای ‌‬
‫ی کرمانشاه با عدد ‪ h‬برابر‬
‫شمس ی پور‌ استاد شيمی دانشگاه راز ‌‬
‫‪ 33‬می باشد‪.‬‬
Terminology
 Np: total number of papers
 Nc,tot: total number of citations
 Y(now): present year
 Y(i): year of publication of paper i
 C(i): set of citations to paper i
The h-index
A scientist has index h if h of his or her Np papers
have at least h citations each and the other ( Np – h )
have at least ≤ h citations each
Doc 1
2
3
4
5
6
7
8
9
10
11
49
23
15
14
6
3
1
1
0
0
0
Cit
H-index example
Author A
Doc
1
2
3
4
5
6
7
8
9
Cit
55
45
20
10
5
4
3
2
1
Author B
46
Doc
1
2
3
4
Cit
25
20
9
6
H-index example
Author X has 5 published articles:
Article1, citations 5
Article2, citations 10
Article3, citations 100
Article4, citations 6
Article5, citations 4
The H-index of X is 4: there are 4 papers with at least 4
citations each.
The h-index
 It could be used for an specific Author:

Evaluate the Research Performance of Author
 Or could be used for a group of Papers of an
institution, department or journal which

Evaluate the Impact of the group of special
papers
H-index drawbacks
 Like impact factors depends on subject area
 It is a growing function over time
 It does NOT show the current activity or
inactivity of the author
 Disadvantages younger researchers (without
previous track record)
 Scientists with short scientific life are out of
competition
The Contemporary h-index
 The Contemporary h-index was proposed by
Antonis Sidiropoulos, Dimitrios Katsaros,
and Yannis Manolopoulos
 It adds an age-related weighting to each cited
article, giving less weight to older articles.
The g-index
 The g-index was proposed by Leo Egghe It is defined
as follows:
 [Given a set of articles] ranked in decreasing order of
the number of citations that they received, the g-index
is the (unique) largest number such that the top g
articles received (together) at least g2 citations.
 It aims to improve on the h-index by giving more
weight to highly-cited articles.
The g-index
 Suggested in 2006 by Leo Egghe.
 The index is calculated based on the
distribution of citations received by a given
researcher's publications.
The g-index
 Given a set of articles ranked in decreasing
order of the number of citations that they
received, the g-index is the (unique) largest
number such that the top g articles received
(together) at least g2 citations
 This index is very similar to the h-index, and
attempts to address its shortcomings.
The h-b-index
 The h-b-index developed by Michael Banks of the
Max Planck Institute for Solid State Research in
Germany, takes the index further by evaluating the
impact of compounds used in solid-state physics
and scientific topics in general.
 The h-b-index is defined in the same manner as the
h-index, but is based on a topic (or compound)
search instead of a scientists name.
The h-b-index
 A scientist has index h if h of his/her Np papers have
at least h citations each, and the other (Np – h)
papers have at most h citations each.
 For the case of a topic it is useful to define the h-b
index in terms of the number of years, n as h = nm
 If the h-b index is linear with the number of years,
then m is given as the gradient. In this respect, a
compound or topic with a large m and h-b index can
be defined as a hot topic.
The H-Graphs in Scopus
 A more comprehensive way evaluating
an author
 Using Author Search, Scopus give us
three different graphs



H-Index Graph of given Author
No of Author Papers (Articles) per year
No of Author Citations per year
No of articles
No of citations
h-index
plot
57
The h-index
 Plots citations per article
 Incision = h-index
 Shows low & highly citedby counts
 Completely transparent
 The date range can change
Practical Interpretation:
Promotion, Evaluation,
Funding, Tenure,
Benchmarking
58
Author articles history
 Shows level of
activity
 Shows peaks and
troths in publication
history
 Can change the date
range
Practical Interpretation:
Promotion, Evaluation,
Funding, Tenure,
Benchmarking
59
Author Cited-by’s
 Shows level of activity
 Shows highs & lows
 Can change the date
range
 Time lag!
Practical Interpretation:
Promotion, Evaluation,
Funding, Tenure,
Benchmarking
60
How to calculate h-index
through Scopus
 There is two way to calculate it according to
the way you want:
 If you want it for an Author:

Search the Author, It will calculate it
Automatically for you.
 If you want it for a group of Papers

Search them & then use the track citation & sort
them out to count & calculate it Manually.
The Hirsch Index:
For a Group of Papers
 Run an author search
 Sort result by citations, clicking on Cited by
 Scroll down the new display of results until
the ranking number is equal or less than the
number of citations.
 That ranking number is the Hirsch Index for
that author.
62
Author Identifier functionality
• Author Identifier enables Scopus users to avoid
two major problems which affect most A&I
databases:


How to distinguish between an author’s articles and
those of another author sharing the same name?
How to group an author’s articles together when his
or her name has been recorded in different ways?
• With other databases, these problems can result
in retrieving incomplete or inaccurate results.
Calculating the H-index:
For a Group of Papers
Indicators of quality as measured
using published outputs
 Number of publications
 Citation counts to these publications (adjusted for selfcitations) -what “window” should be used? 4, 5, 10 years?
 Citations per publication
 Percentage of uncited papers
 Impact factors (of publishing journals)
 Diffusion factor (of citing journals) – profile of users of
research (who, where, when and what)
 “Impact factor” of a scholar - Hirsh index (h index)


(numbers of papers with this number of citations).
Your h index =75 if you wrote at least 75 papers with 75 citations each.
Note: These should not be seen as “absolute” numbers but
always seen in the context of the discipline, research type,
institution profile, seniority of a researcher, etc.
Calculating h-index using
Thomson ISI Web of Science
1)
2)
3)
Conduct a General Search
Automatic: click on “Citation
Report”, or,
Manual: sort by “Times Cited”
Calculating h-index using
Google Scholar
There are different ways to do it &
also different interfaces:
1- Publish or Perish Interface
You can download it here.
2- Another Script, Click here.
3- Also this one.
Compare like with like!
 Applied research attracts fewer citations than basic
research.
 Differences in citation behaviour between disciplines
(e.g. papers in organisational behaviour attract 5
times as many citations as papers in accounting).
 Highest IF journal in immunology is Ann Rev Immun
(IF 47.3) Mean for cat. 4.02; and in health care and
services category is Milbank Q. (IF of 3.8). Mean for
cat. 1.09.
 Matthew effect.
Benchmarking must be done using comparable
variables!
Harzing’s Publish or Perish
 A software program that retrieves and
analyzes academic citations. It uses Google
Scholar to obtain the raw citations, then
analyzes these and calculates a series of
citation metrics.
H-Index Advantages
 The h-index was intended to address the main
disadvantages of other bibliometric indicators, such
as total number of papers or total number of
citations.
 It simultaneously measure the quality and
sustainability of scientific output, as well as, to
some extent, the diversity of scientific research.
H-Index Advantages
 The h-index is much less affected by methodological papers proposing
successful new techniques, methods or approximations, which can be
extremely highly cited. For example, one of the most cited condensed
matter theorists, John P. Perdew, has been very successful in devising
new approximations within the widely used density functional theory.
He has published 3 papers cited more than 5000 times and 2 cited more
than 4000 times. Several thousand papers utilizing the density functional
theory are published every year, most of them citing at least one paper of
J.P. Perdew. His total citation index is close to 39 000, while his h-index
is large, 51, but not unique. In contrast, the condensed-matter theorist
with the highest h-index (94), Marvin L. Cohen, has a lower citation
index of 35 000. One can argue that in this case the h-index reflects the
broader impact of Cohen's paper in solid-state physics due to his larger
number of highly-cited papers.
H-Index Problems



The h-index is bounded by the total number of publications. This means that scientists
with a short career are at an inherent disadvantage, regardless of the importance of their
discoveries. For example, Évariste Galois' h-index is 2, and will remain so forever. Had
Albert Einstein died in early 1906, his h-index would be stuck at 4 or 5, despite his being
widely acknowledged as one of the most important physicists, even considering only his
publications to that date.
The h-index does not consider the context of citations. For example, citations in a paper
are often made simply to flesh-out an introduction, otherwise having no other significance
to the work. h also does not resolve other contextual instances: citations made in a
negative context and citations made to fraudulent or retracted work. (This is true for other
metrics using citations, not just for the h-index.)
The h-index does not account for confounding factors. These include the practice of
"gratuitous authorship", which is still common in some research cultures, the so-called
Matthew effect, and the favorable citation bias associated with review articles.
H-Index Problems
 The h-index has been found to have slightly less
predictive accuracy and precision than the simpler
measure of mean citations per paper.
 While the h-index de-emphasizes singular
successful publications in favor of sustained
productivity, it may do so too strongly. Two
scientists may have the same h-index, say, h = 30,
but one has 20 papers that have been cited more
than 1000 times and the other has none. Clearly
scientific output of the former is more valuable.
H-Index Problems


The h-index is affected by limitations in citation data bases. Some automated searching
processes find citations to papers going back many years, while others find only recent
papers or citations. This issue is less important for those whose publication record started
after automated indexing began around 1990. Citation data bases contain some citations
that are not quite correct and therefore will not properly match to the correct paper or
author.
The h-index does not account for the number of authors of a paper. If the impact of a paper
is the number of citations it receives, it might be logical to divide that impact by the
number of authors involved. (Some authors will have contributed more than others, but in
the absence of information on contributions, the simplest assumption is to divide credit
equally.) Not taking into account the number of authors could allow gaming the h-index
and other similar indices: for example, two equally capable researchers could agree to
share authorship on all their papers, thus increasing each of their h-indices. Even in the
absence of such explicit gaming, the h-index and similar indices tend to favor fields with
larger groups, e.g. experimental over theoretical.
My h-index is bigger
than yours!
Edward Witten
Physicist
h=132
But more people know
who I am!
Stephen Hawking
Physicist
h=62
! ‫ بزنید‬Email ‫اگر میل داشتید‬
payam.kabiri@gmail.com
Download