کارگاه آشنايی با مبانی و شاخصهای علمسنجی: H-Indexو محاسبه آن برگزار کنندهگان: معاونت تحقيقات و فنآوری وزارت بهداشت ،درمان و آموزش پزشکی ،با همکاری دانشگاه علوم پزشکی تهران :آشنايی با مبانی و شاخصهای علمسنجی و محاسبه آنH-Index Payam Kabiri, MD. PhD. Epidemiologist Tehran & Isfahan Universities of Medical Sciences برنامه امروز ! اطالعسنجی و علمسنجی ،مفاهيم و کاربرد معرفی خالصه شاخصهای علمسنجی آشنايی با نمايهنامههای استنادی معرفی شاخص H-Index مزايا و معايب H-Index روشهای محاسبه H-Index کار عملی Scientometrics (bibliometrics) Scientometrics (bibliometrics) - The measurement of scientific output activity through statistics on academic publications • The scope of bibliometrics includes: “all quantitative aspects and models of science communication, storage, dissemination and retrieval of scientific information”. تعريف علمسنجی آن دسته از روشهای ّ کمی را که به تحليل علم بعنوان يک فرآيند اطالعاتی تأکيد دارند ”علم سنجی“ مینامند. ” علم سنجی“ به تعبيری ساده تر عبارت است از دانش اندازهگير علم. Scientometrics informetrics bibliometrics scientometrics cybermetrics webometrics Bibliometric data used for.. Scientific output evaluation Impact Citations History of science Publication strategies Science policy; resource allocation Collection management Sociology of science Information organization Information management & utilization Links of bibliometrics with related research fields and application services Science policy Research management Scientific information Librarianship Services for Research in Economics Sociology of science applied Library and Information Science History of science Scientometrics basic Informetrics Life sciences Mathematics/Physics Webometrics Why do we evaluate scientific output International National Institutional SPLIT IN NEEDS Faculty Researchers SPLIT IN NEEDS • • • • • • • Grant Allocations Policy Decisions Benchmarking Promotion Collection management Funding allocations Research Scientists Ranking Methods Evaluation of scientists by “experts” e.g., surveys Citation Analysis Task: Compute a score for the “objects” Hybrid method of previous two 3 Kinds of Citation Data Indexes Articles Citation Impact Authors Number of papers (Quantity) Number of Citations (Quality) Average number of citations/article h-index & g-index (Quantity & Quality Both) Journals Journal Impact Factor h-index A Sample of a Sceintometery Report 3 Kinds of Citation Data Articles Citation Impact Authors Number of papers (Quantity) Number of Citations (Quality) Average number of citations/article h-index & g-index Quantity & Quality Both) Journals Journal Impact Factor h-index ISI Impact Factor A= total cites in 1992 B= 1992 cites to articles published in 1990-91 (this is a subset of A)* C= number of articles published in 1990-91 D= B/C = 1992 impact factor Citation Databases Web of Science Scopus Google Scholar Other Tools Available Other bibliometric indicators: Journal Citation Reports (JCR) Other indicators databases (national, essential, university, institutional) ISIHighlyCited.com WoS and Scopus: Subject Coverage (% of total records) WoS Social Sciences, 14 SCOPUS Arts & Humanities, 9 Social Sciences, 2 Physical Sciences, 25 Science, 77 Biological & Environmental Sciences, 13 Google Scholar ? Health & Life Sciences, 60 Web of Science Covers around 9,000 journal titles and 200 book series divided between SCI, SSCI and A&HCI. Electronic back files available to 1900 for SCI and mid- 50s for SSCI and mid-70s for A&HCI. Very good coverage of sciences; patchy on “softer” sciences, social sciences and arts and humanities. US and English-language biased. Full coverage of citations. Name disambiguation tool. Limited downloading options. Scopus Positioning itself as an alternative to ISI More journals from smaller publishers and open access (+15,000 journals; 750 conf proceedings) Source data back to 1960. Excellent for physical and biological sciences; poor for social sciences; does not cover humanities or arts. Better international coverage (60% of titles are non-US) Back to 1996 ! (e.g. citation data for the last decade only) Not “cover to cover” and not up to date Easy to use in searching for source publications; clumsy in searching cited publications. Citation tracker works up to 1000 records only. Limited downloading options. Google Scholar Better coverage for all citations as it retrieve web ! More coverage of references also gray literature ! Coverage and scope? Inclusion criteria? Very limited search options No separate cited author search Back to 1990 NOT more ! Free! ?What is Scopus Database معرفي بانک اطالعاتي (Scopus (Database پوشش اطالعاتي بيش از 15200عنوان مجله ناشر بزرگ بيناملللي دربرگيرنده بيش از 30ميليون خالصه مقاله از 4000 دربرگيرنده بيش از 265ميليون Citation در مدالين دربرگيرنده تمامي مجالت نمايه شده What is Scopus? +15,200 titles from more than 4,000 publishers +1,000+ Open Access journals +500 Conference Proceedings Websites and digital Patents 400M web pages archives 21M patents Peer reviewed Repositories Institutional literature repositories Digital Archives Science Medicine Technology Social sciences 23 Content Update 30 million records, of which: 15 million records include references going back to 1996 15 million pre-1996 records go back as far as 1900 265 million references, added to records from 1996 onwards In addition to traditional scientific and academic journals, Scopus covers: 1000 Open Access journals 500 Conference Proceedings 600 Trade Publications 125 Book Series Medline (100% coverage) 275 million quality web sites including 21 million patents from 5 patent offices UK Patents added to Scirus What is Scopus? 240 million scholarly Web items, E-prints, theses, dissertations, 13 M patents Focused 15,100 titles web STM & information World’s Social sciences Largest Abstract & Citation Academic Database 4,000 library publishers sources Fastest route to FullText 25 15% Elsevier sources 85% other publishers Valuable archive included 1966 Abstract 30 million +1.1 million per year Abstract from 1966 1996 Abstract + references 2006/1996 2006 15,100 current journal sources Cited References 265 million 10 years + 25 million each year Currency Updated daily Scopus Coverage 15,100 Unique titles 5,900 Life & Health (100% Medline) 4,500 Chemistry Physics Engineering 2,500 Biological Agricultural Environmental 2,700 Social Sciences Psychology Economics International distribution of titles 6872 806 5336 1390 189 198 251 28 Geographical spread of Scopus content North America South America Asia Pacific Europe, Middle East & Africa 29 Iranian Titles indexed in Scopus • • • • • • • • 30 Iranian Biomedical Journal Archives of Iranian Medicine Daru Iranian Journal of Diabetes and Lipid Disorders Iranian Journal of Medical Sciences Iranian Journal of Public Health Journal of Medicinal Plants Yakhteh Bibliometric Tool Development of Scopus I M P L E M E N T • Citation Tracker • Author Identifier • WebCites • PatentCites Launch Scopus 2004 S T R A T E G Y Literature Search 2005 2006 Introducing RPM tools Market Feedback & Development • h-index • Custom Data * End 2007 release 2007 Scopus for science evaluation Difficulties of Old Criteria Total number of papers (Quantity) Total number of citations (Quality) Average number of citations/article (Deepened on the outliers) Journal Impact Factor (Discipline based, dependent on the outliers) H-index was born ! We need an Index both to include quantity & also quality of an authors' paper Productivity Impact Not affected by “big hits” Not affected by “noise” The h-index Hirsch, J. E. (2005). An index to quantify an individual's scientific research output. Proceedings of the National Academy of Sciences of the United States of America, 102(46), 16569-16572. Meaningful when compared to others within the same discipline area. Researchers in one field may have very different h-indices than researchers in another (e.g. Life Sciences vs. Physics). The h-index Hirsch, J.E. "an index to quantify an individual's scientific research output" . fo ymedacA lanoitaN eht fo sgnideecorP aciremA fo setatS detinU eht fo secneicS SANP(). 102(46), 16569-16572 Available at: http://arxiv.org/pdf/physics/0508025 The H-index: a definition ‘The H-index is the highest number of papers a scientist has that have at least that number of citations.’ Nature (2005) What is the h-Index? Performance measurement tool for scientific authors (similar idea to journal impact factors but for individuals) Established by Jorge Hirsch at UC San Diego “A scientist has index h if h of his/her Np papers have at least h citations each, and the other (Np- h) papers have no more than h citations each.” Source: Hirsch, J. E. (2005, September 29). An index to quantify an individual’s scientific research output. Retrieved from http://arxiv.org/abs/physics/0508025 The h-index Definition: A researcher has h-index h if h of his Np articles have received at least h citations each the rest Np-h articles have received no more than h citations each H-index Concept through its Graph The h - Graph The h-index در سال شاخص جديدي از شاخصهاي علمسنجي است .اين شاخص در دانشگاه کاليفرنيا ابداع شد. 2005ميالدي توسط Jorge Hirsch اثر و ارزيابي کمي برونداد در واقع با هدف ارزيابي کيفي اين شاخص پژوهش ي محققين ابداع شده است. The h-index مفهوم H-Indexعبارت است از تعداد مقاالت نويسنده که تعداد کمتر از آن دارند .مث ال چنانچه H-Index ابر با hو يا ارجاعات بر منتشر شده محققي 5باشد ،مفهوم آن اين است که اين محقق 5مقاله ديگر دارد که هرکدام حداقل 5استناد يا Citationدارند .به عبارت کمتر از 5استناد دارند. ساير مقاالت اين محقق مفهوم آن اين است که امروزه اين شاخص معادل Impact Factorبراي محققين محسوب ميشود. The highest h-index in the World & Iran ابر با در دنيا مربوط به حوزه علوم زيستی بر بزرگترين شاخص h دکتر 197و بزرگترين شاخص hمحققان ايران جناب آقای ی کرمانشاه با عدد hبرابر شمس ی پور استاد شيمی دانشگاه راز 33می باشد. Terminology Np: total number of papers Nc,tot: total number of citations Y(now): present year Y(i): year of publication of paper i C(i): set of citations to paper i The h-index A scientist has index h if h of his or her Np papers have at least h citations each and the other ( Np – h ) have at least ≤ h citations each Doc 1 2 3 4 5 6 7 8 9 10 11 49 23 15 14 6 3 1 1 0 0 0 Cit H-index example Author A Doc 1 2 3 4 5 6 7 8 9 Cit 55 45 20 10 5 4 3 2 1 Author B 46 Doc 1 2 3 4 Cit 25 20 9 6 H-index example Author X has 5 published articles: Article1, citations 5 Article2, citations 10 Article3, citations 100 Article4, citations 6 Article5, citations 4 The H-index of X is 4: there are 4 papers with at least 4 citations each. The h-index It could be used for an specific Author: Evaluate the Research Performance of Author Or could be used for a group of Papers of an institution, department or journal which Evaluate the Impact of the group of special papers H-index drawbacks Like impact factors depends on subject area It is a growing function over time It does NOT show the current activity or inactivity of the author Disadvantages younger researchers (without previous track record) Scientists with short scientific life are out of competition The Contemporary h-index The Contemporary h-index was proposed by Antonis Sidiropoulos, Dimitrios Katsaros, and Yannis Manolopoulos It adds an age-related weighting to each cited article, giving less weight to older articles. The g-index The g-index was proposed by Leo Egghe It is defined as follows: [Given a set of articles] ranked in decreasing order of the number of citations that they received, the g-index is the (unique) largest number such that the top g articles received (together) at least g2 citations. It aims to improve on the h-index by giving more weight to highly-cited articles. The g-index Suggested in 2006 by Leo Egghe. The index is calculated based on the distribution of citations received by a given researcher's publications. The g-index Given a set of articles ranked in decreasing order of the number of citations that they received, the g-index is the (unique) largest number such that the top g articles received (together) at least g2 citations This index is very similar to the h-index, and attempts to address its shortcomings. The h-b-index The h-b-index developed by Michael Banks of the Max Planck Institute for Solid State Research in Germany, takes the index further by evaluating the impact of compounds used in solid-state physics and scientific topics in general. The h-b-index is defined in the same manner as the h-index, but is based on a topic (or compound) search instead of a scientists name. The h-b-index A scientist has index h if h of his/her Np papers have at least h citations each, and the other (Np – h) papers have at most h citations each. For the case of a topic it is useful to define the h-b index in terms of the number of years, n as h = nm If the h-b index is linear with the number of years, then m is given as the gradient. In this respect, a compound or topic with a large m and h-b index can be defined as a hot topic. The H-Graphs in Scopus A more comprehensive way evaluating an author Using Author Search, Scopus give us three different graphs H-Index Graph of given Author No of Author Papers (Articles) per year No of Author Citations per year No of articles No of citations h-index plot 57 The h-index Plots citations per article Incision = h-index Shows low & highly citedby counts Completely transparent The date range can change Practical Interpretation: Promotion, Evaluation, Funding, Tenure, Benchmarking 58 Author articles history Shows level of activity Shows peaks and troths in publication history Can change the date range Practical Interpretation: Promotion, Evaluation, Funding, Tenure, Benchmarking 59 Author Cited-by’s Shows level of activity Shows highs & lows Can change the date range Time lag! Practical Interpretation: Promotion, Evaluation, Funding, Tenure, Benchmarking 60 How to calculate h-index through Scopus There is two way to calculate it according to the way you want: If you want it for an Author: Search the Author, It will calculate it Automatically for you. If you want it for a group of Papers Search them & then use the track citation & sort them out to count & calculate it Manually. The Hirsch Index: For a Group of Papers Run an author search Sort result by citations, clicking on Cited by Scroll down the new display of results until the ranking number is equal or less than the number of citations. That ranking number is the Hirsch Index for that author. 62 Author Identifier functionality • Author Identifier enables Scopus users to avoid two major problems which affect most A&I databases: How to distinguish between an author’s articles and those of another author sharing the same name? How to group an author’s articles together when his or her name has been recorded in different ways? • With other databases, these problems can result in retrieving incomplete or inaccurate results. Calculating the H-index: For a Group of Papers Indicators of quality as measured using published outputs Number of publications Citation counts to these publications (adjusted for selfcitations) -what “window” should be used? 4, 5, 10 years? Citations per publication Percentage of uncited papers Impact factors (of publishing journals) Diffusion factor (of citing journals) – profile of users of research (who, where, when and what) “Impact factor” of a scholar - Hirsh index (h index) (numbers of papers with this number of citations). Your h index =75 if you wrote at least 75 papers with 75 citations each. Note: These should not be seen as “absolute” numbers but always seen in the context of the discipline, research type, institution profile, seniority of a researcher, etc. Calculating h-index using Thomson ISI Web of Science 1) 2) 3) Conduct a General Search Automatic: click on “Citation Report”, or, Manual: sort by “Times Cited” Calculating h-index using Google Scholar There are different ways to do it & also different interfaces: 1- Publish or Perish Interface You can download it here. 2- Another Script, Click here. 3- Also this one. Compare like with like! Applied research attracts fewer citations than basic research. Differences in citation behaviour between disciplines (e.g. papers in organisational behaviour attract 5 times as many citations as papers in accounting). Highest IF journal in immunology is Ann Rev Immun (IF 47.3) Mean for cat. 4.02; and in health care and services category is Milbank Q. (IF of 3.8). Mean for cat. 1.09. Matthew effect. Benchmarking must be done using comparable variables! Harzing’s Publish or Perish A software program that retrieves and analyzes academic citations. It uses Google Scholar to obtain the raw citations, then analyzes these and calculates a series of citation metrics. H-Index Advantages The h-index was intended to address the main disadvantages of other bibliometric indicators, such as total number of papers or total number of citations. It simultaneously measure the quality and sustainability of scientific output, as well as, to some extent, the diversity of scientific research. H-Index Advantages The h-index is much less affected by methodological papers proposing successful new techniques, methods or approximations, which can be extremely highly cited. For example, one of the most cited condensed matter theorists, John P. Perdew, has been very successful in devising new approximations within the widely used density functional theory. He has published 3 papers cited more than 5000 times and 2 cited more than 4000 times. Several thousand papers utilizing the density functional theory are published every year, most of them citing at least one paper of J.P. Perdew. His total citation index is close to 39 000, while his h-index is large, 51, but not unique. In contrast, the condensed-matter theorist with the highest h-index (94), Marvin L. Cohen, has a lower citation index of 35 000. One can argue that in this case the h-index reflects the broader impact of Cohen's paper in solid-state physics due to his larger number of highly-cited papers. H-Index Problems The h-index is bounded by the total number of publications. This means that scientists with a short career are at an inherent disadvantage, regardless of the importance of their discoveries. For example, Évariste Galois' h-index is 2, and will remain so forever. Had Albert Einstein died in early 1906, his h-index would be stuck at 4 or 5, despite his being widely acknowledged as one of the most important physicists, even considering only his publications to that date. The h-index does not consider the context of citations. For example, citations in a paper are often made simply to flesh-out an introduction, otherwise having no other significance to the work. h also does not resolve other contextual instances: citations made in a negative context and citations made to fraudulent or retracted work. (This is true for other metrics using citations, not just for the h-index.) The h-index does not account for confounding factors. These include the practice of "gratuitous authorship", which is still common in some research cultures, the so-called Matthew effect, and the favorable citation bias associated with review articles. H-Index Problems The h-index has been found to have slightly less predictive accuracy and precision than the simpler measure of mean citations per paper. While the h-index de-emphasizes singular successful publications in favor of sustained productivity, it may do so too strongly. Two scientists may have the same h-index, say, h = 30, but one has 20 papers that have been cited more than 1000 times and the other has none. Clearly scientific output of the former is more valuable. H-Index Problems The h-index is affected by limitations in citation data bases. Some automated searching processes find citations to papers going back many years, while others find only recent papers or citations. This issue is less important for those whose publication record started after automated indexing began around 1990. Citation data bases contain some citations that are not quite correct and therefore will not properly match to the correct paper or author. The h-index does not account for the number of authors of a paper. If the impact of a paper is the number of citations it receives, it might be logical to divide that impact by the number of authors involved. (Some authors will have contributed more than others, but in the absence of information on contributions, the simplest assumption is to divide credit equally.) Not taking into account the number of authors could allow gaming the h-index and other similar indices: for example, two equally capable researchers could agree to share authorship on all their papers, thus increasing each of their h-indices. Even in the absence of such explicit gaming, the h-index and similar indices tend to favor fields with larger groups, e.g. experimental over theoretical. My h-index is bigger than yours! Edward Witten Physicist h=132 But more people know who I am! Stephen Hawking Physicist h=62 ! بزنیدEmail اگر میل داشتید payam.kabiri@gmail.com