Peter Andras School of Computing Science Newcastle University peter.andras@ncl.ac.uk Overview Introduction Research metrics Science domains and evolution of science Quality and performance Science management 2 Introduction – 1 RAE – performance and funding REF – metrics based new performance measurement system (2014) Scientomerics, bibliometrics Other countries: Netherlands, Australia, US 3 Introduction – 2 What do we measure by metrics? What can be measured by metrics? How are metrics related to performance and quality? How can higher education and research management use science metrics? 4 Origins of research metrics Derek De Solla Price – 1960s Measuring ‘big’ science – physics Measuring national science output – papers, citations, patents Bibliometrics and scientometrics – 1970s – today Early: counts Recent: networks, distributions and conditional counts 5 What is measured – 1 Publication counts within a time window – at least a year, and usually within a range publication venues (conferences, journals, books) Citation counts of counted publications – citation counts per year may grow for a while and the shrink to zero through some time Links – co-authorship, co-citation, citing – Google page rank 6 What is measured – 2 Indices: h-index – n publications, each with at least n citations – max n g-index – m publications, with m citations in average – max m Journal metrics: Impact factor – average (expected) citation per paper for a time window (2 years) Immediacy index – the ratio of papers in a journal, which are cited within a year Citation half-life – the median age of papers that are cited currently Journal h-index 7 Metrics Metrics in academia: Individual Group Departmental University level Larger integration implies more stability and less variability over (shorter) time in the metrics 8 Domain comparability Different science domains have different practices of doing research, and publishing and discussing research results E.g. computer science – software engineering and software applications – use predominantly conferences to present new results E.g. biology – uses predominantly journals to present new results 9 Normalisation Considering most sciences the underlying generic pattern of publication and citation metrics is very similar – e.g. all follow power law distributions Normalised metrics: Leiden research metrics centre – raw metrics are divided by domain specific parameters to make them comparable across subjects Domain specific parameters are calculated by analysing domain specific journals and papers (from high impact multi-domain journals) – journal classification 10 Domain nature Domain specific public opinion How are new results presented How are they discussed How are they referenced What are the primary and preferred venues How is reputation established and maintained – for individuals, groups and institutes How is the socio-dynamics of the domain What are the ways of ‘experimental’ validation 11 Distortion factors Review papers – very highly cited Methods papers – most highly cited Self-citations – important at low counts, ignorable above medium counts (domain specific counts) Networking – citation cliques Spinning – relatively many highly cited papers contain high claims that are refuted in relatively short time ‘Authority’ – unjustified benefits from already established authority status 12 Variability and noise The metrics data may be highly variable – especially at individual or small group level, and also if too short time periods are considered Structural noise caused by the mechanisms of science – e.g. time to citations peak (domain specific) 13 Common research metrics Publication and citation counts considering a time window Individual and group level h-index / g-index Patent counts and patent revenue value Research funding value Number of PhD students satisfying some quality minimum (e.g. submission of thesis on time, publication of journal paper) Science esteem evaluation – measure of long term reputation management and networking ability 14 Metrics and quality How are metrics related to research quality of an individual / group / department / university ? Are metrics purely retrospective or do they have predictive value as well with respect to research quality ? 15 Evolution of science Karl Popper – continual additions to science, all additions being falsifiable and valid until falsified, revisions caused by falsification of previously accepted scientific knowledge Thomas Kuhn – normal science and revolutionary science, normal science grows by many little additions filling gaps in the current scientific knowledge, revolutionary science reorganises science fundamentally and grows rapidly until matures into normal science 16 Progress in normal science Use existing knowledge to make highly likely predictions Check the validity of such predictions by appropriately designed ‘experimental’ tests E.g. demonstrating a statement or a new minor theorem in a well-established branch of mathematics 17 Progress in revolutionary science Use existing knowledge to discover fundamental inconsistencies or relationships about previously unquestioned or believed-tobe-unrelated pieces of scientific knowledge See what cannot be seen by most others and re-shape the related science according to the new discovery E.g. non-Euclidean geometry (Bolyai, Lobachevsky), alternative continuum hypothesis (Godel, Cohen), axiom of choice alternatives (Zermelo, Godel, Cohen) 18 Predictability of success – 1 Normal science: New hypotheses are about highly predictable outcomes The predictability of success is lower if the hypothesis is in an area where experimental validation of earlier hypotheses is shaky - these are at the boundary of the normal science, and avoided by success-hungry researchers Most research is successful in filling a gap and adding new bits of knowledge to the science Most research is published and is referenced by further publications – being more mainstream in the current science implies more expected citations 19 Predictability of success – 2 Revolutionary science Most hypotheses turn out to be wrong Successful revolutionary science is built on large volume of unsuccessful tries and takes very long time (see the time between Euclid and nonEuclidean geometry, or the time between axiomatic set theory and alternatives of the choice axiom) May take long time to be noticed, due to established authorities and authoritative views of the science High risk of achieving no success and being largely ignored for a good while even if success is achieved Many times it does not lead to publications, citations and reputation enhancement 20 Extent of impact The expected impact is reversely correlated with the predictability of success Normal science has impact – however spinning may create the impression of higher impact that the actual; usually the impact is small, since it is about filling a usually small gap in the science Revolutionary science has major impact, if it is successful, in this case it is likely to change (rewrite) a large part of the existing science or initiate a fundamentally new branch of science with rapid initial growth potential 21 Domain maturity Maturity is indicated by the proportion of normal science progress within the science Mature science: most progress is made by small additions following normal science research, there are rare and relatively marginal revolutionary events – e.g. physics Early stage science: most progress follows the pattern of revolutionary science discovery, normal science progress happens for relatively short periods, and normal science components are regularly undermined by new revolutionary science, discovering the same several times is not uncommon, spinning may play an important role in establishing reputation – e.g. management science 22 Domain specific progress & impact Progress and impact depends on the maturity of the science More mature science implies more work following the pattern of normal science More mature science also implies relatively little impact for most new discoveries 23 Domain public opinion – 1 Science maturity determines the nature of the public opinion of the science Mature science is based on a stable and unchallenged body of scientific knowledge, which allows relatively few major novelties, implies well established and standardised experimental validations, and implies relatively low impact for most discoveries Implication: archival style publication in journals, use of highly standardised language, large integrated scientific societies, the use of spin to create the impression of revolutionaryness and to achieve visibility 24 Domain public opinion – 2 Early stage or less mature science allows regular ‘revolutionary’ novelties, doesn’t have an unquestionable body of established knowledge, allows experimentation with validation methods, allows repeated re-discoveries, and implies relatively high impact for the truly successful revolutionary innovations This environment may facilitate the more informal way of publication of results, also implies less citations for most results, which turn out to be less revolutionary than expected, the high impact may find its way on alternative routes – i.e. avoiding formal citations, spinning may play an important role 25 Measurability of normal science Normal science produces large volume output in well-established venues (journals, conference proceedings) Normal science papers are regularly cited by other such papers – papers in the currently most fashionable mainstream areas can get many citations Judging the quality of usual normal science publications is relatively straightforward, spinning has usually short-term effects Research grants are frequently available for normal science projects with high predictability objectives 26 Measurability of revolutionary science Revolutionary science output is rare, most ‘revolutionary’-intended publications are rejected or end up in obscure journals or conferences, scientists working on revolutionary topics may refrain from publication of half-baked ideas Revolutionary science papers may stay as a ‘sleeping beauty’ for many years, they may get few citations for a long time, and may not get many citations ever (e.g. if other follower papers collect most of the citations) It is hard to judge the quality of revolutionary science work – it is out of the boundaries or at the boundaries of normal science, there are no standardised methods or well established referential knowledge base, the impact is highly unpredictable A very large part of science intended to be revolutionary leads to dead-ends, although the rare truly revolutionary works have extremely high impact on their science Few research grants are available for revolutionary science, due to its unpredictable success, and out-of-ordinary methods and objectives, getting such grants many times needs excessive spinning Metrics measure normal science Research metrics measure almost exclusively normal science Publications Citations Research funding PhD students 28 Measuring revolutionary science It is very difficult to measure revolutionary science It becomes obvious that a work is part of revolutionary science usually well after the work is published – 5 – 25 or more years later – depending on the science domain Revolutionary science work is confirmed to be truly revolutionary at or close to the point when the new science knowledge based on this work becomes normal science It is very difficult to measure the expected impact of scientific work meant to be revolutionary Revolutionary science is not measured by usual research metrics (or at most it measured to very little extent before it becomes normal science) 29 Quality and metrics – 1 Quality in case of normal science can be evaluated and predicted reliably – the methods are standardised and relatively easy to check that they are followed correctly, the expected results are highly predictable, their impact is also well predictable Research metrics are well correlated with research quality in case of normal science 30 Quality and metrics – 2 It is difficult to compose valid short & medium term metrics for revolutionary science – long term metrics work in the sense of evaluating whether it led to emergence of new normal science Quality is hard to assess because results and impact are not easily predictable – an indicator can be the extent of the gap that would be bridged across by the new intended discovery, but spinning may blur the reliability of the estimate of the extent of the gap Quality may be judged by others who are active in the same area and have similarly wide range of knowledge, but even these judgements can be easily wrong There is not much relation between any common research metric and the quality of revolutionary research and its output Output and funding In short-to-medium term (3-5 years): Funding by QR and research council grants (in the UK) follows the performance in output generation (primarily citation counts) – significant correlation Output generation (publication and citation counts) depends on earlier output performance (primarily citation counts) and much less on change in the level of research funding Research funding performance reflects earlier output performance (primarily normal science) Extra research funding does not necessarily imply extra output above the level predicted based on earlier output 32 Metrics for management How can we use research metrics in the context of management ? What kind of research metrics can be used ? What can be achieved by using metrics ? 33 Management objectives Management objectives should be set in clear terms, should be achievable, and the progress towards achievement should be measurable Metrics can help in formulating such research management objectives Objectives should be set after sufficiently clear measurement and evaluation of the current status 34 Measuring progress and achievement – 1 Research metrics are key to set objectives with measurable progress in normal science Normal science based long-term retrospective measurement of revolutionary science work is the main way of measuring progress and achievement in revolutionary science 35 Measuring progress and achievement – 2 Measurement should consider a large part of the spectrum of possible outcomes (in quality terms) at sufficient resolution (e.g. papers published in a wide range of venues, grant applications with all levels of rating) The context and methods of measurement should be establish clearly and transparently (e.g. set of considered journals, conferences, time windows, exclusion rules) 36 Metrics as a management tool – 1 Individual metrics: Publication and citation counts and distributions (yearly counting) for 3 / 5 / 7 years (depending on status) – aim increasing distribution graphs and preferably exponentially increasing citation graph Discounted publication and citation metrics – elimination of reviews, editorials, database reports, excessively-many-author publications, etc. – aim similar as above List of highly cited excluded publications for 3 / 5 / 7 years – aim depends on individual h-index and/or g-index – with and without excluded publications – change in the index over the last 3 / 5 / 7 years – aim steady increase Research grants – count and yearly distribution of applications, distribution of evaluations, amount and yearly distribution of awarded grants for the last 3 / 5 / 7 years – aim steady level or increase For top-flyers – comparison with metrics results of a set of agreed topflyers elsewhere – this applies to metrics that can be calculated using common and accessible data sources 37 Metrics as a management tool – 2 Group & department metrics Same set as metrics for individuals, but counted for the group or department Distributions of individual metrics descriptors (e.g. publication and citation count) over the group or department members – aim balanced distribution with a set of high-flyers Change in the above distributions over time – aim steady or improved balance, with a steady or increasing count of highflyers Cohesion metrics: member co-authorships of counted papers and grant applications and proportion of these within the whole measured sets – aim sufficient but not excessive co-working relations Comparison with metrics results for a set of departments considered close competitors – this applies to metrics that can be calculated using common and accessible data sources 38 Metrics as a management tool – 3 University metrics Same metrics as for groups & departments Group distribution and cohesion metrics should be calculated by considering departments instead of individuals Comparison with metrics results for a set of universities considered close competitors – this applies to metrics that can be calculated using common and accessible data sources 39 Setting targets Targets are short term objectives – time limits/windows should be set very clearly Should be clearly defined, based on evaluation of current standing, achievable, and clearly measurable While targets apply to single years, they should be considered in the context of multi-year objectives 40 Domain specific impact The impact of the work is domain specific This should be considered when setting citation and related (e.g. h-index) targets Individual Group & department University 41 Clarity and predictability It is critical to use metrics in management with clear setting of their context, methods, and evaluations It is also critical to have predictable career-related outcomes associated with metrics-based management evaluation It should be clear what gains/losses or rewards/punishments are associated with delivering the target/objective and the nondelivery, sub-delivery or over-delivery of it 42 Environment Work time allocations should consider explicitly the time related to work for the achievement of research performance targets Metrics, evaluations, related gains/losses, should be generally accepted as valid 43 Competition and transparency Competition in achievement should be openly supported The performance evaluation process should be open and transparent Individuals / groups / departments should be able to compare themselves with others on the basis of openly available timely data (considering normalisation due to science domain) 44 Hype and spin The requirement for hype and spin is induced by the larger market of research results (e.g. publishing papers, getting patents) Ideally hype and spin should be discounted in internal evaluations, but without damaging the enthusiasm of researchers who hype and spin their work More successful and higher reputation department can afford stricter hype and spin discounting – the negative subjective effects of discounting are compensated by the shared reputation dividend Choice: normal vs. revolutionary science A choice between the extent of the normal and revolutionary science work should be made at all levels (individual, group, department, university) Early stage: preference should be for normal science to establish reputation Revolutionary science may be very exciting and challenging, but too much of it may lead to drop in or lack of reputation 46 Funding and impact expectation Funding and impact expectation depends on the normal / revolutionary science mix The normal science part makes more predictable the expectations, and also limits these expectations The revolutionary science part makes less predictable the expectations, but may also imply very high longer term expectations 47 Unit size – 1 Revolutionary science implies high risk High risk is affordable only if it can be spread sufficiently widely research department size implication Small department may easily disappear by taking too much risk Doing revolutionary science needs sufficiently large size department that can compensate for the risk by doing sufficient amount of low risk normal science work as well Unit size – 2 Phase transition: revolutionary science needs to be embedded in very large departments doing large amount of normal science as well / departments not doing revolutionary science need to be smaller, otherwise they have too much inefficiency (e.g. unable to provide sufficient career perspective for many members, too much overlap, too much hypedriven competition) Transition between the two phases is possible Managing normal science Managing normal science work is possible using metrics Since most of the well-established sciences contain mostly normal science this means that individuals / groups / departments / universities can use research metrics-based management to a large extent 50 Managing revolutionary science – 1 Revolutionary science is very difficult to be measured and can be measured reliably only by assessing its normal science impact over long or very long term Investment in revolutionary science is very risky, and most of it has no return Managing revolutionary science work may be possible to some extent on the basis of domain knowledge of established leaders of the domain – they may choose more junior colleagues deemed able and ready to get on the tracks of revolutionary science – this is also a high risk decision for these younger scientists (they may end up with very little and they may realise this only after long time) 51 Managing revolutionary science – 2 Revolutionary science investment may be based on long-term revolutionary science track record of a department or university, but should be noted that any such track record is not a guarantee of successful continuation – however consecutive generations of revolutionary scientists may indicate that at least the previous generations of them knew how to select their followers HR implications – normal science Mature science domains with sustained growth potential need many (possibly increasingly many) researchers who deliver results for normal science progress The ability distribution of the pool of potential researchers does not change in short time, implying that if more potential researchers find work as scientists, increasing number of less able researchers work on normal science topics, and the average ability of researchers working in the science domain is lowered The increasing standardisation of normal science research compensates for the average ability drop, making possible to achieve good results for the less able researchers as well 53 HR implications – revolutionary science – 1 As the body of normal science grows longer time is required for a researcher to show proficiency in production of normal science, which is required for establishing the reputation The benefits of doing normal science may attract many who in principle would be able to do more than that in terms of revolutionary science discoveries – as the science grows likelihood of finding such significant discoveries reduces, increasing the chance that the ability to discover revolutionary science will be wasted in the end 54 HR implications – revolutionary science – 2 Researchers working on revolutionary topics may become the funding and impact ballast of groups and departments, which may lead to pressure on them to switch for topics more rewarding in funding and impact terms Finding the right candidates for doing revolutionary work becomes increasingly difficult as delivering results in normal science becomes easier and the spin-noise of pretended ‘revolutionary’-ness becomes louder Summary – 1 Most of what can be measured to compile research metrics belongs to normal science Revolutionary science work disappears in the flood of normal science until around the time point, when it starts to turn into normal science Research metrics can be used to measure quality in the context of normal science performance In the case of revolutionary science performance the metrics work only in long term and retrospect 56 Summary – 2 Clarity, predictability, transparency, appropriate hype discounting, and competition are critical for the management of research Normal science is easier to practice, more researchers are available to do it, its impact and expected funding are more predictable than in the case of revolutionary science Revolutionary science is increasingly riskier as the body of normal science grows, but its potential impact is also increasingly larger, while finding the right researchers to do it is increasingly difficult 57 Summary – 3 Normal science research is required in all sciences, and to an increasing extent as the science matures Having sufficient amount of good quality normal science research provides predictability and safety (in funding, impact and reputation terms) for a research department Summary – 4 To cultivate revolutionary science, the management of the institution should make clear long term decisions about getting into such risky venture – size matters The expectations about impact, funding and personnel should be clear and should be set in realistic time frames (depending on the science domain) The associated risk of non-delivery, the lack of ability to monitor progress in any close sense, and the autonomous funding requirements should be clear and acceptable for the management 59