Research: metrics, quality, and management

Peter Andras
School of Computing Science
Newcastle University
 Research metrics
 Science domains and evolution of
 Quality and performance
 Science management
Introduction – 1
RAE – performance and funding
 REF – metrics based new performance
measurement system (2014)
 Scientomerics, bibliometrics
 Other countries: Netherlands, Australia,
Introduction – 2
What do we measure by metrics?
 What can be measured by metrics?
 How are metrics related to performance
and quality?
 How can higher education and research
management use science metrics?
Origins of research metrics
Derek De Solla Price – 1960s
 Measuring ‘big’ science – physics
 Measuring national science output –
papers, citations, patents
 Bibliometrics and scientometrics –
1970s – today
 Early: counts
 Recent: networks, distributions and
conditional counts
What is measured – 1
Publication counts within a time window
– at least a year, and usually within a
range publication venues (conferences,
journals, books)
 Citation counts of counted publications –
citation counts per year may grow for a
while and the shrink to zero through
some time
 Links – co-authorship, co-citation, citing
– Google page rank
What is measured – 2
 h-index – n publications, each with at least n
citations – max n
 g-index – m publications, with m citations in
average – max m
Journal metrics:
 Impact factor – average (expected) citation per
paper for a time window (2 years)
 Immediacy index – the ratio of papers in a
journal, which are cited within a year
 Citation half-life – the median age of papers that
are cited currently
 Journal h-index
Metrics in academia:
 Individual
 Group
 Departmental
 University level
Larger integration implies more stability
and less variability over (shorter) time in
the metrics
Domain comparability
Different science domains have different
practices of doing research, and
publishing and discussing research
 E.g. computer science – software
engineering and software applications –
use predominantly conferences to
present new results
 E.g. biology – uses predominantly
journals to present new results
Considering most sciences the underlying
generic pattern of publication and citation
metrics is very similar – e.g. all follow
power law distributions
 Normalised metrics:
 Leiden research metrics centre – raw metrics
are divided by domain specific parameters to
make them comparable across subjects
 Domain specific parameters are calculated by
analysing domain specific journals and papers
(from high impact multi-domain journals) –
journal classification
Domain nature
Domain specific public opinion
 How are new results presented
 How are they discussed
 How are they referenced
 What are the primary and preferred venues
 How is reputation established and maintained –
for individuals, groups and institutes
 How is the socio-dynamics of the domain
 What are the ways of ‘experimental’ validation
Distortion factors
Review papers – very highly cited
Methods papers – most highly cited
Self-citations – important at low counts,
ignorable above medium counts (domain
specific counts)
 Networking – citation cliques
 Spinning – relatively many highly cited
papers contain high claims that are refuted
in relatively short time
 ‘Authority’ – unjustified benefits from
already established authority status
Variability and noise
The metrics data may be highly variable
– especially at individual or small group
level, and also if too short time periods
are considered
 Structural noise caused by the
mechanisms of science – e.g. time to
citations peak (domain specific)
Common research metrics
Publication and citation counts considering
a time window
Individual and group level h-index / g-index
Patent counts and patent revenue value
Research funding value
Number of PhD students satisfying some
quality minimum (e.g. submission of thesis
on time, publication of journal paper)
Science esteem evaluation – measure of
long term reputation management and
networking ability
Metrics and quality
How are metrics related to research
quality of an individual / group /
department / university ?
 Are metrics purely retrospective or do
they have predictive value as well with
respect to research quality ?
Evolution of science
Karl Popper – continual additions to
science, all additions being falsifiable and
valid until falsified, revisions caused by
falsification of previously accepted
scientific knowledge
 Thomas Kuhn – normal science and
revolutionary science, normal science
grows by many little additions filling gaps in
the current scientific knowledge,
revolutionary science reorganises science
fundamentally and grows rapidly until
matures into normal science
Progress in normal science
Use existing knowledge to make highly
likely predictions
 Check the validity of such predictions by
appropriately designed ‘experimental’
 E.g. demonstrating a statement or a new
minor theorem in a well-established
branch of mathematics
Progress in revolutionary science
Use existing knowledge to discover
fundamental inconsistencies or relationships
about previously unquestioned or believed-tobe-unrelated pieces of scientific knowledge
See what cannot be seen by most others and
re-shape the related science according to the
new discovery
E.g. non-Euclidean geometry (Bolyai,
Lobachevsky), alternative continuum
hypothesis (Godel, Cohen), axiom of choice
alternatives (Zermelo, Godel, Cohen)
Predictability of success – 1
Normal science:
 New hypotheses are about highly predictable
 The predictability of success is lower if the
hypothesis is in an area where experimental
validation of earlier hypotheses is shaky - these are
at the boundary of the normal science, and avoided
by success-hungry researchers
 Most research is successful in filling a gap and
adding new bits of knowledge to the science
 Most research is published and is referenced by
further publications – being more mainstream in the
current science implies more expected citations
Predictability of success – 2
Revolutionary science
 Most hypotheses turn out to be wrong
 Successful revolutionary science is built on large
volume of unsuccessful tries and takes very long
time (see the time between Euclid and nonEuclidean geometry, or the time between axiomatic
set theory and alternatives of the choice axiom)
 May take long time to be noticed, due to established
authorities and authoritative views of the science
 High risk of achieving no success and being largely
ignored for a good while even if success is achieved
 Many times it does not lead to publications, citations
and reputation enhancement
Extent of impact
The expected impact is reversely correlated
with the predictability of success
Normal science has impact – however
spinning may create the impression of higher
impact that the actual; usually the impact is
small, since it is about filling a usually small
gap in the science
Revolutionary science has major impact, if it is
successful, in this case it is likely to change
(rewrite) a large part of the existing science or
initiate a fundamentally new branch of science
with rapid initial growth potential
Domain maturity
Maturity is indicated by the proportion of normal
science progress within the science
Mature science: most progress is made by small
additions following normal science research, there
are rare and relatively marginal revolutionary events
– e.g. physics
Early stage science: most progress follows the
pattern of revolutionary science discovery, normal
science progress happens for relatively short
periods, and normal science components are
regularly undermined by new revolutionary science,
discovering the same several times is not
uncommon, spinning may play an important role in
establishing reputation – e.g. management science
Domain specific progress & impact
Progress and impact depends on the
maturity of the science
 More mature science implies more work
following the pattern of normal science
 More mature science also implies
relatively little impact for most new
Domain public opinion – 1
Science maturity determines the nature of the
public opinion of the science
Mature science is based on a stable and
unchallenged body of scientific knowledge,
which allows relatively few major novelties,
implies well established and standardised
experimental validations, and implies relatively
low impact for most discoveries
Implication: archival style publication in
journals, use of highly standardised language,
large integrated scientific societies, the use of
spin to create the impression of revolutionaryness and to achieve visibility
Domain public opinion – 2
Early stage or less mature science allows regular
‘revolutionary’ novelties, doesn’t have an
unquestionable body of established knowledge,
allows experimentation with validation methods,
allows repeated re-discoveries, and implies relatively
high impact for the truly successful revolutionary
This environment may facilitate the more informal
way of publication of results, also implies less
citations for most results, which turn out to be less
revolutionary than expected, the high impact may
find its way on alternative routes – i.e. avoiding
formal citations, spinning may play an important role
Measurability of normal science
Normal science produces large volume output in
well-established venues (journals, conference
Normal science papers are regularly cited by other
such papers – papers in the currently most
fashionable mainstream areas can get many
Judging the quality of usual normal science
publications is relatively straightforward, spinning
has usually short-term effects
Research grants are frequently available for normal
science projects with high predictability objectives
Measurability of revolutionary
Revolutionary science output is rare, most ‘revolutionary’-intended
publications are rejected or end up in obscure journals or
conferences, scientists working on revolutionary topics may refrain
from publication of half-baked ideas
Revolutionary science papers may stay as a ‘sleeping beauty’ for
many years, they may get few citations for a long time, and may not
get many citations ever (e.g. if other follower papers collect most of
the citations)
It is hard to judge the quality of revolutionary science work – it is out
of the boundaries or at the boundaries of normal science, there are
no standardised methods or well established referential knowledge
base, the impact is highly unpredictable
A very large part of science intended to be revolutionary leads to
dead-ends, although the rare truly revolutionary works have
extremely high impact on their science
Few research grants are available for revolutionary science, due to
its unpredictable success, and out-of-ordinary methods and
objectives, getting such grants many times needs excessive
Metrics measure normal science
Research metrics measure almost exclusively normal
Research funding
PhD students
Measuring revolutionary science
It is very difficult to measure revolutionary science
It becomes obvious that a work is part of revolutionary
science usually well after the work is published – 5 – 25 or
more years later – depending on the science domain
Revolutionary science work is confirmed to be truly
revolutionary at or close to the point when the new
science knowledge based on this work becomes normal
It is very difficult to measure the expected impact of
scientific work meant to be revolutionary
Revolutionary science is not measured by usual research
metrics (or at most it measured to very little extent before
it becomes normal science)
Quality and metrics – 1
Quality in case of normal science can be evaluated and
predicted reliably – the methods are standardised and
relatively easy to check that they are followed correctly,
the expected results are highly predictable, their impact is
also well predictable
 Research metrics are well correlated with research quality
in case of normal science
Quality and metrics – 2
It is difficult to compose valid short & medium term metrics for
revolutionary science – long term metrics work in the sense of
evaluating whether it led to emergence of new normal science
 Quality is hard to assess because results and impact are not
easily predictable – an indicator can be the extent of the gap
that would be bridged across by the new intended discovery,
but spinning may blur the reliability of the estimate of the extent
of the gap
 Quality may be judged by others who are active in the same
area and have similarly wide range of knowledge, but even
these judgements can be easily wrong
 There is not much relation between any common research
metric and the quality of revolutionary research and its output
Output and funding
In short-to-medium term (3-5 years):
 Funding by QR and research council grants (in the UK)
follows the performance in output generation (primarily
citation counts) – significant correlation
 Output generation (publication and citation counts) depends
on earlier output performance (primarily citation counts) and
much less on change in the level of research funding
Research funding performance reflects earlier output
performance (primarily normal science)
 Extra research funding does not necessarily imply extra
output above the level predicted based on earlier output
Metrics for management
How can we use research metrics in the
context of management ?
 What kind of research metrics can be used ?
 What can be achieved by using metrics ?
Management objectives
Management objectives should be set in
clear terms, should be achievable, and
the progress towards achievement
should be measurable
 Metrics can help in formulating such
research management objectives
 Objectives should be set after
sufficiently clear measurement and
evaluation of the current status
Measuring progress and
achievement – 1
Research metrics are key to set objectives with
measurable progress in normal science
Normal science based long-term retrospective
measurement of revolutionary science work is the
main way of measuring progress and achievement
in revolutionary science
Measuring progress and
achievement – 2
Measurement should consider a large part
of the spectrum of possible outcomes (in
quality terms) at sufficient resolution (e.g.
papers published in a wide range of
venues, grant applications with all levels of
 The context and methods of measurement
should be establish clearly and
transparently (e.g. set of considered
journals, conferences, time windows,
exclusion rules)
Metrics as a management tool – 1
Individual metrics:
 Publication and citation counts and distributions (yearly counting) for 3 /
5 / 7 years (depending on status) – aim increasing distribution graphs
and preferably exponentially increasing citation graph
Discounted publication and citation metrics – elimination of reviews,
editorials, database reports, excessively-many-author publications, etc. –
aim similar as above
List of highly cited excluded publications for 3 / 5 / 7 years – aim
depends on individual
h-index and/or g-index – with and without excluded publications –
change in the index over the last 3 / 5 / 7 years – aim steady increase
Research grants – count and yearly distribution of applications,
distribution of evaluations, amount and yearly distribution of awarded
grants for the last 3 / 5 / 7 years – aim steady level or increase
For top-flyers – comparison with metrics results of a set of agreed topflyers elsewhere – this applies to metrics that can be calculated using
common and accessible data sources
Metrics as a management tool – 2
Group & department metrics
 Same set as metrics for individuals, but counted for the group or
Distributions of individual metrics descriptors (e.g. publication
and citation count) over the group or department members – aim
balanced distribution with a set of high-flyers
Change in the above distributions over time – aim steady or
improved balance, with a steady or increasing count of highflyers
Cohesion metrics: member co-authorships of counted papers
and grant applications and proportion of these within the whole
measured sets – aim sufficient but not excessive co-working
Comparison with metrics results for a set of departments
considered close competitors – this applies to metrics that can
be calculated using common and accessible data sources
Metrics as a management tool – 3
University metrics
 Same metrics as for groups & departments
 Group distribution and cohesion metrics
should be calculated by considering
departments instead of individuals
 Comparison with metrics results for a set of
universities considered close competitors –
this applies to metrics that can be calculated
using common and accessible data sources
Setting targets
Targets are short term objectives – time
limits/windows should be set very clearly
 Should be clearly defined, based on
evaluation of current standing,
achievable, and clearly measurable
 While targets apply to single years, they
should be considered in the context of
multi-year objectives
Domain specific impact
The impact of the work is domain
 This should be considered when setting
citation and related (e.g. h-index) targets
 Individual
 Group & department
 University
Clarity and predictability
It is critical to use metrics in
management with clear setting of their
context, methods, and evaluations
 It is also critical to have predictable
career-related outcomes associated with
metrics-based management evaluation
 It should be clear what gains/losses or
rewards/punishments are associated with
delivering the target/objective and the nondelivery, sub-delivery or over-delivery of it
Work time allocations should consider
explicitly the time related to work for the
achievement of research performance
 Metrics, evaluations, related
gains/losses, should be generally
accepted as valid
Competition and transparency
Competition in achievement should be
openly supported
 The performance evaluation process
should be open and transparent
 Individuals / groups / departments
should be able to compare themselves
with others on the basis of openly
available timely data (considering
normalisation due to science domain)
Hype and spin
The requirement for hype and spin is induced
by the larger market of research results (e.g.
publishing papers, getting patents)
Ideally hype and spin should be discounted in
internal evaluations, but without damaging the
enthusiasm of researchers who hype and spin
their work
More successful and higher reputation
department can afford stricter hype and spin
discounting – the negative subjective effects of
discounting are compensated by the shared
reputation dividend
Choice: normal vs. revolutionary
A choice between the extent of the normal
and revolutionary science work should be
made at all levels (individual, group,
department, university)
 Early stage: preference should be for
normal science to establish reputation
 Revolutionary science may be very exciting
and challenging, but too much of it may
lead to drop in or lack of reputation
Funding and impact expectation
Funding and impact expectation
depends on the normal / revolutionary
science mix
 The normal science part makes more
predictable the expectations, and also
limits these expectations
 The revolutionary science part makes
less predictable the expectations, but
may also imply very high longer term
Unit size – 1
Revolutionary science implies high risk
 High risk is affordable only if it can be
spread sufficiently widely  research
department size implication
 Small department may easily disappear
by taking too much risk
 Doing revolutionary science needs
sufficiently large size department that
can compensate for the risk by doing
sufficient amount of low risk normal
science work as well
Unit size – 2
Phase transition: revolutionary science needs
to be embedded in very large departments
doing large amount of normal science as well /
departments not doing revolutionary science
need to be smaller, otherwise they have too
much inefficiency (e.g. unable to provide
sufficient career perspective for many
members, too much overlap, too much hypedriven competition)
Transition between the two phases is possible
Managing normal science
Managing normal science work is
possible using metrics
 Since most of the well-established
sciences contain mostly normal science
this means that individuals / groups /
departments / universities can use
research metrics-based management to
a large extent
Managing revolutionary science – 1
Revolutionary science is very difficult to be measured
and can be measured reliably only by assessing its
normal science impact over long or very long term
 Investment in revolutionary science is very risky, and
most of it has no return
 Managing revolutionary science work may be possible
to some extent on the basis of domain knowledge of
established leaders of the domain – they may choose
more junior colleagues deemed able and ready to get
on the tracks of revolutionary science – this is also a
high risk decision for these younger scientists (they
may end up with very little and they may realise this
only after long time)
Managing revolutionary science – 2
Revolutionary science investment may be based
on long-term revolutionary science track record of
a department or university, but should be noted
that any such track record is not a guarantee of
successful continuation – however consecutive
generations of revolutionary scientists may
indicate that at least the previous generations of
them knew how to select their followers
HR implications – normal science
Mature science domains with sustained growth potential
need many (possibly increasingly many) researchers who
deliver results for normal science progress
The ability distribution of the pool of potential researchers
does not change in short time, implying that if more potential
researchers find work as scientists, increasing number of
less able researchers work on normal science topics, and
the average ability of researchers working in the science
domain is lowered
The increasing standardisation of normal science research
compensates for the average ability drop, making possible to
achieve good results for the less able researchers as well
HR implications –
revolutionary science – 1
As the body of normal science grows longer time
is required for a researcher to show proficiency in
production of normal science, which is required for
establishing the reputation
The benefits of doing normal science may attract
many who in principle would be able to do more
than that in terms of revolutionary science
discoveries – as the science grows likelihood of
finding such significant discoveries reduces,
increasing the chance that the ability to discover
revolutionary science will be wasted in the end
HR implications –
revolutionary science – 2
Researchers working on revolutionary topics may
become the funding and impact ballast of groups
and departments, which may lead to pressure on
them to switch for topics more rewarding in
funding and impact terms
Finding the right candidates for doing revolutionary
work becomes increasingly difficult as delivering
results in normal science becomes easier and the
spin-noise of pretended ‘revolutionary’-ness
becomes louder
Summary – 1
Most of what can be measured to compile
research metrics belongs to normal science
Revolutionary science work disappears in the
flood of normal science until around the time point,
when it starts to turn into normal science
Research metrics can be used to measure quality
in the context of normal science performance
In the case of revolutionary science performance
the metrics work only in long term and retrospect
Summary – 2
Clarity, predictability, transparency, appropriate
hype discounting, and competition are critical for
the management of research
Normal science is easier to practice, more
researchers are available to do it, its impact and
expected funding are more predictable than in the
case of revolutionary science
Revolutionary science is increasingly riskier as the
body of normal science grows, but its potential
impact is also increasingly larger, while finding the
right researchers to do it is increasingly difficult
Summary – 3
Normal science research is required in all
sciences, and to an increasing extent as the
science matures
Having sufficient amount of good quality normal
science research provides predictability and safety
(in funding, impact and reputation terms) for a
research department
Summary – 4
To cultivate revolutionary science, the
management of the institution should make clear
long term decisions about getting into such risky
venture – size matters
The expectations about impact, funding and
personnel should be clear and should be set in
realistic time frames (depending on the science
The associated risk of non-delivery, the lack of
ability to monitor progress in any close sense, and
the autonomous funding requirements should be
clear and acceptable for the management