Uploaded by marialisa.scata

The State of Open Data 2023

advertisement
A Digital Science Report
November 2023
The State of
Open Data 2023
The longest-running longitudinal survey and analysis on open data.
With opening remarks from Springer Nature’s CPO, Harsh Jegadeesan, and Digital Science’s CEO, Daniel Hook.
Authors Mark Hahnel, Graham Smith, Niki Scaplehorn, Henning Schoenenberger and Laura Day.
DOI: https://doi.org/10.6084/m9.figshare.24428194
Data available at: https://doi.org/10.6084/m9.figshare.24517123
A Digital Science Report
The State of Open Data
Contents
03
About the survey
05 Key takeaways from the
State of Open Data 2023
10 Key insights: Analysis to action.
The need for a nuanced approach
to the momentum of open
research data
13
15 Key insights: Beyond policy,
meeting researchers’
practical needs
From generalized trends to a
demographic-tailored approach
20 Researchers at all stages of their
careers share the same optimism
and concerns around open data
23 The importance of data
management plans (DMPs)
25 AI and open science - the start
of a beautiful relationship?
27 Recommendations for the
academic community
29 Authors and acknowledgements
1
A Digital Science Report
The State of Open Data
01
Opening remarks
“Over the past eight years, open data and open research have
undergone a rapid transition – from being academic concepts and the
domain of the few, to becoming more widely accepted and, in some
countries, mandatory for researchers and institutions.
At the same time, The State of Open Data Report has become a unique,
long-term resource chronicling the establishment of open data,
attitudes towards it, and researchers’ experiences of data sharing.
At Digital Science, we’re proud of the role we’ve played in capturing
researchers’ sentiments toward and experiences of open data during a
time when both practice and community have found their footing. We
hope that the act of reporting these early steps back to the wider research
community has helped to contextualize and normalize good practice
in why and how research data should be shared. Furthermore, these
reports themselves, together with the data that underpin them, now form
a valuable resource to the community to understand the longitudinal
development of open data practices and sentiments in our community.”
Dr Daniel Hook,
CEO
Digital Science
“Researchers are at the heart of what we do. Understanding their
motivations for engaging in open research practices is critical if we, as
a community, are to develop sustainable routes towards open science.
But why is open science important? Because ensuring easy and open
access to all parts of research supports accessibility, usability and
re-usability – and this is key in helping to ensure research can be built
upon and gets into the hands of those that can effect change to tackle
the world’s most challenging issues.
This is central to our mission at Springer Nature, and The State
of Open Data report, the only industry report offering an annual
snapshot of open research trends, plays a central role in this. The data
it provides enables publishers, funders and institutions to gain clear
insights from researchers, helping us understand the roles we need to
play in driving accessible research. This year, we have expanded that
further with the first publication of a partner report by the Computer
Network Information Center of the Chinese Academy of Sciences,
looking at open data in China.
Together with our partners Figshare and Digital Science, we are delighted
to present this report, and continue to work collaboratively to better
understand and drive the solutions needed to support data sharing.”
2
Harsh Jegadeesan,
Chief Publishing Officer
Springer Nature
A Digital Science Report
The State of Open Data
02
About the survey
Background
Professional experience
The State of Open Data survey continues to provide
All respondents were questioned to ensure they were either
a detailed and sustained insight into the motivations,
currently or had recently participated in developing research
challenges, perceptions and behaviours of researchers
results. Respondents who had not published within the last 5
towards open data. The survey is a collaboration between
years were screened out of the survey; in total, 332 responses
Figshare, Digital Science and Springer Nature.
were screened out at this stage.
The State of Open Data survey contains approximately 55
79% had published or submitted a manuscript within the last
questions and is intended to take 20 minutes for respondents
year, while 15% did so within the last 1-2 years, and 6% within
to complete. There are also 3 open questions, all of which
3-5 years. This is slightly lower than previous responses –
are optional. The survey is translated into French, German,
82% of respondents in 2022 stated they had published or
Japanese and Simplified Chinese. An incentive of five $100 gift
submitted a manuscript within the last year, and 25% within
cards was offered through a prize draw, with ten stuffed toys
the last 5 years.
for winners of a prize draw in China.
When asked in what year their first peer-reviewed research
There were 6,091 usable responses from the State of Open
article was published, 24% of respondents reported this as
Data survey this year. The survey received a total of 7,042
being before 2000. This is also a decrease compared to 2022,
responses, some of which were screened out during the data-
where a third of respondents reported publishing their first
cleaning process. Please note that some of the questions in
peer-reviewed article before 2000.
the survey were not mandatory or relevant to all respondents.
When asked about their job title, 38% selected senior
The largest proportion of responses were completed in English,
research roles (professors, associate professors, research
representing 77% of the sample. This was significantly higher
directors or lab directors), while another 22% selected early
than the languages of other respondents, with the second largest
career roles (PhD/master’s students, postdoctoral candidates,
being Chinese at 10%, followed by Japanese and German which
research assistants or undergraduate students). In
each made up 5% of responses. The smallest proportion of
comparison, in 2022, 31% of respondents classed themselves
responses were completed in French, at 4%.
as senior researchers, and 31% as early career researchers.
3
A Digital Science Report
Field of interest
The survey received responses from a broad range of
disciplines. The largest proportion of respondents were from
medicine (23%) which is consistent with previous responses
– in 2022, 23% were also from this field. This was followed
by biology (16%) and engineering (10%). The proportion
of responses from social sciences (9%) and earth and
environmental sciences (8%) were also consistent with 2022,
where 9% and 7% of respondents reported working in these
fields, respectively.
Respondent institutional
information
The majority of respondents said they worked or studied in
a university (64%), up from 54% in 2022. This is followed by
research institutions (13%), which was similar in 2022, at 14%.
A further 7% worked in a hospital or other healthcare setting
and 5% worked at a medical school.
Regional comparison
The country with the largest individual response size was
India, representing 12% of responses. This was followed by
China at 11% and the US at 9%. This differs slightly from 2022
– the highest response rates were from China and the US,
which each accounted for 11% of responses. Other nations
with response rates of over 100 were Germany (6%), Japan
(5%), Italy (4%), Ethiopia and the UK (each at 3%), as well as
Turkey, Brazil, Spain, Canada, Pakistan, Egypt, France and
Nigeria (each making up 2% of responses).
4
The State of Open Data
A Digital Science Report
The State of Open Data
Key takeaways from
Key takeaways from the State of Open Data 2023
The State of
Open Data 2023
03
Support is not making its way to those who need it
Almost three-quarters of respondents had never received support with making their data
openly available.
One size does not fit all
Variations in responses from different subject expertise and geographies highlight a need for
a more nuanced approach to research data management support globally.
Challenging stereotypes
Are later career academics really opposed to progress? The results ofNothe 2023 survey
indicate that career stage is not a significant factor in
awareness or support levels.
Yes,open
based indata
my department
Yes, based in my lab
Credit is an ongoing issue
Yes, based in my library
For eight years running, our survey has revealed a recurring concern
among researchers: the
Don't know
perception that they don’t receive sufficient recognition
forplease
openly
sharing
their data.
Other,
give details
40
150
AI awareness hasn’t translated to action
220
781
For the first time, this year we asked survey respondents
to indicate if they were using
ChatGPT or similar AI tools for data collection, processing and metadata creation.
326
Support is not making its way to
those who need it
Almost three-quarters of respondents
have never received support with planning,
managing or sharing research data.
429
Do you have access to support
from specialist data managers?
No
With the global increase in policies and mandates to
share data openly, who researchers are approaching for
support becomes a pertinent question.
Yes, based in my department
If respondents stated that they were aware of the
concept of a data management plan, they were then
asked if they have access to support from specialist data
managers and we saw over 50% of our respondents
state that they do have access to specialist research
data managers in their research setting, but who else
has been providing support?
Don't know
Almost three quarters of respondents had never
received support with planning, managing or sharing
research data. When respondents were asked if they
had ever received support with managing or making
their data openly available, only 23% said they had. Of
that 23%, 61% received support from informal internal
sources such as colleagues or supervisors. Two other
sources of support ranked highly with our respondents;
their institutional libraries (31%) and their research
office/in-house institutional expertise (26%).
Yes, based in my lab
Yes, based in my library
Other, please give details
40
150
220
781
326
429
Graph showing the responses to the question ‘Do you have access to support
from specialist data managers?’ This question was only asked if respondents
stated that they were aware of the concept of a data management plan. This
graph shows the number of respondents for each answer.
5
Awareness of DMP by Country (Top 10 Countries)
100
A Digital Science Report
The State of Open Data
80
Yes
No
Percentage
60
One
size does not fit all
I don’t know
We40 need a nuanced approach towards research data management globally
n
pa
Br
Ch
in
a
(M
G
Ja
az
il
ai
nl
an
d)
ey
ly
Ita
Tu
rk
an
y
er
m
s
te
In
di
a
St
a
d
in
te
U
U
in
te
d
Et
Ki
ng
d
hi
op
ia
om
The20variation in responses from different geographies in The State of Open Data 2023 clearly
indicates that there are vast differences in the current approaches and attitudes towards data
0
sharing
around the world. When looking at key awareness-focused questions such as whether
respondents are aware of the concept of a data management plan, we see considerable variation
Country
across different regions.
Awareness of the concept of a data management
plan by primary area of expertise
Awareness of DMP by country (top 10 countries)
Awareness of DMP by Country (Top 10 Countries)
Percentage of respondents
80
Percentage of respondents
100
Yes
No
60
I don’t know
40
40
Yes
No
20
I don’t know
Co
th
er
O
pa
n
Ea
rt
Ch
in
a
h
Ja
az
il
(M
ai
nl
an
d)
Br
rk
Tu
er
U
in
U
te
in
te
G
d
ey
ly
Ita
an
y
m
In
di
a
s
at
e
St
do
d
Ki
Et
ng
hi
op
ia
m
(p
0
le
a
m
pu
tin
an
se
g
d
sp
So
en
ec
ci
vi
ify
al
ro
)
nm sc
en ien
ce
ta
Ar
s
l
ts
sc
ie
&
n
H
um ce
As
an
tr
on
iti
es
om
M
ed
y
an
ic
in
d
e
pl
Bi
an
ol
et
og
ar
y
y
sc
ie
nc
Ch
e
em
Bu
is
En
tr
si
gi
y
ne
ne
ss
/In erin
g
ve
st
m
en
M
t
Ph
at
er
ys
ia
i
cs
ls
sc
ie
M
nc
at
e
he
m
at
ic
s
20
Country of respondents
Primary area of expertise of respondents
Awareness levels of the concept of a data management plan broken down by
countries. This graphic highlights the awareness levels from the ten countries
from which we saw most respondents.
Awareness of the concept of a data management plan broken down by
primary area of expertise and ordered by percentage of those that said ‘yes’.
There is also significant variation when we break the respondents down by their primary discipline. Differing levels
of data management plan awareness are evident from different expertise areas, as well as different motivations for
data sharing.
Motivations for sharing data openly by primary area of expertise
100
Percentage of respondents
80
60
an
um
H
&
ts
Ar
Bi
iti
es
en
t
ic
s
st
m
Bu
si
ne
s
En
s/
In
ve
Ph
ys
in
g
gi
tin
pu
m
Co
is
em
Ch
ne
er
y
tr
e
nc
ie
sc
ls
ia
er
at
M
g
20
24%
11%
1981
24%
15%
1982
28%
1983
14%
1984
26%
1985
20%
6%
1986
20%
20%
1987
25%
1988
23%
1989
25%
1990
20%
1991
19%
1992
18%
16%
1993
17%
12%
8%
9%
16%
6%
5%
9%
14%
9%
6%
6%
10%
10%
12%
12%
11%
6%
12%
9%
6%
6%
9%
21%
12%
5%
5% 8%
7%
6%
7%
8%
10%
5% 7%
10%
11%
7%
5%
5%
10%
7%
9%
5%
11%
7%
6%
8%
5%
6%
14%
16%
5% 8%
13%
6%
7%
12%
11%
6%
6%
5%
1994
17%
1995
20%
1996
24%
1997
23%
1998
16%
1999
17%
21%
2000
17%
13%
2001
18%
2002
19%
2003
25%
2004
20%
11%
2005
20%
11%
2006
25%
2007
27%
2008
20%
2009
18%
2010
16%
16%
6%
11%
6%
12%
6%
6%
17%
14%
7%
6%
8%
15%
6%
6%
16%
6%
5%
9%
11%
6%
11%
7%
2013
17%
7%
5%
6%
12%
5%
5%
7%
8%
14%
11%
6%
8%
16%
6%
12%
8%
5%
11%
10%
6%
Public benefit
14%
16%
12%
9%
Other (please specify)
10%
15%
8%
14%
8%
12%
6%
11%
16%
16%
19%
8%
Primary area of expertise 2011
of respondents
7%
Open data badge
14%
11%
Journal/Publisher requirement
None
13%
6%
14%
13%
10%
6%
6%
7%
8%
It was made simple and easy to do
9%
11%
10%
11%
11%
9%
18%
6%
10%
10%
5%
10%
11%
9%
18%
15%
Full data citation
It was a field/industry expectation
6%
10%
15%
8%
6%
8%
2012
5%
Freedom of information request
Institution/Organization requirement
19%
11%
7%
Financial reward
Increased impact and visibility of my research
17%
6%
5%
13%
15%
6%
16%
7%
20%
6%
6%
Direct request from another researcher
Greater transparency and reuse
12%
13%
Consideration in job reviews and funding applications
Funder requirement
12%
16%
5% 8%
Citation of my research papers
Co-authorship on papers
13%
9%
11%
6%
7%
6%
8%
12%
6%
8%
15%
5%
14%
11%
6%
5%
6%
8%
9%
11%
14%
6%
5%
17%
14%
8%
6%
10%
9%
8%
9%
15%
ol
og
y paper publication
Year of first
M
ed
ic
in
e
Ea
M
rt
at
h
h
em
an
d
at
en
ic
s
vi
ro
nm
e
sc nt
ie al
So
nc
ci
e
a
ls
O
th
ci
en
er
ce
(p
As
s
le
tr
as
on
e
om
sp
ec
y
ify
an
)
d
pl
an
e
sc tar
ie y
nc
e
40
1980
13%
8%
13%
15%
8%
10%
6%
15%
2014 18%
10%
12%
12%
Citation of
research
papers
Increased
visibility
of 8%
my research
Percentage distribution of responses
tomywhat
the
primary circumstances
wereimpact
thatand
would
motivate
them6%to16%
share their data,
broken down by primary area of expertise.
2015
23%
2016
19%
9%
Co-authorship on papers
Institution/Organization requirement
Consideration in job reviews and funding applications
It 2017
was a 22%
field/industry expectation
8%
Direct request from another researcher
20% simple and 10%
It 2018
was made
easy to do
14%
5%
5%
7%
13%
6%
13%
6%
11%
12%
13%
9%
11%
10%
9%
14%
11%
9%
7%
10%
7%
6%
These varying responses
highlight
that a nuanced Journal/Publisher
approach
to research data management is needed to proficiently
requirement
Financial
reward
of information
request worldwide.
None
encourage and supportFreedom
data
sharing
Full data citation
6
2019
23%
2020
19%
2021
23%
8%
6%
2023
16%
7%
7%
2022 data
21%badge
Open
6%
9%
Funder requirement
Other (please specify)
Greater transparency and reuse
Public benefit
7%
7%
9%
8%
11%
13%
12%
11%
11%
13%
11%
6%
12%
6%
15%
14%
13%
11%
Percentage of respondents
12%
6%
17%
12%
A Digital Science Report
The State of Open Data
Challenging stereotypes: are later career academics really
opposed to progress?
Career length is not a significant factor in open data awareness or support levels
From the responses to the 2023 survey, it appears that the length of academics’ careers is not a
significant factor in terms of awareness levels of the concept of a data management plan or support
levels of data being made openly available common practice.
Awareness of data management plans by first year of publication
60
Percentage of respondents
50
40
Yes
I don’t know
30
No
20
10
0
Year of first paper publication
Awareness of the concept of a data management plan broken down by the first year that the respondent published a peer-reviewed article.
Citation of my research papers
This indicates that the idea that
Publicearly
benefitcareer researchers are driving data sharing forward and that more
Increased impact
and visibility
of my research
established,
later career
academics
are opposed to progress in the space is a misconception. Researchers
data
citation the same challenges and share the same motivations for data sharing.
spanning all career lengthsFullare
meeting
20
0
200
es
e
pa ass ar
is ch
lI
or
ta
nv
at
es nt
or
tig
y
D
at
ire
or
ct
or
Re
/H
se
ea
Po
ar
d
ch
st
do
sc
ct
ie
or
nt
al
is
Ph
t
c
D
an
/M
di
as
da
te
O
t
e
r's
th
er
st
ud
(p
U
le
en
nd
as
t
er
e
sp
gr
ad
ec
ify
ua
)
te
st
ud
H
en
ea
t
lth
Re
ca
re
tir
ed
pr
o
Ph
fe
ss
ys
io
ic
na
ia
n/
l
As
C
s
l
Re
in
is
ic
ta
se
ia
nt
ar
n
ch
pr
of
di
es
re
so
ct
or
r
re / VP
se o
ar f
c
As
Pr
h
so
of
es
ci
at
so
e
r
pr
of
es
so
r
/R
an
ci
ni
in
ci
La
b
Research articles
open access
Research data
openly available
JobResearch
title data
openly available
Peer review
open
or
ss
y) M
at
ec
di
as
io
e
da R
to
na
te
st O
r/
te es
ud th
r's
n/
H
l
ea
ea
Cl He
en er
st Po
rc
in a
d
ud st
t
(
h
p
l
i
ta
se
U
ci th
le
sc
en do
an ca
nt
Re nd
ar
as
ie
t cto
re
ch
tir er
pr
nt
e
ra
sp P
ed gr
of
is
pr
di
l
h
t
a
es
ec D
ca
of
re
du
so Ph
es
ify /M
ct
nd
at
or
si
r ys
) a
id
e
on
ic
st
at
st O
ia
e
re / VP
a
e
ud th
r's
n/
l
se oA
Cl He
en er
st
R ar f ss
in a
ud
t (p
As
Pr esech ista
ic lth
U
le
en
ia c
so
of a
nt
Re nd
as
n ar
t
es rch
ci
e
t
p
e
e
ire rg
at
ro
so d
sp
pr
e
d ra
fe
r ire
ec
of
pr
du
ss P
es
ify
ct
of
or hy
a
o
s
)
te
es
si
io
r
ci
so
na
st
re / VP
an
ud
r
se o A
/C H l
en
lin ea
Rearc f ssis
t
As
Pr seh ta
ic lth
ia c
so
of a
nt
Re
n ar
es rch
ci
tir
pr
e
at
so d
ed
of
pr
e
r
ire
es
o
pr
fe
so Ph
ct
of
ss
or
r ys
es
io
ic
so
na
ia
re / VP
r
n/
l
se o A
C
s
a
lin
R r f s
As
Pr esceh ista
ic
ia
so
of a
nt
n
es rch
ci
pr
at
so
of
e
r dire
es
pr
so
ct
of
or
r
es
so
re / VP
r
se o
ar f
ch
As
Pr
so
of
es
ci
at
so
e
r
pr
of
es
so
r
Research articles
open access
Peer review
open
Publishing a
pre-print
data agree’
Peer review
Percentage of Research
‘strongly
responses
to athe question of whether
Publishing
openly available
open
pre-print
four selected core open practices should be ‘common scholarly
Job title
practice’,
broken down by job title.
Peer review
open
Job title
Publishing a
pre-print
is
ci
a
Job title
As
s
si
Research data
openly available
Pr
Te
ch
Research articles
open access
Publishing a
pre-print
1980
24%
11%
1981
24%
15%
1982
28%
1983
14%
1984
26%
9%
1986
20%
20%
1987
25%
1988
23%
6%
20%
9%
1991
19%
21%
1992
18%
6%
10%
12%
12%
6%
6%
8%
9%
5%
6005%
12%
1994
17%
1995
20%
1996
24%
1998
16%
1999
17%
21%
2000
17%
13%
2001
18%
2002
19%
2003
25%
2004
20%
11%
2005
20%
11%
2006
25%
2007
27%
2008
20%
2009
18%
2010
16%
2011
19%
2012
18%
12%
2013
17%
7%
2014
18%
2015
23%
2016
19%
2017
22%
2018
20%
2019
23%
2020
19%
2021
23%
2022
21%
2023
16%
10%
16%
6%
6%
7%
6%
6%
9%
11%
6%
11%
7%
8%
9%
8%
7%
9%
8%
6%
8%
12%
6%
11%
13%
12%
9%
10%
7%
11%
11%
11%
15%
16%
6%
12%
13%
6%
13%
11%
13%
8%
13%
13%
8%
14%
15%
10%
14%
7%
7%
7%
6%
5%
7%
8%
16%
6%
11%
9%
8%
9%
7%
9%
10%
6%
6%
6%
5%
5%
5%
11%
5%
12%
14%
8%
12%
8%
Journal/Publisher requirement
Public benefit
15%
11%
10%
6%
12%
Increased impact and visibility of my research
Other (please specify)
10%
16%
12%
9%
5%
14%
11%
Greater transparency and reuse
Open data badge
14%
6%
Full data citation
None
13%
6%
11%
14%
8%
10%
7%
16%
16%
8%
13%
7%
10%
6%
11%
10%
6%
Freedom of information request
It was made simple and easy to do
9%
14%
11%
8%
9%
18%
6%
10%
5%
6%
5%
10%
Financial reward
It was a field/industry expectation
19%
10%
11%
10%
9%
6%
Direct request from another researcher
Institution/Organization requirement
16%
15%
11%
8%
17%
15%
8%
6%
1200
12%
6%
Consideration in job reviews and funding applications
Funder requirement
12%
16%
11%
5%
6%
6%
16%
6%
6%
5%
8%
15%
11%
5%
1000
5%
7%
15%
10%
10%
9%
6%
13%
8%
5% 7%
7%
7%
14%
6%
13%
17%
20%
7%
5% 8%
Number
11% of responses
12%
6%
Publishing a
1997 23%
pre-print
7%
5%
800 7%
8%
7%
16%
5%
11%
6%
14%
6%
8%
5%
12%
16%
7%
6%
5% 8%
13%
6%
6%
Citation of my research papers
Co-authorship on papers
13%
9%
6%
12%
11%
12%
8%
15%
5%
11%
6%
7%
6%
5%
6%
14%
11%
6%
12%
6%
8%
9%
11%
14%
11%
6%
5%
17%
6%
8%
8%
9%
10%
9%
10%
1990
14%
9%
15%
14%
6%
400
16%
6%
9%
Research articles
open1985
access20%
Peer review
1993 17%
open
8%
5%
Research data
25%
openly1989
available
Research articles
open access
Re
hy
Motivations for data sharing by publication year
Year of first paper publication
Percentage of respondents
Co-authorship on papers
Financial reward
Journal/Publisher
requirement
Percentage of ‘strongly
agree’
responses
Greater transparency and reuse
for selected questions by
job
title
Funder requirement
Consideration in job reviews and funding applications
Direct request from another researcher
It was made simple and easy to do
Institution/Organisation requirement
60
Freedom of information request
None
It was a field/industry expectation
40
Open data badge
Other (please specify)
11%
10%
7%
13%
6%
6%
15%
14%
13%
11%
6%
12%
12%
6%
17%
12%
Distribution of responses
to the question of what
circumstances would motivate
the respondent most to openly
share their data, broken
down by the first year that the
respondent published a peerreviewed article.
Percentage of respondents
7
A Digital Science Report
The State of Open Data
Credit is an ongoing issue
Researchers still do not feel they receive appropriate credit for openly
sharing their data
For eight years running, The State of Open Data survey has revealed a recurring concern
among researchers: the perception that they don’t receive sufficient recognition for openly
sharing their data. The 2023 survey responses continue this trend, highlighting an ongoing issue
within the research community that needs to be addressed in the future.
Do you think researchers currently get sufficient credit for sharing data?
Yes
Percentage of respondents
60
No
40
20
2019
2020
2021
2022
2023
Survey year
Longitudinal survey data from 2019-2023 for the question ‘Do you think researchers currently get sufficient credit for sharing data?’
4500
Number of respondents
4000
In terms of what would motivate researchers to share their data, the responses remain very similar to previous
years, with full citation of research papers or a data citation ranking highly. However, ‘public
benefit’ was the
Yes
3500
second most popular selection for respondents when asked ‘which circumstances would motivate you most to
No
share your data?’
3000
they receive
too little
credit
2500
Which of these circumstances would motivate you most to share your data?
2000
Citation of my research papers
20%
Public benefit
12%
1000
Increased impact and visibility of my research
12%
Full data citation
11%
500
Co-authorship on papers
10%
1500
Financial reward
0
2019
Journal/Publisher requirement
2021
Greater 2020
transparency and reuse
6%
5%
2022
5%
2023
Survey year
Funder requirement
4%
Consideration in job reviews and fundingapplications
3%
Direct request from another researcher
3%
It was made simple and easy to do
3%
Institution/Organisation requirement
2%
None
1%
Freedom of information request
It was a field/industry expectation
Open data badge
Other (please specify)
1%
1%
1%
0%
Percentage of respondents
Graph showing the percentage of respondents that would be motivated by certain circumstances to share their data openly.
8
0
Open data software provider
Colleague/Supervisor
500
1000
1500
2000
2087
1810
Are you using ChatGPT or similar AI tools for data processing?
Are you using ChatGPT or similar AI tools for data collection?
I’m aware of these tools but haven’t considered it
I’m aware
of these
tools but haven’t considered it
A Digital
Science
Report
Not aware / don’t know
Not aware / don’t know
Yes, I’ve started using it
AI awareness
hasn’t translated
to action
7%
Yes, using it regularly
The State of Open Data
No but I’ve considered it
No but I’ve considered it
Yes, I’ve started using it
Yes, using it regularly
5%
The space is nascent but opportunities abound
18%
6%
8%
19%
19%
20%
For the first time, this year we asked survey respondents to indicate if they were using ChatGPT
or similar AI tools for data collection, processing and metadata collection.
The most common response to all three questions was
‘I’m aware of these tools but haven’t considered it.’
47%
51%
Are you using ChatGPT or similar AI tools
for data collection?
Are you using ChatGPT or similar AI tools
for data processing?
Are you using ChatGPT or similar AI tools for data collection?
Are you using ChatGPT or similar AI tools for data processing?
I’m aware of these tools but haven’t considered it
I’m aware of these tools but haven’t considered it
No but I’ve considered it
I’m aware of these tools but haven’t considered it
Not aware / don’t know
Not aware / don’t know
Yes, I’ve started using it
No but I’ve considered it
Yes, using it regularly
Yes, I’ve started using it
No but I’ve considered it
Are you using ChatGPT or similar AI tools for metadata creation?
Yes, using it regularly
Not aware / don’t know
Yes, I’ve started using it
5%
4%
7%
6%
Yes, using it regularly
18%
20%
6%
8%
19%
19%
20%
22%
51%
47%
49%
Are you using ChatGPT or similar AI tools
for metadata creation?
Are you using ChatGPT or similar AI tools for metadata creation?
I’m aware of these tools but haven’t considered it
Not aware / don’t know
No but I’ve considered it
Yes, I’ve started using it
Yes, using it regularly
4%
6%
20%
22%
We have now benchmarked researchers’ use
of ChatGPT and similar AI tools in regards to
research data and its management and we’re
looking forward to seeing how these responses
develop in years to come, in light of the fast
moving space of AI tools and their applications.
48%
9
A Digital Science Report
The State of Open Data
Mark Hahnel.
Founder, CEO - Figshare,
Digital Science
04
Key insights: Analysis to action. The need for a nuanced
approach to the momentum of open research data
The journey through the eight years of The State of Open
progress in researchers adopting data management plans
Data survey has witnessed a remarkable evolution in the
and dig into the importance of this later in the report. This is
open data sphere, where the confluence of technology,
an early signal with regards to researchers recognising the
policy, and community engagement has sculpted a dynamic
importance/or expectations their funders and institutions are
and resilient environment. From fostering transparency to
placing on data.
catalysing innovation, the impact of open data continues
to be felt. Kicking off the commentary around the data (all
The big challenges in promoting the publication of open
of which is open for your own perusal here: https://doi.
academic data remain the same: Credit and concerns.
org/10.6084/m9.figshare.24517123), we must not forget
Graham Smith talks more about these concerns in his key
the end goal of open academic data. In an era where data
insights piece: ‘Beyond policy, meeting researchers’ practical
is often heralded as the ‘new oil,’ we should recognize the
needs’. We may also be seeing some fatigue in enthusiasm
value and potential in driving innovation, policy-making, and
for open research in general as open data policies come
societal advancements that open academic data holds.
into place and researchers find themselves with even less
time to comply with the directives of their funders. 54% of
The 2023 report is intentionally more analytical than in
respondents felt that funders should make sharing research
previous years. As the report has gained international
data part of their requirements for awarding grants. In 2022,
traction, we want to provide as much detail as possible in
52% agreed – slightly less. However, this proportion has
this original 2023 report. This year also sees the launch of
dropped from 2019, where 69% agreed.
a partner report from the Computer Network Information
Center of the Chinese Academy of Sciences - using The State
of Open Data survey results to guide their understanding of
how Chinese researchers are responding to data publishing
requirements. The variance observed across different
geographical locales, research disciplines, and career stages,
however subtle, has also prompted us to create some
recommendations for stakeholders in order to recognize
• Respondents from universities were more likely to disagree
(26%).
• Similar to the national mandates, professors were less likely
to agree to make the sharing of research data part of their
requirements.
• The proportions were significantly higher for open science
advocates.
disparities and address them through targeted interventions,
policies, and support mechanisms. We anticipate and
46% of respondents agreed that they felt funders should
encourage further feedback on the report and national,
withhold funding from, or penalize, researchers who do not
funder or even organizational analysis of the data published
share their data if it was previously mandated by the funder
alongside the report.
at the grant application stage. This was two percentage
points higher than in 2022, but far lower than in 2019, when
As a longitudinal survey, we want to track how both
the agreement level was 69%. There were three areas which
emotions and actions are driving an exponential increase
were common for respondents to feel they needed help with
in the number of datasets published year on year. In the
in regard to making their data openly available. ‘Copyright/
survey data, we can see a shift as researchers respond to
licensing of data’ (55%), ‘Finding the time to manage their
new funder policies coming into place, a focus on trust in
data’ (53%) and ‘Understanding the data management
a post-COVID world and the rapid emergence of new tools
policies that apply to them’ (51%) were all selected by over
and technologies, such as artificial intelligence (AI). We see
half of the respondents, significantly high proportions.
10
A Digital Science Report
The State of Open Data
What areas, if any, do you feel you need help
with in regard to making your research data
openly available?
How would you rate the importance of each of the
following with reference to determining the quality
of an openly accessible dataset?
56
Percentage of respondents
Copyright / licensing of data
54
Extremely important
Somewhat important
Moderately important
Slightly important
I don’t know
Not at all important
Finding the time to manage
my data
52
50%
30%
Data management policies
The data are available from a
publicly available repository
50
12%
3%
3%
3%
48
46
44%
44
2020
2021
2022
The data come from a known
source (e.g. familiar institution
or researcher)
Finding appropriate repositories
2023 for deposition of data
Survey year
31%
15%
5%
3%
2%
Graph showing longitudinal data of what areas respondents feel they need
help with in regard to making their research data openly available.
44%
32%
The data are associated with a
peer-reviewed article
14%
6%
3%
Finding appropriate funds was also a highly chosen area, with
2%
45% of respondents selecting it. Interestingly, last year, the
42%
same concern about data rights and licensing topped the list
Citation credit
Researchers consistently say they will be motivated by
citations of their data and their research papers. While paper
citation is rewarded in the academic career progress scoring
32%
Dataset quality indicator
with 55% of researchers wanting more guidance on it.
The data are in a repository that I
know checks/curates the data
15%
5%
3%
2%
41%
33%
The data set is complete
for my needs
14%
6%
3%
3%
system, credit for data citations remains low. Tracking of data
citations themselves is a nascent space. Make Data Count
is soon to release a Data Citation Corpus, which will enable
repositories to consistently track the impact of data citations.
40%
Data visualization (e.g. figures,
charts) reflect the true state of
the source data
30%
16%
7%
4%
This provides funders with a stepping stone to start giving
3%
career-advancing credit to researchers. There is of course the
36%
question of whether researchers are actually re-using open
data. One interesting area of the survey is investigating the
perceived quality of open datasets, as a proxy to ‘trustworthy
30%
Has a full set of associated
metadata
17%
8%
6%
3%
data’ and how comfortable those researchers are at having
their own data re-used. The graph to the right suggests
there are several essential criteria that need to be met by a
dataset in order for it to be considered trustworthy. Novelty
35%
30%
Data is consistent with previously
published findings
17%
8%
7%
and previous findings are less relevant than the data being
3%
findable, accessible, interoperable and re-usable (FAIR).
33%
We hear the success stories of large-scale academic projects
making use of FAIR data and artificial intelligence (AI) to drive
systemic change in a research field (such as DeepChem,
ClimateNet and DeepTrio), but need to dig deeper into
academic motivations to re-use datasets. Interestingly, data
being publicly available in an open repository came out as
the most important factor when determining the quality of
29%
The data are new (e.g. released
within the last year)
17%
10%
9%
2%
Percentage of respondents
Graph showing how respondents rate the importance of certain criteria for
determining the quality of an openly accessible dataset.
11
A Digital Science Report
The State of Open Data
a dataset. We also see a little wariness in researchers about
This year’s report takes a more nuanced approach to the
others ‘re-interpreting’ their data when compared to eg. re-
survey responses. We investigate whether the results are
use for replication. Researchers are more comfortable than
consistent across countries, research subjects and the career
uncomfortable when it comes to their own data being re-used,
stage a researcher is at. It feels like the right time to do this.
as highlighted in the graph below. However, the fact that
While a global funder push towards FAIR data has researchers
nearly 10% of those surveyed are still uncomfortable about
globally moving in the same direction, it is important to
their data being re-used in any format highlights potential
recognize the subtleties in researchers’ behaviours based on
roadblocks in the path to fully open research at scale.
variables in who they are and where they are.
To what extent are you comfortable with others using your data for replication studies
Extremely comfortable
Somewhat comfortable
Somewhat uncomfortable
Not applicable
Neutral / no opinion
Extremely uncomfortable
43%
Use of your data as a reference to
determine whether they can be
repeated under similar conditions
30%
15%
5%
4%
2%
43%
31%
Reanalysing your data in combination
with other data sets to answer
a new question
16%
5%
3%
Context of data re-use
2%
41%
32%
Using a different method to analyse
your dataset
16%
6%
3%
2%
40%
30%
Reanalysing your data to answer a
new question
17%
7%
3%
3%
40%
31%
Using your data as is to answer
another question
16%
7%
3%
3%
Percentage of respondents
Graph showing to what extent respondents are comfortable with their dataset being used for replication studies.
The State of Open Data survey aspires to continue being a compass; guiding efforts, informing strategies, and
illuminating the pathway towards an open data future that is not just robust but also inclusive and equitable. As we
dig deeper into the findings of this report, it is important to not only celebrate the milestones achieved but also to
cognitively engage with the challenges and opportunities that lay embedded in the data. As such, we are also providing
some recommendations to different audiences in order to help move the space further, faster. May this call to action
serve as a catalyst for informed discussions, strategic alliances, and innovative solutions in our collective journey towards
a more open, inclusive, and data-driven future.
12
A Digital Science Report
The State of Open Data
Graham Smith, Open Data
Programme Manager at
Springer Nature
05
Key insights: Beyond policy,
meeting researchers’ practical needs
Since its inception, The State of Open Data has charted the
sciences where, broadly speaking, the establishment of data
shifting landscape of data-sharing policies, with researchers
repositories is more mature than in other research areas.
increasingly facing requirements to make data available for
This suggests that while solutions exist, they are not easily
evaluation and re-use. In 2023 a number of these have come
accessible in the research lifecycle.
into practical effect, for example, the NIH Data Management
and Sharing Policy, and deadlines for implementing the
WHOSTP Nelson memo into US federal agencies’ policies.
During or after data collection, do you
curate/prepare your data for sharing
(whether privately or publicly)?
Already we see that researchers publishing in the last
35
funder requirement than those publishing earlier. However
30
there are also clear challenges in the level of support for
researchers to comply with data policies; almost threequarters of respondents had never received support with
planning, managing or sharing research data. Where support
Percentage of respondents
year are significantly more likely to share data due to a
25
15
Yes, but only data to be shared
with colleagues or beyond
Yes, but only data shared publicly
10
No, we do not have the resource
to do this but we would like to
is provided, it is more often from informal channels, such as
5
colleagues and supervisors. This is noteworthy, particularly
0
when half of the respondents who were aware of data
management plans also indicated that they had access to
data managers or curators at their institution.
Yes, some of the data collected
Yes, all data collected
20
No, it is not important with our data
2017
2018
2019
2020
2021
2022
2023
Survey year
Graph showing the percentage of respondents who curate/prepare their data
and to what extent.
We can infer that effective support is not making it to
Publishers can play a key role in reducing this burden. At
researchers, while they are being squeezed by growing
Springer Nature, our vision for open science encompasses
requirements. The data also suggests areas of improvement
a set of open outputs such as data, code and protocols,
for those providing support; those who had published or
all linked via the research article. We aim to make
submitted a manuscript within the last year were significantly
open science easily accessible, more prominent in our
less likely to describe their support levels as ‘good’ or
journals and embedded across disciplines, with authors
‘excellent’. Additionally, these respondents were less likely to
empowered to share their data, opening them up for
rely on funders and professional third parties for support.
further re-use and interrogation.
Reducing the burden on researchers
Looking at the areas researchers tell us they need help, an
ever-present issue has been that of time to curate data.
While any data curator will tell you good data management
takes time, certain main challenges in this area seem
solvable, for example: ‘finding appropriate repositories
for deposition of data’ (identified by 41% of respondents).
One initiative in this area is the standardization of our
research data policy which will embed the requirement for
Data Availability Statements across over three thousand
journals. This move is designed to increase transparency
around underlying data and enhance the integrity of the
scientific record. As part of the change, we are making
author and editor guidance more straightforward,
supporting authors and editors with compliance.
Interestingly this is highlighted more (45%) in biological
13
A Digital Science Report
Alongside policy requirements, the rollout of the Figshare
integration across the Nature Portfolio is providing a
practical means to make data sharing in repositories
easier for authors. This streamlined tool has seen great
uptake from authors, with 7,500 data submissions since
its launch in April 2022, equivalent to 15% of manuscript
submissions. The first year’s worth of publications since
integration shows more authors are using repositories
overall (a 12% increase), which supports our hypothesis that
more prominent and accessible data solutions can have
a demonstrable impact on data sharing at a journal. The
initiative has since expanded to 37 Nature Portfolio titles,
including Nature and Nature Communications.
Complementing our data initiatives is a similar code-focused
integration with Code Ocean and the acquisition of protocols.
io, both strengthening our vision for linked open science
outputs available alongside the article.
But in order to take into account future developments and to
meet the constantly evolving needs and requirements of the
scientific community, our work must not stop here. A range
of commentators in The State of Open Data reports over the
years have highlighted the need for different entities in this
space to work together to solve the challenges of data sharing
and accessibility. Bodies such as the Research Data Alliance,
CODATA and FORCE11 have provided invaluable forums for
cooperation between institutes, researchers, publishers,
funders and numerous other actors. The ongoing collaboration
between the Computer Network Information Center of the
Chinese Academy of Sciences, Springer Nature, Digital Science
and Figshare on The State of Open Data is another such
positive sign in global data collaborations. One thing The State
of Open Data results tell us is there are solvable challenges
that need practical solutions to back up this cooperation.
One area we will undoubtedly see growing in response is
AI. While the results from this year’s survey don’t yet show
a clear picture, data on which issues to address will be key
as we think about the tasks best suited for automation. At
a time when much generative AI is focused on language,
data-focused automation is a more nascent area but one
that will likely play a large role in navigating current obstacles
in research. Niki Scaplehorn and Henning Schoenenberger
explore this further in the piece ‘AI and open science - the
start of a beautiful relationship?’
14
The State of Open Data
A Digital Science Report
The State of Open Data
06
In all 8 editions of The State of Open Data, we have explored
longitudinal trends in researchers’ attitudes towards data
publishing. As this nascent field develops, we can delve
deeper into the nuanced thinking among various researchers,
From generalized trends to a
demographic-tailored approach
What circumstances would motivate you
to share your data?
20%
considering their location, primary research field, and even
18
research seniority. What follows begins to unravel how
16
norms, funding, data privacy laws, perceived data value, and
lack of resources can influence attitudes towards data sharing.
When it came to potential problems with sharing datasets,
the most commonly reported was around ‘The inclusion
of sensitive information or requiring study participant
permissions before sharing’ (39%). Those within hospitals or
other healthcare settings were significantly more likely than
other organization or institution types to select this (51%),
Percentage of respondents
differences in the types of research being conducted can
significantly affect researcher attitudes. Factors like cultural
14
United Kingdom (53%) than other territories, suggesting that
local legislation may play a role.
‘Concerns about misuse of data’ and ‘Unsure about copyright
and data licensing’ were the next most reported concerns,
with just under a third (32%) selecting each. China and the US
were more likely to select ‘Concerns about misuse of data’.
Those who were concerned about the misuse of data were
significantly more likely to work in social sciences (38%) or
medicine (36%). A lack of clarity about copyright and data
licensing was significantly more likely to be a concern for
technicians/research assistants (45%).
Public benefit
12
Increased impact and visibility of my research
Full data citation
Co-authorship on papers
10
8
6
Financial reward
Journal/publisher requirement
Greater transparency and reuse
Funder requirement
Consideration in job reviews and funding applications
Direct request from another researcher
It was made easy and simple to do so
Institution/organisation requirement
Freedom of information request
4
2
0
2021
as were those in private companies (50%). This concern was
greater for those based in China (57%), Canada (54%) and the
Citation of my research papers
2022
2023
Survey year
Graph showing the percentage of respondents motivated by different
circumstances to share their data.
is working on this through its Data Citation working group.
However, until new reward systems are embedded into
career progress, paper citations will remain the key driver.
• Respondents working within computing were significantly
more likely to select paper citations as their primary
motivation for sharing data (75%) than those in other
disciplines. Additionally, those based in Japan were
significantly more likely to select this (75%) than those in
other locations.
• Respondents working in astronomy and planetary
Motivation to share data
science (73%), arts and humanities (67%), computing
The primary circumstance that would motivate respondents
all significantly more likely to see ‘Full data citation’ as
to share data was ‘Citation of their research papers’ (65%),
a motivating factor in sharing their data. Respondents
which was also the top factor in 2022. Credit for dataset
affiliated with research institutions (60%) shared this
citation counts has not yet been quantified in a standard
perspective. Moreover, respondents from China (69%),
way. The Generalist Repository Ecosystem Initiative (GREI)
Germany (62%), and the United States (58%) were also
(67%) and earth and environmental sciences (62%) were
15
A Digital Science Report
The State of Open Data
significantly more likely to select ‘Full data citation’ as their
which does ‘require open data archiving’ and since 2017 has
motivating factor.
required funded researchers to develop a data management
plan defining how to manage research data, and to manage
National mandates
data accordingly. We observe similar results when assessing
There was overall support for a national mandate for
highest awareness as a percentage of those surveyed, while
making research data openly available – 64% of respondents
Japan appears to have the least awareness.
awareness of data management plans. Ethiopia has the
supported the idea. A little over a third (34%) strongly
supported the idea. Only 11% opposed the idea, a
significantly low proportion. Indian and German respondents
Are you aware of DMPs (Top 10 country responses)
were more likely to support this idea (both 71%). In fact,
Yes
Japanese and Chinese respondents were more likely to be
Ethiopia
33%, respectively), than other countries. A similar pattern is
seen when researchers were asked about their support for a
mandate in their own country (below).
Percentage of respondents
Would you support a data mandate in your country?
Strongly support
Somewhat support
Somewhat oppose
Strongly oppose
Country of respondents
neutral around the idea of a national mandate (41% and
I don’t know
No
69%
24%
7%
United Kingdom 68%
24%
8%
United States
61%
India
44%
Germany
40%
47%
Italy
39%
42%
Turkey
37%
Brazil
35%
China (Mainland) 26%
Japan
24%
25%
14%
42%
14%
13%
19%
36%
27%
45%
35%
21%
39%
51%
25%
Percentage of respondents
Neutral
Graph showing the levels of awareness of the concept of a data management
plan, broken down by country, showing data for the 10 countries with the
highest number of respondents.
40
Subject-based trends
20
il
az
ey
d
Ki
Br
ng
rk
do
op
hi
Et
Tu
m
ia
ly
Ita
n
pa
Ja
an
m
er
at
y
es
G
St
d
te
te
ni
ni
U
Country of respondents
U
Ch
in
a
(M
ai
nl
In
an
di
a
d)
Nation-states may have different strengths in their policies and
Graph showing the percentage of respondents that support national data
mandates in their country, showing data for the 10 countries with the highest
number of respondents.
promotion of open academic data. The same could be true
of research fields. Research always strives to be collaborative
across multiple areas, but there is often a sentiment that
single-subject-focused research domains can exist in a silo.
Thus, it’s unsurprising that researchers in different fields
have varied reasons for choosing to share or not share their
data. Coupled with this is a real heterogeneity in the outputs
When examining the ten countries with the most survey
of research in different fields. Data is a broad term but can
respondents, Ethiopia tops the chart in terms of the highest
be simplified to mean the outputs that researchers create to
percentage of respondents who ‘Strongly support’ open data,
back up the findings they publish in papers or monographs.
followed by India, Germany, and the United Kingdom. Japan
As such, the outputs generated in the social sciences may be
has the lowest percentage of respondents who ‘Strongly
quite different from those in the life sciences.
support’ open data. Interestingly, the biggest funder of
Ethiopian-based publications is the Bill and Melinda Gates
The previous State of Open Data surveys have shown that
Foundation, which has a strong open data publishing policy:
motivations for sharing data include advancing science,
‘Each accepted article must be accompanied by a Data
increasing transparency, gaining recognition, fulfilling
Availability Statement that describes where any primary data,
funding requirements, and promoting interdisciplinary
associated metadata, original software, and any additional
collaboration. For the first time this year, we decided to
relevant materials can be found.’
examine differences in research fields. Across all subjects, a
consistent message emerges that research impact and credit
The biggest funder of Japanese-based publications is the
are key drivers when asked, ‘What motivates you to share
Japan Society for the Promotion of Science (JSPS). The second
your data?’. In all fields, ‘Citation of research papers,’ ‘Full data
biggest Funder is Japan Science and Technology Agency,
citation,’ and ‘Increased impact and visibility of my research’
16
A Digital Science Report
The State of Open Data
score highly, but there does seem to be a subject-based
preference for citation of papers, or citation of data. For
example, as highlighted below, researchers from ‘Astronomy
and planetary Science’ are more motivated by a ‘Full Data
Citation’ than more citations to their papers.
Motivations for sharing data openly by primary area of expertise.
1982
28%
1983
14%
1984
26%
1985
20%
6%
1986
20%
20%
1987
25%
1988
23%
1989
25%
1990
20%
9%
1991
19%
21%
1992
18%
16%
1993
17%
12%
1994
17%
16%
1995
20%
1996
24%
1997
23%
2002
19%
6%
6%
8%
12%
6%
9%
5%
7%
6%
12%
6%
in
og
y
e
7%
at
13%
15%
6%
6%
6%
11%
5%
6%
13%
6%
20%
11%
Citation of my research papers2005
20%
11%
10%
11%
Increased impact
and visibility
of my research 7%
2006
25%
Institution/Organization
requirement
9%
5%
8%
7%
10%
13%
6%
Other (please specify)
10%
Public benefit
15%
11%
11%
Open data badge
14%
14%
Journal/Publisher requirement
None
18%
6%
2004
It was made simple and easy to do
9%
11%
10%
11%
9%
10%
10%
Full data citation
It was a field/industry expectation
19%
15%
Freedom of information request
Institution/Organization requirement
16%
6%
Financial reward
Increased impact and visibility of my research
17%
15%
8%
2003area
25% of expertise of 8%
6%
Primary
respondents
Co-authorship on papers
6%
Direct request from another researcher
Greater transparency and reuse
12%
11%
5%
15%
6%
16%
5%
Consideration in job reviews and funding applications
Funder requirement
12%
16%
7%
8%
he
21%
5%
6%
5%
13%
m
ic
20%
10%
11%
13%
6%
8%
10%
7%
7%
14%
6%
9%
6%
17%
7%
5% 7%
5% 8%
6%
14%
7%
5%
10%
11%
7%
5%
11%
6%
6%
8%
5%
12%
5% 8%
6%
6%
16%
5% 8%
13%
6%
7%
12%
11%
12%
6%
Citation of my research papers
Co-authorship on papers
13%
9%
11%
7%
6%
5%
at
en
ic
s
vi
ro
nm
sc ent
ie al
So
nc
ci
e
al
O
sc
th
ie
er
nc
(
p
e
As
s
le
tr
as
on
e
om
sp
e
y
ci
an
fy
)
d
pl
an
et
sc ar
ie y
nc
e
18%
12%
an
d
2001
11%
6%
12%
8%
15%
14%
11%
6%
5%
6%
8%
9%
10%
10%
6%
5%
11%
14%
M
17%
8%
6%
17%
6%
8%
9%
10%
9%
14%
Ea
rt
h
H
ts
&
2000
14%
9%
15%
6%
ed
17%
16%
6%
9%
M
iti
16%
um
1999
8%
5%
9%
ol
t
1998
Ar
ne
ss
/I
tm
ic
ys
si
Bu
15%
en
s
g
ne
er
in
tin
gi
En
is
pu
m
Co
Ph
y
tr
e
nc
ie
em
Ch
sc
ls
ia
er
at
M
g
20
nv
Year of first paper
espublication
40
11%
24%
es
60
24%
1981
an
Percentage of respondents
80
1980
Bi
100
14%
8%
5%
8%
was a field/industry
Consideration in job reviews and
funding
applications
2007
27%
9%
6%expectation
11%
16%
Graph showing the percentage
of respondents
that
would Itbe
motivated
by11%
certain circumstances
to share their data, broken down by subject area of expertise of
Direct request from another researcher
It was made simple
and easy to7%
do
2008 20%
11%
10%
16%
6%
8%
the respondents.
Journal/Publisher requirement
Financial reward
2009
18%
Freedom of information request
2010
16%
Full data citation
Funder requirement
2011
2012
Greater transparency and reuse
6%
2013
17%
2014
18%
2015
23%
2016
19%
2017
22%
2018
20%
16%
14%
8%
12%
Open data badge
19%
18%
6%
16%None
8%
9%
10%
Other (please specify)
12%
6%
Public benefit
7%
5%
10%
8%
Making data publicly available
5%
5%
5%
8%
10%
6%
12%
9%
14%
12%
8%
12%
7%
2023
16%
6%
14%
9%
8%
11%
11%
13%
8%
6%
15%
16%
12%
11%
12%
process.
Similarly,
11% selected ‘I do not share my data
6%
9%
13%
8%
10%
13%
11%
14%
15%
6%
9%
9%
7%
6%
2020 19%
6%
7%
13%
11% having responded, ‘I do not share
my data
beyond
my
2021 23%
7%
7%
12%
21%
5%
13%
89% of respondents make their data
available publicly,
with
2019 23%
8%
7%
11%
2022
immediate collaborators’ (n=6091).
6%
11%
13%
9%
10%
7%
10%
beyond
my immediate
collaborators’, again suggesting
11%
7%
6%
sharing
data12%is not the norm for all researchers.
13%
6%
11%
14%
6%
15%
12%
6%
12%
• Those within
medicine (20%) and social sciences (23%) were
11%
17%
13%
significantly more likely to share their data ‘Upon request
Percentage of respondents
Most respondents indicated that they made their
data
from others’.
publicly available ‘On the publication of the research’ (37%)
or ‘When the research is complete’ (18%), suggesting this
When we explore a general level of ‘Open to Openness’, we
usually happens towards the end of the research cycle.
observe subject-based variations. The chart below shows
Similarly, 13% selected that sharing happens ‘In the process
what percentage of researchers ‘Strongly agreed’ with open
of submitting a research article’, again indicating that sharing
practices around open research articles, datasets, peer
of data takes place in the latter stages.
review and preprints. The general trend is that researchers
prioritize the importance of these outputs being open in
• Respondents in biology were significantly more likely to
that order, with open research articles evoking the strongest
make their data publicly available ‘On publication of the
positive sentiment. Strong support for open data varies from
research’ (47%), whereas engineers were significantly more
39% in Materials Science to 59% in Mathematics. Materials
likely to do so ‘When research is complete’ (23%).
Science interestingly has less than 50% of researchers who
‘Strongly Agree’ with any of these open practices. This again
This being said, 17% indicated that they make their data
highlights the need for a more subject-based approach to
publicly available only ‘Upon request from others’; indicating
outreach and education around open practices.
that they do not routinely share their data as part of the
17
A Digital Science Report
The State of Open Data
Open to Openness by Subject (Percentage that ‘Strongly agreed’)
Making research articles open access should be common scholarly practice
Making peer review open should be common scholarly practice
Making research data openly available should be common scholarly practice
Publishing a pre-print should be common scholarly practice
Percentage of respondents
80%
70
60
50
40
30
20
es
s
nc
ic
ys
ie
sc
al
ci
e
in
ed
Ph
So
vi
en
d
an
Ea
rt
h
M
ro
m
Co
ic
tin
pu
is
em
ne
si
Bu
g
nm
sc en
ie tal
nc
e
En
gi
ne
er
M
in
at
g
er
ia
ls
sc
ie
nc
e
M
at
he
m
at
ic
s
y
tr
en
ss
Ch
m
st
/In
ve
pl
d
an
y
om
on
tr
As
t
y
og
ol
Bi
es
iti
an
um
H
&
ts
Ar
an
sc eta
ie ry
nc
e
10
Subject expertise of respondents
Graph showing the levels of ‘openness’ to making certain open practices ‘common scholarly practices’ broken down by subject
area of expertise of the respondents.
Do researchers receive enough credit for sharing
their data?
Percentage of respondents
I don't know
No, they
receive too
much credit
50
No, they
receive too
little credit
rt
h
an
d
en
vi
Bu
si
ne
ss
/In
ve
So
st
m
c
en
ro ial
t
nm sc
ie
O
e
nc
th
es
er nta
ls
(p
c
le
as ien
c
e
sp e
ec
ify
)
Bi
ol
og
Ar
M
y
ts
ed
&
ic
H
um ine
an
iti
Co
m es
pu
M
tin
at
he
g
m
at
ic
As
M
s
P
tr
at
on
er hys
ia
om
ic
ls
s
y
sc
an
En ien
d
ce
gi
pl
n
an
et eer
in
ar
g
y
sc
ie
n
Ch
ce
em
is
tr
y
Yes
Ea
credit for sharing their data, according
to researchers in all major fields except
Mathematics, where the majority of
100
Subject expertise of respondents
Graph showing whether respondents feel researchers receive enough
credit for data sharing, broken down by the subject area of expertise of the
respondents.
18
Researchers do not receive enough
researchers responded with ‘I don’t
know’. Arts and Humanities and Social
Science researchers have the largest
majority who think that they do not get
sufficient credit for sharing data.
A Digital Science Report
The State of Open Data
In summary, the variations in responses from different
countries and subject areas suggest that a nuanced
approach to Research Data Management (RDM) is necessary
to effectively encourage and support data sharing. While
some subjects may not align as seamlessly with open data
practices as others, we have previously observed that when
communities support open data, it more reliably becomes
the norm. This is evidently less of a cultural shift for digitalborn research practices, as observed in genomics. However,
we have also witnessed the effects in older research fields
such as ecology, where a concerted effort has driven open
data into the perceived requirements of research publishing.
A lack of subject-specific or thematic generalist repositories
may lead to a one-size-fits-all approach, and this is something
that could be addressed by societies and subject-based
research communities.
Even with a global push towards mandating open data
publishing, the follow-through from different countries and
funders will largely define the speed of success and return
on investment for the respective country. Now is the time
to compare and contrast how your country is performing
compared to those trying to reach similar goals.
Now is the time to compare and
contrast how your country is
performing compared to those
trying to reach similar goals.
19
A Digital Science Report
07
The State of Open Data
Researchers at all stages of their careers share
the same optimism and concerns around open data
Planck’s principle is the view that scientific change does
not occur because individual scientists change their minds,
but rather that successive generations of scientists have
different views.
Incentives and recognition for
sharing data are happening
Just over half (54%) of those surveyed already have received
some form of recognition for sharing their data; the most
‘A new scientific truth does not triumph by convincing its
common form of recognition received was ‘Full citation in
opponents and making them see the light, but rather because its
another article’ (39%) followed by ‘Co-authorship on a paper
opponents eventually die and a new generation grows up that is
that used my data’, which was selected by 23%. Other kinds
familiar with it’.
of recognition included ‘Consideration of data sharing in a
job review’ (7%), ‘Consideration of data sharing in a grant
Intriguingly, this year’s The State of Open Data report suggests
application’ (8%) and ‘Open data badge’ (4%).
that the differences observed between countries and primary
subjects do not extend to researchers who have been publishing
• PhD/master’s students were significantly more likely to
for a long time or to specific job descriptions. This challenges
have received ‘Consideration of data sharing in a job review’
stereotypes and misconceptions about later career academics
(10%), as had research assistants/technicians (15%) and
being opposed to progress and contradicts the message that
undergraduate students (27%).
early career researchers are the primary drivers of change in this
area. Researchers, at all stages of their careers, face the same
However, over a third (37%) had never received any credit/
challenges, seek the same rewards, and demonstrate a similar
recognition for sharing their data. 33% reported they had
level of openness to embracing open data.
been involved in collaboration as a result of data they had
previously shared. 60% reported that they hadn’t.
Researchers, at all stages of their
careers, face the same challenges,
seek the same rewards, and
demonstrate a similar level of
openness to embracing open data.
• Professors were significantly more likely to indicate that
they had been involved in collaboration (39%), as were
research directors or VPs of research (48%) – perhaps
suggesting those in senior roles are more likely to get
these opportunities. Those working within earth and
environmental sciences also indicated this (40%).
• PhD/master’s students were significantly more likely
It seems that the length of an academic career does
to indicate that they had not (69%), again suggesting
not influence how aware researchers are of aspects like
collaboration opportunities may be more available to
data management plans (DMPs), nor does it affect their
senior roles.
motivations for sharing their data. There are indeed
differences, but it is encouraging that there are not huge
differences in the ways in which we have to educate
academics on open data. We examined both academic job
titles and the year in which a researcher first published
a paper to gain deeper insights into their awareness and
motivations regarding open data.
20
A Digital Science Report
The State of Open Data
Trends based on year
of first publication
prevails across all stages of academic career progression:
The motivations for sharing research data are largely
it appears that we can communicate with all researchers
consistent among academics who published their first paper
on the same level regarding how to garner more credit for
between 1980 and 2023. Each line in this chart represents
their research and how sharing their research data openly
one of those groups. While there is some variation year by
can help improve the trust around their research papers and
year, no general trend is apparent. A consistent message
result in more citations to the paper.
‘Researchers receive insufficient credit for sharing their data
openly.’ Coupled with the motivations for sharing research,
Year of first paper publication
Motivations for data sharing by publication year
1980
24%
11%
1981
24%
15%
1982
28%
1983
14%
1984
26%
1985
20%
6%
1986
20%
20%
1987
25%
1988
23%
1989
25%
1990
20%
9%
1991
19%
21%
1992
18%
16%
1993
17%
12%
1994
17%
16%
1995
20%
1996
24%
1997
23%
1998
16%
1999
17%
21%
2000
17%
13%
2001
18%
2002
19%
2003
25%
2004
20%
11%
2005
20%
11%
2006
25%
2007
27%
2008
20%
2009
18%
2010
16%
2011
19%
8%
2012
18%
12%
2013
17%
7%
2014
18%
2015
23%
2016
19%
2017
22%
2018
20%
2019
23%
2020
19%
2021
23%
2022
21%
2023
16%
8%
5%
9%
16%
6%
9%
14%
9%
15%
6%
9%
10%
12%
12%
6%
10%
11%
6%
6%
8%
12%
9%
5%
7%
6%
11%
6%
8%
6%
16%
6%
5%
9%
11%
10%
7%
5%
10%
8%
14%
12%
5%
6%
7%
7%
7%
6%
7%
9%
8%
6%
13%
6%
11%
13%
8%
6%
15%
12%
11%
12%
13%
9%
10%
7%
11%
11%
11%
13%
16%
6%
12%
8%
14%
8%
10%
13%
13%
8%
15%
14%
7%
5%
16%
6%
11%
9%
8%
7%
6%
9%
10%
8%
11%
5%
5%
8%
14%
16%
5%
Public benefit
15%
11%
6%
Other (please specify)
10%
6%
12%
8%
Open data badge
14%
11%
12%
12%
9%
6%
11%
10%
6%
6%
Journal/Publisher requirement
None
13%
14%
8%
9%
11%
10%
7%
16%
16%
9%
8%
11%
6%
5%
7%
6%
It was made simple and easy to do
9%
13%
11%
9%
9%
14%
10%
10%
6%
6%
Full data citation
It was a field/industry expectation
19%
18%
11%
Freedom of information request
Institution/Organization requirement
10%
10%
11%
6%
17%
16%
15%
Financial reward
Increased impact and visibility of my research
12%
6%
Direct request from another researcher
Greater transparency and reuse
16%
6%
Consideration in job reviews and funding applications
Funder requirement
12%
5%
15%
8%
8%
11%
5%
11%
5%
6%
10%
9%
7%
15%
15%
5% 7%
6%
5%
13%
8%
10%
7%
7%
7%
7%
6%
13%
17%
6%
7%
6%
6%
20%
7%
6%
5% 8%
6%
12%
14%
14%
5%
10%
6%
16%
5%
11%
6%
8%
5%
12%
9%
6%
Citation of my research papers
Co-authorship on papers
13%
6%
7%
6%
5% 8%
13%
5% 8%
6%
12%
11%
12%
5%
11%
7%
6%
8%
15%
14%
11%
6%
5%
6%
8%
9%
11%
14%
6%
5%
17%
6%
8%
6%
10%
14%
8%
9%
7%
13%
11%
10%
6%
6%
15%
14%
13%
11%
6%
12%
12%
6%
17%
12%
Graph showing the percentage of respondents that
would be motivated to openly share their data by
certain circumstances, broken down by the year of
the respondent’s first paper publication.
Percentage of respondents
When discussing the education of researchers regarding
the benefits of sharing research data, it’s noteworthy that,
although the desire for impact and career progression is
consistent and support for open practices is strong, potential
differences may exist in the concerns preventing researchers
from making data available.
21
A Digital Science Report
The State of Open Data
Percentage of researchers who list the cost of data sharing as something
that puts them off data sharing
Undergraduate student
34%
Principal Investigator
33%
Technician / Research assistant
31%
Laboratory Director/Head
28%
Job title
Research director / VP of research 26%
Professor
25%
Other (please specify)
24%
Associate professor
24%
PhD/Master's student
23%
Research scientist
23%
Physician/Clinician
23%
Healthcare professional
23%
Assistant professor
22%
Postdoctoral candidate
19%
Retired
15%
Percentage of respondents
Graph showing the percentage of respondents who would be deterred from sharing their data openly by the associated costs, broken
down by job title.
A prime example involves researchers hindered from
These results are encouraging, showcasing the academic
sharing their data openly due to cost-related concerns.
community’s willingness to embrace open academic data.
There’s a slight skew towards undergraduate students and
Unlike country- and subject-based approaches, we can
Principal Investigators (PIs). As budget holders, PIs likely
ensure researchers at all stages comprehend the benefits
harbour concerns about how this will impact the day-to-day
of making data openly available while tailoring support
management of their projects. Conversely, undergraduate
to alleviate their concerns. In terms of actions, these data
students are more likely to worry about how it will affect their
highlight a need for more inclusive outreach when organizing
personal finances. Numerous avenues allow researchers to
discussions, forums and panels in the open research space.
publish data at no direct cost to them, or to apply for support
through their funding bodies. Just over a third of respondents
(34%) indicated that their institution or organization would
meet the costs of making research data openly accessible.
This was followed by respondents’ own funds (30%) and
then ‘Funds identified in your grant for this purpose’ (20%).
Associate professors and professors were significantly more
likely to meet this cost using their own funds (34% each).
Research scientists (42%) and research assistants/technicians
(52%) were significantly more likely to have this cost met by
their institution/organization.
22
A Digital Science Report
08
The State of Open Data
The importance of data management plans (DMPs)
Only 43% of respondents were aware of what a data
management plan (DMP) is. Surprisingly, this represents a
dip from the previous year, where over half (52%) claimed
Are you familiar with the concept of a
data management plan?
awareness. The awareness metrics further vary within specific
were more familiar (44%) with DMPs, and those within the
government or local government sectors showcased even
higher awareness at 57%. However, certain disciplines like
engineering, mathematics, material science, and physics
lagged behind in awareness compared to other fields. The
Percentage of respondents
demographics: those who recently published a manuscript
Yes
No
I don’t know
40
20
Ar
ts
&
H
nm
sc ent
ie al
n
um ce
an
iti
es
M
As
ed
tr
i
c
on
in
e
om
Bi
y
an
ol
og
d
y
pl
an
sc eta
ie ry
Ch nce
em
is
tr
En
Bu
y
gi
si
ne
ne
er
ss
i
ng
/In
ve
st
m
en
t
Ph
M
at
ys
er
i
c
ia
s
ls
sc
ie
nc
M
e
at
he
m
at
ic
s
es
nc
ie
sc
an
h
ro
al
d
en
as
le
So
rt
Ea
vi
g
ify
e
sp
ec
tin
pu
m
Co
(p
er
th
differences in awareness across various fields.
O
based on respondents’ primary area of expertise, highlighting
ci
concept of a data management plan. The data is segmented
)
graph to the right represents respondents’ awareness of the
Subject expertise of respondents
While awareness is the first step, competence in drafting a
functional DMP is another challenge. A mere 22% felt fully
equipped to develop a DMP, while a significant 44% believed
Graph showing the percentage of respondents that are aware of the concept
of a data management plan, broken down by subject area of expertise of
the respondents.
they would need moderate to extensive training on the
subject. Interestingly, geographical location also played a
role. Respondents from the US exuded greater confidence,
with 31% feeling fully competent, whereas the figures were
lower for Japan and Italy, with 8% and 9% respectively. Of
those who know about DMPs, an encouraging 74% had
To what extent have you implemented a
data management plan
(3%) admitted to needing further training, underscoring
the gap between awareness, actual implementation, and
comprehensive understanding.
Encouragingly, there is a significant increase in the number of
Percentage of respondents
created one. However, a notable subset of this demographic
Last Project
Current Project
40
20
respondents who are fully implementing a data management
plan in their current project compared to their last project.
An impressive 79% created a DMP for their latest projects,
with 96% doing so for ongoing projects. The percentage of
those feeling they’ve fully implemented a DMP in ongoing
Entirely
Somewhat
Not at all
Not applicable
Extent of DMP implementation
Graph showing the extent to which respondents implemented a data
management plan in their last project compared with their current project.
projects (47%) is much greater than those from completed
projects (20%).
23
A Digital Science Report
The State of Open Data
Diving deeper into the motivations, it emerged that a
considerable number were influenced by external factors
rather than personal convictions. Requirements from
funders (42%) and institutions (36%) ranked high, while
only 29% created a DMP out of personal choice. Other
driving factors included expectations within research fields,
recommendations from supervisors, and requirements from
specific journals. For those making their debut in research
publication in 2022 and 2023, the journal requirement
seemed more pronounced, hinting at a possible trend
towards mandatory DMP submissions in the future.
Are you familiar with Data Management Plans,
sorted by last publication date
Within the last year
1-2 years ago
3-5 years ago
44%
Yes
40%
Familiarity level
39%
39%
No
40%
39%
17%
I don’t 21%
know 22%
Percentage of respondents
Graph showing awareness levels of the concept of a data management plan,
broken down by the year of the most recent publication date of the respondents.
Challenges
While the progress in the DMP space seems promising,
issues remain. Technical challenges topped the list with
34% struggling with data storage and organization. Time
constraints and the lack of trained staff followed closely.
Financial challenges, changing research dynamics, and
memory lapses in following the DMP also emerged as
barriers. Medicine and material sciences faced unique
challenges, emphasising the need for a tailored approach to
data management across different disciplines.
In conclusion, while the awareness and implementation of
data management plans are gaining traction, there exists
a gap in confidence and competence. With the myriad
challenges faced in its application, there’s an evident need
for structured training, tailored guidance, and perhaps a
rethinking of how DMPs are designed and executed across
various disciplines.
24
A Digital Science Report
The State of Open Data
Niki Scaplehorn, Director,
Content Innovation,
Springer Nature
Henning Schoenenberger,
Vice President Content
Innovation, Springer Nature
09
AI and open science the start of a beautiful relationship?
With the incredible pace of recent developments in AI,
for example. While open science often focuses its efforts on
researchers are rapidly confronting the difficult question of
big, highly re-usable data, AI is poised to unleash and make
how to safely use these tools to accelerate their everyday
sense out of a tidal wave of small data that is otherwise
work. However, according to this year’s State of Open
rarely re-used.
Data survey, only a small portion of researchers have so
far considered using these tools to collect, analyse and
annotate their data. In this section, we discuss the potential
for AI to overcome barriers to open science, and highlight
how open science will become ever more important in an
AI-augmented world.
Making meta better
While sharing data can be easy, thanks in no small part to
generalist repositories such as Figshare, sharing data in full
accordance with the principles of findability, accessibility,
interoperability and re-usability (FAIR) can be considerably
Unleashing the power of small data
more challenging. Researchers often share data without
AI techniques have been improving our article searches
was created and how it should be re-used. Very often, that
for longer than many would realize, but the ability of the
information is buried within an associated research article, but
latest generation of large language models (LLMs) to extract
inconsistencies in how data and articles are linked, and the
meaning from large volumes of text has already created a
complexity of the article itself, can make it prohibitively difficult
raft of new tools for scientists to help navigate and make
for others to confidently re-use or reproduce the data.
including the vital metadata required to understand how it
sense of the scientific literature. These tools are among
the first to gain the widespread attention of researchers
With most science funders now encouraging and even
exploring the possibilities of using generative AI. Rather
mandating the sharing of all underlying data at the point of
than relying on keywords to identify papers of interest, LLM-
publication, journals are playing an increasingly proactive
based products can identify papers based on how closely
role in promoting good open science practices, and LLMs
they match a particular research question, and quickly
have the potential to support both authors and journals
extract and summarize details across a large number
through that process. Although human oversight will always
of papers to present a balanced answer, backed up by
be necessary to detect potential errors, AI-based drafting of
references. This approach has the potential to save hours
metadata, data availability statements and even associated
or even weeks of research time, although it comes at the
data descriptor papers will be valuable tools to improve
expense of a level of transparency and control over which
standards of data sharing.
papers are highlighted, and ultimately a risk of systematic
biases that, for now, can only be mitigated by careful
comparison of augmented and more manual approaches.
Your personal data scientist
While these tools primarily search and process text, it is
LLMs are not limited to searching and summarizing -
already possible for large ‘multimodal’ models to make sense
they are already set to transform how we analyse and
of images and other data types. For open science, this raises
visualize data. For many of us, data analysis begins with a
the exciting prospect of being able to unlock the vast array
spreadsheet and ends with copying and pasting a graph
of data that are hidden in previously machine-unreadable
into a figure, leaving the data behind to languish on our
formats - within figures and tables or in supplementary files,
hard drives. Analysing and visualizing data using code
25
A Digital Science Report
is usually considered to be a more challenging option,
reserved for more complex analyses and those with
sufficient technical expertise.
The State of Open Data
Open science to the rescue
A clear risk of generative AI for research publishing more
generally is the potential for paper mills to create fake articles
The ability of LLMs to write code is, however, dramatically
much more quickly and easily than was previously possible.
changing this. It’s now possible to use an LLM as a
There are many beneficial uses of these technologies in
personal data scientist, which can explore a complex
scientific writing, especially as an opportunity to level the
dataset, propose interesting research questions, design
playing field and broaden access to publication, and so even
a step-by-step statistical analysis, and draw up graphical
if the detection of AI-generated text was completely reliable,
representations of the results in a matter of minutes.
banning AI-generated text from scientific publications would
Importantly, the data and the resulting figure remain linked
not represent a viable solution to this problem.
together by code, making the figure a dynamic artefact
that can be reanalysed and redrawn, and making the raw
While addressing the misaligned incentive structures that
data an integral part of the figure. These tools are still far
lead researchers to use paper mills in the first place should
from perfect: it’s still essential to make sure that an analysis
be our ultimate goal, open science has an important part to
driven by AI makes scientific and statistical sense. But for
play in providing proof of authenticity in research. Insisting
open science, AI raises the tantalizing prospect of data
on the sharing of all raw data makes it much more difficult
finally taking its place at the heart of our scientific articles,
to construct an entirely falsified article, especially with the
with integrated transparency and reproducibility as to how
development of increasingly sophisticated AI-based tools to
the data were processed and analysed.
detect data manipulation. The prospect of an AI ‘arms race’
between data falsifiers and manipulation detectors is a grim
A virtual collaborator
For most of us, until recently, the idea of an artificial scientist
designing and even carrying out their own research projects
felt very much like a prospect for the distant future. But just
as LLMs can support human data analysis, the prospect of
autonomous agents that can sift through the world’s scientific
data and identify unanticipated patterns and connections is
no longer far away. For example, AI could be used as a co-pilot
within a research team - not only scanning the literature for
new papers of interest but running analyses that integrate
the team’s most recent results with publicly available data to
propose new experiments and new collaborations.
The impact of these new players on the data ecosystem
may be complex. On the one hand, the use of AI will greatly
increase the chances that shared data will be re-used,
creating new demand for open science. At the same time,
fear of data ‘misuse’ is still a significant barrier to data
sharing, and if mechanisms to ensure proper credit for
data creators and good practices around authorship and
collaboration are not in place, the threat of data being
snapped up by a competitor bot could act as a disincentive
to early sharing of data. These are not new dynamics, but
their impact could be magnified and accelerated by the
widespread use of AI.
26
vision of the future, but without open science, the struggle
to protect the integrity of the scientific record will be lost
long before that. Instead, we should work towards ensuring
that our vision for the positive impact of AI on open science
becomes a reality.
The prospect of an AI ‘arms race’
between data falsifiers and
manipulation detectors is a grim
vision of the future, but without
open science, the struggle to protect
the integrity of the scientific record
will be lost long before that. Instead,
we should work towards ensuring
that our vision for the positive
impact of AI on open science
becomes a reality.
A Digital Science Report
10
The State of Open Data
Recommendations for the academic community
Understanding the ‘state
of open data’ in our
specific research setting
Credit where credit’s due
Respondents from different geographies
it for sharing their data openly, has
and different disciplines responded to the survey with great
consistently been evident in The State of Open Data since
variation. A purely global and cross-disciplinary approach
its inception. Credit for data sharing could take many forms,
to research data management and its promotion is clearly
from a citation to career-related recognition or progression.
not sufficient or sustainable. Understanding the ‘state of
Community initiatives like CoARA (Coalition for Advancing
open data’ in our specific research setting, whether that be
Research Assessment) are tackling this head-on. This
at a national level, a university level or a departmental level
initiative launched by the EU Commission, the European
could be the key to truly engaging with the communities
University Association (EUA), Science Europe, and with
we’re trying to reach. There is a clear need to tailor support
over 350 signatories, has principles guiding this space. We
that’s dependent on awareness and support levels specific
need to give more rewards and incentives for data sharing
to the particular context that we’re operating within.
when assessing research. One way in which receiving
The issue of credit, and researchers
feeling they don’t receive enough of
credit for sharing data openly could be more likely is if it’s
Action Request:
always linked to the final publication as clearly as possible.
• For policymakers: Engage with researchers and
Some publishers are piloting innovative solutions such as
institutions to understand specific needs and challenges
the Public Library of Science (PLOS) and the pilot of their
within their context and tailor open data policies
‘accessible data button’. This was the seemingly simple
accordingly. Establish advisory committees, consult with
addition of a button linking to an open dataset associated
experts and collaborate with scientific organizations.
with a PLOS article in selected repositories; the pilot saw
a 20% relative increase in views of the datasets included.
• For researchers: Actively participate in surveys, forums,
More eyes on a dataset could mean more potential for re-
and discussions to voice the specific challenges and
use, a citation or simply further recognition. Both publishers
needs related to open data within your specific setting.
and repositories could begin thinking about the way they
link datasets to articles and work to increase visibility,
• For academic institutions: Conduct internal surveys and
therefore creating opportunities for more recognition.
workshops to understand the unique needs of different
departments and establish data management support
Action Request:
that is contextually relevant.
• For all participants in the space: Define an inclusive
and all-stakeholder approach. Societal change requires
• For publishers: Facilitate collaboration between
researchers and academic institutions by organizing
looking outside of not-for-profits and into industry and
beyond in order to effect change.
workshops or conferences that bring stakeholders
together to share experiences and knowledge.
• For funders and academic institutions: Establish clear
policies that recognize and reward researchers for
sharing data openly and integrate these contributions
into career progression evaluations.
27
A Digital Science Report
• For publishers: Adopt and implement mechanisms, like
the ‘accessible data button’, that enhance the visibility
The State of Open Data
Making outreach inclusive
of open datasets linked to publications and explore
As we’ve discussed within our report, early
additional methodologies to credit researchers for
career researchers are not the only ones who
sharing their data.
support and also struggle with data sharing.
Researchers and academics at all stages
• For researchers: Ensure to cite datasets appropriately
of their careers share the same motivations, have the same
in your research and advocate for policies within your
concerns and exhibit similar levels of support when it comes
institutions that recognize and reward open data sharing.
to open data. In an organizational setting, it may be tempting
to focus on instilling and promoting core open science values
Help and guidance for
the greater good
We need inter- and intra-community
coordination when it comes to doing a
better job at education. The NIH-funded
GREI initiative is a prime example of a funder forcing change
by preventing the siloed approaches of individual generalist
data repositories. This prevents organizations from focusing
on the functionality of their own platforms, with training or
materials specific to their infrastructure or repository, but
instead empowers them to offer more general guidance
documentation and advice on data sharing that goes beyond
a user choosing to use their specific product or platform.
Being more collaborative and contributing to the bigger
picture could be of greater benefit to researchers. While
the awareness and implementation of data management
plans are gaining traction, there exists a gap in confidence
and competence. With the myriad challenges faced in its
application, there’s an evident need for structured training,
tailored guidance, and perhaps a rethinking of how DMPs are
designed and executed across various disciplines.
Action Request:
• For publishers, librarians and software providers:
Develop and disseminate open data guides that are not
product-specific and conduct workshops and webinars
that cater to a broad audience regarding the essentials
and best practices of data sharing.
• For researchers: Advocate for, and participate in, forums
and workshops on data sharing and bring forward your
challenges and insights to help shape better platforms
and policies.
• For funders: Ensure that funded projects allocate
resources and time for data management, and provide
clearer guidelines on data sharing requirements.
28
among early career researchers and those who are just starting
their academic journeys. One takeaway from this year’s results
is that those looking to engage research communities should
be inclusive and deliberate with their outreach, engaging those
who have not yet published their first paper as well as those
who first published over 30 years ago.
Action Request:
• For conference organizers: Ensure representation from
various career stages in discussions, panels, and talks
related to open data and science.
• Implementation of mentorship programmes with
a specific focus on data sharing, aimed at fostering
collaborative learning from one another.
• Outreach in all areas: Have a standard set of support
tools, accessible via a central repository - similar to the
OA Books Toolkit for Researchers.
We would also like to make a final call to ALL actors to
actively and above all regularly participate. This can take
various forms, including participation in forums where
experiences, needs, and requirements are openly discussed.
The goal of these forums is to establish clear objectives that
aim to standardize the entire data-sharing process.
Moving forward, it’s essential to maintain a dynamic approach.
This involves regularly monitoring, evaluating, adapting,
and sharing feedback on the standardized processes. We
don’t see standardization as a fixed achievement but as an
evolving process that needs to keep up with technological
advancements and changing methodologies.
In the upcoming forums, the focus should be on discussing
progress and sharing constructive feedback. This ongoing
feedback loop ensures that our data-sharing standards
remain relevant and aligned with the latest developments in
the field, helping us stay current and effective.
A Digital Science Report
The State of Open Data
11
Authors and acknowledgments
Graham Smith
Dr Mark Hahnel
Graham is the Open Data Programme
Mark is the CEO and founder of Figshare,
Manager at Springer Nature. He works
which he created while completing his
to develop and promote data-sharing
PhD in stem cell biology at Imperial
tools, partnerships and initiatives
College London. Figshare currently
across the organization’s publishing
provides research data infrastructure
activities. He has a background in
for institutions, publishers and funders
geophysics and has coordinated data curation activities
globally. He is passionate about open science and the
across the Nature, BMC and Springer portfolios, and at the
potential it has to revolutionize the research community.
Natural History Museum in London.
Niki Scaplehorn
Henning Schoenenberger
Niki is Editorial Director, Content
Henning, Vice President Content
Innovation, at Springer Nature, where
Innovation at Springer Nature, is
he is developing solutions that make it
an accomplished leader in digital
easier for authors to share and interact
innovation and a pioneer in the early
with data, code and protocols. A cell
adoption of artificial intelligence in
biologist and neuroscientist by training,
scholarly publishing. He ideated and
he was drawn to an editorial career at Cell before moving to
product managed the first machine-generated research book
Springer Nature, where he became Chief Life Science Editor of
published at Springer Nature. Henning graduated in Social
Nature Communications and Editorial Director for the Nature
Science and is based in Heidelberg, Germany.
life science journals. He is based in Hamburg, Germany.
Laura Day
Laura is the Marketing Director at
Figshare, part of Digital Science. Before
joining Digital Science, Laura worked
in scholarly publishing, focusing on
Contributors
Thanks to colleagues from Springer Nature, Figshare and
Digital Science who also helped to shape this whitepaper.
open access journal marketing and
transformative agreements. In her
current role, Laura focuses on marketing campaigns and
outreach for Figshare. She is passionate about open science
and is excited by the potential it has to advance knowledge
sharing by enabling academic research communities to reach
new and diverse audiences.
Acknowledgments
Figshare, Digital Science and Springer Nature extend their
thanks to Shift Insight, who collected the 2023 survey
data and undertook the initial analysis, as well as our allimportant respondents who provided us with unparalleled
insights and enabled us to bring the survey findings to the
wider research community.
29
A Digital Science Report
The State of Open Data
A collaboration between Figshare, Digital Science, and Springer Nature
30
Download