Tracking science in real-time from large-scale usage data Johan Bollen -

advertisement
Introduction MESUR Mapping Science Metrics Survey Discussion
Tracking science in real-time from large-scale
usage data
Johan Bollen - jbollen@indiana.edu
Indiana University
School of Informatics and Computing
Center for Complex Networks and Systems Research
Cognitive Science Program
July 29, 2010
Johan Bollen - jbollen@indiana.edu
Tracking science in real-time from large-scale usage data
Introduction MESUR Mapping Science Metrics Survey Discussion
Problem statement Usage data
Outline
1 Introduction
Problem statement
Usage data
2 MESUR
MESUR overview
Creating MESUR’s reference data set
3 Mapping Science
4 Metrics Survey
5 Discussion
Overview
Future Research
Relevant papers
Johan Bollen - jbollen@indiana.edu
Tracking science in real-time from large-scale usage data
Introduction MESUR Mapping Science Metrics Survey Discussion
Problem statement Usage data
Why study science?
Tremendous importance
Enormous amount of resources and people involved:
Allocation of resources: funding agencies, policy makers, the
public, corporations, the scientists themselves
Social systems research: Ability to learn about general properties
of similar social systems
Unrelated picture of my daughter
Johan Bollen - jbollen@indiana.edu
Tracking science in real-time from large-scale usage data
Introduction MESUR Mapping Science Metrics Survey Discussion
Problem statement Usage data
Science as a social system
Actors: scientists, stakeholders
Relations (often focused on
exchange of knowledge)
Informal interactions:
meetings, workshops,
conversations, messages, etc
Formal: affiliations,
collaborations, projects,
publications
Artifacts: articles, journals,
reports, data, software
Johan Bollen - jbollen@indiana.edu
Xiaoming Liu, Johan Bollen, Michael L. Nelson, and
Herbert Van de Sompel. Co-authorship networks in
the digital library research community. Information
Processing and Management, 41(6):14621480,
December 2005
Tracking science in real-time from large-scale usage data
Introduction MESUR Mapping Science Metrics Survey Discussion
Problem statement Usage data
The measure of science in the print-era
A word on the traditional way
Gold standard: citation data
Extracted from “publication” data.
Johan Bollen - jbollen@indiana.edu
Tracking science in real-time from large-scale usage data
Introduction MESUR Mapping Science Metrics Survey Discussion
Problem statement Usage data
The measure of science in the print-era
A word on the traditional way
Gold standard: citation data
Extracted from “publication” data.
From print comes citation:
Science is a gift-economy
Steal text, but not ideas.
Currency is acknowledgement
of influence: citations
More citations, more influence
Track scientific activity from
citation data
Johan Bollen - jbollen@indiana.edu
Tracking science in real-time from large-scale usage data
Introduction MESUR Mapping Science Metrics Survey Discussion
Problem statement Usage data
The measure of science in the print-era
A word on the traditional way
Gold standard: citation data
Extracted from “publication” data.
From print comes citation:
Science is a gift-economy
Steal text, but not ideas.
Currency is acknowledgement
of influence: citations
More citations, more influence
Track scientific activity from
citation data
Citations are derived from
peer-reviewed publications
Johan Bollen - jbollen@indiana.edu
Tracking science in real-time from large-scale usage data
Introduction MESUR Mapping Science Metrics Survey Discussion
Problem statement Usage data
From this springs an entire industry
Applications of citation data
End-user services and bibliometrics
Web of Science, Scopus, publishers, institutions, etc.
Johan Bollen - jbollen@indiana.edu
Tracking science in real-time from large-scale usage data
Introduction MESUR Mapping Science Metrics Survey Discussion
Problem statement Usage data
From this springs an entire industry
Applications of citation data
End-user services and bibliometrics
Web of Science, Scopus, publishers, institutions, etc.
Citation-based services
Reference linking
Variety of end-user services
Johan Bollen - jbollen@indiana.edu
Tracking science in real-time from large-scale usage data
Introduction MESUR Mapping Science Metrics Survey Discussion
Problem statement Usage data
From this springs an entire industry
Applications of citation data
End-user services and bibliometrics
Web of Science, Scopus, publishers, institutions, etc.
Citation-based services
Reference linking
Variety of end-user services
Citation-based assessment and bibliometrics
Impact metrics
Analytics
Mapping
Johan Bollen - jbollen@indiana.edu
Tracking science in real-time from large-scale usage data
Introduction MESUR Mapping Science Metrics Survey Discussion
Problem statement Usage data
Two problems with present citation-based approach
Data (1) and metrics (2)
Issues:
1
Inherent features of citation data due to its origins in
published material
2
Existing metrics fail to acknowledge (1), and ignore network
properties of science as a social system.
Johan Bollen - jbollen@indiana.edu
Tracking science in real-time from large-scale usage data
Introduction MESUR Mapping Science Metrics Survey Discussion
Problem statement Usage data
Citation data
Delays and partiality
Citation problem 1: Data
Citation data is late and partial indicator of scientific activity
time
0 year
+0.5 year
+1.5 years
+2 years
+2.5 years
+3.5 years
discovery
research
publication
discovery
research
publication
?
citation
citation data
Publication delays
Domain-dependent practices
Publicationt → publicationt−1
→ citation DB
Community: Publishing authors
only
Johan Bollen - jbollen@indiana.edu
Tracking science in real-time from large-scale usage data
Introduction MESUR Mapping Science Metrics Survey Discussion
Problem statement Usage data
Citation data
Metrics vs. networks
Citation problem 2: networks
Ignoring network properties of science
Resource Management
+%*.%73$"%$,/
/*-%0%73$"%*86*+%$/3
*-3.%)'"*(
7,&'$+%'/,4%.*-*1
/*-%0%7,&'$+%&'$
$,34%$/3%$,/%*.%0
(*2,$'%:,-'%0
bad
X
0%*.%*/*2%2'&.*+,4
*&/"%2'&.*+,4
'/,4,1#
*/+*%",&+3/
!4*-+%*-2%$,34
!4*-+%/'44
z
X
Food
.'*+%$/3
*.%-*+
*.%0%/43-%-6+&
/&,!%$/3
)&3+%0%2'&.*+,4
43.-,4%,/'*-,1&
/43-%-'6&,!"#$3,4
0%';!%.*&%)3,4%'/,4
!4*-+%0
0%';!%),+
!4*-+*
.*&%)3,4
1*$+&,'-+'&,4,1#
Livestock
Dermatology
"#2&,)3,4,13*
,'/,4,13*
,39,$
",&+$/3'-/'
0%*.%$,/%",&+3/%$/3
0%"#2&,4
5*+'&%&'$,6&%&'$
"#2&,4%!&,/'$$
'(,46+3,'4'/+&,'-%/43-%-'6&,
0%*/,6$+%$,/%*.
)&3+%0%,!"+"*4.,4
.*&%'/,4!!&,1%$'&
.6+*+%&'$
.,4%'/,4
0%*-3.%$/3
*-3.%&'!&,2%$/3
&'(%)&*$%:,,+'/0%&'!&,2%7'&+34
0%2*3&#%$/3
+"'&3,1'-,4,1#
!"#+,/"'.3$+&#
'6&%0%*!!4%!"#$3,4
'6&%&'$!3&%0
*.%&'(%&'$!3&%23$
23*)'+'$
)'"*(%)&*3-%&'$
!$#/",!"*&.*/,4,1#
!"*&.*/,4%)3,/"'.%)'
0%*1&%7,,2%/"'.
)3,4%&'!&,2
!,64+&#%$/3
*&/"%,!"+"*4.,4!/"3/
,!"+"*4.,4,1#
.'2%$/3%$!,&+%';'&
*.%0%,!"+"*4.,4
*$3*-%*6$+&*4%0%*-3.
'6&%0%!"*&.*/,4
0%*!!4%!"#$3,4
/3&/%&'$
$"'4;$=>%$"'4;4=>
$*2*)$
Medicine
*.%0%&'$!%/&3+%/*&'
0%+",&*/%/*&23,(%$6&
*--%+",&*/%$6&1
/"'$+
0%!"*&.*/,4%';!%+"'&
*.%0%!"#$3,4!"'*&+%/
3-('$+%,!"+"%(3$%$/3
932-'#%3-+
/3&/64*+3,1'-'+3/$
+"&,.)%"*'.,$+*$3$
*&/"%(3&,4
.,4%)3,4%'(,4
)&*3-%&'$
3-+%0%/*&23,4
*.%"'*&+%0
0%/43-%3-('$+
*&+'&3,$/4%+"&,.%(*$
0%*.%/,44%/*&23,4
0%)3,4%/"'. 0%1'-%(3&,4
0%-*+%!&,26/+$
*.%0%/*&23,4
4*-/'+
'6&%"'*&+%0
-'5%'-14%0%.'2
0*.*!0%*.%.'2%*$$,/
'-2,/&3-,4,1#
0%/43-%'-2,/&%.'+*)
*-*4%)3,/"'.
*/+*%/&#$+*44,1&%2
-'6&,$/3%4'++
0%3..6-,4
*/+*%/&#$+*44,1&%'
!'23*+&3/$
0%-'6&,!"#$3,4
0%*!!4%/&#$+*44,1&*&/"%)3,/"'.%)3,!"#$
(3&,4,1#
0%(3&,40%!"#$3,4!4,-2,*/+*%/&#$+*44,1&%*
/,/"&*-'%2)%$#$+%&'(
!&,+'3-%$/3
'6&%0%)3,/"'.
0%';!%.'2 (*//3-'
.3/&,)3,4!69
!%-*+4%*/*2%$/3%6$*
0%-'6&,$/3
0%/"'.%$,/%!'&9%+%?
/43-%3-7'/+%23$
0%-'6&,/"'.
0%3-7'/+%23$
0%/'44%)3,4
*-*4%/"'.
-'6&,$/3'-/'
.'+",2%'-:#.,4
)3,/"'.%)3,!"%&'$%/,
1'-'%2'(
';!%)&*3-%&'$
)3,,&1*-%.'2%/"'.
.,4%/'44%)3,4
"'4(%/"3.%*/+*
)4,,2
!&,+'3-$';!%/'44%&'$
)3,,&1%.'2%/"'.%4'++
0%/,.!%-'6&,4
3-7'/+%3..6'.),%0
)3,/"'.%0
/'44'6&%0%-'6&,$/3
-'6&,.,4%)3,4%/'44
7')$%4'++
0%.'2%/"'.
+'+&*"'2&,-!*$#..'+&
*-+3.3/&,)%*1'-+$%/"
0%.,4%)3,47')$%0
0%,&1%/"'.
/'44%.,4%437'%$/3
0%/'44%$/3
$#-4'++
.,4%.3/&,)3,4
+'+&*"'2&,+'+&*"'2&,-%4'++
)3,/"'.3$+&#
,-/,1'-'
7'&+34%$+'&34
$#-+"'+3/%/,..6,&1%)3,.,4%/"'.
"6.%&'!&,2
$#-+"'$3$!$+6++1*&+
'6&%0%3..6-,4
-6/4'3/%*/32$%&'$
/*-%0%/"'.
0%/43-%.3/&,)3,4
3..6-,4%&'(
'6&%0%,&1%/"'.
1'-'
/"'.%)'&
)3,/"3.%)3,!"#$%*/+*
,&1%4'++
/43-%/*-/'&%&'$
0%/43-%,-/,4
/*-/'&%&'$
:%*-,&1%*441%/"'.
2'('4,!.'-+
2'(%)3,4
0%)*/+'&3,4
0%,&1*-,.'+%/"'.
0%-*+4%/*-/'&%3
/"'.%/,..6)3,3-7,&.*+3/$
/,,&23-%/"'.%&'(
'6&%0%3-,&1%/"'.
*-1'5%/"'.%3-+%'23+
$/3'-/'
)%/"'.%$,/%0!3-,&1%/"3.%*/+*
3-,&1%/"'.
)3,!"#$%0
!,4#"'2&,(3$3,-%&'$
2'(%2#-*.
,&1*-,.'+*443/$
0%*.%/"'.%$,/
3-+%0%&*23*+%,-/,4
*-+3/*-/'&%&'$
-*+6&'
0%.,4%/*+*4%*!/"'. /"'.!'6&%0
2*4+,-%+
/"'.%&'(
$+62%$6&7%$/3%/*+*4
7'.$%.3/&,)3,4%4'++
*!!4%'-(3&,-%.3/&,)
0%'4'/+&,*-*4%/"'.
/*-/'&
*.%0%,)$+'+%1#-'/,4
.3/&,!,&%.'$,!,&%.*+
*!!4%/*+*4%*!1'0%!",+,/"%!",+,)3,%*
0%/*+*40%.,4%$+&6/+
1#-'/,4%,-/,4
0%!,4#.%$/3%!,4%/"'.
!'&/'!+%!$#/",!"#$
,)$+'+%1#-'/,4
.*/&,.,4'/64'$
4*-1.63&
$!'/+&,/"3.%*/+*%*
0%/,44,32%3-+'&7%$/3
'4'/+&,/"3.%*/+*
0%!"#$%/"'.!6$
/*+*4%+,2*#
0%!"#$%/"'.%)
!"#$%/"'.%/"'.%!"#$
.*/&,.,4%/"'.%!"#$
0%/,.!6+%/"'.
.,4%!"#$ /"'.%.*+'&
/"'.,$!"'&'
0%*!!4%!,4#.%$/3
/,44,32%$6&7*/'%*
'-(3&,-%$/3%+'/"-,4
/"'.%1',4
0%!"#$%/"'.%*
0%!,4#.%$/3%!,4%!"#$
5*+'&%&'$
3-2%'-1%/"'.%&'$
!,4#.'&
0%/"'.%!"#$
*2(%.*+'&
/"'.%!"#$%4'++
/"'.%!"#$
+"',/"'.!0%.,4%$+&6/
1',/"3.%/,$.,/"3.%*/
0%'4'/+&,/"'.%$,/
1',/"'.%1',!"#%1',$#
0%.,4%$!'/+&,$/
74632%!"*$'%'86343)&
5*+'&%'-(3&,-%&'$
'*&+"%*-2%!4*-'+*&#%$/3'-/'%4'++'&$
1',!"#$%&'$%4'++
3-+%0%86*-+6.%/"'.
43+",$
$6&7%$/3 6-!6)
0%,!+%$,/%*.%*
/,-+&3)%.3-'&*4%!'+&
*3/"'%0
0%!'+&,4
*.%.3-'&*4
3/*&6$
*+.,$%'-(3&,0%-,-!/&#$+%$,432$
0%(,4/*-,4%1',+"%&'$
/,..6-3/*+3,0%.*+'&%$/3
/"'.%'-1%$/3
0%'6&%/'&*.%$,/
!"#$%&'(%'
+'/+,-,!"#$3/$
0%!"#$!/,-2'-$%.*+
1',4,1#
0%!"#$%)!*+%.,4%,!+
1',!"#$%0%3-+
0%*!!4%!"#$
0%!"#$%/%$,432%$+*+'
9'#%'-1%.*+'&
0%*+.,$%$/3
.,-%-,+%&%*$+&,-%$,/
*--%1',!"#$!*+.%"#2&
*!!4%$6&7%$/3 !"#$%&'(%*
0%!"#$%,/'*-,1&
0%*.%/'&*.%$,/
*$+&,-%*$+&,!"#$ !4*-'+%$!*/'%$/3
.,-%5'*+"'&%&'(
0%1',!"#$%&'$
+"3-%$,432%734.$
*3!%/,-7%!
!"#$3/*%*
*!!4%!"#$%4'++
!"#$%&'(%)
&'(%.,2%!"#$
.*+%$/3%'-1%*!$+&6/+
0%!"#$%2%*!!4%!"#$
.*+'&%$/3%7,&6.
!"#$3/*4%&'(3'5
0%-'6&,$6&1
*/+*%.*+'&
0%/43.*+'
'6&,!"#$%4'++
8%0%&,#%.'+',&%$,/
$6&1%-'6&,4
0*!*-'$'%0,6&-*4%,7%*!!43'2%!"#$3/$
*!!4%,!+3/$
!"#$%4'++%*
$,432%$+*+'%/,..6$!*/'%$/3%&'(
0%/&#$+%1&,5+"
0%$+&6/+%1',4
3-06&#
0%';!%+"',&%!"#$<!%$,/%!",+,!,!+%3-$ *$+&,!"#$%$!*/'%$/3
)%$'3$.,4%$,/%*.
!"#$3/*%)
.'+*44%.*+'&%+&*-$%*
-'6&,$6&1'&#
*--6%&'(%*$+&,-%*$+&
!6)4%*$+&,-%$,/%!*/
*/+*%.'+*44%.*+'&
,!+%/,..60%.*1-%.*1-%.*+'&
0%*+.,$%$,4!+'&&%!"#
0%*&+"&,!4*$+#
!"#$%&'(%4'++ ,!+%';!&'$$
*$+&,-%0
-6/4%3-$+&6.%.'+"%)
*$+&%$,/%!
!"#$%&'!
0%*44,#%/,.!2
0%!"#$%*!.*+"%1'*$+&,!"#$%0%$6!!4%$
'6&%!"#$%0%)
0%),-'%0,3-+%$6&1%)&
,!+%4'++
/43-%,&+",!%&'4*+%&
*--%!"#$!-'5%#,&9
0%),-'%0,3-+%$6&1%*.
0%74632%.'/"
*&+"&,$/,!#
*$+&,!"#$%0$,4%!"#$
0%!"#$%/"'.%$,432$
,!+%'-1
!"#$%74632$
$!3-'
-6/4%!"#$%*
0%!'23*+&%,&+",!'2
3-+%0%.,2%!"#$%)
0%!'23*+&%,&+",!%)
3'''%0%86*-+6.%'4'/+
!"#$%$+*+6$%$,4323%)
0%.*+"%!"#$
'6&%!"#$%0%*
!"#$%&'(%2
3-+%0%.,2%!"#$%*
*.%0%$!,&+%.'2
&'.,+'%$'-$%'-(3&,!"#$%!4*$.*$
!"#$3/*%/
!"#$%&'(%/
!"#$%4'++%)
-6/4%!"#$%)
0%431"+5*('%+'/"-,4
/4*$$3/*4%*-2%86*-+6.%1&*(3+#
0%!"#$%1%-6/4%!*&+3/
'4'/+&,-%4'++
-6/4%!"#$%)!!&,/%$6!
!4*$.*%!"#$%/,-+&%7
-6/4%2*+*%$"''+$
.,2%!"#$%4'++%*
/,..6-%.*+"%!"#$
'6&%!"#$%0%/
0%/"&,.*+,1&%*
0%*!!4%!,64+&#%&'$
0%$!''/"%4*-1%"'*&%&
0%$,6-2%(3)
23*)'+'$%/*&'
!4*-+%!"#$3,4
0%7,,2%$/3
0%$/3%7,,2%*1&
7,,2%/"'.
Measuring impact, influence,
prestige from citation data:
+*4*-+*
*-*4%)3,*-*4%/"'.
*-*4%/"3.%*/+*
good XX
z
X
'4'/+&,*-*4
Chemistry
ObGyn
Material Sciences
Geology
Sports Medicine
Physics
0%"31"%'-'&1#%!"#$
!"#$%*+,.%-6/4<
:%!"#$%/%!*&+%73'42$
0%/,$.,4%*$+&,!*&+%!
&*23,4,1#
3-+%0%&'.,+'%$'-$
3'''%+%3-7,&.%+"',&#
-6/4%3-$+&6.%.'+"%*
*.%0%&,'-+1'-,4
-6/4%76$3,3-+%0%"'*+%.*$$%+&*-
4'/+%-,+'$%/,.!6+%$/
More citations is better
than less citations
In citation network,
central position is better
than outskirts.
$3*.%0%/,.!6+
/,..6-%*/.
+"',&%/,.!6+%$/3
0%*$$,/%/,.!6+%.*/"
Computer Science
Johan Bollen - jbollen@indiana.edu
Tracking science in real-time from large-scale usage data
Introduction MESUR Mapping Science Metrics Survey Discussion
Problem statement Usage data
Citation problem no. 2
Metrics vs. networks
This approach is epitomized in Thomson-Reuters Impact Factor:
Johan Bollen - jbollen@indiana.edu
Tracking science in real-time from large-scale usage data
Introduction MESUR Mapping Science Metrics Survey Discussion
Problem statement Usage data
Citation problem no. 2
Metrics vs. networks
This approach is epitomized in Thomson-Reuters Impact Factor:
Johan Bollen - jbollen@indiana.edu
Tracking science in real-time from large-scale usage data
Introduction MESUR Mapping Science Metrics Survey Discussion
Problem statement Usage data
Citation problem no. 2
Metrics vs. networks
This approach is epitomized in Thomson-Reuters Impact Factor:
Standard citation graph representation
G = (V , E , W )
E ⊆ V2
W : E → N+
However:
∀(ni , nj ) ∈ E : pubtime(ni ) > pubtime(ni )
Impact Factor: normalized in-degree
IFj =
P
i wij
Nj
Johan Bollen - jbollen@indiana.edu
Tracking science in real-time from large-scale usage data
Introduction MESUR Mapping Science Metrics Survey Discussion
Problem statement Usage data
Citation problem no. 2
Between Big Star and Britney Spears
Is more always better?
Why context matters
Johan Bollen - jbollen@indiana.edu
Tracking science in real-time from large-scale usage data
Introduction MESUR Mapping Science Metrics Survey Discussion
Problem statement Usage data
Citation problem no. 2
Between Big Star and Britney Spears
Is more always better?
Why context matters
Sold 50M records
Influenced who?
High popularity, low
prestige.
Johan Bollen - jbollen@indiana.edu
Tracking science in real-time from large-scale usage data
Introduction MESUR Mapping Science Metrics Survey Discussion
Problem statement Usage data
Citation problem no. 2
Between Big Star and Britney Spears
Is more always better?
Why context matters
Sold 50M records
Influenced who?
High popularity, low
prestige.
Johan Bollen - jbollen@indiana.edu
Tracking science in real-time from large-scale usage data
Introduction MESUR Mapping Science Metrics Survey Discussion
Problem statement Usage data
Citation problem no. 2
Between Big Star and Britney Spears
Is more always better?
Why context matters
Sold 50M records
Influenced who?
High popularity, low
prestige.
Johan Bollen - jbollen@indiana.edu
Sold 50k records
Influenced REM, B52s,
Teenage Fanclub, etc.
Low popularity, high
influence
Tracking science in real-time from large-scale usage data
Introduction MESUR Mapping Science Metrics Survey Discussion
Problem statement Usage data
Citation problem no. 2
Between Big Star and Britney Spears
Silly? That’s how we evaluate scientific impact right now!
The Impact Factor and lots of other citation-based metrics are
presently being used to assess the quality of publications, journals,
authors, institutions, and even entire countries by proxy.
Note: Lots of developments on better metrics (cf. Eigenfactor:
Bergstrom and Rosvall), but still reliance on citation data.
Johan Bollen - jbollen@indiana.edu
Tracking science in real-time from large-scale usage data
Introduction MESUR Mapping Science Metrics Survey Discussion
Problem statement Usage data
New developments
PageRank for citation graphs
P
5
v1
50
v3
15
w
Impact Factor: IFj = Ni j ij :
normalized citation count
origin of citation disregarded
v2
?
10
3
40
1
60
20
2
20
20
?
4
A different route:
P
IF (vi , ) ' λ j IFj
P
1
IF (vi , ) ' λ j IFj × O(v
j)
P
1
PR(vi ) ' λ j PR(vj ) × O(v
j)
P
(1−λ)
1
PR(vi ) ' N + λ j PR(vj ) × O(v
j)
P
PRw (vi ) = (1−λ)
j PRw (vj ) × w (vj , vi )
N +λ
Johan Bollen - jbollen@indiana.edu
Tracking science in real-time from large-scale usage data
Introduction MESUR Mapping Science Metrics Survey Discussion
Problem statement Usage data
New developments, contd
PageRank for citation graphs
ISI IF
PRw
rank
value
Journal
value (x 103 )
Journal
1
52.28
ANNU REV IMMUNOL
16.78
NATURE
2
37.65
ANNU REV BIOCHEM
16.39
J BIOL CHEM
3
36.83
PHYSIOL REV
16.38
SCIENCE
4
35.04
NAT REV MOL CELL BIO
14.49
P NATL ACAD SCI USA
5
34.83
NEW ENGL J MED
8.41
PHYS REV LETT
6
30.98
NATURE
5.76
CELL
7
30.55
NAT MED
5.70
NEW ENGL J MED
8
29.78
SCIENCE
4.67
J AM CHEM SOC
9
28.18
NAT IMMUNOL
4.46
J IMMUNOL
10
28.17
REV MOD PHYS
4.28
APPL PHYS LETT
Table: The highest ranking journals according to ISI IF and Weighted
PageRank (JCR2003)
Johan Bollen - jbollen@indiana.edu
Tracking science in real-time from large-scale usage data
Introduction MESUR Mapping Science Metrics Survey Discussion
Problem statement Usage data
Two problems with present citation-based approach
Data (1) and metrics (2)
Issues:
1
Inherent features of citation data due to its origins in
published material
2
Existing metrics fail to acknowledge (1), and ignore network
properties of science as a social system.
Johan Bollen - jbollen@indiana.edu
Tracking science in real-time from large-scale usage data
Introduction MESUR Mapping Science Metrics Survey Discussion
Problem statement Usage data
Post-print, online era, aka known as “today”: usage data?
Most use of scholarly resources is now mediated by online services
time
0 year
+0.5 year
+1.5 years
+2 years
discovery
research
publication
discovery
+2.5 years
+3.5 years
research
publication
citation
usage data
search queries
text data
early indicators
Recorded by nearly every
scholarly service
Captures early phases of
scientific activity
Includes a wide variety of
resource types
Johan Bollen - jbollen@indiana.edu
citation data
late indicators
Activities of larger scientific
communities, e.g. not limited
only to authors
Scale: cf. Elsevier announced
1B downloads in 2006 vs. 650M
citation in WoS.
Tracking science in real-time from large-scale usage data
Introduction MESUR Mapping Science Metrics Survey Discussion
Problem statement Usage data
Issues with usage data
Lots of interest in leveraging usage data for models of science,
tracking scientific activity, metrics, etc. Cf. COUNTER, Ex Libris
bX, and many more.
However many different issues:
Silo: recorded for particular service
for a particular user community
Lack of standards: recorded in
different manner, for different
systems
Lack of research: how leverage
usage data for scientific tracking,
modeling, metrics, etc.
Johan Bollen - jbollen@indiana.edu
Tracking science in real-time from large-scale usage data
Introduction MESUR Mapping Science Metrics Survey Discussion
MESUR overview Creating MESUR’s reference data set
Outline
1 Introduction
Problem statement
Usage data
2 MESUR
MESUR overview
Creating MESUR’s reference data set
3 Mapping Science
4 Metrics Survey
5 Discussion
Overview
Future Research
Relevant papers
Johan Bollen - jbollen@indiana.edu
Tracking science in real-time from large-scale usage data
Introduction MESUR Mapping Science Metrics Survey Discussion
MESUR overview Creating MESUR’s reference data set
MESUR project: survey the potential of usage data at very
large-scale
Studying scientific activity
Johan Bollen - jbollen@indiana.edu
Tracking science in real-time from large-scale usage data
Introduction MESUR Mapping Science Metrics Survey Discussion
MESUR overview Creating MESUR’s reference data set
MESUR project: survey the potential of usage data at very
large-scale
Studying scientific activity
Can we model and study patterns of scientific activity from
large-scale, representative usage data?
Johan Bollen - jbollen@indiana.edu
Tracking science in real-time from large-scale usage data
Introduction MESUR Mapping Science Metrics Survey Discussion
MESUR overview Creating MESUR’s reference data set
MESUR project: survey the potential of usage data at very
large-scale
Studying scientific activity
Can we model and study patterns of scientific activity from
large-scale, representative usage data?
What can we learn about impact and structure from patterns
of scientific activity from usage data?
Johan Bollen - jbollen@indiana.edu
Tracking science in real-time from large-scale usage data
Introduction MESUR Mapping Science Metrics Survey Discussion
MESUR overview Creating MESUR’s reference data set
MESUR history
Johan Bollen - jbollen@indiana.edu
Tracking science in real-time from large-scale usage data
Introduction MESUR Mapping Science Metrics Survey Discussion
MESUR overview Creating MESUR’s reference data set
MESUR history
2006-2008: Andrew W. Mellon
Foundation
Johan Bollen - jbollen@indiana.edu
Tracking science in real-time from large-scale usage data
Introduction MESUR Mapping Science Metrics Survey Discussion
MESUR overview Creating MESUR’s reference data set
MESUR history
2006-2008: Andrew W. Mellon
Foundation
2008-2009: Los Alamos
National Laboratory
Johan Bollen - jbollen@indiana.edu
Tracking science in real-time from large-scale usage data
Introduction MESUR Mapping Science Metrics Survey Discussion
MESUR overview Creating MESUR’s reference data set
MESUR history
2006-2008: Andrew W. Mellon
Foundation
2008-2009: Los Alamos
National Laboratory
2009-: Indiana University,
School of Informatics and
Computing
Johan Bollen - jbollen@indiana.edu
Tracking science in real-time from large-scale usage data
Introduction MESUR Mapping Science Metrics Survey Discussion
MESUR overview Creating MESUR’s reference data set
MESUR history
2006-2008: Andrew W. Mellon
Foundation
2008-2009: Los Alamos
National Laboratory
2009-: Indiana University,
School of Informatics and
Computing
2009-2013: National Science
Foundation
Johan Bollen - jbollen@indiana.edu
Tracking science in real-time from large-scale usage data
Introduction MESUR Mapping Science Metrics Survey Discussion
MESUR overview Creating MESUR’s reference data set
MESUR history
2006-2008: Andrew W. Mellon
Foundation
2008-2009: Los Alamos
National Laboratory
2009-: Indiana University,
School of Informatics and
Computing
2009-2013: National Science
Foundation
Team: PI, co-PI, 2 scientists,
2 full-time developers, and 1
PhD student.
Johan Bollen - jbollen@indiana.edu
Tracking science in real-time from large-scale usage data
Introduction MESUR Mapping Science Metrics Survey Discussion
MESUR overview Creating MESUR’s reference data set
MESUR objective
Johan Bollen - jbollen@indiana.edu
Tracking science in real-time from large-scale usage data
Introduction MESUR Mapping Science Metrics Survey Discussion
MESUR overview Creating MESUR’s reference data set
MESUR dataflow
Data
providers
Filtering &
deduplication
impact
metrics
Publishers
Usage
data
Parser 1
Aggregators
Usage
date
Parser 2
institutions
Usage
date
Parser 3
...
MESUR
ontology
ingestion
Models
MESUR
reference data set
Models
...
Johan Bollen - jbollen@indiana.edu
Tracking science in real-time from large-scale usage data
Introduction MESUR Mapping Science Metrics Survey Discussion
MESUR overview Creating MESUR’s reference data set
MESUR
ontology
Aggregators
Usage
date
Parser 2
ingestion
Models
0.0e+00
Parser 1
number of events
impact
metrics
Publishers
1.0e+07
Filtering &
deduplication
Usage
data
5.0e+06
Data
providers
1.5e+07
Creating MESUR’s data set
institutions
...
Usage
date
Parser 3
MESUR
reference data set
2004
2005
Models
2006
2007
2008
2009
date
...
Providers 2006-20010: BMC, Blackwell, UC, CSU (23),
EBSCO, ELSEVIER, EMERALD, INGENTA, JSTOR,
LANL, MIMAS/ZETOC, THOMSON, UPENN (9),
UTEXAS
events 1,000,000,000 usage events
citations +500,000,000 citations,
articles and serials +50M articles, +-100,000 serials
Johan Bollen - jbollen@indiana.edu
Tracking science in real-time from large-scale usage data
Introduction MESUR Mapping Science Metrics Survey Discussion
MESUR overview Creating MESUR’s reference data set
Data requirements
Minimum requirements
Separate user requests
Date-time stamp down to second
Document identifier or sufficient metadata to de-duplicate
Session ID (or anonymized user ID)
Request type identifier
http://tweety.lanl.gov/public/schemas/2007-01/mesur.owl
Johan Bollen - jbollen@indiana.edu
Tracking science in real-time from large-scale usage data
Introduction MESUR Mapping Science Metrics Survey Discussion
MESUR overview Creating MESUR’s reference data set
MESUR’s OWL/RDF ontology
1
1
Rodriguez, Bollen & Van de Sompel. A Practical Ontology for the Large-Scale Modeling of Scholarly
Artifacts and their Usage. JCDL07 - Based on OntologyX work.
Johan Bollen - jbollen@indiana.edu
Tracking science in real-time from large-scale usage data
Introduction MESUR Mapping Science Metrics Survey Discussion
Outline
1 Introduction
Problem statement
Usage data
2 MESUR
MESUR overview
Creating MESUR’s reference data set
3 Mapping Science
4 Metrics Survey
5 Discussion
Overview
Future Research
Relevant papers
Johan Bollen - jbollen@indiana.edu
Tracking science in real-time from large-scale usage data
Introduction MESUR Mapping Science Metrics Survey Discussion
Reference data set
Subsetting
Domain
Usage
UC Degrees
JCR
Natural Science
Social Sciences
Humanities
37%
45%
14%
39%
46%
15%
92.8%
7.2%
Source: http://www.ucop.edu/ucophome/uwnews/stat/statsum/fall2007/statsumm2007.pdf (table 9)
Johan Bollen - jbollen@indiana.edu
Tracking science in real-time from large-scale usage data
Introduction MESUR Mapping Science Metrics Survey Discussion
Reference data set
Subsetting
Common time period March 1st 2006 - February 1st 2007
Domain
Usage
UC Degrees
JCR
Natural Science
Social Sciences
Humanities
37%
45%
14%
39%
46%
15%
92.8%
7.2%
Source: http://www.ucop.edu/ucophome/uwnews/stat/statsum/fall2007/statsumm2007.pdf (table 9)
Johan Bollen - jbollen@indiana.edu
Tracking science in real-time from large-scale usage data
Introduction MESUR Mapping Science Metrics Survey Discussion
Reference data set
Subsetting
Common time period March 1st 2006 - February 1st 2007
Providers: Thomson Scientic (Web of Science), Elsevier
(Scopus), JSTOR, Ingenta, University of Texas (9
campuses, 6 health institutions), and California State
University (23 campuses)
Domain
Usage
UC Degrees
JCR
Natural Science
Social Sciences
Humanities
37%
45%
14%
39%
46%
15%
92.8%
7.2%
Source: http://www.ucop.edu/ucophome/uwnews/stat/statsum/fall2007/statsumm2007.pdf (table 9)
Johan Bollen - jbollen@indiana.edu
Tracking science in real-time from large-scale usage data
Introduction MESUR Mapping Science Metrics Survey Discussion
Reference data set
Subsetting
Common time period March 1st 2006 - February 1st 2007
Providers: Thomson Scientic (Web of Science), Elsevier
(Scopus), JSTOR, Ingenta, University of Texas (9
campuses, 6 health institutions), and California State
University (23 campuses)
Scale: 346,312,045 usage events, 97,532 serials (many of
which not journals)
Domain
Usage
UC Degrees
JCR
Natural Science
Social Sciences
Humanities
37%
45%
14%
39%
46%
15%
92.8%
7.2%
Source: http://www.ucop.edu/ucophome/uwnews/stat/statsum/fall2007/statsumm2007.pdf (table 9)
Johan Bollen - jbollen@indiana.edu
Tracking science in real-time from large-scale usage data
Introduction MESUR Mapping Science Metrics Survey Discussion
Article clickstreams
usage data log: U = {u1 , u2 , · · · , un }
u = {s, t, a}
s = session identifier , t = a date-time and a = article
f ∈ F = clickstreams extracted from U
f ⊂ U, where
f = (∀u ∈ U, ∃s : s(u) ∧ t(ui ) < t(ui+1 ))
s(u) and t(u): session identifier and date-time of interaction u.
clickstream c1
v1
clickstream c2
v2
v3
v1
v3
journals
a1
a2
a3
a4
a1
a4
articles
i1
i2
i3
i4
i1
i2
online
user interactions
session1
session2
time
Johan Bollen - jbollen@indiana.edu
Tracking science in real-time from large-scale usage data
Introduction MESUR Mapping Science Metrics Survey Discussion
Journal Clickstreams
article clickstream: fa = (a1 , a2 , · · · , ak )
journal clickstream: fv = (v1 , v2 , · · · , vk )
We observe: N(vi , vj )
for all pairs (vi , vj ) in which j = i + 1
From which follows:
N(vi , vj )
P(vi , vj ) = P
j N(vi , vj )
and
M whose entries mi,j = P(vi , vj ).
Johan Bollen - jbollen@indiana.edu
v2
15
v1
5
v3
30
v4
v2
0.3
v1
0.1
v3
0.6
v4
Tracking science in real-time from large-scale usage data
Introduction MESUR Mapping Science Metrics Survey Discussion
Examples of prominent connections
vi
American Journal of International Law
Journal of Educational Sociology
Surface Science
Journal of Organic Chemistry
Ecological Applications
Annals of Mathematics
vj
International Organization
International Affairs
International and Comparative Law Quarterly
Foreign Policy
American Political Science Association
American Journal of Sociology
Journal of Higher Education
Journal of Negro Education
American Sociological Review
Social Forces
Physical Review B
Applied Surface Science
Physical Review Letters
Journal of Chemical Physics
Applied Physics Letters
Journal of the American Chemical Society
Tetrahedron Letters
Tetrahedron
Organic Letters
Angewandte Chemie
Ecology
Conservation Biology
Bioscience
Annual Review of Ecology and Systematics
Clinical and Experimental Allergy
American Journal of Mathematics
American Mathematical Monthly
PNAS
Econometrica
Mathematics Magazine
Johan Bollen - jbollen@indiana.edu
p(vi , vj )
0.0207
0.0184
0.0171
0.0167
0.0140
0.0334
0.0303
0.0286
0.0276
0.0249
0.0704
0.0341
0.0339
0.0333
0.0327
0.0873
0.0865
0.0602
0.0532
0.0305
0.0965
0.0524
0.0215
0.0215
0.0191
0.0705
0.0579
0.0156
0.0082
0.0077
N(vi , vj )
9,292
8,254
7,654
7,500
6,291
2,790
2,529
2,389
2,303
2,076
2,555
1,239
1,230
1,207
1,188
4,141
4,105
2,857
2,526
1,448
13,659
7,408
3,043
3,043
2,699
5,392
4,432
1,195
624
587
N(vi )
448,034
83,419
36,282
47,439
141,481
76,526
Tracking science in real-time from large-scale usage data
Introduction MESUR Mapping Science Metrics Survey Discussion
Network parameters
Parameter
Journals
Edges
Matrix density
Strongly Connected Components (SCC)
Journals in SCC
Average journal clustering coefficient (SCC)
Diameter of largest SCC
Johan Bollen - jbollen@indiana.edu
Network matrix
M
M0
97,532
2,307
6,783,552 50,000
0.071%
0.939%
16,474
236
80,934
1,944
0.285
0.514
37
14
Tracking science in real-time from large-scale usage data
Introduction MESUR Mapping Science Metrics Survey Discussion
common
date range
all edges
most reliable
edges
MESUR
log
data
matrix
M
matrix
M'
1B
interactions
346M
interactions
97k journals
6.7M edges
2307 journals
50k edges
36000
map
28000
Classification code:
AAT
taxonomy
20000
root
0
5000
12000
N(vi,vj)
rankings
0e+00
3e+04
1e+05
Dewey
JCR
2e+05
rank
1
2
3
4
5
Journals
Fruchterman (1991) Graph Drawing by Force-directed
Placement. Software, 21(11):1129-1164
Johan Bollen - jbollen@indiana.edu
Tracking science in real-time from large-scale usage data
Introduction MESUR Mapping Science Metrics Survey Discussion
Minerology
Acoustics
Manufacturing
Material science
Engineering
Production
research
Law
Economics
Clinical
trials
Thermodynamics
Applied
physics
Clinical
Pharmacology
Demographics
Statistical
physics
International
studies
Sociology
Statistics
Electrochemistry
Physical
chemistry
Public
health
Polymers
Dermatology
Asian
studies
Organic
chemistry
Philosophy
Social work
Child
Psychology
Analytical
Chemistry
Education
Alternative
energy
Religion
Social and personality
psychology
Biochemistry
Nursing
Pharmaceutical
research
Anthropology
Psychology
Chemical
Engineering
Archeology
Human
geography
Brain studies
Cognitive
Science
Sports
medicine
Toxicology
Miocrobiology
Ecology
Biotechnology
Classical
studies
Nutrition
Language
Music
Architecture
Design
Plant
biology
Biodiversity
Geography
Tourism
Environmental
Science
Soil/Marine
biology
Animal
Behavior
Neurology
Plant
agriculture
Hydrology
Genetics
Agriculture
Geology
Physiology
Plant
genetics
Johan Bollen - jbollen@indiana.edu
Brain
research
Tracking science in real-time from large-scale usage data
Introduction MESUR Mapping Science Metrics Survey Discussion
The Arts and Architecture Thesaurus (Getty Research
Center)
Table: Distance from AAT root (α) and number of classifications
Nc at that level. Each α produces a finer-grained separation of
scientific disciplines.
Distance (α)
1
2
3
4
Nc
4
8
31
195
Example classifications
Natural sciences, social sciences, humanities, · · ·
Biology, chemistry, physics, · · ·
Classics, communication, engineering, · · ·
Allergy, anesthesiology, applied linguistics, · · ·
Johan Bollen - jbollen@indiana.edu
Tracking science in real-time from large-scale usage data
Introduction MESUR Mapping Science Metrics Survey Discussion
AAT
taxonomy
root
Dewey
JCR
1
Johan Bollen - jbollen@indiana.edu
2
3
4
5
Journals
Tracking science in real-time from large-scale usage data
Introduction MESUR Mapping Science Metrics Survey Discussion
Cross-validating the clickstream map
(
1
f (vi , vj , α) =
0
AAT
log
data

a0,0

f (vi , vj , α) ! Aα =  ...
an,0
'
P (vi , vj )

p0,0
&
.
! M! = 
 ..
pn,0
···
..
.
···
···
..
.
···
Cα (vi ) = Cα (vj )
Cα (vi ) 6= Cα (vj )

a0,n
..  , ai,j = 1
.  ai,j = 0
an,n

& F2 &
p0,n
"
#
..  , Ni,j ≤ µ0.5 (Nk )"F
.  Ni,j > µ0.5 (Nk )$ 1
$
%
pn,n
χ2
contingency table at α
α = 1 : p < 0.0001
Figure
the map structure given by M ! to journal relationships derived
α 6.=Cross-validating
2 : p < 0.0001
from AAT journal classifications, i.e. matrix Aα .
α = 3 : p < 0.0001
α = 4 : p < 0.0001
Johan Bollen - jbollen@indiana.edu
Tracking science in real-time from large-scale usage data
Introduction MESUR Mapping Science Metrics Survey Discussion
Betweenness centrality
Cb (vk ) =
X σi,j (vk )
σi,j
(1)
i6=j6=k
Table: Ranking of journals from M 0 according to betweenness
centrality.
Rank
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Journal
Top-level AAT classification
Science
Proceedings of the National Academy of Sciences
Environmental Health Perspectives
Chemosphere
Journal of Advanced Nursing
Nature
Ecology
Milbank Quarterly
Applied and Environmental Microbiology
Child Development
Behavioral Ecology and Sociobiology
Journal of Colloid and Information Science
American Anthropologist
Journal of Biogeography
Materials Science and Technology
Johan Bollen - jbollen@indiana.edu
Natural Sciences
Natural Sciences
Natural Science
Natural Sciences
Natural Sciences
Natural Sciences
Natural Sciences
Natural Sciences
Natural Sciences
Social Sciences
Social Sciences
Natural Sciences
Social Sciences
Natural Sciences
Natural Sciences
Tracking science in real-time from large-scale usage data
Introduction MESUR Mapping Science Metrics Survey Discussion
PageRank
PR(vi ) =
X PR(vj )
1−λ
+λ
N
O(vj )
(2)
j
Table: Ranking of journals from M 0 according to PageRank
(λ = 0.85).
Rank
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Journal
Top-level AAT classification
Applied Physics Letters
Journal of Advanced Nursing
Journal of the American Chemical Society
Ecology
Nature
Physical Review B
Journal of Applied Physics
American Economic Review
American Historical Review
Physical Review Letters
Science
Langmuir
Journal of Chemical Physics
American Anthropologist
Annals of the American Academy of Political and Social Science
Johan Bollen - jbollen@indiana.edu
Natural Sciences
Natural Sciences
Natural Sciences
Natural Sciences
Natural Sciences
Natural Sciences
Natural Sciences
Social Sciences
Social Sciences
Natural Sciences
Natural Sciences
Natural Sciences
Natural Sciences
Social Sciences
Social Science
Tracking science in real-time from large-scale usage data
Introduction MESUR Mapping Science Metrics Survey Discussion
Outline
1 Introduction
Problem statement
Usage data
2 MESUR
MESUR overview
Creating MESUR’s reference data set
3 Mapping Science
4 Metrics Survey
5 Discussion
Overview
Future Research
Relevant papers
Johan Bollen - jbollen@indiana.edu
Tracking science in real-time from large-scale usage data
Introduction MESUR Mapping Science Metrics Survey Discussion
Metrics
ID
Type
Measure
Source
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
Citation
Citation
Citation
Citaton
Citation
Citation
Citation
Citation
Citation
Citation
Citation
Citation
Citation
Citation
Citation
Citation
Citation
Citation
Citation
Citation
Citation
Citation
Citation
Usage
Usage
Usage
Usage
Usage
Usage
Usage
Usage
Usage
Usage
Usage
Usage
Usage
Usage
Usage
Usage
Scimago Journal Rank
Immediacy Index
Closeness Centrality
Cites per doc
Journal Impact Factor
Closeness centrality
Out-degree centrality
Out-degree centrality
Degree Centrality
Degree Centrality
H-Index
Scimago Total cites
Journal Cite Probability
In-degree centrality
In-degree centrality
PageRank
PageRank
PageRank
PageRank
Y-factor
Betweenness centrality
Betweenness centrality
Citation Half-Life
Closeness centrality
Closeness centrality
Degree centrality
PageRank
PageRank
In-degree centrality
Out-degree centrality
PageRank
PageRank
Betweenness centrality
Betweenness centrality
Degree centrality
Out-degree centrality
In-degree centrality
Journal Use Probability
Usage Impact Factor
Scimago/Scopus
JCR 2007
JCR 2007
Scimago/Scopus
JCR 2007
JCR 2007
JCR 2007
JCR 2007
JCR 2007
JCR 2007
Scimago/Scopus
Scimago/Scopus
JCR 2007
JCR 2007
JCR 2007
JCR 2007
JCR 2007
JCR 2007
JCR 2007
JCR 2007
JCR 2007
JCR 2007
JCR 2007
MESUR 2007
MESUR 2007
MESUR 2007
MESUR 2007
MESUR 2007
MESUR 2007
MESUR 2007
MESUR 2007
MESUR 2007
MESUR 2007
MESUR 2007
MESUR 2007
MESUR 2007
MESUR 2007
MESUR 2007
MESUR 2007
Johan Bollen - jbollen@indiana.edu
Network parameters
Undirected, weighted
Undirected, unweighted
Directed, weighted
Directed, unweighted
Undirected, weighted
Undirected, unweighted
Directed, unweighted
Directed, weighted
Directed, unweighted
Undirected, unweighted
Undirected, weighted
Directed, weighted
Directed, weighted
Undirected, weighted
Undirected, unweighted
Undirected, weighted
Undirected, unweighted
Undirected, unweighted
Undirected, unweighted
Directed, unweighted
Directed, unweighted
Directed, unweighted
Directed, weighted
Undirected, weighted
Undirected, unweighted
Undirected, weighted
Undirected, weighted
Directed, weighted
Directed, weighted
PC1
PC2
-0.974
1.659
0.339
-1.311
-1.854
-1.388
-3.191
-2.703
-4.850
-4.398
-3.326
-4.926
-5.389
-5.302
-5.380
-4.476
-4.929
-4.160
-3.103
-2.971
-0.462
-0.474
/
3.130
3.100
3.271
3.327
3.463
3.463
3.484
3.780
3.813
3.988
3.957
5.293
5.302
5.286
8.914
/
-8.296
-7.046
-6.284
-6.192
-5.937
-4.827
-4.215
-4.015
-2.834
-2.643
-2.003
-1.722
-1.647
-1.429
-1.554
0.108
0.731
0.864
0.333
0.317
0.872
1.609
/
2.683
3.899
3.873
4.192
4.336
4.015
3.994
4.217
4.223
4.271
3.698
3.528
3.518
3.531
1.833
/
ρ̄
0.556?
0.508?
0.565?
0.588?
0.592?
0.619
0.642
0.640
0.690
0.691
0.681
0.712
0.710
0.717
0.712
0.693
0.726
0.696
0.659
0.657
0.643
0.642
0.037
0.703
0.731
0.729
0.728
0.727
0.728
0.727
0.710
0.710
0.699
0.693
0.683
0.683
0.683
0.593
0.279
Tracking science in real-time from large-scale usage data
Introduction MESUR Mapping Science Metrics Survey Discussion
01.00
R10×10 =
B0.71
B0.77
B
B0.52
B
B
B0.79
B
B0.55
B
B0.69
B
B0.63
@
0.60
0.18
0.71
0.99
0.52
0.69
0.79
0.85
0.49
0.44
0.49
0.22
0.77
0.52
1.00
0.62
0.63
0.39
0.70
0.73
0.68
0.20
0.52
0.69
0.62
1.00
0.68
0.78
0.49
0.56
0.65
0.06
0.79
0.79
0.63
0.68
1.00
0.82
0.66
0.62
0.66
0.15
0.55
0.85
0.39
0.78
0.82
1.00
0.40
0.40
0.50
0.13
Johan Bollen - jbollen@indiana.edu
0.69
0.49
0.70
0.49
0.66
0.40
1.00
0.89
0.85
0.53
0.63
0.44
0.73
0.56
0.62
0.40
0.89
1.00
0.97
0.45
0.60
0.49
0.68
0.65
0.66
0.50
0.85
0.97
1.00
0.42
0.181
0.22C
0.20C
C
0.06C
C
0.15C
C
0.13C
C
C
0.53C
C
0.45C
A
0.42
1.00
19: Citation PageRank
5: Journal Impact Factor
22: Citation Betweenness
6: Citation Closeness
11: Citation H-index
1: Citation Scimago Journal Rank
31: Usage PageRank
34: Usage Betweenness
24: Usage Closeness
39: Usage Impact Factor
Tracking science in real-time from large-scale usage data
Introduction MESUR Mapping Science Metrics Survey Discussion
data sources
citation data
Scimago
citation data
2007 JCR
12,751
journals
7,584
journals
citation
network
7,388 x 7,388
1,4,11,12
2,5,23
3,6,7,8,9,10,13,14
15,16,17,18,19,20
21,22
usage log data
MESUR
usage
network
7,575 x 7,575
intersection
24,25,26,27,28,29,39
31,32,33,34,35,36,37,
38,39
impact measures
39x39
correlation
matrix
Schematic representation of data sources and processing. Impact measure identifiers.
Johan Bollen - jbollen@indiana.edu
Tracking science in real-time from large-scale usage data
Introduction MESUR Mapping Science Metrics Survey Discussion
Principal Component Analysis
Proportion of Variance
Cumulative Proportion
PC1
66.1%
66.1%
Johan Bollen - jbollen@indiana.edu
PC2
17.3%
83.4%
PC3
9.2%
92.6%
PC4
4.8%
97.4%
PC5
0.9%
98.3%
Tracking science in real-time from large-scale usage data
!
Introduction MESUR Mapping Science Metrics Survey Discussion
28
27 31 33
29 32
25
26 30
34
35
36
37
24
38
22
18
17
"
16
$%&
15
14
13 12
9
21
19 20
11
10
7
8
!!
6
5
4
3
2
1
!!
"
!
#"
$%#
Johan Bollen - jbollen@indiana.edu
Tracking science in real-time from large-scale usage data
Introduction MESUR Mapping Science Metrics Survey Discussion
Usage
Betweenness
PageRank
+
Usage
Probability
PC2
citation
Betweenness
PageRank
Total
Cites
-
JIF06
Citation
Immediacy
Cites per
Doc
-
Johan Bollen - jbollen@indiana.edu
Tracking science in real-time
+ from large-scale usage data
PC1
Introduction MESUR Mapping Science Metrics Survey Discussion
4: Cites per doc
5: Journal Impact Factor
1: Scimago Journal Rank
2: Citation Immediacy Index
7: Citation Out.degree centr.
8: Citation Out.degree centr.
6: Citation Closen. centr.
3: Citation Closen. centr.
18: Citation PageRank
16: Citation PageRank
20: Citation Y.factor
19: Citation PageRank
21: Citation Between. centr.
22: Citation Between. centr.
9: Citation Degree centr.
10: Citation Degree centr.
17: Citation PageRank
13: Cite Probability
15: Citation In.degree centr.
14: Citation In.degree centr.
12: Scimago Total Cites
11: Scimago H.index
30: Usage Out.degree centr.
29: Usage In.degree centr.
27: Usage PageRank
28: Usage PageRank
26: Usage Degree centr.
25: Usage Closen. centr.
34: Usage Betw. centr.
33: Usage Betw. centr.
24: Usage Closen. centr.
35: Usage Degree centr.
37: Usage In.degree centr.
36: Usage Out.degree
32: Usage PageRank
31: Usage PageRank
38: Journal Use Probability
2.5
Cluster
1
2
3
4
5
2.0
1.5
1.0
0.5
Measures
38
24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37
1, 2, 3, 4, 5
6, 7, 8, 9, 10, 11, 12, 13, 14, 15
16, 17, 18, 19, 20, 21, 22
Johan Bollen - jbollen@indiana.edu
0.0
Interpretation
Journal Use Probability
Usage measures
JIF, SJR, Cites per Document measures
Total Citation rates and distributions
Citation Betweenness and PageRank
Tracking science in real-time from large-scale usage data
Introduction MESUR Mapping Science Metrics Survey Discussion
Overview Future Research Relevant papers
Outline
1 Introduction
Problem statement
Usage data
2 MESUR
MESUR overview
Creating MESUR’s reference data set
3 Mapping Science
4 Metrics Survey
5 Discussion
Overview
Future Research
Relevant papers
Johan Bollen - jbollen@indiana.edu
Tracking science in real-time from large-scale usage data
Introduction MESUR Mapping Science Metrics Survey Discussion
Overview Future Research Relevant papers
MESUR: so far
Usage data: single largest reference data set of usage, citation
and bibliographic data
+1,000,000,000 usage events loaded
multiple publishers, aggregators and institutions
Infrastructure for research program
Usage graphs: track scientific flow of activity
real-time studies of science
inclusive of larger “scholarly community”
Metrics: valid, vetted indicators of scientific impact
Different facets of scholarly impact
Simple metrics, good results. Law of diminishing
returns?
Hybrid, consensus metrics?
Johan Bollen - jbollen@indiana.edu
Tracking science in real-time from large-scale usage data
Introduction MESUR Mapping Science Metrics Survey Discussion
Overview Future Research Relevant papers
MESUR: the bad
Sustainability: burden of data collection
ad hoc, customized agreements with data
providers
restrictive agreements with regards to data
sharing
high costs of maintaining infrastructure: funding
Research program: too much to do
3rd party access: better science
many eyes on data
Metrics and services: Community support and acceptance
Can’t be limited to academic exercise
Investigations need to be useful
Accepted by community, become part of
scholarly assessment system
Johan Bollen - jbollen@indiana.edu
Tracking science in real-time from large-scale usage data
Introduction MESUR Mapping Science Metrics Survey Discussion
Overview Future Research Relevant papers
Future Research: Two directions
Longitudinal research and dynamic models of science
Bursty behavior of science
Timeseries of reads over time are bursty
Coordinated: social networks at work?
Modeling
Stochastic and agent-based models of scientific activity
Citation following? Group-think? Find parameters of human
information searching behavior.
Johan Bollen - jbollen@indiana.edu
Tracking science in real-time from large-scale usage data
Introduction MESUR Mapping Science Metrics Survey Discussion
Overview Future Research Relevant papers
Bursty behavior
j[, 2]
100
300
500
700
APPLIED STATISTICS: time series
0
50
100
150
200
250
300
350
250
300
350
250
300
350
Index
400
200
0
−200
j[, 2] − m$y
APPLIED STATISTICS: residuals percentiles
0
50
100
150
200
Index
200
0
−200
mr$y
400
APPLIED STATISTICS: residuals smoothed
0
50
100
150
200
Index
Johan Bollen - jbollen@indiana.edu
Tracking science in real-time from large-scale usage data
Introduction MESUR Mapping Science Metrics Survey Discussion
Overview Future Research Relevant papers
Contagion?
JOURNAL OF HIGH ENERGY PHYSICS
AMERICAN JOURNAL OF PHYSICS
EUROPHYSICS LETTERS
BIOCHIMICA ET BIOPHYSICA ACTA (BBA) − MOLECULAR BASI
ANNALS OF PHYSICS
ACTA CRYSTALLOGRAPHICA SECTION A CRYSTAL PHYSICS D
PHYSICS AND CHEMISTRY OF THE EARTH PARTS A/B/C
ASTROPHYSICS AND SPACE SCIENCE
BIOCHIMICA ET BIOPHYSICA ACTA (BBA) − BIOMEMBRANES
PHYSICS TODAY
FOUNDATIONS OF PHYSICS LETTERS
EUROPEAN JOURNAL OF PHYSICS
NATURE PHYSICS
SOLAR PHYSICS
PHYSICS AND CHEMISTRY OF LIQUIDS
ATMOSPHERIC CHEMISTRY AND PHYSICS
MODERN PHYSICS LETTERS A
PHYSICA E
PHYSICA B
PHYSICS OF THE EARTH AND PLANETARY INTERIORS
INTERNATIONAL JOURNAL OF MODERN PHYSICS A
QUARTERLY REVIEWS OF BIOPHYSICS
ASTRONOMY & GEOPHYSICS
PHYSICS OF ATOMIC NUCLEI
JAPANESE JOURNAL OF APPLIED PHYSICS
BIOCHIMICA ET BIOPHYSICA ACTA (BBA) − BIOENERGETICS
JOURNAL OF PHYSICS G NUCLEAR AND PARTICLE PHYSICS
PHYSICS LETTERS A
TECTONOPHYSICS
PHYSICS OF FLUIDS
JOURNAL OF PHYSICS A GENERAL PHYSICS
ASTRONOMY & ASTROPHYSICS
CZECHOSLOVAK JOURNAL OF PHYSICS
APPLIED PHYSICS B
JOURNAL OF THE MECHANICS AND PHYSICS OF SOLIDS
MATERIALS CHEMISTRY AND PHYSICS
PHYSICS OF PLASMAS
NEW JOURNAL OF PHYSICS
FOUNDATIONS OF PHYSICS
JOURNAL OF STATISTICAL PHYSICS
INTERNATIONAL JOURNAL OF THEORETICAL PHYSICS
PHYSICA C
NUCLEAR PHYSICS B
MEDICAL PHYSICS
NUCLEAR PHYSICS A
PHYSICA A
CURRENT APPLIED PHYSICS
JOURNAL OF PHYSICS D APPLIED PHYSICS
RADIATION PHYSICS AND CHEMISTRY
CHINESE PHYSICS LETTERS
APPLIED PHYSICS
PHYSICA D
CHEMICAL PHYSICS LETTERS
CANADIAN JOURNAL OF PHYSICS
PHYSICA SCRIPTA
INTERNATIONAL JOURNAL OF THERMOPHYSICS
NUCLEAR INSTRUMENTS & METHODS IN PHYSICS RESEARCH
PHYSICA STATUS SOLIDI (B)
CHEMICAL PHYSICS
NUCLEAR INSTRUMENTS & METHODS IN PHYSICS RESEARCH
CONTEMPORARY PHYSICS
PHYSICS LETTERS B
CELL BIOCHEMISTRY AND BIOPHYSICS
CENTRAL EUROPEAN JOURNAL OF PHYSICS
JOURNAL OF CHEMICAL PHYSICS
JOURNAL OF COMPUTATIONAL PHYSICS
JOURNAL OF PHYSICS B ATOMIC MOLECULAR AND OPTICAL
ADVANCES IN PHYSICS
MOLECULAR PHYSICS
PERCEPTION & PSYCHOPHYSICS
0
7
14
21
28
35
42
49
56
63
70
77
84
91
98
105
112
119
126
133
140
147
154
161
Johan Bollen - jbollen@indiana.edu
168
175
182
189
196
203
210
217
224
231
238
245
252
259
266
273
280
287
294
301
308
315
322
329
336
Tracking science in real-time from large-scale usage data
Introduction MESUR Mapping Science Metrics Survey Discussion
Overview Future Research Relevant papers
Contagion?
63/37849
/.7,53053A83:9063/37849
63/3784041,/9328/6
>+.5-.4163/378490./;063/1-849
41--,/87@063/37849
8/735/.781/.20=1,5/.201<08--,/163/37849
E153./0=1,5/.201<063/37849
63/3784908/0-3;848/3
.;A./43908/063/37849
=1,5/.201<063/37849
4,553/7063/37849
63/37849093234781/03A12,781/
.-3584./0=1,5/.201<0-3;84.2063/378490>.5704093-8/.
-,7.781/05393.54+0!063/3784071D841216@0./;03/A851
.//.2390;3063/378C,3
63/378403>8;3-81216@
63/378403/68/3358/60/3:9
.-3584./0=1,5/.201<0-3;84.2063/378490>.570?0/3,51
3,51>3./0=1,5/.201<0+,-./063/37849
-1234,2.50./;063/35.2063/37849
3,51>3./0=1,5/.201<0-3;84.2063/37849
4,553/701>8/81/08/063/378490B0;3A321>-3/7
?-4063/37849
63/378490./;0-1234,2.50?81216@
>+.5-.4163/37849
63/3784073978/6
>9@4+8.7584063/37849
/3,5163/37849
63/378409148.20./;063/35.20>9@4+1216@0-1/165.>+
4@7163/37840./;063/1-305393.54+
=1,5/.201<0/3,5163/37849
63/3784.205393.54+
=1,5/.201<063/37840>9@4+1216@
63/390B063/378409@973-9
=1,5/.201<0+,-./063/37849
.//,.2053A83:01<063/1-8490./;0+,-./063/37849
+,-./063/37849
5,998./0=1,5/.201<063/37849
.-3584./0=1,5/.201<0-3;84.2063/378490>.570.
8--,/163/37849
753/;908/063/37849
.//,.2053A83:01<063/37849
>219063/37849
4./435063/378490./;04@7163/37849
=1,5/.201<0-3;84.2063/37849
?814+3-84.2063/37849
.-3584./0=1,5/.201<0+,-./063/37849
3,51>3./0=1,5/.201<08--,/163/37849
/3:063/378490B0914837@
982A.3063/3784.
63/3784.
/.7,53063/37849
41/935A.781/063/37849
?3+.A815063/37849
-1234,2.5063/378490./;0-37.?1289<,/6.2063/378490./;0?81216@
-1234,2.50>+@2163/378490./;03A12,781/
=1,5/.201<0./8-.20?533;8/60./;063/37849
=1,5/.201<0.998973;053>51;,4781/0./;063/37849
.//.2901<0+,-./063/37849
428/84.2063/37849
./8-.2063/37849
7:8/05393.54+0./;0+,-./063/37849
+,-./0-1234,2.5063/37849
!
"
#$
%#
%&
'(
$%
$)
(*
*'
"!
""
&$
)#
)&
#!(
##%
##)
#%*
#''
#$!
#$"
#($
#*#
Johan Bollen - jbollen@indiana.edu
#*&
#"(
#&%
#&)
#)*
%!'
%#!
%#"
%%$
%'#
%'&
%$(
%(%
%()
%**
%"'
%&!
%&"
%)$
'!#
'!&
'#(
'%%
'%)
''*
Tracking science in real-time from large-scale usage data
Introduction MESUR Mapping Science Metrics Survey Discussion
Overview Future Research Relevant papers
Relevant papers
Johan Bollen, Herbert Van de Sompel, Aric Hagberg and Ryan
Chute
A Principal Component Analysis of 39 Scientific Impact
Measures.
PLoS ONE, June 2009. URL:
http://dx.plos.org/10.1371/journal.pone.0006022.
Michael Kurtz and Johan Bollen.
Usage bibliometrics.
Annual Review of Information Science and Technology, 2010
Johan Bollen, Herbert Van de Sompel, Aric Hagberg,Luis
Bettencourt, Ryan Chute, Marko A. Rodriguez, Lyudmila
Balakireva.
Clickstream data yields high-resolution maps of science.
PLoS One, February 2009.
Johan Bollen - jbollen@indiana.edu
Tracking science in real-time from large-scale usage data
Download