Introduction MESUR Mapping Science Metrics Survey Discussion Tracking science in real-time from large-scale usage data Johan Bollen - jbollen@indiana.edu Indiana University School of Informatics and Computing Center for Complex Networks and Systems Research Cognitive Science Program July 29, 2010 Johan Bollen - jbollen@indiana.edu Tracking science in real-time from large-scale usage data Introduction MESUR Mapping Science Metrics Survey Discussion Problem statement Usage data Outline 1 Introduction Problem statement Usage data 2 MESUR MESUR overview Creating MESUR’s reference data set 3 Mapping Science 4 Metrics Survey 5 Discussion Overview Future Research Relevant papers Johan Bollen - jbollen@indiana.edu Tracking science in real-time from large-scale usage data Introduction MESUR Mapping Science Metrics Survey Discussion Problem statement Usage data Why study science? Tremendous importance Enormous amount of resources and people involved: Allocation of resources: funding agencies, policy makers, the public, corporations, the scientists themselves Social systems research: Ability to learn about general properties of similar social systems Unrelated picture of my daughter Johan Bollen - jbollen@indiana.edu Tracking science in real-time from large-scale usage data Introduction MESUR Mapping Science Metrics Survey Discussion Problem statement Usage data Science as a social system Actors: scientists, stakeholders Relations (often focused on exchange of knowledge) Informal interactions: meetings, workshops, conversations, messages, etc Formal: affiliations, collaborations, projects, publications Artifacts: articles, journals, reports, data, software Johan Bollen - jbollen@indiana.edu Xiaoming Liu, Johan Bollen, Michael L. Nelson, and Herbert Van de Sompel. Co-authorship networks in the digital library research community. Information Processing and Management, 41(6):14621480, December 2005 Tracking science in real-time from large-scale usage data Introduction MESUR Mapping Science Metrics Survey Discussion Problem statement Usage data The measure of science in the print-era A word on the traditional way Gold standard: citation data Extracted from “publication” data. Johan Bollen - jbollen@indiana.edu Tracking science in real-time from large-scale usage data Introduction MESUR Mapping Science Metrics Survey Discussion Problem statement Usage data The measure of science in the print-era A word on the traditional way Gold standard: citation data Extracted from “publication” data. From print comes citation: Science is a gift-economy Steal text, but not ideas. Currency is acknowledgement of influence: citations More citations, more influence Track scientific activity from citation data Johan Bollen - jbollen@indiana.edu Tracking science in real-time from large-scale usage data Introduction MESUR Mapping Science Metrics Survey Discussion Problem statement Usage data The measure of science in the print-era A word on the traditional way Gold standard: citation data Extracted from “publication” data. From print comes citation: Science is a gift-economy Steal text, but not ideas. Currency is acknowledgement of influence: citations More citations, more influence Track scientific activity from citation data Citations are derived from peer-reviewed publications Johan Bollen - jbollen@indiana.edu Tracking science in real-time from large-scale usage data Introduction MESUR Mapping Science Metrics Survey Discussion Problem statement Usage data From this springs an entire industry Applications of citation data End-user services and bibliometrics Web of Science, Scopus, publishers, institutions, etc. Johan Bollen - jbollen@indiana.edu Tracking science in real-time from large-scale usage data Introduction MESUR Mapping Science Metrics Survey Discussion Problem statement Usage data From this springs an entire industry Applications of citation data End-user services and bibliometrics Web of Science, Scopus, publishers, institutions, etc. Citation-based services Reference linking Variety of end-user services Johan Bollen - jbollen@indiana.edu Tracking science in real-time from large-scale usage data Introduction MESUR Mapping Science Metrics Survey Discussion Problem statement Usage data From this springs an entire industry Applications of citation data End-user services and bibliometrics Web of Science, Scopus, publishers, institutions, etc. Citation-based services Reference linking Variety of end-user services Citation-based assessment and bibliometrics Impact metrics Analytics Mapping Johan Bollen - jbollen@indiana.edu Tracking science in real-time from large-scale usage data Introduction MESUR Mapping Science Metrics Survey Discussion Problem statement Usage data Two problems with present citation-based approach Data (1) and metrics (2) Issues: 1 Inherent features of citation data due to its origins in published material 2 Existing metrics fail to acknowledge (1), and ignore network properties of science as a social system. Johan Bollen - jbollen@indiana.edu Tracking science in real-time from large-scale usage data Introduction MESUR Mapping Science Metrics Survey Discussion Problem statement Usage data Citation data Delays and partiality Citation problem 1: Data Citation data is late and partial indicator of scientific activity time 0 year +0.5 year +1.5 years +2 years +2.5 years +3.5 years discovery research publication discovery research publication ? citation citation data Publication delays Domain-dependent practices Publicationt → publicationt−1 → citation DB Community: Publishing authors only Johan Bollen - jbollen@indiana.edu Tracking science in real-time from large-scale usage data Introduction MESUR Mapping Science Metrics Survey Discussion Problem statement Usage data Citation data Metrics vs. networks Citation problem 2: networks Ignoring network properties of science Resource Management +%*.%73$"%$,/ /*-%0%73$"%*86*+%$/3 *-3.%)'"*( 7,&'$+%'/,4%.*-*1 /*-%0%7,&'$+%&'$ $,34%$/3%$,/%*.%0 (*2,$'%:,-'%0 bad X 0%*.%*/*2%2'&.*+,4 *&/"%2'&.*+,4 '/,4,1# */+*%",&+3/ !4*-+%*-2%$,34 !4*-+%/'44 z X Food .'*+%$/3 *.%-*+ *.%0%/43-%-6+& /&,!%$/3 )&3+%0%2'&.*+,4 43.-,4%,/'*-,1& /43-%-'6&,!"#$3,4 0%';!%.*&%)3,4%'/,4 !4*-+%0 0%';!%),+ !4*-+* .*&%)3,4 1*$+&,'-+'&,4,1# Livestock Dermatology "#2&,)3,4,13* ,'/,4,13* ,39,$ ",&+$/3'-/' 0%*.%$,/%",&+3/%$/3 0%"#2&,4 5*+'&%&'$,6&%&'$ "#2&,4%!&,/'$$ '(,46+3,'4'/+&,'-%/43-%-'6&, 0%*/,6$+%$,/%*. )&3+%0%,!"+"*4.,4 .*&%'/,4!!&,1%$'& .6+*+%&'$ .,4%'/,4 0%*-3.%$/3 *-3.%&'!&,2%$/3 &'(%)&*$%:,,+'/0%&'!&,2%7'&+34 0%2*3&#%$/3 +"'&3,1'-,4,1# !"#+,/"'.3$+&# '6&%0%*!!4%!"#$3,4 '6&%&'$!3&%0 *.%&'(%&'$!3&%23$ 23*)'+'$ )'"*(%)&*3-%&'$ !$#/",!"*&.*/,4,1# !"*&.*/,4%)3,/"'.%)' 0%*1&%7,,2%/"'. )3,4%&'!&,2 !,64+&#%$/3 *&/"%,!"+"*4.,4!/"3/ ,!"+"*4.,4,1# .'2%$/3%$!,&+%';'& *.%0%,!"+"*4.,4 *$3*-%*6$+&*4%0%*-3. '6&%0%!"*&.*/,4 0%*!!4%!"#$3,4 /3&/%&'$ $"'4;$=>%$"'4;4=> $*2*)$ Medicine *.%0%&'$!%/&3+%/*&' 0%+",&*/%/*&23,(%$6& *--%+",&*/%$6&1 /"'$+ 0%!"*&.*/,4%';!%+"'& *.%0%!"#$3,4!"'*&+%/ 3-('$+%,!"+"%(3$%$/3 932-'#%3-+ /3&/64*+3,1'-'+3/$ +"&,.)%"*'.,$+*$3$ *&/"%(3&,4 .,4%)3,4%'(,4 )&*3-%&'$ 3-+%0%/*&23,4 *.%"'*&+%0 0%/43-%3-('$+ *&+'&3,$/4%+"&,.%(*$ 0%*.%/,44%/*&23,4 0%)3,4%/"'. 0%1'-%(3&,4 0%-*+%!&,26/+$ *.%0%/*&23,4 4*-/'+ '6&%"'*&+%0 -'5%'-14%0%.'2 0*.*!0%*.%.'2%*$$,/ '-2,/&3-,4,1# 0%/43-%'-2,/&%.'+*) *-*4%)3,/"'. */+*%/&#$+*44,1&%2 -'6&,$/3%4'++ 0%3..6-,4 */+*%/&#$+*44,1&%' !'23*+&3/$ 0%-'6&,!"#$3,4 0%*!!4%/&#$+*44,1&*&/"%)3,/"'.%)3,!"#$ (3&,4,1# 0%(3&,40%!"#$3,4!4,-2,*/+*%/&#$+*44,1&%* /,/"&*-'%2)%$#$+%&'( !&,+'3-%$/3 '6&%0%)3,/"'. 0%';!%.'2 (*//3-' .3/&,)3,4!69 !%-*+4%*/*2%$/3%6$* 0%-'6&,$/3 0%/"'.%$,/%!'&9%+%? /43-%3-7'/+%23$ 0%-'6&,/"'. 0%3-7'/+%23$ 0%/'44%)3,4 *-*4%/"'. -'6&,$/3'-/' .'+",2%'-:#.,4 )3,/"'.%)3,!"%&'$%/, 1'-'%2'( ';!%)&*3-%&'$ )3,,&1*-%.'2%/"'. .,4%/'44%)3,4 "'4(%/"3.%*/+* )4,,2 !&,+'3-$';!%/'44%&'$ )3,,&1%.'2%/"'.%4'++ 0%/,.!%-'6&,4 3-7'/+%3..6'.),%0 )3,/"'.%0 /'44'6&%0%-'6&,$/3 -'6&,.,4%)3,4%/'44 7')$%4'++ 0%.'2%/"'. +'+&*"'2&,-!*$#..'+& *-+3.3/&,)%*1'-+$%/" 0%.,4%)3,47')$%0 0%,&1%/"'. /'44%.,4%437'%$/3 0%/'44%$/3 $#-4'++ .,4%.3/&,)3,4 +'+&*"'2&,+'+&*"'2&,-%4'++ )3,/"'.3$+&# ,-/,1'-' 7'&+34%$+'&34 $#-+"'+3/%/,..6,&1%)3,.,4%/"'. "6.%&'!&,2 $#-+"'$3$!$+6++1*&+ '6&%0%3..6-,4 -6/4'3/%*/32$%&'$ /*-%0%/"'. 0%/43-%.3/&,)3,4 3..6-,4%&'( '6&%0%,&1%/"'. 1'-' /"'.%)'& )3,/"3.%)3,!"#$%*/+* ,&1%4'++ /43-%/*-/'&%&'$ 0%/43-%,-/,4 /*-/'&%&'$ :%*-,&1%*441%/"'. 2'('4,!.'-+ 2'(%)3,4 0%)*/+'&3,4 0%,&1*-,.'+%/"'. 0%-*+4%/*-/'&%3 /"'.%/,..6)3,3-7,&.*+3/$ /,,&23-%/"'.%&'( '6&%0%3-,&1%/"'. *-1'5%/"'.%3-+%'23+ $/3'-/' )%/"'.%$,/%0!3-,&1%/"3.%*/+* 3-,&1%/"'. )3,!"#$%0 !,4#"'2&,(3$3,-%&'$ 2'(%2#-*. ,&1*-,.'+*443/$ 0%*.%/"'.%$,/ 3-+%0%&*23*+%,-/,4 *-+3/*-/'&%&'$ -*+6&' 0%.,4%/*+*4%*!/"'. /"'.!'6&%0 2*4+,-%+ /"'.%&'( $+62%$6&7%$/3%/*+*4 7'.$%.3/&,)3,4%4'++ *!!4%'-(3&,-%.3/&,) 0%'4'/+&,*-*4%/"'. /*-/'& *.%0%,)$+'+%1#-'/,4 .3/&,!,&%.'$,!,&%.*+ *!!4%/*+*4%*!1'0%!",+,/"%!",+,)3,%* 0%/*+*40%.,4%$+&6/+ 1#-'/,4%,-/,4 0%!,4#.%$/3%!,4%/"'. !'&/'!+%!$#/",!"#$ ,)$+'+%1#-'/,4 .*/&,.,4'/64'$ 4*-1.63& $!'/+&,/"3.%*/+*%* 0%/,44,32%3-+'&7%$/3 '4'/+&,/"3.%*/+* 0%!"#$%/"'.!6$ /*+*4%+,2*# 0%!"#$%/"'.%) !"#$%/"'.%/"'.%!"#$ .*/&,.,4%/"'.%!"#$ 0%/,.!6+%/"'. .,4%!"#$ /"'.%.*+'& /"'.,$!"'&' 0%*!!4%!,4#.%$/3 /,44,32%$6&7*/'%* '-(3&,-%$/3%+'/"-,4 /"'.%1',4 0%!"#$%/"'.%* 0%!,4#.%$/3%!,4%!"#$ 5*+'&%&'$ 3-2%'-1%/"'.%&'$ !,4#.'& 0%/"'.%!"#$ *2(%.*+'& /"'.%!"#$%4'++ /"'.%!"#$ +"',/"'.!0%.,4%$+&6/ 1',/"3.%/,$.,/"3.%*/ 0%'4'/+&,/"'.%$,/ 1',/"'.%1',!"#%1',$# 0%.,4%$!'/+&,$/ 74632%!"*$'%'86343)& 5*+'&%'-(3&,-%&'$ '*&+"%*-2%!4*-'+*&#%$/3'-/'%4'++'&$ 1',!"#$%&'$%4'++ 3-+%0%86*-+6.%/"'. 43+",$ $6&7%$/3 6-!6) 0%,!+%$,/%*.%* /,-+&3)%.3-'&*4%!'+& *3/"'%0 0%!'+&,4 *.%.3-'&*4 3/*&6$ *+.,$%'-(3&,0%-,-!/&#$+%$,432$ 0%(,4/*-,4%1',+"%&'$ /,..6-3/*+3,0%.*+'&%$/3 /"'.%'-1%$/3 0%'6&%/'&*.%$,/ !"#$%&'(%' +'/+,-,!"#$3/$ 0%!"#$!/,-2'-$%.*+ 1',4,1# 0%!"#$%)!*+%.,4%,!+ 1',!"#$%0%3-+ 0%*!!4%!"#$ 0%!"#$%/%$,432%$+*+' 9'#%'-1%.*+'& 0%*+.,$%$/3 .,-%-,+%&%*$+&,-%$,/ *--%1',!"#$!*+.%"#2& *!!4%$6&7%$/3 !"#$%&'(%* 0%!"#$%,/'*-,1& 0%*.%/'&*.%$,/ *$+&,-%*$+&,!"#$ !4*-'+%$!*/'%$/3 .,-%5'*+"'&%&'( 0%1',!"#$%&'$ +"3-%$,432%734.$ *3!%/,-7%! !"#$3/*%* *!!4%!"#$%4'++ !"#$%&'(%) &'(%.,2%!"#$ .*+%$/3%'-1%*!$+&6/+ 0%!"#$%2%*!!4%!"#$ .*+'&%$/3%7,&6. !"#$3/*4%&'(3'5 0%-'6&,$6&1 */+*%.*+'& 0%/43.*+' '6&,!"#$%4'++ 8%0%&,#%.'+',&%$,/ $6&1%-'6&,4 0*!*-'$'%0,6&-*4%,7%*!!43'2%!"#$3/$ *!!4%,!+3/$ !"#$%4'++%* $,432%$+*+'%/,..6$!*/'%$/3%&'( 0%/&#$+%1&,5+" 0%$+&6/+%1',4 3-06&# 0%';!%+"',&%!"#$<!%$,/%!",+,!,!+%3-$ *$+&,!"#$%$!*/'%$/3 )%$'3$.,4%$,/%*. !"#$3/*%) .'+*44%.*+'&%+&*-$%* -'6&,$6&1'&# *--6%&'(%*$+&,-%*$+& !6)4%*$+&,-%$,/%!*/ */+*%.'+*44%.*+'& ,!+%/,..60%.*1-%.*1-%.*+'& 0%*+.,$%$,4!+'&&%!"# 0%*&+"&,!4*$+# !"#$%&'(%4'++ ,!+%';!&'$$ *$+&,-%0 -6/4%3-$+&6.%.'+"%) *$+&%$,/%! !"#$%&'! 0%*44,#%/,.!2 0%!"#$%*!.*+"%1'*$+&,!"#$%0%$6!!4%$ '6&%!"#$%0%) 0%),-'%0,3-+%$6&1%)& ,!+%4'++ /43-%,&+",!%&'4*+%& *--%!"#$!-'5%#,&9 0%),-'%0,3-+%$6&1%*. 0%74632%.'/" *&+"&,$/,!# *$+&,!"#$%0$,4%!"#$ 0%!"#$%/"'.%$,432$ ,!+%'-1 !"#$%74632$ $!3-' -6/4%!"#$%* 0%!'23*+&%,&+",!'2 3-+%0%.,2%!"#$%) 0%!'23*+&%,&+",!%) 3'''%0%86*-+6.%'4'/+ !"#$%$+*+6$%$,4323%) 0%.*+"%!"#$ '6&%!"#$%0%* !"#$%&'(%2 3-+%0%.,2%!"#$%* *.%0%$!,&+%.'2 &'.,+'%$'-$%'-(3&,!"#$%!4*$.*$ !"#$3/*%/ !"#$%&'(%/ !"#$%4'++%) -6/4%!"#$%) 0%431"+5*('%+'/"-,4 /4*$$3/*4%*-2%86*-+6.%1&*(3+# 0%!"#$%1%-6/4%!*&+3/ '4'/+&,-%4'++ -6/4%!"#$%)!!&,/%$6! !4*$.*%!"#$%/,-+&%7 -6/4%2*+*%$"''+$ .,2%!"#$%4'++%* /,..6-%.*+"%!"#$ '6&%!"#$%0%/ 0%/"&,.*+,1&%* 0%*!!4%!,64+&#%&'$ 0%$!''/"%4*-1%"'*&%& 0%$,6-2%(3) 23*)'+'$%/*&' !4*-+%!"#$3,4 0%7,,2%$/3 0%$/3%7,,2%*1& 7,,2%/"'. Measuring impact, influence, prestige from citation data: +*4*-+* *-*4%)3,*-*4%/"'. *-*4%/"3.%*/+* good XX z X '4'/+&,*-*4 Chemistry ObGyn Material Sciences Geology Sports Medicine Physics 0%"31"%'-'&1#%!"#$ !"#$%*+,.%-6/4< :%!"#$%/%!*&+%73'42$ 0%/,$.,4%*$+&,!*&+%! &*23,4,1# 3-+%0%&'.,+'%$'-$ 3'''%+%3-7,&.%+"',&# -6/4%3-$+&6.%.'+"%* *.%0%&,'-+1'-,4 -6/4%76$3,3-+%0%"'*+%.*$$%+&*- 4'/+%-,+'$%/,.!6+%$/ More citations is better than less citations In citation network, central position is better than outskirts. $3*.%0%/,.!6+ /,..6-%*/. +"',&%/,.!6+%$/3 0%*$$,/%/,.!6+%.*/" Computer Science Johan Bollen - jbollen@indiana.edu Tracking science in real-time from large-scale usage data Introduction MESUR Mapping Science Metrics Survey Discussion Problem statement Usage data Citation problem no. 2 Metrics vs. networks This approach is epitomized in Thomson-Reuters Impact Factor: Johan Bollen - jbollen@indiana.edu Tracking science in real-time from large-scale usage data Introduction MESUR Mapping Science Metrics Survey Discussion Problem statement Usage data Citation problem no. 2 Metrics vs. networks This approach is epitomized in Thomson-Reuters Impact Factor: Johan Bollen - jbollen@indiana.edu Tracking science in real-time from large-scale usage data Introduction MESUR Mapping Science Metrics Survey Discussion Problem statement Usage data Citation problem no. 2 Metrics vs. networks This approach is epitomized in Thomson-Reuters Impact Factor: Standard citation graph representation G = (V , E , W ) E ⊆ V2 W : E → N+ However: ∀(ni , nj ) ∈ E : pubtime(ni ) > pubtime(ni ) Impact Factor: normalized in-degree IFj = P i wij Nj Johan Bollen - jbollen@indiana.edu Tracking science in real-time from large-scale usage data Introduction MESUR Mapping Science Metrics Survey Discussion Problem statement Usage data Citation problem no. 2 Between Big Star and Britney Spears Is more always better? Why context matters Johan Bollen - jbollen@indiana.edu Tracking science in real-time from large-scale usage data Introduction MESUR Mapping Science Metrics Survey Discussion Problem statement Usage data Citation problem no. 2 Between Big Star and Britney Spears Is more always better? Why context matters Sold 50M records Influenced who? High popularity, low prestige. Johan Bollen - jbollen@indiana.edu Tracking science in real-time from large-scale usage data Introduction MESUR Mapping Science Metrics Survey Discussion Problem statement Usage data Citation problem no. 2 Between Big Star and Britney Spears Is more always better? Why context matters Sold 50M records Influenced who? High popularity, low prestige. Johan Bollen - jbollen@indiana.edu Tracking science in real-time from large-scale usage data Introduction MESUR Mapping Science Metrics Survey Discussion Problem statement Usage data Citation problem no. 2 Between Big Star and Britney Spears Is more always better? Why context matters Sold 50M records Influenced who? High popularity, low prestige. Johan Bollen - jbollen@indiana.edu Sold 50k records Influenced REM, B52s, Teenage Fanclub, etc. Low popularity, high influence Tracking science in real-time from large-scale usage data Introduction MESUR Mapping Science Metrics Survey Discussion Problem statement Usage data Citation problem no. 2 Between Big Star and Britney Spears Silly? That’s how we evaluate scientific impact right now! The Impact Factor and lots of other citation-based metrics are presently being used to assess the quality of publications, journals, authors, institutions, and even entire countries by proxy. Note: Lots of developments on better metrics (cf. Eigenfactor: Bergstrom and Rosvall), but still reliance on citation data. Johan Bollen - jbollen@indiana.edu Tracking science in real-time from large-scale usage data Introduction MESUR Mapping Science Metrics Survey Discussion Problem statement Usage data New developments PageRank for citation graphs P 5 v1 50 v3 15 w Impact Factor: IFj = Ni j ij : normalized citation count origin of citation disregarded v2 ? 10 3 40 1 60 20 2 20 20 ? 4 A different route: P IF (vi , ) ' λ j IFj P 1 IF (vi , ) ' λ j IFj × O(v j) P 1 PR(vi ) ' λ j PR(vj ) × O(v j) P (1−λ) 1 PR(vi ) ' N + λ j PR(vj ) × O(v j) P PRw (vi ) = (1−λ) j PRw (vj ) × w (vj , vi ) N +λ Johan Bollen - jbollen@indiana.edu Tracking science in real-time from large-scale usage data Introduction MESUR Mapping Science Metrics Survey Discussion Problem statement Usage data New developments, contd PageRank for citation graphs ISI IF PRw rank value Journal value (x 103 ) Journal 1 52.28 ANNU REV IMMUNOL 16.78 NATURE 2 37.65 ANNU REV BIOCHEM 16.39 J BIOL CHEM 3 36.83 PHYSIOL REV 16.38 SCIENCE 4 35.04 NAT REV MOL CELL BIO 14.49 P NATL ACAD SCI USA 5 34.83 NEW ENGL J MED 8.41 PHYS REV LETT 6 30.98 NATURE 5.76 CELL 7 30.55 NAT MED 5.70 NEW ENGL J MED 8 29.78 SCIENCE 4.67 J AM CHEM SOC 9 28.18 NAT IMMUNOL 4.46 J IMMUNOL 10 28.17 REV MOD PHYS 4.28 APPL PHYS LETT Table: The highest ranking journals according to ISI IF and Weighted PageRank (JCR2003) Johan Bollen - jbollen@indiana.edu Tracking science in real-time from large-scale usage data Introduction MESUR Mapping Science Metrics Survey Discussion Problem statement Usage data Two problems with present citation-based approach Data (1) and metrics (2) Issues: 1 Inherent features of citation data due to its origins in published material 2 Existing metrics fail to acknowledge (1), and ignore network properties of science as a social system. Johan Bollen - jbollen@indiana.edu Tracking science in real-time from large-scale usage data Introduction MESUR Mapping Science Metrics Survey Discussion Problem statement Usage data Post-print, online era, aka known as “today”: usage data? Most use of scholarly resources is now mediated by online services time 0 year +0.5 year +1.5 years +2 years discovery research publication discovery +2.5 years +3.5 years research publication citation usage data search queries text data early indicators Recorded by nearly every scholarly service Captures early phases of scientific activity Includes a wide variety of resource types Johan Bollen - jbollen@indiana.edu citation data late indicators Activities of larger scientific communities, e.g. not limited only to authors Scale: cf. Elsevier announced 1B downloads in 2006 vs. 650M citation in WoS. Tracking science in real-time from large-scale usage data Introduction MESUR Mapping Science Metrics Survey Discussion Problem statement Usage data Issues with usage data Lots of interest in leveraging usage data for models of science, tracking scientific activity, metrics, etc. Cf. COUNTER, Ex Libris bX, and many more. However many different issues: Silo: recorded for particular service for a particular user community Lack of standards: recorded in different manner, for different systems Lack of research: how leverage usage data for scientific tracking, modeling, metrics, etc. Johan Bollen - jbollen@indiana.edu Tracking science in real-time from large-scale usage data Introduction MESUR Mapping Science Metrics Survey Discussion MESUR overview Creating MESUR’s reference data set Outline 1 Introduction Problem statement Usage data 2 MESUR MESUR overview Creating MESUR’s reference data set 3 Mapping Science 4 Metrics Survey 5 Discussion Overview Future Research Relevant papers Johan Bollen - jbollen@indiana.edu Tracking science in real-time from large-scale usage data Introduction MESUR Mapping Science Metrics Survey Discussion MESUR overview Creating MESUR’s reference data set MESUR project: survey the potential of usage data at very large-scale Studying scientific activity Johan Bollen - jbollen@indiana.edu Tracking science in real-time from large-scale usage data Introduction MESUR Mapping Science Metrics Survey Discussion MESUR overview Creating MESUR’s reference data set MESUR project: survey the potential of usage data at very large-scale Studying scientific activity Can we model and study patterns of scientific activity from large-scale, representative usage data? Johan Bollen - jbollen@indiana.edu Tracking science in real-time from large-scale usage data Introduction MESUR Mapping Science Metrics Survey Discussion MESUR overview Creating MESUR’s reference data set MESUR project: survey the potential of usage data at very large-scale Studying scientific activity Can we model and study patterns of scientific activity from large-scale, representative usage data? What can we learn about impact and structure from patterns of scientific activity from usage data? Johan Bollen - jbollen@indiana.edu Tracking science in real-time from large-scale usage data Introduction MESUR Mapping Science Metrics Survey Discussion MESUR overview Creating MESUR’s reference data set MESUR history Johan Bollen - jbollen@indiana.edu Tracking science in real-time from large-scale usage data Introduction MESUR Mapping Science Metrics Survey Discussion MESUR overview Creating MESUR’s reference data set MESUR history 2006-2008: Andrew W. Mellon Foundation Johan Bollen - jbollen@indiana.edu Tracking science in real-time from large-scale usage data Introduction MESUR Mapping Science Metrics Survey Discussion MESUR overview Creating MESUR’s reference data set MESUR history 2006-2008: Andrew W. Mellon Foundation 2008-2009: Los Alamos National Laboratory Johan Bollen - jbollen@indiana.edu Tracking science in real-time from large-scale usage data Introduction MESUR Mapping Science Metrics Survey Discussion MESUR overview Creating MESUR’s reference data set MESUR history 2006-2008: Andrew W. Mellon Foundation 2008-2009: Los Alamos National Laboratory 2009-: Indiana University, School of Informatics and Computing Johan Bollen - jbollen@indiana.edu Tracking science in real-time from large-scale usage data Introduction MESUR Mapping Science Metrics Survey Discussion MESUR overview Creating MESUR’s reference data set MESUR history 2006-2008: Andrew W. Mellon Foundation 2008-2009: Los Alamos National Laboratory 2009-: Indiana University, School of Informatics and Computing 2009-2013: National Science Foundation Johan Bollen - jbollen@indiana.edu Tracking science in real-time from large-scale usage data Introduction MESUR Mapping Science Metrics Survey Discussion MESUR overview Creating MESUR’s reference data set MESUR history 2006-2008: Andrew W. Mellon Foundation 2008-2009: Los Alamos National Laboratory 2009-: Indiana University, School of Informatics and Computing 2009-2013: National Science Foundation Team: PI, co-PI, 2 scientists, 2 full-time developers, and 1 PhD student. Johan Bollen - jbollen@indiana.edu Tracking science in real-time from large-scale usage data Introduction MESUR Mapping Science Metrics Survey Discussion MESUR overview Creating MESUR’s reference data set MESUR objective Johan Bollen - jbollen@indiana.edu Tracking science in real-time from large-scale usage data Introduction MESUR Mapping Science Metrics Survey Discussion MESUR overview Creating MESUR’s reference data set MESUR dataflow Data providers Filtering & deduplication impact metrics Publishers Usage data Parser 1 Aggregators Usage date Parser 2 institutions Usage date Parser 3 ... MESUR ontology ingestion Models MESUR reference data set Models ... Johan Bollen - jbollen@indiana.edu Tracking science in real-time from large-scale usage data Introduction MESUR Mapping Science Metrics Survey Discussion MESUR overview Creating MESUR’s reference data set MESUR ontology Aggregators Usage date Parser 2 ingestion Models 0.0e+00 Parser 1 number of events impact metrics Publishers 1.0e+07 Filtering & deduplication Usage data 5.0e+06 Data providers 1.5e+07 Creating MESUR’s data set institutions ... Usage date Parser 3 MESUR reference data set 2004 2005 Models 2006 2007 2008 2009 date ... Providers 2006-20010: BMC, Blackwell, UC, CSU (23), EBSCO, ELSEVIER, EMERALD, INGENTA, JSTOR, LANL, MIMAS/ZETOC, THOMSON, UPENN (9), UTEXAS events 1,000,000,000 usage events citations +500,000,000 citations, articles and serials +50M articles, +-100,000 serials Johan Bollen - jbollen@indiana.edu Tracking science in real-time from large-scale usage data Introduction MESUR Mapping Science Metrics Survey Discussion MESUR overview Creating MESUR’s reference data set Data requirements Minimum requirements Separate user requests Date-time stamp down to second Document identifier or sufficient metadata to de-duplicate Session ID (or anonymized user ID) Request type identifier http://tweety.lanl.gov/public/schemas/2007-01/mesur.owl Johan Bollen - jbollen@indiana.edu Tracking science in real-time from large-scale usage data Introduction MESUR Mapping Science Metrics Survey Discussion MESUR overview Creating MESUR’s reference data set MESUR’s OWL/RDF ontology 1 1 Rodriguez, Bollen & Van de Sompel. A Practical Ontology for the Large-Scale Modeling of Scholarly Artifacts and their Usage. JCDL07 - Based on OntologyX work. Johan Bollen - jbollen@indiana.edu Tracking science in real-time from large-scale usage data Introduction MESUR Mapping Science Metrics Survey Discussion Outline 1 Introduction Problem statement Usage data 2 MESUR MESUR overview Creating MESUR’s reference data set 3 Mapping Science 4 Metrics Survey 5 Discussion Overview Future Research Relevant papers Johan Bollen - jbollen@indiana.edu Tracking science in real-time from large-scale usage data Introduction MESUR Mapping Science Metrics Survey Discussion Reference data set Subsetting Domain Usage UC Degrees JCR Natural Science Social Sciences Humanities 37% 45% 14% 39% 46% 15% 92.8% 7.2% Source: http://www.ucop.edu/ucophome/uwnews/stat/statsum/fall2007/statsumm2007.pdf (table 9) Johan Bollen - jbollen@indiana.edu Tracking science in real-time from large-scale usage data Introduction MESUR Mapping Science Metrics Survey Discussion Reference data set Subsetting Common time period March 1st 2006 - February 1st 2007 Domain Usage UC Degrees JCR Natural Science Social Sciences Humanities 37% 45% 14% 39% 46% 15% 92.8% 7.2% Source: http://www.ucop.edu/ucophome/uwnews/stat/statsum/fall2007/statsumm2007.pdf (table 9) Johan Bollen - jbollen@indiana.edu Tracking science in real-time from large-scale usage data Introduction MESUR Mapping Science Metrics Survey Discussion Reference data set Subsetting Common time period March 1st 2006 - February 1st 2007 Providers: Thomson Scientic (Web of Science), Elsevier (Scopus), JSTOR, Ingenta, University of Texas (9 campuses, 6 health institutions), and California State University (23 campuses) Domain Usage UC Degrees JCR Natural Science Social Sciences Humanities 37% 45% 14% 39% 46% 15% 92.8% 7.2% Source: http://www.ucop.edu/ucophome/uwnews/stat/statsum/fall2007/statsumm2007.pdf (table 9) Johan Bollen - jbollen@indiana.edu Tracking science in real-time from large-scale usage data Introduction MESUR Mapping Science Metrics Survey Discussion Reference data set Subsetting Common time period March 1st 2006 - February 1st 2007 Providers: Thomson Scientic (Web of Science), Elsevier (Scopus), JSTOR, Ingenta, University of Texas (9 campuses, 6 health institutions), and California State University (23 campuses) Scale: 346,312,045 usage events, 97,532 serials (many of which not journals) Domain Usage UC Degrees JCR Natural Science Social Sciences Humanities 37% 45% 14% 39% 46% 15% 92.8% 7.2% Source: http://www.ucop.edu/ucophome/uwnews/stat/statsum/fall2007/statsumm2007.pdf (table 9) Johan Bollen - jbollen@indiana.edu Tracking science in real-time from large-scale usage data Introduction MESUR Mapping Science Metrics Survey Discussion Article clickstreams usage data log: U = {u1 , u2 , · · · , un } u = {s, t, a} s = session identifier , t = a date-time and a = article f ∈ F = clickstreams extracted from U f ⊂ U, where f = (∀u ∈ U, ∃s : s(u) ∧ t(ui ) < t(ui+1 )) s(u) and t(u): session identifier and date-time of interaction u. clickstream c1 v1 clickstream c2 v2 v3 v1 v3 journals a1 a2 a3 a4 a1 a4 articles i1 i2 i3 i4 i1 i2 online user interactions session1 session2 time Johan Bollen - jbollen@indiana.edu Tracking science in real-time from large-scale usage data Introduction MESUR Mapping Science Metrics Survey Discussion Journal Clickstreams article clickstream: fa = (a1 , a2 , · · · , ak ) journal clickstream: fv = (v1 , v2 , · · · , vk ) We observe: N(vi , vj ) for all pairs (vi , vj ) in which j = i + 1 From which follows: N(vi , vj ) P(vi , vj ) = P j N(vi , vj ) and M whose entries mi,j = P(vi , vj ). Johan Bollen - jbollen@indiana.edu v2 15 v1 5 v3 30 v4 v2 0.3 v1 0.1 v3 0.6 v4 Tracking science in real-time from large-scale usage data Introduction MESUR Mapping Science Metrics Survey Discussion Examples of prominent connections vi American Journal of International Law Journal of Educational Sociology Surface Science Journal of Organic Chemistry Ecological Applications Annals of Mathematics vj International Organization International Affairs International and Comparative Law Quarterly Foreign Policy American Political Science Association American Journal of Sociology Journal of Higher Education Journal of Negro Education American Sociological Review Social Forces Physical Review B Applied Surface Science Physical Review Letters Journal of Chemical Physics Applied Physics Letters Journal of the American Chemical Society Tetrahedron Letters Tetrahedron Organic Letters Angewandte Chemie Ecology Conservation Biology Bioscience Annual Review of Ecology and Systematics Clinical and Experimental Allergy American Journal of Mathematics American Mathematical Monthly PNAS Econometrica Mathematics Magazine Johan Bollen - jbollen@indiana.edu p(vi , vj ) 0.0207 0.0184 0.0171 0.0167 0.0140 0.0334 0.0303 0.0286 0.0276 0.0249 0.0704 0.0341 0.0339 0.0333 0.0327 0.0873 0.0865 0.0602 0.0532 0.0305 0.0965 0.0524 0.0215 0.0215 0.0191 0.0705 0.0579 0.0156 0.0082 0.0077 N(vi , vj ) 9,292 8,254 7,654 7,500 6,291 2,790 2,529 2,389 2,303 2,076 2,555 1,239 1,230 1,207 1,188 4,141 4,105 2,857 2,526 1,448 13,659 7,408 3,043 3,043 2,699 5,392 4,432 1,195 624 587 N(vi ) 448,034 83,419 36,282 47,439 141,481 76,526 Tracking science in real-time from large-scale usage data Introduction MESUR Mapping Science Metrics Survey Discussion Network parameters Parameter Journals Edges Matrix density Strongly Connected Components (SCC) Journals in SCC Average journal clustering coefficient (SCC) Diameter of largest SCC Johan Bollen - jbollen@indiana.edu Network matrix M M0 97,532 2,307 6,783,552 50,000 0.071% 0.939% 16,474 236 80,934 1,944 0.285 0.514 37 14 Tracking science in real-time from large-scale usage data Introduction MESUR Mapping Science Metrics Survey Discussion common date range all edges most reliable edges MESUR log data matrix M matrix M' 1B interactions 346M interactions 97k journals 6.7M edges 2307 journals 50k edges 36000 map 28000 Classification code: AAT taxonomy 20000 root 0 5000 12000 N(vi,vj) rankings 0e+00 3e+04 1e+05 Dewey JCR 2e+05 rank 1 2 3 4 5 Journals Fruchterman (1991) Graph Drawing by Force-directed Placement. Software, 21(11):1129-1164 Johan Bollen - jbollen@indiana.edu Tracking science in real-time from large-scale usage data Introduction MESUR Mapping Science Metrics Survey Discussion Minerology Acoustics Manufacturing Material science Engineering Production research Law Economics Clinical trials Thermodynamics Applied physics Clinical Pharmacology Demographics Statistical physics International studies Sociology Statistics Electrochemistry Physical chemistry Public health Polymers Dermatology Asian studies Organic chemistry Philosophy Social work Child Psychology Analytical Chemistry Education Alternative energy Religion Social and personality psychology Biochemistry Nursing Pharmaceutical research Anthropology Psychology Chemical Engineering Archeology Human geography Brain studies Cognitive Science Sports medicine Toxicology Miocrobiology Ecology Biotechnology Classical studies Nutrition Language Music Architecture Design Plant biology Biodiversity Geography Tourism Environmental Science Soil/Marine biology Animal Behavior Neurology Plant agriculture Hydrology Genetics Agriculture Geology Physiology Plant genetics Johan Bollen - jbollen@indiana.edu Brain research Tracking science in real-time from large-scale usage data Introduction MESUR Mapping Science Metrics Survey Discussion The Arts and Architecture Thesaurus (Getty Research Center) Table: Distance from AAT root (α) and number of classifications Nc at that level. Each α produces a finer-grained separation of scientific disciplines. Distance (α) 1 2 3 4 Nc 4 8 31 195 Example classifications Natural sciences, social sciences, humanities, · · · Biology, chemistry, physics, · · · Classics, communication, engineering, · · · Allergy, anesthesiology, applied linguistics, · · · Johan Bollen - jbollen@indiana.edu Tracking science in real-time from large-scale usage data Introduction MESUR Mapping Science Metrics Survey Discussion AAT taxonomy root Dewey JCR 1 Johan Bollen - jbollen@indiana.edu 2 3 4 5 Journals Tracking science in real-time from large-scale usage data Introduction MESUR Mapping Science Metrics Survey Discussion Cross-validating the clickstream map ( 1 f (vi , vj , α) = 0 AAT log data a0,0 f (vi , vj , α) ! Aα = ... an,0 ' P (vi , vj ) p0,0 & . ! M! = .. pn,0 ··· .. . ··· ··· .. . ··· Cα (vi ) = Cα (vj ) Cα (vi ) 6= Cα (vj ) a0,n .. , ai,j = 1 . ai,j = 0 an,n & F2 & p0,n " # .. , Ni,j ≤ µ0.5 (Nk )"F . Ni,j > µ0.5 (Nk )$ 1 $ % pn,n χ2 contingency table at α α = 1 : p < 0.0001 Figure the map structure given by M ! to journal relationships derived α 6.=Cross-validating 2 : p < 0.0001 from AAT journal classifications, i.e. matrix Aα . α = 3 : p < 0.0001 α = 4 : p < 0.0001 Johan Bollen - jbollen@indiana.edu Tracking science in real-time from large-scale usage data Introduction MESUR Mapping Science Metrics Survey Discussion Betweenness centrality Cb (vk ) = X σi,j (vk ) σi,j (1) i6=j6=k Table: Ranking of journals from M 0 according to betweenness centrality. Rank 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Journal Top-level AAT classification Science Proceedings of the National Academy of Sciences Environmental Health Perspectives Chemosphere Journal of Advanced Nursing Nature Ecology Milbank Quarterly Applied and Environmental Microbiology Child Development Behavioral Ecology and Sociobiology Journal of Colloid and Information Science American Anthropologist Journal of Biogeography Materials Science and Technology Johan Bollen - jbollen@indiana.edu Natural Sciences Natural Sciences Natural Science Natural Sciences Natural Sciences Natural Sciences Natural Sciences Natural Sciences Natural Sciences Social Sciences Social Sciences Natural Sciences Social Sciences Natural Sciences Natural Sciences Tracking science in real-time from large-scale usage data Introduction MESUR Mapping Science Metrics Survey Discussion PageRank PR(vi ) = X PR(vj ) 1−λ +λ N O(vj ) (2) j Table: Ranking of journals from M 0 according to PageRank (λ = 0.85). Rank 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Journal Top-level AAT classification Applied Physics Letters Journal of Advanced Nursing Journal of the American Chemical Society Ecology Nature Physical Review B Journal of Applied Physics American Economic Review American Historical Review Physical Review Letters Science Langmuir Journal of Chemical Physics American Anthropologist Annals of the American Academy of Political and Social Science Johan Bollen - jbollen@indiana.edu Natural Sciences Natural Sciences Natural Sciences Natural Sciences Natural Sciences Natural Sciences Natural Sciences Social Sciences Social Sciences Natural Sciences Natural Sciences Natural Sciences Natural Sciences Social Sciences Social Science Tracking science in real-time from large-scale usage data Introduction MESUR Mapping Science Metrics Survey Discussion Outline 1 Introduction Problem statement Usage data 2 MESUR MESUR overview Creating MESUR’s reference data set 3 Mapping Science 4 Metrics Survey 5 Discussion Overview Future Research Relevant papers Johan Bollen - jbollen@indiana.edu Tracking science in real-time from large-scale usage data Introduction MESUR Mapping Science Metrics Survey Discussion Metrics ID Type Measure Source 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 Citation Citation Citation Citaton Citation Citation Citation Citation Citation Citation Citation Citation Citation Citation Citation Citation Citation Citation Citation Citation Citation Citation Citation Usage Usage Usage Usage Usage Usage Usage Usage Usage Usage Usage Usage Usage Usage Usage Usage Scimago Journal Rank Immediacy Index Closeness Centrality Cites per doc Journal Impact Factor Closeness centrality Out-degree centrality Out-degree centrality Degree Centrality Degree Centrality H-Index Scimago Total cites Journal Cite Probability In-degree centrality In-degree centrality PageRank PageRank PageRank PageRank Y-factor Betweenness centrality Betweenness centrality Citation Half-Life Closeness centrality Closeness centrality Degree centrality PageRank PageRank In-degree centrality Out-degree centrality PageRank PageRank Betweenness centrality Betweenness centrality Degree centrality Out-degree centrality In-degree centrality Journal Use Probability Usage Impact Factor Scimago/Scopus JCR 2007 JCR 2007 Scimago/Scopus JCR 2007 JCR 2007 JCR 2007 JCR 2007 JCR 2007 JCR 2007 Scimago/Scopus Scimago/Scopus JCR 2007 JCR 2007 JCR 2007 JCR 2007 JCR 2007 JCR 2007 JCR 2007 JCR 2007 JCR 2007 JCR 2007 JCR 2007 MESUR 2007 MESUR 2007 MESUR 2007 MESUR 2007 MESUR 2007 MESUR 2007 MESUR 2007 MESUR 2007 MESUR 2007 MESUR 2007 MESUR 2007 MESUR 2007 MESUR 2007 MESUR 2007 MESUR 2007 MESUR 2007 Johan Bollen - jbollen@indiana.edu Network parameters Undirected, weighted Undirected, unweighted Directed, weighted Directed, unweighted Undirected, weighted Undirected, unweighted Directed, unweighted Directed, weighted Directed, unweighted Undirected, unweighted Undirected, weighted Directed, weighted Directed, weighted Undirected, weighted Undirected, unweighted Undirected, weighted Undirected, unweighted Undirected, unweighted Undirected, unweighted Directed, unweighted Directed, unweighted Directed, unweighted Directed, weighted Undirected, weighted Undirected, unweighted Undirected, weighted Undirected, weighted Directed, weighted Directed, weighted PC1 PC2 -0.974 1.659 0.339 -1.311 -1.854 -1.388 -3.191 -2.703 -4.850 -4.398 -3.326 -4.926 -5.389 -5.302 -5.380 -4.476 -4.929 -4.160 -3.103 -2.971 -0.462 -0.474 / 3.130 3.100 3.271 3.327 3.463 3.463 3.484 3.780 3.813 3.988 3.957 5.293 5.302 5.286 8.914 / -8.296 -7.046 -6.284 -6.192 -5.937 -4.827 -4.215 -4.015 -2.834 -2.643 -2.003 -1.722 -1.647 -1.429 -1.554 0.108 0.731 0.864 0.333 0.317 0.872 1.609 / 2.683 3.899 3.873 4.192 4.336 4.015 3.994 4.217 4.223 4.271 3.698 3.528 3.518 3.531 1.833 / ρ̄ 0.556? 0.508? 0.565? 0.588? 0.592? 0.619 0.642 0.640 0.690 0.691 0.681 0.712 0.710 0.717 0.712 0.693 0.726 0.696 0.659 0.657 0.643 0.642 0.037 0.703 0.731 0.729 0.728 0.727 0.728 0.727 0.710 0.710 0.699 0.693 0.683 0.683 0.683 0.593 0.279 Tracking science in real-time from large-scale usage data Introduction MESUR Mapping Science Metrics Survey Discussion 01.00 R10×10 = B0.71 B0.77 B B0.52 B B B0.79 B B0.55 B B0.69 B B0.63 @ 0.60 0.18 0.71 0.99 0.52 0.69 0.79 0.85 0.49 0.44 0.49 0.22 0.77 0.52 1.00 0.62 0.63 0.39 0.70 0.73 0.68 0.20 0.52 0.69 0.62 1.00 0.68 0.78 0.49 0.56 0.65 0.06 0.79 0.79 0.63 0.68 1.00 0.82 0.66 0.62 0.66 0.15 0.55 0.85 0.39 0.78 0.82 1.00 0.40 0.40 0.50 0.13 Johan Bollen - jbollen@indiana.edu 0.69 0.49 0.70 0.49 0.66 0.40 1.00 0.89 0.85 0.53 0.63 0.44 0.73 0.56 0.62 0.40 0.89 1.00 0.97 0.45 0.60 0.49 0.68 0.65 0.66 0.50 0.85 0.97 1.00 0.42 0.181 0.22C 0.20C C 0.06C C 0.15C C 0.13C C C 0.53C C 0.45C A 0.42 1.00 19: Citation PageRank 5: Journal Impact Factor 22: Citation Betweenness 6: Citation Closeness 11: Citation H-index 1: Citation Scimago Journal Rank 31: Usage PageRank 34: Usage Betweenness 24: Usage Closeness 39: Usage Impact Factor Tracking science in real-time from large-scale usage data Introduction MESUR Mapping Science Metrics Survey Discussion data sources citation data Scimago citation data 2007 JCR 12,751 journals 7,584 journals citation network 7,388 x 7,388 1,4,11,12 2,5,23 3,6,7,8,9,10,13,14 15,16,17,18,19,20 21,22 usage log data MESUR usage network 7,575 x 7,575 intersection 24,25,26,27,28,29,39 31,32,33,34,35,36,37, 38,39 impact measures 39x39 correlation matrix Schematic representation of data sources and processing. Impact measure identifiers. Johan Bollen - jbollen@indiana.edu Tracking science in real-time from large-scale usage data Introduction MESUR Mapping Science Metrics Survey Discussion Principal Component Analysis Proportion of Variance Cumulative Proportion PC1 66.1% 66.1% Johan Bollen - jbollen@indiana.edu PC2 17.3% 83.4% PC3 9.2% 92.6% PC4 4.8% 97.4% PC5 0.9% 98.3% Tracking science in real-time from large-scale usage data ! Introduction MESUR Mapping Science Metrics Survey Discussion 28 27 31 33 29 32 25 26 30 34 35 36 37 24 38 22 18 17 " 16 $%& 15 14 13 12 9 21 19 20 11 10 7 8 !! 6 5 4 3 2 1 !! " ! #" $%# Johan Bollen - jbollen@indiana.edu Tracking science in real-time from large-scale usage data Introduction MESUR Mapping Science Metrics Survey Discussion Usage Betweenness PageRank + Usage Probability PC2 citation Betweenness PageRank Total Cites - JIF06 Citation Immediacy Cites per Doc - Johan Bollen - jbollen@indiana.edu Tracking science in real-time + from large-scale usage data PC1 Introduction MESUR Mapping Science Metrics Survey Discussion 4: Cites per doc 5: Journal Impact Factor 1: Scimago Journal Rank 2: Citation Immediacy Index 7: Citation Out.degree centr. 8: Citation Out.degree centr. 6: Citation Closen. centr. 3: Citation Closen. centr. 18: Citation PageRank 16: Citation PageRank 20: Citation Y.factor 19: Citation PageRank 21: Citation Between. centr. 22: Citation Between. centr. 9: Citation Degree centr. 10: Citation Degree centr. 17: Citation PageRank 13: Cite Probability 15: Citation In.degree centr. 14: Citation In.degree centr. 12: Scimago Total Cites 11: Scimago H.index 30: Usage Out.degree centr. 29: Usage In.degree centr. 27: Usage PageRank 28: Usage PageRank 26: Usage Degree centr. 25: Usage Closen. centr. 34: Usage Betw. centr. 33: Usage Betw. centr. 24: Usage Closen. centr. 35: Usage Degree centr. 37: Usage In.degree centr. 36: Usage Out.degree 32: Usage PageRank 31: Usage PageRank 38: Journal Use Probability 2.5 Cluster 1 2 3 4 5 2.0 1.5 1.0 0.5 Measures 38 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37 1, 2, 3, 4, 5 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 16, 17, 18, 19, 20, 21, 22 Johan Bollen - jbollen@indiana.edu 0.0 Interpretation Journal Use Probability Usage measures JIF, SJR, Cites per Document measures Total Citation rates and distributions Citation Betweenness and PageRank Tracking science in real-time from large-scale usage data Introduction MESUR Mapping Science Metrics Survey Discussion Overview Future Research Relevant papers Outline 1 Introduction Problem statement Usage data 2 MESUR MESUR overview Creating MESUR’s reference data set 3 Mapping Science 4 Metrics Survey 5 Discussion Overview Future Research Relevant papers Johan Bollen - jbollen@indiana.edu Tracking science in real-time from large-scale usage data Introduction MESUR Mapping Science Metrics Survey Discussion Overview Future Research Relevant papers MESUR: so far Usage data: single largest reference data set of usage, citation and bibliographic data +1,000,000,000 usage events loaded multiple publishers, aggregators and institutions Infrastructure for research program Usage graphs: track scientific flow of activity real-time studies of science inclusive of larger “scholarly community” Metrics: valid, vetted indicators of scientific impact Different facets of scholarly impact Simple metrics, good results. Law of diminishing returns? Hybrid, consensus metrics? Johan Bollen - jbollen@indiana.edu Tracking science in real-time from large-scale usage data Introduction MESUR Mapping Science Metrics Survey Discussion Overview Future Research Relevant papers MESUR: the bad Sustainability: burden of data collection ad hoc, customized agreements with data providers restrictive agreements with regards to data sharing high costs of maintaining infrastructure: funding Research program: too much to do 3rd party access: better science many eyes on data Metrics and services: Community support and acceptance Can’t be limited to academic exercise Investigations need to be useful Accepted by community, become part of scholarly assessment system Johan Bollen - jbollen@indiana.edu Tracking science in real-time from large-scale usage data Introduction MESUR Mapping Science Metrics Survey Discussion Overview Future Research Relevant papers Future Research: Two directions Longitudinal research and dynamic models of science Bursty behavior of science Timeseries of reads over time are bursty Coordinated: social networks at work? Modeling Stochastic and agent-based models of scientific activity Citation following? Group-think? Find parameters of human information searching behavior. Johan Bollen - jbollen@indiana.edu Tracking science in real-time from large-scale usage data Introduction MESUR Mapping Science Metrics Survey Discussion Overview Future Research Relevant papers Bursty behavior j[, 2] 100 300 500 700 APPLIED STATISTICS: time series 0 50 100 150 200 250 300 350 250 300 350 250 300 350 Index 400 200 0 −200 j[, 2] − m$y APPLIED STATISTICS: residuals percentiles 0 50 100 150 200 Index 200 0 −200 mr$y 400 APPLIED STATISTICS: residuals smoothed 0 50 100 150 200 Index Johan Bollen - jbollen@indiana.edu Tracking science in real-time from large-scale usage data Introduction MESUR Mapping Science Metrics Survey Discussion Overview Future Research Relevant papers Contagion? JOURNAL OF HIGH ENERGY PHYSICS AMERICAN JOURNAL OF PHYSICS EUROPHYSICS LETTERS BIOCHIMICA ET BIOPHYSICA ACTA (BBA) − MOLECULAR BASI ANNALS OF PHYSICS ACTA CRYSTALLOGRAPHICA SECTION A CRYSTAL PHYSICS D PHYSICS AND CHEMISTRY OF THE EARTH PARTS A/B/C ASTROPHYSICS AND SPACE SCIENCE BIOCHIMICA ET BIOPHYSICA ACTA (BBA) − BIOMEMBRANES PHYSICS TODAY FOUNDATIONS OF PHYSICS LETTERS EUROPEAN JOURNAL OF PHYSICS NATURE PHYSICS SOLAR PHYSICS PHYSICS AND CHEMISTRY OF LIQUIDS ATMOSPHERIC CHEMISTRY AND PHYSICS MODERN PHYSICS LETTERS A PHYSICA E PHYSICA B PHYSICS OF THE EARTH AND PLANETARY INTERIORS INTERNATIONAL JOURNAL OF MODERN PHYSICS A QUARTERLY REVIEWS OF BIOPHYSICS ASTRONOMY & GEOPHYSICS PHYSICS OF ATOMIC NUCLEI JAPANESE JOURNAL OF APPLIED PHYSICS BIOCHIMICA ET BIOPHYSICA ACTA (BBA) − BIOENERGETICS JOURNAL OF PHYSICS G NUCLEAR AND PARTICLE PHYSICS PHYSICS LETTERS A TECTONOPHYSICS PHYSICS OF FLUIDS JOURNAL OF PHYSICS A GENERAL PHYSICS ASTRONOMY & ASTROPHYSICS CZECHOSLOVAK JOURNAL OF PHYSICS APPLIED PHYSICS B JOURNAL OF THE MECHANICS AND PHYSICS OF SOLIDS MATERIALS CHEMISTRY AND PHYSICS PHYSICS OF PLASMAS NEW JOURNAL OF PHYSICS FOUNDATIONS OF PHYSICS JOURNAL OF STATISTICAL PHYSICS INTERNATIONAL JOURNAL OF THEORETICAL PHYSICS PHYSICA C NUCLEAR PHYSICS B MEDICAL PHYSICS NUCLEAR PHYSICS A PHYSICA A CURRENT APPLIED PHYSICS JOURNAL OF PHYSICS D APPLIED PHYSICS RADIATION PHYSICS AND CHEMISTRY CHINESE PHYSICS LETTERS APPLIED PHYSICS PHYSICA D CHEMICAL PHYSICS LETTERS CANADIAN JOURNAL OF PHYSICS PHYSICA SCRIPTA INTERNATIONAL JOURNAL OF THERMOPHYSICS NUCLEAR INSTRUMENTS & METHODS IN PHYSICS RESEARCH PHYSICA STATUS SOLIDI (B) CHEMICAL PHYSICS NUCLEAR INSTRUMENTS & METHODS IN PHYSICS RESEARCH CONTEMPORARY PHYSICS PHYSICS LETTERS B CELL BIOCHEMISTRY AND BIOPHYSICS CENTRAL EUROPEAN JOURNAL OF PHYSICS JOURNAL OF CHEMICAL PHYSICS JOURNAL OF COMPUTATIONAL PHYSICS JOURNAL OF PHYSICS B ATOMIC MOLECULAR AND OPTICAL ADVANCES IN PHYSICS MOLECULAR PHYSICS PERCEPTION & PSYCHOPHYSICS 0 7 14 21 28 35 42 49 56 63 70 77 84 91 98 105 112 119 126 133 140 147 154 161 Johan Bollen - jbollen@indiana.edu 168 175 182 189 196 203 210 217 224 231 238 245 252 259 266 273 280 287 294 301 308 315 322 329 336 Tracking science in real-time from large-scale usage data Introduction MESUR Mapping Science Metrics Survey Discussion Overview Future Research Relevant papers Contagion? 63/37849 /.7,53053A83:9063/37849 63/3784041,/9328/6 >+.5-.4163/378490./;063/1-849 41--,/87@063/37849 8/735/.781/.20=1,5/.201<08--,/163/37849 E153./0=1,5/.201<063/37849 63/3784908/0-3;848/3 .;A./43908/063/37849 =1,5/.201<063/37849 4,553/7063/37849 63/37849093234781/03A12,781/ .-3584./0=1,5/.201<0-3;84.2063/378490>.5704093-8/. -,7.781/05393.54+0!063/3784071D841216@0./;03/A851 .//.2390;3063/378C,3 63/378403>8;3-81216@ 63/378403/68/3358/60/3:9 .-3584./0=1,5/.201<0-3;84.2063/378490>.570?0/3,51 3,51>3./0=1,5/.201<0+,-./063/37849 -1234,2.50./;063/35.2063/37849 3,51>3./0=1,5/.201<0-3;84.2063/37849 4,553/701>8/81/08/063/378490B0;3A321>-3/7 ?-4063/37849 63/378490./;0-1234,2.50?81216@ >+.5-.4163/37849 63/3784073978/6 >9@4+8.7584063/37849 /3,5163/37849 63/378409148.20./;063/35.20>9@4+1216@0-1/165.>+ 4@7163/37840./;063/1-305393.54+ =1,5/.201<0/3,5163/37849 63/3784.205393.54+ =1,5/.201<063/37840>9@4+1216@ 63/390B063/378409@973-9 =1,5/.201<0+,-./063/37849 .//,.2053A83:01<063/1-8490./;0+,-./063/37849 +,-./063/37849 5,998./0=1,5/.201<063/37849 .-3584./0=1,5/.201<0-3;84.2063/378490>.570. 8--,/163/37849 753/;908/063/37849 .//,.2053A83:01<063/37849 >219063/37849 4./435063/378490./;04@7163/37849 =1,5/.201<0-3;84.2063/37849 ?814+3-84.2063/37849 .-3584./0=1,5/.201<0+,-./063/37849 3,51>3./0=1,5/.201<08--,/163/37849 /3:063/378490B0914837@ 982A.3063/3784. 63/3784. /.7,53063/37849 41/935A.781/063/37849 ?3+.A815063/37849 -1234,2.5063/378490./;0-37.?1289<,/6.2063/378490./;0?81216@ -1234,2.50>+@2163/378490./;03A12,781/ =1,5/.201<0./8-.20?533;8/60./;063/37849 =1,5/.201<0.998973;053>51;,4781/0./;063/37849 .//.2901<0+,-./063/37849 428/84.2063/37849 ./8-.2063/37849 7:8/05393.54+0./;0+,-./063/37849 +,-./0-1234,2.5063/37849 ! " #$ %# %& '( $% $) (* *' "! "" &$ )# )& #!( ##% ##) #%* #'' #$! #$" #($ #*# Johan Bollen - jbollen@indiana.edu #*& #"( #&% #&) #)* %!' %#! %#" %%$ %'# %'& %$( %(% %() %** %"' %&! %&" %)$ '!# '!& '#( '%% '%) ''* Tracking science in real-time from large-scale usage data Introduction MESUR Mapping Science Metrics Survey Discussion Overview Future Research Relevant papers Relevant papers Johan Bollen, Herbert Van de Sompel, Aric Hagberg and Ryan Chute A Principal Component Analysis of 39 Scientific Impact Measures. PLoS ONE, June 2009. URL: http://dx.plos.org/10.1371/journal.pone.0006022. Michael Kurtz and Johan Bollen. Usage bibliometrics. Annual Review of Information Science and Technology, 2010 Johan Bollen, Herbert Van de Sompel, Aric Hagberg,Luis Bettencourt, Ryan Chute, Marko A. Rodriguez, Lyudmila Balakireva. Clickstream data yields high-resolution maps of science. PLoS One, February 2009. Johan Bollen - jbollen@indiana.edu Tracking science in real-time from large-scale usage data