Intellectual diversity of scientific disciplines and research teams Staša Milojević School of Informatics and Computing, Indiana University Bloomington Exponential rise in scientific literature • Doubling time 10-20 years 20 papers per day Number of articles published annually 8000 7000 Astronomy 6000 5000 4000 3000 1 paper per day 2000 1000 0 1880 1900 1920 1940 1960 Publication year 1980 2000 Is publication volume a good proxy for the growth of science? 1960 textbook ? Measure concepts instead of publications! 2015 textbook? Cognitive concepts from document titles • Full text not normally available • Abstracts: availability incomplete • Keywords: very restricted • Titles function as “attention triggers” (Bazerman 1985,1989) • Titles have undergone a change during the 20th century, becoming • More informative • More specific • Containing a larger number of words that indicate article content • Any given article title captures the concepts only crudely and incompletely => Use very large number of articles! Measuring cognitive content from titles Intensity [# of occurrences] Title length (words) • Idea: count number of unique phrases in titles • Must normalize for unequal volume of literature => Cognitive content: number of unique phrases in 10,000 title words (~1000 articles) A 14 12 10 8 6 4 2 0 1880 1890 1900 1910 1920 1930 1940 1950 1960 Publication year 1970 Literature Y Literature X Same volume of literature Cognitive extent [number of different unique concepts] 1980 1990 2000 2010 Concepts from titles • Automatically identify phrases • Use general words to separate phrases • General words: prepositions, articles + non-specific words, such as: study, analysis, result, determination, comparison, discovery • Examples of titles with phrases capitalized: • HALOS and VOIDS in MULTIFRACTAL+MODEL of COSMIC+STRUCTURE • COLLISION+STRENGTHS for ELECTRON+IMPACT+EXCITATION of FINE+STRUCTURE+LEVELS in FE+XIII • Old words in new context: • MASS, ENERGY -> MASS-ENERGY EQUIVALENCE • considered a new phrase (introduced in relativity theory) Data • Research articles for: • Astronomy (160,000 articles since 1892) • Physics (440,000 articles since 1895) • Biomedicine (20 million articles since 1946) • Sources: • Thompson-Reuters Web of Science • NASA ADS • American Physical Society • PubMed Evolution of the cognitive extent of astronomy 7000 Cognitive extent 6000 Astronomy 5000 4000 3000 2000 1880 1890 1900 1910 1920 1930 1940 1950 1960 Publication year 1970 1980 1990 Dot = ca. 1000 articles Blue line = 2-year average Standard deviation from independent batches of articles: ~ 1.5% 2000 2010 Evolution of the cognitive extent of astronomy 7000 6500 6000 Astronomy 5000 4500 Phase II se III Pha WWII 3500 3000 Sputnik 4000 Number of articles published annually 8000 2000 1880 1890 1900 1910 1920 1930 1940 1950 1960 Publication year Astronomy 7000 6000 5000 4000 3000 2000 1000 0 1880 2500 Astronomy ->" astrophysics Cognitive extent 5500 1970 1980 1900 1990 1920 1940 1960 Publication year 2000 1980 2000 2010 Milojević, S. (in prep.) Cognitive extent by journal • Cognitive extent should not be equaled with quality! 2400 2200 Cognitive extent 2000 1800 MNRAS ApJ AJ A&A 1600 1400 1200 1000 800 1880 1890 1900 1910 1920 1930 1940 1950 1960 Publication year 1970 1980 1990 2000 2010 Milojević, S. (in prep.) Cognitive content by country • Country of lead author • Again, this is all normalized to the same publication volume 6500 USA Cognitive extent 6000 5500 UK France Italy Germany Japan 5000 Netherlands Canada 4500 4000 1960 China 1970 1980 1990 Publication year 2000 2010 Milojević, S. (in prep.) Science as a collaborative effort • Collaboration is one of the defining features of modern science • Most common way of studying collaboration: through coauthorship networks • Connect authors on a paper • Add to the network Börner et al. 2004 Teams as foundation for collaboration • Teams are becoming dominant in the production of knowledge (Wuchty, Jones, & Uzzi 2007) • Teams are growing in size (Bordons & Gomez 2000) • High-impact research is increasingly attributed to large teams (Wuchty, Jones, & Uzzi 2007; Börner et al. 2010) • Research that features novel ideas is attributed to large teams (Uzzi, Mukherjee, Stringer, & Jones 2013) Change in the character of team size distribution • Poisson function used to be a good fit, but not any more! Astronomy Team formation essentially a Poisson process, which somehow transformed Milojević, S. (2014). PNAS, 111(11), 3984 Functional description of different team types Distribution of each team type, as determined in the modeling, can be described by a corresponding analytical function Functional forms allow the empirical distribution to be analyzed without running a model simulation 1E+0 1 1E+0 1 ALL TEAMS Standard core teams Core +1 teams Extended teams 1E-­‐1 0.1 0.01 1E-­‐2 10-­‐5 1E-­‐5 POISSON+1" (shifted) TRUNCATED POWER LAW SUM Poisson (standard) Poisson w/extra member Truncated power law 1E-­‐1 0.1 0.01 1E-­‐2 Articles s 10-­‐3 el 1E-­‐3 ci tr 10-­‐4 A 1E-­‐4 POISSON 10-­‐3 1E-­‐3 10-­‐4 1E-­‐4 10-­‐5 1E-­‐5 2006-­‐10 10-­‐6 1E-­‐6 10-­‐7 1E-­‐7 1 2006-­‐10 10-­‐6 1E-­‐6 10 100 Article team size (k) 10-­‐7 1E-­‐7 1 Milojević, S. (2014). PNAS, 111(11), 3984 10 100 Article team size (k) Cognitive extent covered by different types of teams • Large teams do net (yet) cover the entire field 7000 Astronomy Single authors 6000 Author pairs Cognitive extent Core teams 5000 Extended teams Large extended teams 4000 3000 2000 1880 1890 1900 1910 1920 1930 1940 1950 1960 Publication year 1970 1980 1990 2000 2010 Currently: core (3-9 authors), extended (10-20), large extended (>20) Milojević, S. (in prep.) Cognitive extent covered by different types of teams • Large teams do net (yet) cover the entire field 8000 Physics Single authors Cognitive extent 7000 Author pairs Core teams 6000 Extended teams Large extended teams 5000 4000 3000 1880 1890 1900 1910 1920 1930 1940 1950 1960 Publication year 1970 1980 1990 2000 2010 Milojević, S. (in prep.) Cognitive extent covered by different types of teams • Single authors cover less than small teams 9000 00 00 Physics Author pairs Core teams Extended teams Large extended teams Cognitive extent 00 8000 Single authors Biomedicine 7000 6000 00 5000 00 00 1880 4000 1940 1890 1900 1910 1920 1930 1950 1960 1940 1950 Publication year 1970 1980 1990 Publication year 1960 1970 1980 1990 2000 2010 2000 2010 Milojević, S. (in prep.) Thank you! Staša Milojević School of Informatics and Computing Indiana University Bloomington smilojev@indiana.edu