Understanding outside collaborations of the Chinese Academy of Science using Jensen-Shannon divergence Visualization and Data Analysis 2009 San Jose, Jose California California, USA 19 January 2009 Russell J. Duhon rduhon@indiana.edu Cyberinfrastructure for Network Science Center School of Library and Information Science Indiana University Bl Bloomington, i t IN, IN USA Russell J. Duhon: Understanding Outside Collaborations 01/28 Table of Contents 1. Collaboration 2. Entropy 3 Collaboration and Entropy 3. 4. Information 5. Insight (Perhaps) 6 Applications 6. A li ti iin th the S Smallll and dL Large Russell J. Duhon: CIShell, Network Workbench, and the Scholarly Database 02/28 Collaboration Wh t is What i C Collaboration? ll b ti ? 03/28 Russell J. Duhon: Collaboration Collaboration Wh t is What i C Collaboration? ll b ti ? Good question . . . but a lot of collaboration is pretty easy to recognize. Co-authorship is a common sort looked at in scientometrics. Russell J. Duhon: Collaboration 04/28 Collaboration H How d do we measure collaboration? ll b ti ? In a lot of ways, some of which we’ve h d about heard b t earlier li ttoday. d R Returning t i tto coauthorship, one way is to make networks and analyze those. 05/28 Russell J. Duhon: Collaboration Collaboration Wh t if we don’t What d ’t h have a nice i coauthorship network? In other words, how do we uncover less obvious b i characteristics h t i ti off iinterest? t t? Russell J. Duhon: Collaboration 06/28 Entropy 07/28 Russell J. Duhon: Entropy Entropy Wh t if we don’t What d ’t h have a nice i coauthorship network? In other words, how do we uncover less obvious b i characteristics h t i ti off iinterest? t t? Russell J. Duhon: Entropy 08/28 Entropy Many conceptualizations, M t li ti and d iin one sense helps measure distinguishability. 09/28 Russell J. Duhon: Entropy Entropy Sh Shannon E Entropy t pxi log pxi Russell J. Duhon: Entropy 10/28 Entropy J Jensen-Shannon Sh Divergence Di H .5 p .5q .5 H p .5 H (q ) where H is the Shannon Entropy. Russell J. Duhon: Entropy 11/28 Collaboration and Entropy So, given how a set of collaborators co So co-author author with some control set, we can treat those as distributions and uncover the distinguishability of those distributions. Russell J. Duhon: Collaboration and Entropy 12/28 Collaboration and Entropy Then, given a pairwise distinguishability Then metric, algorithms taking advantage of metric similarity become available. Russell J. Duhon: Collaboration and Entropy 13/28 Collaboration and Entropy Take those pair pair-wise wise metrics metrics, and do Agglomerative Hierarchical Clustering with Ward’s Algorithm to look for structure. Russell J. Duhon: Collaboration and Entropy 14/28 Information 15/28 Russell J. Duhon: Information Information Ignoring Beijing Beijing-only only collaborators, collaborators there are two primary clusters: Russell J. Duhon: Information 16/28 Information More developed, more politically normalized countries, 17/28 Russell J. Duhon: Information Information and less developed, less politically normalized countries. Russell J. Duhon: Information 18/28 Information Within the cluster of more developed developed, more politically normalized countries, there are again two clear clusters: 19/28 Russell J. Duhon: Information Information Larger, g , more developed p countries,, including the up-and-coming BRIC members members, Russell J. Duhon: Information 20/28 Information and much smaller countries, for the most part 21/28 Russell J. Duhon: Information Information Looking more closely, there are many small clusters that show expected patterns patterns. Russell J. Duhon: Information 22/28 Insight (Perhaps) Starting to approach insight insight, the locations of Chinese collaborators hint at potential explanations explanations. 23/28 Russell J. Duhon: Insight (Perhaps) Insight (Perhaps) , , ? One of the strongest g p potential applications pp will be identifying possible sites for the application of policy tools to strengthen, sustain, and transform collaborative structures structures. Russell J. Duhon: Insight (Perhaps) 24/28 Applications in the Small and Large Remember the motivation: understanding structure when there is less information than ideal. Russell J. Duhon: Applications in the Small and Large 25/28 Applications in the Small and Large Small research groups generally know who their members collaborate with and how much, but rarely how those people collaborate with each other. Russell J. Duhon: Applications in the Small and Large 26/28 Applications in the Small and Large Large organizations often have similar information limitations, particularly due to identification and data dirtiness. Russell J. Duhon: Applications in the Small and Large 27/28 Applications in the Small and Large In both cases cases, this procedure may be very useful. The insights hint at potential relations among outside collaborators, and comparison of distributions minimizes the impact of dirty data. Russell J. Duhon: Applications in the Small and Large 28/28 Th k Y Thank You Russell Duhon rduhon@indiana.edu @ NSF IIS-0534909 and IIS-0513650 awards