Understanding outside collaborations of the Chinese Academy of Science using

advertisement
Understanding outside collaborations of
the Chinese Academy of Science using
Jensen-Shannon divergence
Visualization and Data Analysis 2009
San Jose,
Jose California
California, USA
19 January 2009
Russell J. Duhon
rduhon@indiana.edu
Cyberinfrastructure for Network Science Center
School of Library and Information Science
Indiana University
Bl
Bloomington,
i t
IN,
IN USA
Russell J. Duhon: Understanding Outside Collaborations
01/28
Table of Contents
1. Collaboration
2. Entropy
3 Collaboration and Entropy
3.
4. Information
5. Insight (Perhaps)
6 Applications
6.
A li ti
iin th
the S
Smallll and
dL
Large
Russell J. Duhon: CIShell, Network Workbench, and the Scholarly Database
02/28
Collaboration
Wh t is
What
i C
Collaboration?
ll b ti ?
03/28
Russell J. Duhon: Collaboration
Collaboration
Wh t is
What
i C
Collaboration?
ll b ti ?
Good question . . .
but a lot of collaboration is pretty easy to
recognize. Co-authorship is a common sort
looked at in scientometrics.
Russell J. Duhon: Collaboration
04/28
Collaboration
H
How
d
do we measure collaboration?
ll b ti ?
In a lot of ways, some of which we’ve
h d about
heard
b t earlier
li ttoday.
d
R
Returning
t i tto coauthorship, one way is to make networks
and analyze those.
05/28
Russell J. Duhon: Collaboration
Collaboration
Wh t if we don’t
What
d ’t h
have a nice
i coauthorship network?
In other words, how do we uncover less
obvious
b i
characteristics
h
t i ti off iinterest?
t
t?
Russell J. Duhon: Collaboration
06/28
Entropy
07/28
Russell J. Duhon: Entropy
Entropy
Wh t if we don’t
What
d ’t h
have a nice
i coauthorship network?
In other words, how do we uncover less
obvious
b i
characteristics
h
t i ti off iinterest?
t
t?
Russell J. Duhon: Entropy
08/28
Entropy
Many conceptualizations,
M
t li ti
and
d iin one
sense helps measure distinguishability.
09/28
Russell J. Duhon: Entropy
Entropy
Sh
Shannon
E
Entropy
t
  pxi  log pxi 
Russell J. Duhon: Entropy
10/28
Entropy
J
Jensen-Shannon
Sh
Divergence
Di
H .5 p  .5q   .5 H  p   .5 H (q )
where H is the Shannon Entropy.
Russell J. Duhon: Entropy
11/28
Collaboration and Entropy
So, given how a set of collaborators co
So
co-author
author
with some control set, we can treat those as
distributions and uncover the distinguishability
of those distributions.
Russell J. Duhon: Collaboration and Entropy
12/28
Collaboration and Entropy
Then, given a pairwise distinguishability
Then
metric, algorithms taking advantage of metric
similarity become available.
Russell J. Duhon: Collaboration and Entropy
13/28
Collaboration and Entropy
Take those pair
pair-wise
wise metrics
metrics, and do
Agglomerative Hierarchical Clustering with
Ward’s Algorithm to look for structure.
Russell J. Duhon: Collaboration and Entropy
14/28
Information
15/28
Russell J. Duhon: Information
Information
Ignoring Beijing
Beijing-only
only collaborators,
collaborators there are
two primary clusters:
Russell J. Duhon: Information
16/28
Information
More developed, more politically normalized
countries,
17/28
Russell J. Duhon: Information
Information
and less developed, less politically
normalized countries.
Russell J. Duhon: Information
18/28
Information
Within the cluster of more developed
developed, more
politically normalized countries, there are
again two clear clusters:
19/28
Russell J. Duhon: Information
Information
Larger,
g , more developed
p countries,,
including the up-and-coming BRIC
members
members,
Russell J. Duhon: Information
20/28
Information
and much smaller countries, for the most
part
21/28
Russell J. Duhon: Information
Information
Looking more closely, there are many small
clusters that show expected patterns
patterns.
Russell J. Duhon: Information
22/28
Insight (Perhaps)
Starting to approach insight
insight, the locations of
Chinese collaborators hint at potential
explanations
explanations.
23/28
Russell J. Duhon: Insight (Perhaps)
Insight (Perhaps)
,
,
?
One of the strongest
g
p
potential applications
pp
will be identifying possible sites for the
application of policy tools to strengthen,
sustain, and transform collaborative
structures
structures.
Russell J. Duhon: Insight (Perhaps)
24/28
Applications in the Small and Large
Remember the motivation: understanding
structure when there is less information than
ideal.
Russell J. Duhon: Applications in the Small and Large
25/28
Applications in the Small and Large
Small research groups generally know
who their members collaborate with and
how much, but rarely how those people
collaborate with each other.
Russell J. Duhon: Applications in the Small and Large
26/28
Applications in the Small and Large
Large organizations often have similar
information limitations, particularly due to
identification and data dirtiness.
Russell J. Duhon: Applications in the Small and Large
27/28
Applications in the Small and Large
In both cases
cases, this procedure may be very
useful. The insights hint at potential
relations among outside collaborators, and
comparison of distributions minimizes the
impact of dirty data.
Russell J. Duhon: Applications in the Small and Large
28/28
Th k Y
Thank
You
Russell Duhon
rduhon@indiana.edu
@
NSF IIS-0534909 and IIS-0513650 awards
Download