Dynamic graph patterns in the High-Energy Physics/Theory Citation Networks

advertisement
Dynamic graph patterns in the High-Energy
Physics/Theory Citation Networks
Victor O. Santos and Lawrence B. Holder
Electrical Engineering and Computer Science, Washington State University REU
INTRODUCTION
The citation network can be easily represented as a graph where each paper is a
vertex and each citation is an edge. The graph formed by the citation network is
directed since the citations can only occur when a recently published paper
cites a previously published paper. The citation network graphs with we will be
working with in this research are dynamic. We will be using several tools to
study these dynamic graphs such as DynGRL [1] that will be analyzing the
graphs using a two-step algorithm to determine if there are patterns in these
networks, and a conversion algorithm in order to eliminate the inconsistencies
in raw data and generate the dynamic graphs.
We will be working with the High Energy Physics and High Energy Physics
Theory citation networks that contains ten years of information given in the
following format.
Target paper
Cited paper
Paper
EXPERIMENT
RESULTS
The steps for the experiment were the followings:
•
Six new dynamic graphs were generated to try to find patterns. These six graphs
are the full month, weeks, and days graphs for both citation networks.
•
Because the high time consumption of DynGRL’s first step algorithm, the two
citation networks graphs were executed in different machines at the same time.
•
After eight days of execution, all the High Energy Physics processes finished, and
the results for the pattern discovery process for both citation networks were
different from other previous inconsistent results.
•
We realize that should increment the precision parameters of DynGRL to find
more complex substructures and more interesting patterns.
•
After executing DynGRL with these new parameters, we notice that the execution
time will be approximately exponentiated as the snapshot grows.
The total execution time for a ten months’ dynamic graph can be expressed as
follows.
Publication date
Despite the difficulties processing the different dynamic graphs, three patterns
were found in the High Energy Physics citation network.
(A)
Graph (A) represents the
pattern discovered by dividing
the time snapshots every one
day.
Graph
(B) represents the pattern
discovered by dividing the time
snapshots every two days.
(B)
APROACH
(A)
Before starting with the pattern discovery process, we perform
the following steps:
Time 1
v 1 paper
v 2 paper
…
•
v i paper
d 2 1 citation
d i1 i2 citation
Build a conversion algorithm that will be used countless
times in the experimental process.
- The conversion algorithm allows generating the dynamic
graph by splitting the whole graph into time snapshots.
…
Here is a framework of dynamic graph analysis [2]. Step (A) represents a dynamic graph
with ten snapshots of time. Step (B) The graph’s rewriting rules discovery from two
continuous graph snapshot times. Step (C) Learning the rewriting rules generated by the
previous step. Step (D) Generating the dynamic graph transformation patterns by
abstracting the learned rewriting rules.
…
•
v n paper
d 2 1 citation
d i1 i2 citation
Convert the raw graph data into its dynamic graph
representation as represented in figure (A).
CONCLUSION
VISUALIZATION
(A)
Other analyses were made to both
citation networks, like calculating the
weight of the vertices, community
detection, and a visualization using the
Gephi [3] graphs visualization tool.
COMPUATIONAL POWER ISSUE
Graph visualization (A) represents the
first 500 days of the High Energy Physics
Theory citation network.
DynGRL is designed to work as a single thread process, and this can result in a
very slow pattern discovery process in large graphs like the ones we work on in
this research.
•
(C)
- Three measures of time were created: months, weeks and
days to create snapshots.
Time i
v 1 paper
v 2 paper
•
Graph (C) represents the pattern
discovered by dividing the time snapshots
every one month.
DynGRL is designed to process graphs that change over time, and these
changes included additions and subtraction of vertices and edges.
(B)
Citation graphs do change through time, but do not suffer subtraction of
vertices or edges.
•
The execution time of DynGRL varies depending on the size of the graph’s
time snapshots.
•
The execution time for 500 snapshots representing the evolution of the
graphs each day can take more than 72 hours to process with the lowest
accuracy parameters.
•
A Good solution for this issue is to change the DynGRL single thread design
to a parallel design capable of using the resources of today’s multi core
systems.
Graph visualization (B) represents the
whole High Energy Physics Theory
citation network. The colors in the
different
areas
are
the
graphs
communities.
This work was supported by the National Science Foundation’s REU
program under grant number IIS-0647705
The pattern discovery technique has demonstrated effectiveness in the
discovery of patterns in these kinds of networks and therefore in other
citation networks that can also be represented in graphs. This technique gives
us an abstract idea of how the citations network behaves and therefore,
shows the possibility of predicting when their structures will be changing. In
order to test the accuracy of the three found patterns, we can take a small
sample of today’s High Energy Physics citation network and see if the
patterns are present. If we get a high accuracy, we can conclude that these
patterns represents a persistent behavior in this citation network and
therefore, the behavior of how researchers are related by the citations of their
publications.
The research also gives an idea of how much computational power is needed
to process sophisticated graphs like these ones.
REFERENCES
[1] C. hun You, “DynGRL: Dynamic Graph-based Relational
Learning,” 2011, http://changhun.com/research.html.
[2] C. hun You, L. B. Holder, and D. J. Cook, “Learning
Patterns in the Dynamics of Biological Networks,” 2009, in
press.
[3] M. Bastian, S. Heymann, and M. Jacomy, “Gephi: An
Open Source Software for Exploring and Manipulating
Networks,” 2009, in press.
Download