Dynamic graph patterns in the High-Energy Physics/Theory Citation Networks Victor O. Santos and Lawrence B. Holder Electrical Engineering and Computer Science, Washington State University REU INTRODUCTION The citation network can be easily represented as a graph where each paper is a vertex and each citation is an edge. The graph formed by the citation network is directed since the citations can only occur when a recently published paper cites a previously published paper. The citation network graphs with we will be working with in this research are dynamic. We will be using several tools to study these dynamic graphs such as DynGRL [1] that will be analyzing the graphs using a two-step algorithm to determine if there are patterns in these networks, and a conversion algorithm in order to eliminate the inconsistencies in raw data and generate the dynamic graphs. We will be working with the High Energy Physics and High Energy Physics Theory citation networks that contains ten years of information given in the following format. Target paper Cited paper Paper EXPERIMENT RESULTS The steps for the experiment were the followings: • Six new dynamic graphs were generated to try to find patterns. These six graphs are the full month, weeks, and days graphs for both citation networks. • Because the high time consumption of DynGRL’s first step algorithm, the two citation networks graphs were executed in different machines at the same time. • After eight days of execution, all the High Energy Physics processes finished, and the results for the pattern discovery process for both citation networks were different from other previous inconsistent results. • We realize that should increment the precision parameters of DynGRL to find more complex substructures and more interesting patterns. • After executing DynGRL with these new parameters, we notice that the execution time will be approximately exponentiated as the snapshot grows. The total execution time for a ten months’ dynamic graph can be expressed as follows. Publication date Despite the difficulties processing the different dynamic graphs, three patterns were found in the High Energy Physics citation network. (A) Graph (A) represents the pattern discovered by dividing the time snapshots every one day. Graph (B) represents the pattern discovered by dividing the time snapshots every two days. (B) APROACH (A) Before starting with the pattern discovery process, we perform the following steps: Time 1 v 1 paper v 2 paper … • v i paper d 2 1 citation d i1 i2 citation Build a conversion algorithm that will be used countless times in the experimental process. - The conversion algorithm allows generating the dynamic graph by splitting the whole graph into time snapshots. … Here is a framework of dynamic graph analysis [2]. Step (A) represents a dynamic graph with ten snapshots of time. Step (B) The graph’s rewriting rules discovery from two continuous graph snapshot times. Step (C) Learning the rewriting rules generated by the previous step. Step (D) Generating the dynamic graph transformation patterns by abstracting the learned rewriting rules. … • v n paper d 2 1 citation d i1 i2 citation Convert the raw graph data into its dynamic graph representation as represented in figure (A). CONCLUSION VISUALIZATION (A) Other analyses were made to both citation networks, like calculating the weight of the vertices, community detection, and a visualization using the Gephi [3] graphs visualization tool. COMPUATIONAL POWER ISSUE Graph visualization (A) represents the first 500 days of the High Energy Physics Theory citation network. DynGRL is designed to work as a single thread process, and this can result in a very slow pattern discovery process in large graphs like the ones we work on in this research. • (C) - Three measures of time were created: months, weeks and days to create snapshots. Time i v 1 paper v 2 paper • Graph (C) represents the pattern discovered by dividing the time snapshots every one month. DynGRL is designed to process graphs that change over time, and these changes included additions and subtraction of vertices and edges. (B) Citation graphs do change through time, but do not suffer subtraction of vertices or edges. • The execution time of DynGRL varies depending on the size of the graph’s time snapshots. • The execution time for 500 snapshots representing the evolution of the graphs each day can take more than 72 hours to process with the lowest accuracy parameters. • A Good solution for this issue is to change the DynGRL single thread design to a parallel design capable of using the resources of today’s multi core systems. Graph visualization (B) represents the whole High Energy Physics Theory citation network. The colors in the different areas are the graphs communities. This work was supported by the National Science Foundation’s REU program under grant number IIS-0647705 The pattern discovery technique has demonstrated effectiveness in the discovery of patterns in these kinds of networks and therefore in other citation networks that can also be represented in graphs. This technique gives us an abstract idea of how the citations network behaves and therefore, shows the possibility of predicting when their structures will be changing. In order to test the accuracy of the three found patterns, we can take a small sample of today’s High Energy Physics citation network and see if the patterns are present. If we get a high accuracy, we can conclude that these patterns represents a persistent behavior in this citation network and therefore, the behavior of how researchers are related by the citations of their publications. The research also gives an idea of how much computational power is needed to process sophisticated graphs like these ones. REFERENCES [1] C. hun You, “DynGRL: Dynamic Graph-based Relational Learning,” 2011, http://changhun.com/research.html. [2] C. hun You, L. B. Holder, and D. J. Cook, “Learning Patterns in the Dynamics of Biological Networks,” 2009, in press. [3] M. Bastian, S. Heymann, and M. Jacomy, “Gephi: An Open Source Software for Exploring and Manipulating Networks,” 2009, in press.