Ke, Weimao, Börner, Katy and Viswanath, Lalitha (in press) Major Information Visualization Authors, Papers and Topics in the ACM Library. IEEE Information Visualization Conference, Houston, Texas, Oct 10-12, 2004. This entry did win the 1st Price! Major Information Visualization Authors, Papers and Topics in the ACM Library Weimao Ke* Katy Börner+ Lalitha Viswanath‡ School of Library and Information Science Indiana University School of Library and Information Science Indiana University School of Informatics Indiana University ABSTRACT The presented work aims to identify major research topics, coauthorships, and trends in the IV Contest 2004 dataset. Co-author, paper-citation, and burst analysis were used to analyze the dataset. The results are visually presented as graphs, static Pajek [1] visualizations and interactive network layouts using Pajek’s SVG output feature. A complementary web page with all the raw data, details of the analyses, and high resolution images of all figures is available online at http://iv.slis.indiana.edu/ref/iv04contest/. these venues do not include the annual Information Visualization Conference in London, the annual SPIE Visualization and Data Analysis Conference in San Jose, or the new Information Visualization journal. Hence, the subsequent analysis will provide only a partial view of InfoVis research and education. During data cleaning, the total set of 1,161 authors was reduced to 1,036 unique authors and 1,859 keywords were reduced to 1,753 unique keywords. The year of publication was successfully retrieved for 8,178 out of the 8,502 citation references. Keywords: Citation analysis, co-author analysis, visualization 2 TASK 1: STATIC OVERVIEW OF 10 YEARS OF INFOVIS Knowledge domain visualization techniques [2] were applied to map the semantic space of the dataset via citation analysis. The results of the citation analysis were visualized with Pajek [1] and are shown in Figure 2. 1 THE INFOVIS CONTEST DATASET The InfoVis Contest dataset contains 614 papers that were published between 1974 and 2004. The papers come with a title, authors, abstracts, keywords, source, references, number of pages, and year of publication. One paper (acm673478) has no author. 429 papers have abstracts, 424 papers come with keywords and 340 papers have an abstract and keywords. The yearly distribution is plotted in Figure 1. Figure 2. Citation network of major papers Figure 1. Yearly distribution of papers, abstracts, keywords, references and citation counts. The dataset has a total number of 8,502 unique references. Out of these, 1,970 link to papers within the contest dataset, called the IV core. 1,801 references link to other ACM papers and 4,722 link to non-ACM papers. We identified 106 unique publication venues. Unfortunately, * email: wke@indiana.edu + email: katy@indiana.edu ‡ email: lviswana@indiana.edu Depicted are papers that got cited at least 20 times (15 papers) as well as papers which were cited at least 7 times AND that cited those 15 papers (44 papers). Elimination of duplicate entries resulted in 47 papers. Each paper is represented by a circle. Node size denotes the number of received citations. Node color denotes year of publication. Ring color denotes the average citation year. Links represent direct citation links. Within the IV core there are two papers that received 70 citations: Furnas’s 1986 paper entitled Generalized fisheye views, and Robertson’s 1991 paper on Cone trees: animated 3D visualizations of hierarchical information. Tufte’s 1986 paper entitled The visual display of quantitative information was cited 40 times (see article_cited_count_withinset). It is interesting to note that Bertin’s 1983 paper on the Semiology of Graphics is cited most often (14 times) among papers in the non-ACM category. It is followed by Spence & Apperley’s 1982 paper Database Navigation: An Office Environment for the Professional, which has 9 citations (see article_cited_count_outside_acm). 3 TASK 2: MAJOR RESEARCH TOPICS AND THEIR EVOLUTION Sudden increases in the usage frequency of keywords were identified using Kleinberg’s burst analysis algorithm [3]. The results for unique keywords (several are compound terms) are shown in Table 1. Table 1. Keyword burst analysis results Word Burst_Weight Data visualization 3.70 focus+context 4.29 hierarchy 3.95 human factors 3.42 information visualization 13.08 user interface 3.46 Burst_Years (1994 - 1995) (1999 - 2002) (2000 - 2002) (1983 - 1994) (1998 - present) (1983 - 1991) While InfoVis research seems to have started with user_interface and human_factors research, data_visualization work dominated in 94/95. Information Visualization has had a strong, ongoing burst since 1998. 4 TASK 3: THE AUTHORS IN THE INFOVIS CONTEST SET Scholars with more than 10 papers are Ben Shneiderman (23), Stuart K. Card (16), Jock D. Mackinlay (15), Steven F. Roth (12), George Robertson (11), Daniel A. Keim (11) and John T. Stasko (11). Authors that received more than 100 citation links are Stuart K. Card (236), Jock D. Mackinlay (212), George G. Robertson (180) and Ben Shneiderman (173). The top four authors with the largest number of unique coauthors are Ben Shneiderman (23), Stuart K. Card (17), Jock D. Mackinlay (17) and George G. Robertson (16). In the IV core, 93.3% of the authors have co-authored. Figure 3 shows a co-author network for the IV core authors that published no less than 10 papers OR got cited no less than 50 times OR have no less than 20 occurrences of co-authorship with other authors. 17 authors satisfied one or more of the three criteria. All of their co-authors are shown, resulting in 138 author nodes. The node size corresponds to the number of papers published. Node color denotes the total number of received citations. Edge thickness indicates the number of times authors co-authored together. The visualization reveals that Ben Shneiderman has authored the most papers (23), while Stuart K. Card received the most citations for his work. Diverse clusters of co-authors can also be identified and are discussed in the accompanying web page. 5 DISCUSSION We presented simple statistics, burst analysis results of keywords and semantic maps of major papers and authors based on the InfoVis Contest 2004 dataset. Given that the dataset does not cover papers presented at two of the three main annual InfoVis Conferences, the new Information Visualization journal, or books, only a partial picture of the domain has been drawn. ACKNOWLEDGEMENTS Ketan K. Mane & Ning Yu provided support in developing the visualizations for the citation networks and co-author networks. We appreciate the enormous effort by Jean-Daniel Fekete, Georges Grinstein and Catherine Plaisant and others in providing the contest dataset. This work is supported by a National Science Foundation CAREER Grant under IIS-0238261 and NSF grant DUE-0333623. REFERENCES 1. Batagelj, V. and A. Mrvar, Pajek: Program Package for Large Network Analysis, University of Ljubljana, Slovenia. 1997, http://vlado.fmf.uni-lj.si/pub/networks/pajek/. 2. Börner, K., C. Chen, and K. Boyack, Visualizing Knowledge Domains, In Annual Review of Information Science & Technology, B. Cronin, Editor. 2003, Information Today, Inc./American Society for Information Science and Technology: Medford, NJ. p. 179-255. 3. Kleinberg, J.M. Bursty and hierarchical structure in streams. In 8th ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining. 2002: ACM Press. p. 91-101. Figure 3. Co-author space of highly productive, cited or co-authoring authors.