An Integrated Approach to Mapping Worldwide Bioterrorism Research Capabilities MIS580 Spring 2009 Artificial Intelligence Lab University of Arizona 7/12/2016 1 Outline I. II. III. IV. V. VI. VII. 7/12/2016 Introduction Literature Review Research Testbed Research Design Experimental Study Conclusion and Directions References 2 I. Introduction Bioterrorism Research • Since the Anthrax attacks after 9/11, bioterrorism has been given a high priority in national security. • Biomedical research that is essential to creating the medicines, vaccines, and technologies to counter the threat of bioterrorism and naturally occurring disease, might also be applied towards biological weapons development. • The U.S. Government has attempted to monitor and control biomedical research labs, especially those that study bioterrorism agents/diseases. • However, monitoring worldwide biomedical research and researchers is still an issue. 7/12/2016 3 I. Introduction Research Literature • Literature resources are very important to scientific research. – In general, experimental labs may collect literatures in a special domain. As the trends of scientific cooperation become popular, the needs of sharing literatures turn to be prerequisite (Huang et al., 2003). • With the rapid development of science and technology, the number of scientific publications is growing exponentially. – With the explosive growth of scientific information, there is an overwhelming amount of journal articles in various research areas (Bruijn and Martin, 2002; Cohen and Hersh, 2005). • Research Literature may be used as a resource to monitor bioterrorism research. 7/12/2016 4 I. Introduction Research Objectives • In this study, we develop an integrated approach to monitoring and analyzing worldwide bioterrorism research literature by using knowledge mapping techniques. Our objectives are to identify: – researchers who have expertise in bioterrorism agents/diseases research domain, – major institutions and countries where these researchers reside, and – emerging topics and trends in bioterrorism agents/diseases research. 7/12/2016 5 II. Literature Review Bioterrorism Literature Analysis • During the past years, biomedical informatics tools have been developed to protect our populations from bioterrorism attacks (Kohane, 2002). • Some previous studies used bibliometrics to examine terrorism research publications and offer an evolution view of the development of the field (Kennedy and Lum, 2003; Reid, 1997). • Hu et al. (2006) used text mining approach to identify candidate viruses and bacteria as potential bioterrorism weapons from PubMed. • However, few efforts have been made to monitor and analyze worldwide bioterrorism research and researchers. Few studies have used knowledge mapping techniques to analyze the research status of the bioterrorism area. 7/12/2016 6 II. Literature Review Knowledge Mapping • Three types of analysis are often adopted in knowledge mapping research: text mining, network analysis, and information visualization (Chen and Roco, 2008). – Text mining consists of two significant classes of techniques: Natural Language Processing (NLP) and content analysis. – Social network analysis (SNA) has provided the means for studying the network of productive scholars. – Information visualization techniques can be used to turn abstract textual documents into objects that can be displayed. 7/12/2016 7 II. Literature Review Natural Language Processing (NLP) • In NLP, automatic indexing (Salton, 1989) is a method commonly used to represent the content of a document by means of a vector of keywords or terms (Chen and Roco, 2008). – Noun-phrasing techniques can capture a richer linguistic representation of document than the Bag of Words (BOW). – Examples of noun-phrasing tools include MIT’s Chopper, Nptool (Voutilainen, 1997), and Arizona Noun Phraser (Tolle and Chen, 2000). • Information extraction is another computationally effective method to identify important concepts from text documents (Chen and Roco, 2008). – The best systems have been shown to achieve more than 90% accuracy in both precision and recall rates when extracting persons, locations, organizations, dates, times, currencies, and percentages from newspaper articles (Chinchor, 1998). 7/12/2016 8 II. Literature Review Content Analysis • By using content analysis, articles that are collected and grouped based on authors, institutions, topic areas, countries, or regions can be analyzed to identify the underlying themes, patterns, or trends (Chen and Roco, 2008). • Popular content analysis techniques include: – – – – – – Clustering Algorithms, Self-Organizing Map (SOM), Multidimensional Scaling (MDS), Principal Component Analysis (PCA), Co-word Analysis, and PathFinder Network. 7/12/2016 9 II. Literature Review Social Network Analysis (SNA) • SNA is capable of detecting subgroups (of scholars), discovering their pattern of interactions, identifying central individuals, and uncovering network organization and structure (Chen and Roco, 2008). – Burt (Burt, 1976) applied hierarchical clustering methods based on structural equivalence measure (Lorrain and White, 1971) to detect subgroups in a social network. – Blockmodel analysis approach can be used to discover patterns of interactions between subgroups (Wasserman and Faust, 1994; Xu and Chen, 2005). – Several measures, such as degree, betweenness, and closeness, are related centrality, which deals with the roles of individuals in a network (Wasserman and Faust, 1994). 7/12/2016 10 II. Literature Review Information Visualization • The last step in the knowledge mapping process is to make knowledge transparent through the use of various information visualization (or mapping) techniques (Chen and Roco, 2008). • Shneiderman (1996) proposed seven types of information representation methods including: – – – – – – – 1D (one-dimensional) representation , 2D representation , 3D representation , multi-dimensional representation , tree representation , network representation , and temporal representation. 7/12/2016 11 III. Research Testbed Research Testbed • We built two sets of test data based on human and animal related bioterrorism agents/diseases respectively. • For human bioterrorism agents/diseases, we retrieved 178,599 publication records from MEDLINE (1964-2005), by searching article abstracts and titles using 58 keywords from CDC’s list of agents by category (http://www.bt.cdc.gov/Agent/agentlist.asp). • For animal bioterrorism agents/diseases, we retrieved 135,774 publication records from MEDLINE (1965-2005) by searching article abstracts and titles using 58 keywords from OIE’s list of diseases by species (http://www.oie.int/eng/maladies/en_classification.htm). 7/12/2016 12 III. Research Testbed Human Agents/Diseases Research Collection The dataset characteristics broken down by CDC’s agents category. Disease # Publications Category A Botulism Anthrax Plague Smallpox Viral hemorrhagic fever Tularemia Category B * E. Coli Q Fever Category C (Only Nipah virus and hantavirus) Overall ** 7/12/2016 # Unique Authors # Unique Countries 8,635 3,780 1,674 1,504 846 678 494 170,460 106,479 34,312 919 23,891 9,988 5,579 4,169 2,623 1,945 1,454 356,162 212,338 115,136 2,974 89 56 54 55 43 35 30 157 124 144 50 178,599 381,684 159 * Only 2 most researched diseases in category B are shown here. ** Some articles mention multiple diseases. 13 III. Research Testbed Animal Agents/Diseases Research Collection The dataset characteristics broken down by disease. Diseases Q fever Vesicular stomatitis Foot and mouth disease Rabies Brucellosis Anthrax Paratuberculosis Japanese Encephalitis West Nile Virus Avian Influenza Overall * * 7/12/2016 # Publications 33,999 2,374 2,338 2,209 1,955 1,240 997 988 944 717 135,774 # Unique Authors # Unique Countries 114,600 7,281 7,159 5,509 5,585 4,236 2,616 2,870 2,086 3,446 320,630 144 41 63 81 77 50 37 39 35 41 165 Only top 10 diseases are shown. There are about 80 diseases in the OIE’s list. 14 IV. Research Design Research Design Data Acquisition MEDLINE Abstracts Data Filtering Data Parsing and Cleaning Data Parsing Facts Consolidating Data Analysis Productivity: Bibliographic Analysis Collaboration: Co-authorship Analysis Emerging Topics: Content Map Analysis Data acquisition involves gathering the bioterrorism agents/diseases related research literature from MEDLINE database. In data parsing and cleaning: •the title, abstract, and authors’ information of each paper are parsed into a local database; •and some variations of foreign institution names need to be consolidated. Date analysis is performed as follows: •bibliographic analysis is used to analyze the productivity status; •co-authorship analysis is used to analyze the collaboration status; •and content map analysis is used to analyze the emerging topics. Details on these steps are discussed later. 7/12/2016 15 IV. Research Design Data Acquisition • Research articles are retrieved from the MEDLINE database. – Compiled by the U.S. National Library of Medicine (NLM) and published on the Web by Community of Science, MEDLINE® is the world's most comprehensive source of life sciences and biomedical bibliographic information. It contains nearly eleven million records from over 7,300 different publications from 1965 to November 16, 2005 (http://medline.cos.com/). • All the related articles are collected by using keyword filtering. 7/12/2016 16 IV. Research Design Data Parsing and Cleaning • Data Parsing – The title, abstract, and authors’ information of each article are parsed and stored in a relational database. – The institutions and countries of the authors are parsed out by using dictionaries of countries, states, cities, and institutions. – All the author names of an article are parsed out, but only the first author’s institution is kept for later analysis. • Facts Consolidating – Some variations of foreign institution names, and city names were spot checked and fixed manually. 7/12/2016 17 IV. Research Design Data Analysis • Productivity Status – We use bibliographic analysis to study the productivity of authors, institutions, and countries. – We also assess the trends and evolution of bioterrorism agents/diseases research activities . • Collaboration Status – We use co-authorship analysis to study the collaborations between researchers. – We also detect the independent or isolated research groups in the field. • Research Trend Topics – We use SOMs to study the active research topics, and discover the emerging research topics in different time spans. 7/12/2016 18 V. Experimental Study Experimental Study • Human Agents/Diseases Research – Productivity Status – Collaboration Status – Emerging Topics • Animal Agents/Diseases Research – Productivity Status – Collaboration Status – Emerging Topics 7/12/2016 19 V. Experimental Study Human Agents/Diseases Research: Productivity Status (Country Level) Top 10 countries Rank 1 2 3 4 5 6 7 8 9 10 7/12/2016 Country United States Japan United Kingdom Germany France Canada Italy Sweden Spain India # Publications 65,810 16,023 12,091 10,598 8,732 6,367 4,193 3,933 3,847 3,589 • The United States had the most publications in human agents/diseases research, followed by Japan and United Kingdom. 20 V. Experimental Study Human Agents/Diseases Research: Productivity Status (Institution Level) Top 10 institutions Rank 1 2 3 4 5 6 7 8 9 10 7/12/2016 Institution Harvard University University of WisconsinMadison Institute Pasteur-Paris University of Tokyo Centers for Disease Control and Prevention-Atlanta Stanford University University of MarylandBaltimore Osaka University Yale University University of California-Davis # Publications 1,389 1,131 1,125 883 849 815 813 798 785 782 • Harvard university had the most publications, followed by University of WisconsinMadison and Institute Pasteur-Paris. • Harvard University included Harvard Medical School, the John F. Kennedy School of Government, and all the other departments. 21 V. Experimental Study Human Agents/Diseases Research: Productivity Status (Researcher Level) Top 10 researchers Rank Researchers 1 Raoult, D 2 3 4 5 6 7 8 9 10 Inouye, M Yamamoto, K Rowe, B Peters, C J Levine, M M Dougan, G Ito, K Kaback, H R Watanabe, K Institution # Publications WHO Collaborative Center for Rickettsial Reference and Research, France Robert Wood Johnson Medical School, New Jersey Tohoku University, Japan Central Public Health Laboratory, London, UK University of Texas Medical Branch-Galveston University of Maryland-Baltimore Imperial College London, UK Kyoto University, Japan Howard Hughes Medical Institute, UCLA University of Tokyo, Japan 220 163 159 148 145 143 140 140 136 134 • Researchers with the most publications usually performed research related to CDC’s category B agents such as Q-fever and E.Coli. 7/12/2016 22 V. Experimental Study Human Agents/Diseases Research: Productivity Status (Researcher Level) Top 5 researchers in selected regions State sponsors of terrorism Cuba Guzman, M G (Cuba), Q fever Campos, J (Cuba), E. Coli Fando, R (Cuba), Cholera Kouri, G (Cuba), Q fever Silva, A (Cuba), E. Coli # Pubs 18 Jafari, A (Iran), E. Coli # Pubs 8 14 Katouli, M (Iran), E. Coli 7 9 Bouzari, S (Iran), E. Coli 9 5 Iran Middle East and North Africa (exclude Israel) # Pubs # Pubs Eurasia Memish, Z A (Saudi Arabia), Brucellosis Al-Eissa, Y A (Saudi Arabia), Q fever 27 Ozen, S (Turkey), Q fever 30 21 Avaeva, S M (Russia), E. Coli 18 7 Majeed, H A (Kuwait), Q fever 15 17 Shokouhi, F (Iran), E. Coli 5 Botros, B A (Egypt), Q fever 13 Farhoudi-Moghaddam, A A (Iran), E. Coli, Salmonella 5 Araj, G F (Kuwait), Brucellosis 13 Baykov, A A (Russia), E. Coli Shchelkunov, S N (Russia), Small pox Skulachev, V P (Russia), E. Coli 16 15 • The tables shows top 5 researchers from countries mentioned in the state sponsors of terrorism by the government’s country reports on terrorism, such as Iran, Cuba, Sudan, Libya, North Korea, and Syria. • The table also shows the top 5 researchers in Middle East and North Africa (excluding Israel), and Eurasia. • Israel is excluded because the top 5 Israel researchers had more publications than other researchers In Middle East region. 7/12/2016 23 V. Experimental Study Human Agents/Diseases Research: Collaboration Status • Different collaboration networks for authors are generated based on different agents/diseases and regions. • The node in the network represents an individual researcher. The bigger the node, the more publications the researcher has published. • The link between two researchers means that these two researchers have published one ore more scientific articles together. The thicker the link, the more articles these two authors have published together. • We included only researchers who published more than five articles. 7/12/2016 24 V. Experimental Study Human Agents/Diseases Research: Collaboration Status (by Disease) Collaboration Status on Anthrax • The figure shows the collaboration status of researchers on Anthrax. • • The second largest group is from France. The United States Italy India United Kingdom 7/12/2016 Israel The largest group in the center consists of researchers from the United States. • The smaller groups are from India, Israel, Italy, and United Kingdom. France 25 V. Experimental Study Human Agents/Diseases Research: Collaboration Status (by Region) Collaboration Status in State Sponsors of Terrorism Botulism • The figure shows the coauthorship network for state sponsors of terrorism on CDC’s category A agents. • There are only 6 groups, all from Iran. • Two groups largest groups focusing on botulism agents are: • Pasteur Institute of Iran (top) • Tehran University of Medical Sciences (bottom) 7/12/2016 Botulism Toxin 26 V. Experimental Study Human Agents/Diseases Research: Emerging Topics • Content map analysis was used to identify the emerging topics and trends. • The nodes in the folder tree and colored regions are topics extracted from research papers. – The topics are organized by the multi-level self-organizing map algorithm. – The conceptually closer technology topics (according to co-occurrence patterns) are positioned closer geographically. – Numbers of papers belong to the topics are presented after the topic labels. – The sizes of the topic regions also correspond to the number of documents assigned to the topics. • Region color indicates the growth rate of the associated topic: the warmer the color, the higher the growth rate. The growth rate is defined as the number of articles published in the previous time period / the number of articles published in the following time period for a particular topic (region). 7/12/2016 27 V. Experimental Study Human Agents/Diseases Research: Emerging Topics Emerging Topics in Human Agents/Diseases Research (2001-2005) 7/12/2016 • During 2001-2005, dominating topics are: “Yersinia pestis”, “Centers of Disease Control”, “Protective antigens”, “Francisella tularensis”, and “Botulinum neurotoxin”. • The new important topics are: “Biological weapons”, “Anthracis spores”, and “Smallpox vaccination”. • We can see the shift of research interest towards the use of Anthrax spores and biological weapons after 2001. 28 V. Experimental Study Animal Agents/Diseases Research: Productivity Status (Country Level) Top 10 countries Rank 1 2 3 4 5 6 7 8 9 10 7/12/2016 Country United States Japan United Kingdom France Italy Germany India Spain Canada Taiwan # Publications 39,901 10,392 9,369 6,115 5,970 5,269 4,308 3,695 3,568 3,146 • The United States had the most publications in animal agents/diseases research, followed by Japan and United Kingdom. 29 V. Experimental Study Animal Agents/Diseases Research: Productivity Status (Institution Level) Top 10 institutions Rank 1 2 3 4 5 6 7 8 9 10 7/12/2016 Institution Centers for Disease Control and Prevention-Atlanta National Taiwan University Institute Pasteur-Paris University of California-San Francisco University of California-Davis University of Pittsburgh Mayo Clinic-Rochester University of Southern California Mahidol University (Thailand) U.S. Department of AgricultureAgricultural Research Service # Publications 1300, 685 638 602 551 529 521 500 • CDC, Atlanta had the most publications, almost twice higher than National Taiwan University and Institute Pasteur, Paris in the second and third rank. 479 458 30 V. Experimental Study Animal Agents/Diseases Research: Productivity Status (Researcher Level) Top 10 researchers Rank Researchers 1 Chen, D S National Taiwan University 209 2 3 Williams, R Raoult, D 203 198 4 5 6 7 8 9 10 Lee, S D Liaw, Y F Hayashi, N Okamoto, H Carreno, V Prusiner, S B Purcell, R H King's College Hospital, London, UK WHO Collaborative Center for Rickettsial Reference and Research, France Veterans General Hospital-Taipei, Taiwan Chang Gung Memorial Hospital, Taiwan Osaka University, Japan Jichi Medical School, Japan Viral Hepatitis Research Foundation, Madrid, Spain University of California, San Francisco National Institute of Allergy and Infectious Diseases, Maryland 7/12/2016 Institution # Publications 163 159 151 142 139 137 130 31 V. Experimental Study Animal Agents/Diseases Research: Collaboration Status Collaboration Status on West Nile Virus • The diagram shows the co-authorship network for West Nile virus publications. • We only included researchers who published more than 5 papers. • The largest group is from the United States. 7/12/2016 32 V. Experimental Study Animal Agents/Diseases Research: Emerging Topics Research Emerging Topics Related to Foot-and-Mouth Disease 2001-2005 • During 2001-2005, dominating topics are: “Fibromuscular dysplasia”, “Foot-and-mouth disease virus”, “Outbreak of Foot-andmouth disease”, “Reactive Hyperemia”, and “United Kingdom”. • These dominating topics are also the emerging ones. • These topics are due to the outbreak of FMD in the UK in 2001. 7/12/2016 33 VI. Conclusion and Future Directions Conclusion and Future Directions • Monitoring worldwide bioterrorism research is becoming more and more important and urgent. • In this study, we built an integrated approach to mapping worldwide bioterrorism literature and capabilities. We analyzed the productivity status, collaboration status, and emerging topics by using knowledge mapping techniques. • In future, we plan to monitor and analyze more bioterrorism agents/diseases together with more literature sources. We also plan to develop and incorporate more advanced analysis and visualization techniques into our approach. 7/12/2016 34 VII. References References • Bruijn, B. d. and J. Martin (2002). "Literature Mining in Molecular Biology." Proceedings of the EFMI Workshop on Natural Language Processing in Biomedical Applications. Nicosia, Cyprus. March 8-9.: 1-5. • Burt, R. S. (1976). "Positions in Networks." Social Forces 55(1): 93-122. • Chen, H. and M. Roco (2008). Mapping Nanotechnology Innovations and Knowledge: Global, Longitudinal Patent and Literature Analysis. • Chinchor, N. (1998). "MUC-7 test scores introduction." In Proceedings of the Seventh Message Understanding Conference. • Cohen, A. M. and W. R. Hersh (2005). "A Survey of Current Work in Biomedical Text Mining." BRIEFINGS IN BIOINFORMATICS 6(1): 57–71. • Hu, X., X. Zhang, et al. (2006). Text Mining the Biomedical Literature for Identification of Potential Virus / Bacterium as Bioterrorism Weapons, Springer. 7/12/2016 35 VII. References References • Huang, L., W. Chen, et al. (2003). "Literature Resource Portal Based on Virtual and Dynamic Hierarchical Architecture." Proceedings of the Fifth International Conference on Computational Intelligence and Multimedia Applications (ICCIMA'03) 17(3): 329-347. • Kennedy, L. W. and C. M. Lum (2003). Developing a Foundation for Policy Relevant Terrorism Research in Criminology, New Brunswick, Rutgers University. • Kohane, I. S. (2002). "The Contributions of Biomedical Informatics to the Fight Against Bioterrorism." Biomedical Informatics and Bioterrorism 9: 116-119. • Lorrain, F. and H. C. White (1971). "Structural equivalence of individuals in social networks." Journal of Mathematical Sociology 1: 49-80. • Reid, E. O. F. (1997). "Evolution of a body of knowledge: an analysis of terrorism research." Information Processing and Management 33(1): 91-106. • Salton, G. (1989). Automatic Text Processing: the Transformation, Analysis, and Retrieval of Information by Computer, Addison-Wesley Longman Publishing Co., Inc. 7/12/2016 36 VII. References References • Shneiderman, B. (1996). The Eyes Have It: A Task by Data Type Taxonomy for Information Visualizations. In Proceedings of the IEEE Symposium on Visual Languages, Washington, IEEE Computer Society Press. • Tolle, K. M. and H. Chen (2000). "Comparing Noun Phrasing Techniques for Use with Medical Digital Library Tools." Journal of the American Society for Information Science 51(4): 352-370. • Voutilainen, A. (1997). "A Short Introduction to NPtool." In http://www2.lingsoft.fi/doc/nptool/intro/. • Wasserman, S. and K. Faust (1994). Social Networks Analysis: Methods and Applications, Cambridge: Cambridge University Press. • Xu, J. J. and H. Chen (2005b). "Criminal Network Analysis and Visualization." Communications of the ACM 48(6): 101-107. 7/12/2016 37