An Integrated Approach to Mapping Worldwide Bioterrorism Research Capabilities MIS580

advertisement
An Integrated Approach to
Mapping Worldwide Bioterrorism
Research Capabilities
MIS580
Spring 2009
Artificial Intelligence Lab
University of Arizona
7/12/2016
1
Outline
I.
II.
III.
IV.
V.
VI.
VII.
7/12/2016
Introduction
Literature Review
Research Testbed
Research Design
Experimental Study
Conclusion and Directions
References
2
I. Introduction
Bioterrorism Research
• Since the Anthrax attacks after 9/11, bioterrorism has been given a
high priority in national security.
• Biomedical research that is essential to creating the medicines,
vaccines, and technologies to counter the threat of bioterrorism and
naturally occurring disease, might also be applied towards biological
weapons development.
• The U.S. Government has attempted to monitor and control
biomedical research labs, especially those that study bioterrorism
agents/diseases.
• However, monitoring worldwide biomedical research and
researchers is still an issue.
7/12/2016
3
I. Introduction
Research Literature
•
Literature resources are very important to scientific research.
– In general, experimental labs may collect literatures in a special domain. As
the trends of scientific cooperation become popular, the needs of sharing
literatures turn to be prerequisite (Huang et al., 2003).
•
With the rapid development of science and technology, the number of
scientific publications is growing exponentially.
– With the explosive growth of scientific information, there is an overwhelming
amount of journal articles in various research areas (Bruijn and Martin, 2002;
Cohen and Hersh, 2005).
•
Research Literature may be used as a resource to monitor bioterrorism
research.
7/12/2016
4
I. Introduction
Research Objectives
• In this study, we develop an integrated approach to
monitoring and analyzing worldwide bioterrorism
research literature by using knowledge mapping
techniques. Our objectives are to identify:
– researchers who have expertise in bioterrorism agents/diseases
research domain,
– major institutions and countries where these researchers reside,
and
– emerging topics and trends in bioterrorism agents/diseases
research.
7/12/2016
5
II. Literature Review
Bioterrorism Literature Analysis
•
During the past years, biomedical informatics tools have been developed to
protect our populations from bioterrorism attacks (Kohane, 2002).
•
Some previous studies used bibliometrics to examine terrorism research
publications and offer an evolution view of the development of the field
(Kennedy and Lum, 2003; Reid, 1997).
•
Hu et al. (2006) used text mining approach to identify candidate viruses and
bacteria as potential bioterrorism weapons from PubMed.
•
However, few efforts have been made to monitor and analyze worldwide
bioterrorism research and researchers. Few studies have used knowledge
mapping techniques to analyze the research status of the bioterrorism area.
7/12/2016
6
II. Literature Review
Knowledge Mapping
• Three types of analysis are often adopted in knowledge
mapping research: text mining, network analysis, and
information visualization (Chen and Roco, 2008).
– Text mining consists of two significant classes of techniques:
Natural Language Processing (NLP) and content analysis.
– Social network analysis (SNA) has provided the means for
studying the network of productive scholars.
– Information visualization techniques can be used to turn abstract
textual documents into objects that can be displayed.
7/12/2016
7
II. Literature Review
Natural Language Processing (NLP)
• In NLP, automatic indexing (Salton, 1989) is a method commonly
used to represent the content of a document by means of a vector of
keywords or terms (Chen and Roco, 2008).
– Noun-phrasing techniques can capture a richer linguistic representation
of document than the Bag of Words (BOW).
– Examples of noun-phrasing tools include MIT’s Chopper, Nptool
(Voutilainen, 1997), and Arizona Noun Phraser (Tolle and Chen, 2000).
• Information extraction is another computationally effective method to
identify important concepts from text documents (Chen and Roco,
2008).
– The best systems have been shown to achieve more than 90% accuracy
in both precision and recall rates when extracting persons, locations,
organizations, dates, times, currencies, and percentages from
newspaper articles (Chinchor, 1998).
7/12/2016
8
II. Literature Review
Content Analysis
• By using content analysis, articles that are collected and grouped
based on authors, institutions, topic areas, countries, or regions can
be analyzed to identify the underlying themes, patterns, or trends
(Chen and Roco, 2008).
• Popular content analysis techniques include:
–
–
–
–
–
–
Clustering Algorithms,
Self-Organizing Map (SOM),
Multidimensional Scaling (MDS),
Principal Component Analysis (PCA),
Co-word Analysis, and
PathFinder Network.
7/12/2016
9
II. Literature Review
Social Network Analysis (SNA)
• SNA is capable of detecting subgroups (of scholars), discovering their
pattern of interactions, identifying central individuals, and uncovering
network organization and structure (Chen and Roco, 2008).
– Burt (Burt, 1976) applied hierarchical clustering methods based on
structural equivalence measure (Lorrain and White, 1971) to detect
subgroups in a social network.
– Blockmodel analysis approach can be used to discover patterns of
interactions between subgroups (Wasserman and Faust, 1994; Xu and
Chen, 2005).
– Several measures, such as degree, betweenness, and closeness, are
related centrality, which deals with the roles of individuals in a network
(Wasserman and Faust, 1994).
7/12/2016
10
II. Literature Review
Information Visualization
• The last step in the knowledge mapping process is to make
knowledge transparent through the use of various information
visualization (or mapping) techniques (Chen and Roco, 2008).
• Shneiderman (1996) proposed seven types of information
representation methods including:
–
–
–
–
–
–
–
1D (one-dimensional) representation ,
2D representation ,
3D representation ,
multi-dimensional representation ,
tree representation ,
network representation , and
temporal representation.
7/12/2016
11
III. Research Testbed
Research Testbed
• We built two sets of test data based on human and animal related
bioterrorism agents/diseases respectively.
• For human bioterrorism agents/diseases, we retrieved 178,599
publication records from MEDLINE (1964-2005), by searching article
abstracts and titles using 58 keywords from CDC’s list of agents by
category (http://www.bt.cdc.gov/Agent/agentlist.asp).
• For animal bioterrorism agents/diseases, we retrieved 135,774
publication records from MEDLINE (1965-2005) by searching article
abstracts and titles using 58 keywords from OIE’s list of diseases by
species (http://www.oie.int/eng/maladies/en_classification.htm).
7/12/2016
12
III. Research Testbed
Human Agents/Diseases Research Collection
The dataset characteristics broken down by CDC’s agents category.
Disease
# Publications
Category A
Botulism
Anthrax
Plague
Smallpox
Viral hemorrhagic fever
Tularemia
Category B *
E. Coli
Q Fever
Category C
(Only Nipah virus and
hantavirus)
Overall **
7/12/2016
# Unique Authors
# Unique Countries
8,635
3,780
1,674
1,504
846
678
494
170,460
106,479
34,312
919
23,891
9,988
5,579
4,169
2,623
1,945
1,454
356,162
212,338
115,136
2,974
89
56
54
55
43
35
30
157
124
144
50
178,599
381,684
159
* Only 2 most researched diseases in category B are shown here.
** Some articles mention multiple diseases.
13
III. Research Testbed
Animal Agents/Diseases Research Collection
The dataset characteristics broken down by disease.
Diseases
Q fever
Vesicular stomatitis
Foot and mouth disease
Rabies
Brucellosis
Anthrax
Paratuberculosis
Japanese Encephalitis
West Nile Virus
Avian Influenza
Overall *
*
7/12/2016
# Publications
33,999
2,374
2,338
2,209
1,955
1,240
997
988
944
717
135,774
# Unique Authors
# Unique Countries
114,600
7,281
7,159
5,509
5,585
4,236
2,616
2,870
2,086
3,446
320,630
144
41
63
81
77
50
37
39
35
41
165
Only top 10 diseases are shown. There are about 80 diseases in the OIE’s list.
14
IV. Research Design
Research Design
Data Acquisition
MEDLINE
Abstracts
Data Filtering
Data Parsing and Cleaning
Data Parsing
Facts
Consolidating
Data Analysis
Productivity:
Bibliographic
Analysis
Collaboration:
Co-authorship
Analysis
Emerging
Topics: Content
Map Analysis
Data acquisition involves gathering the
bioterrorism agents/diseases related research
literature from MEDLINE database.
In data parsing and cleaning:
•the title, abstract, and authors’ information of
each paper are parsed into a local database;
•and some variations of foreign institution names
need to be consolidated.
Date analysis is performed as follows:
•bibliographic analysis is used to analyze the
productivity status;
•co-authorship analysis is used to analyze the
collaboration status;
•and content map analysis is used to analyze
the emerging topics.
Details on these steps are discussed later.
7/12/2016
15
IV. Research Design
Data Acquisition
• Research articles are retrieved from the MEDLINE
database.
– Compiled by the U.S. National Library of Medicine (NLM) and
published on the Web by Community of Science, MEDLINE® is
the world's most comprehensive source of life sciences and
biomedical bibliographic information. It contains nearly eleven
million records from over 7,300 different publications from 1965
to November 16, 2005 (http://medline.cos.com/).
• All the related articles are collected by using keyword
filtering.
7/12/2016
16
IV. Research Design
Data Parsing and Cleaning
• Data Parsing
– The title, abstract, and authors’ information of each article are
parsed and stored in a relational database.
– The institutions and countries of the authors are parsed out by
using dictionaries of countries, states, cities, and institutions.
– All the author names of an article are parsed out, but only the
first author’s institution is kept for later analysis.
• Facts Consolidating
– Some variations of foreign institution names, and city names
were spot checked and fixed manually.
7/12/2016
17
IV. Research Design
Data Analysis
• Productivity Status
– We use bibliographic analysis to study the productivity of authors,
institutions, and countries.
– We also assess the trends and evolution of bioterrorism
agents/diseases research activities .
• Collaboration Status
– We use co-authorship analysis to study the collaborations between
researchers.
– We also detect the independent or isolated research groups in the field.
• Research Trend Topics
– We use SOMs to study the active research topics, and discover the
emerging research topics in different time spans.
7/12/2016
18
V. Experimental Study
Experimental Study
• Human Agents/Diseases Research
– Productivity Status
– Collaboration Status
– Emerging Topics
• Animal Agents/Diseases Research
– Productivity Status
– Collaboration Status
– Emerging Topics
7/12/2016
19
V. Experimental Study
Human Agents/Diseases Research: Productivity Status (Country Level)
Top 10 countries
Rank
1
2
3
4
5
6
7
8
9
10
7/12/2016
Country
United States
Japan
United Kingdom
Germany
France
Canada
Italy
Sweden
Spain
India
# Publications
65,810
16,023
12,091
10,598
8,732
6,367
4,193
3,933
3,847
3,589
• The United States had the
most publications in human
agents/diseases research,
followed by Japan and United
Kingdom.
20
V. Experimental Study
Human Agents/Diseases Research: Productivity Status (Institution Level)
Top 10 institutions
Rank
1
2
3
4
5
6
7
8
9
10
7/12/2016
Institution
Harvard University
University of WisconsinMadison
Institute Pasteur-Paris
University of Tokyo
Centers for Disease Control
and Prevention-Atlanta
Stanford University
University of MarylandBaltimore
Osaka University
Yale University
University of California-Davis
# Publications
1,389
1,131
1,125
883
849
815
813
798
785
782
• Harvard university had the
most publications, followed
by University of WisconsinMadison and Institute
Pasteur-Paris.
• Harvard University included
Harvard Medical School,
the John F. Kennedy
School of Government, and
all the other departments.
21
V. Experimental Study
Human Agents/Diseases Research: Productivity Status (Researcher Level)
Top 10 researchers
Rank
Researchers
1
Raoult, D
2
3
4
5
6
7
8
9
10
Inouye, M
Yamamoto, K
Rowe, B
Peters, C J
Levine, M M
Dougan, G
Ito, K
Kaback, H R
Watanabe, K
Institution
# Publications
WHO Collaborative Center for Rickettsial Reference and
Research, France
Robert Wood Johnson Medical School, New Jersey
Tohoku University, Japan
Central Public Health Laboratory, London, UK
University of Texas Medical Branch-Galveston
University of Maryland-Baltimore
Imperial College London, UK
Kyoto University, Japan
Howard Hughes Medical Institute, UCLA
University of Tokyo, Japan
220
163
159
148
145
143
140
140
136
134
• Researchers with the most publications usually performed research related to CDC’s
category B agents such as Q-fever and E.Coli.
7/12/2016
22
V. Experimental Study
Human Agents/Diseases Research: Productivity Status (Researcher Level)
Top 5 researchers in selected regions
State sponsors of terrorism
Cuba
Guzman, M G (Cuba),
Q fever
Campos, J (Cuba), E.
Coli
Fando, R (Cuba),
Cholera
Kouri, G (Cuba), Q
fever
Silva, A (Cuba), E.
Coli
#
Pubs
18
Jafari, A (Iran), E. Coli
#
Pubs
8
14
Katouli, M (Iran), E. Coli
7
9
Bouzari, S (Iran), E. Coli
9
5
Iran
Middle East and North
Africa (exclude Israel)
#
Pubs
#
Pubs
Eurasia
Memish, Z A (Saudi Arabia),
Brucellosis
Al-Eissa, Y A (Saudi Arabia),
Q fever
27
Ozen, S (Turkey), Q fever
30
21
Avaeva, S M (Russia), E.
Coli
18
7
Majeed, H A (Kuwait), Q fever
15
17
Shokouhi, F (Iran), E. Coli
5
Botros, B A (Egypt), Q fever
13
Farhoudi-Moghaddam, A
A (Iran), E. Coli,
Salmonella
5
Araj, G F (Kuwait), Brucellosis
13
Baykov, A A (Russia), E.
Coli
Shchelkunov, S N (Russia),
Small pox
Skulachev, V P (Russia), E.
Coli
16
15
• The tables shows top 5 researchers from countries mentioned in the state sponsors of terrorism by the
government’s country reports on terrorism, such as Iran, Cuba, Sudan, Libya, North Korea, and Syria.
• The table also shows the top 5 researchers in Middle East and North Africa (excluding Israel), and Eurasia.
• Israel is excluded because the top 5 Israel researchers had more publications than other researchers
In Middle East region.
7/12/2016
23
V. Experimental Study
Human Agents/Diseases Research: Collaboration Status
• Different collaboration networks for authors are generated based on
different agents/diseases and regions.
• The node in the network represents an individual researcher. The
bigger the node, the more publications the researcher has
published.
• The link between two researchers means that these two researchers
have published one ore more scientific articles together. The thicker
the link, the more articles these two authors have published
together.
• We included only researchers who published more than five articles.
7/12/2016
24
V. Experimental Study
Human Agents/Diseases Research: Collaboration Status (by Disease)
Collaboration Status on Anthrax
• The figure shows the
collaboration status of
researchers on Anthrax.
•
• The second largest group is
from France.
The United States
Italy
India
United
Kingdom
7/12/2016
Israel
The largest group in the
center consists of researchers
from the United States.
• The smaller groups are from
India, Israel, Italy, and United
Kingdom.
France
25
V. Experimental Study
Human Agents/Diseases Research: Collaboration Status (by Region)
Collaboration Status in State Sponsors of Terrorism
Botulism
• The figure shows the coauthorship network for state
sponsors of terrorism on
CDC’s category A agents.
• There are only 6 groups, all
from Iran.
• Two groups largest groups
focusing on botulism agents
are:
• Pasteur Institute of Iran
(top)
• Tehran University of
Medical Sciences (bottom)
7/12/2016
Botulism Toxin
26
V. Experimental Study
Human Agents/Diseases Research: Emerging Topics
•
Content map analysis was used to identify the emerging topics and trends.
•
The nodes in the folder tree and colored regions are topics extracted from
research papers.
– The topics are organized by the multi-level self-organizing map algorithm.
– The conceptually closer technology topics (according to co-occurrence patterns)
are positioned closer geographically.
– Numbers of papers belong to the topics are presented after the topic labels.
– The sizes of the topic regions also correspond to the number of documents
assigned to the topics.
•
Region color indicates the growth rate of the associated topic: the warmer
the color, the higher the growth rate. The growth rate is defined as the
number of articles published in the previous time period / the number of
articles published in the following time period for a particular topic (region).
7/12/2016
27
V. Experimental Study
Human Agents/Diseases Research: Emerging Topics
Emerging Topics in Human Agents/Diseases Research (2001-2005)
7/12/2016
•
During 2001-2005,
dominating topics are:
“Yersinia pestis”, “Centers
of Disease Control”,
“Protective antigens”,
“Francisella tularensis”,
and “Botulinum
neurotoxin”.
•
The new important topics
are: “Biological weapons”,
“Anthracis spores”, and
“Smallpox vaccination”.
•
We can see the shift of
research interest towards
the use of Anthrax spores
and biological weapons
after 2001.
28
V. Experimental Study
Animal Agents/Diseases Research: Productivity Status (Country Level)
Top 10 countries
Rank
1
2
3
4
5
6
7
8
9
10
7/12/2016
Country
United States
Japan
United Kingdom
France
Italy
Germany
India
Spain
Canada
Taiwan
# Publications
39,901
10,392
9,369
6,115
5,970
5,269
4,308
3,695
3,568
3,146
• The United States had the
most publications in animal
agents/diseases research,
followed by Japan and United
Kingdom.
29
V. Experimental Study
Animal Agents/Diseases Research: Productivity Status (Institution Level)
Top 10 institutions
Rank
1
2
3
4
5
6
7
8
9
10
7/12/2016
Institution
Centers for Disease Control
and Prevention-Atlanta
National Taiwan University
Institute Pasteur-Paris
University of California-San
Francisco
University of California-Davis
University of Pittsburgh
Mayo Clinic-Rochester
University of Southern
California
Mahidol University (Thailand)
U.S. Department of AgricultureAgricultural Research Service
# Publications
1300,
685
638
602
551
529
521
500
• CDC, Atlanta had the
most publications,
almost twice higher
than National Taiwan
University and Institute
Pasteur, Paris in the
second and third rank.
479
458
30
V. Experimental Study
Animal Agents/Diseases Research: Productivity Status (Researcher Level)
Top 10 researchers
Rank
Researchers
1
Chen, D S
National Taiwan University
209
2
3
Williams, R
Raoult, D
203
198
4
5
6
7
8
9
10
Lee, S D
Liaw, Y F
Hayashi, N
Okamoto, H
Carreno, V
Prusiner, S B
Purcell, R H
King's College Hospital, London, UK
WHO Collaborative Center for Rickettsial Reference and
Research, France
Veterans General Hospital-Taipei, Taiwan
Chang Gung Memorial Hospital, Taiwan
Osaka University, Japan
Jichi Medical School, Japan
Viral Hepatitis Research Foundation, Madrid, Spain
University of California, San Francisco
National Institute of Allergy and Infectious Diseases,
Maryland
7/12/2016
Institution
# Publications
163
159
151
142
139
137
130
31
V. Experimental Study
Animal Agents/Diseases Research: Collaboration Status
Collaboration Status on West Nile Virus
• The diagram shows the
co-authorship network for
West Nile virus
publications.
• We only included
researchers who published
more than 5 papers.
• The largest group is from
the United States.
7/12/2016
32
V. Experimental Study
Animal Agents/Diseases Research: Emerging Topics
Research Emerging Topics Related to Foot-and-Mouth Disease 2001-2005
• During 2001-2005,
dominating topics are:
“Fibromuscular dysplasia”,
“Foot-and-mouth disease
virus”, “Outbreak of Foot-andmouth disease”, “Reactive
Hyperemia”, and “United
Kingdom”.
• These dominating topics are
also the emerging ones.
• These topics are due to the
outbreak of FMD in the UK in
2001.
7/12/2016
33
VI. Conclusion and Future Directions
Conclusion and Future Directions
• Monitoring worldwide bioterrorism research is becoming
more and more important and urgent.
• In this study, we built an integrated approach to mapping
worldwide bioterrorism literature and capabilities. We
analyzed the productivity status, collaboration status, and
emerging topics by using knowledge mapping techniques.
• In future, we plan to monitor and analyze more
bioterrorism agents/diseases together with more literature
sources. We also plan to develop and incorporate more
advanced analysis and visualization techniques into our
approach.
7/12/2016
34
VII. References
References
•
Bruijn, B. d. and J. Martin (2002). "Literature Mining in Molecular Biology."
Proceedings of the EFMI Workshop on Natural Language Processing in Biomedical
Applications. Nicosia, Cyprus. March 8-9.: 1-5.
•
Burt, R. S. (1976). "Positions in Networks." Social Forces 55(1): 93-122.
•
Chen, H. and M. Roco (2008). Mapping Nanotechnology Innovations and Knowledge:
Global, Longitudinal Patent and Literature Analysis.
•
Chinchor, N. (1998). "MUC-7 test scores introduction." In Proceedings of the Seventh
Message Understanding Conference.
•
Cohen, A. M. and W. R. Hersh (2005). "A Survey of Current Work in Biomedical Text
Mining." BRIEFINGS IN BIOINFORMATICS 6(1): 57–71.
•
Hu, X., X. Zhang, et al. (2006). Text Mining the Biomedical Literature for Identification
of Potential Virus / Bacterium as Bioterrorism Weapons, Springer.
7/12/2016
35
VII. References
References
•
Huang, L., W. Chen, et al. (2003). "Literature Resource Portal Based on Virtual and Dynamic
Hierarchical Architecture." Proceedings of the Fifth International Conference on Computational
Intelligence and Multimedia Applications (ICCIMA'03) 17(3): 329-347.
•
Kennedy, L. W. and C. M. Lum (2003). Developing a Foundation for Policy Relevant Terrorism
Research in Criminology, New Brunswick, Rutgers University.
•
Kohane, I. S. (2002). "The Contributions of Biomedical Informatics to the Fight Against
Bioterrorism." Biomedical Informatics and Bioterrorism 9: 116-119.
•
Lorrain, F. and H. C. White (1971). "Structural equivalence of individuals in social networks."
Journal of Mathematical Sociology 1: 49-80.
•
Reid, E. O. F. (1997). "Evolution of a body of knowledge: an analysis of terrorism research."
Information Processing and Management 33(1): 91-106.
•
Salton, G. (1989). Automatic Text Processing: the Transformation, Analysis, and Retrieval of
Information by Computer, Addison-Wesley Longman Publishing Co., Inc.
7/12/2016
36
VII. References
References
•
Shneiderman, B. (1996). The Eyes Have It: A Task by Data Type Taxonomy for
Information Visualizations. In Proceedings of the IEEE Symposium on Visual
Languages, Washington, IEEE Computer Society Press.
•
Tolle, K. M. and H. Chen (2000). "Comparing Noun Phrasing Techniques for Use with
Medical Digital Library Tools." Journal of the American Society for Information
Science 51(4): 352-370.
•
Voutilainen, A. (1997). "A Short Introduction to NPtool." In
http://www2.lingsoft.fi/doc/nptool/intro/.
•
Wasserman, S. and K. Faust (1994). Social Networks Analysis: Methods and
Applications, Cambridge: Cambridge University Press.
•
Xu, J. J. and H. Chen (2005b). "Criminal Network Analysis and Visualization."
Communications of the ACM 48(6): 101-107.
7/12/2016
37
Download