A Mathematical Basis for Science

advertisement
Mathematical Underpinnings for Science-Based Cybersecurity
The Current Landscape and Need for Fundamental Research
Cybersecurity, as currently practiced, is a mixed bag of electronic patches and reactionary
physical and administrative controls aimed at fixing the crisis of the day. We rely heavily
on commercial solutions, despite the critical importance of our cyber resources and
infrastructure to the mission space of the Department of Energy (DOE) and more
generally to our nation’s security. As the cyber threat continues to grow, it becomes
increasingly clear that the DOE must embark on a scientific process of inquiry,
investigation, and sound decision-making. Rather than waiting to discover a cyber attack
(perhaps days, weeks, or months after it has happened), we need to implement a sciencebased approach to cybersecurity with a rigorous technical foundation. Here, we propose
mathematical research that will pave the way for the interdisciplinary advances needed to
thwart the growing cyber threat and transform the DOE approach for protecting
electronic resources.
The Approach: Mathematical Underpinnings for Cybersecurity
A rigorous mathematical foundation is critical to the success of an interdisciplinary
science-based research program to address fundamental cybersecurity issues. Here we
identify five major topic areas that will establish a foundation for defining and modeling
problems in cyberspace and cybersecurity. These areas will enable us to develop
techniques for monitoring, probing, and even controlling cyber activity. They will also
motivate useful tools for risk assessments to aid decision makers in setting cyber security
priorities. The proposed mathematics will not only aid in reducing today’s vulnerabilities,
but will also provide guidance and modeling capabilities that are essential for the
development of a more secure Internet in the future.
1. Knowledge discovery and information science for massive real-time data. Human
made computer networks generate massive amounts of data. Knowledge discovery tools
and integrated information science approaches are needed to interrogate data and extract
information in a rigorous process of scientific inquiry. The current state-of-the-art for
threat detection relies on rule-based flagging or blocking of traffic based on signatures
that can be computed in real-time. These methods are not adaptive and so must be
manually updated every time a new threat is encountered. Machine learning and data
mining techniques are needed that can operate in real-time to distinguish between
harmless anomalies and malicious attacks, and automatically adapt to changing
environments over time. There is reason to believe that these methods could be effective,
because they have proved successful in academic settings on small data sets. The
challenge is in handling massive amounts of data with the highly heteroscedastic, nonstationary behaviors found in the cyber world. The inherent stochastic nature of cyber
activity calls for innovative data-driven statistical approaches for exploration,
characterization, and analysis. For example, advances in nonlinear time-series analysis,
both discrete and continuous and in both time and frequency domains, will be critical
security enablers. Network tomography techniques can be developed for active statistical
probing of network traffic patterns. Additionally, for attacks that are not nullified,
knowledge discovery can be used for forensic analysis and to derive new rules to deter
future attacks. Because automated analysis may be too costly or too complex to handle all
aspects of discovery, participation of human experts and development of data analysis,
data summarization and visualization techniques will be needed.
2. Graph analysis of the Internet and its inherent structure. Cyberspace can be
represented mathematically using a graph theoretic approach because the basic structure
of a graph is well-suited to the interconnected world of computer networks. Nodes can
be associated with various types of hardware or virtual systems as well as routers and
other Internet infrastructure, and edges can represent connections or information flow
between nodes. A key challenge in modeling the Internet is to understand its inherent
structure. We posit that metrology for the Internet is in its infancy; the distribution of the
edges and concepts such as scale-free are only the tip of the iceberg in terms of accurately
characterizing and measuring the structure of the Internet. In order to simulate the spread
of malware or the effects of an infrastructure attack, we require advances to the state of
the art in graph theory, graph theoretic analysis and large scale simulation. The fact that
the Internet is a network can be exploited in its defense by developing knowledge
discovery techniques on graphs that use the patterns of data flow between nodes for
characterization. This necessitates graph algorithms to understand the vulnerability of the
network to infrastructure attacks, matrix and tensor methods to rank the criticality and/or
influence of individual nodes or groups of nodes, and collective inference across nodes
for anomaly detection.
3. Understanding the Internet as a dynamic, complex network. The dynamic nature of
the Internet means that we need to understand its time-varying structure. Extensions to
graph theory for providing meaningful theoretical statements about large, time-varying
graphs will be an essential component of any mathematical framework for cyber-security.
Moreover, since we can never produce a 100% secure general system/network, we need
methods to mitigate the spread of damage. Understanding the Internet’s mathematical
structure will allow us to develop theory for the analysis, detection, obliteration, and
prevention of malicious attacks and inadvertent lapses. The Internet is, in fact, a complex
system that adapts according to unknown rules. Controlling the network requires
advances in robust optimization and game theory, because cyber-security implies the
existence of at least two intelligent agents interacting in an attempt to maximize an
objective function.
4. Statistical assessments of cyber traffic and risk analyses. Key to making sciencebased decisions is the use of statistical approaches for evaluating risk. Although the
malicious attacks are statistically rare, the consequences are extreme and can be
compared to a major earthquake. Therefore, evaluating risk requires advances in
reliability theory, extreme value theory for highly unusual events, Dempster-Schaefer
theory (uncertainty modeled as an interval), non-normal distributional analysis, and riskbased pricing. Current risk analysis is based on reliability theory and limited work in
multiple dependent distributions using Gaussian Copula formulations. The theory of risk
and its related disciplines of robust and stochastic optimization must be enhanced if we
are to have sufficient methods for decision analysis and risk assessment in future cybersystems that will balance the risks of potential threats with the impact and costs of cyber
responses. In general, we need new work on distribution characterization, analysis, and
comparison for the data sets associated with cyber-security.
5. Structural study of accessibility/vulnerability space. There is a fundamental tension
between the competing desires for security and accessibility. A completely inaccessible
system is naturally secure; unfortunately, it is also unusable. As accessibility increases, so
too does vulnerability, requiring implementation of additional security measures. This
leads us to the idea of an accessibility/vulnerability space. The structure of this space may
be expressed axiomatically from first principles, using mathematical tools such as those
from lattice theory (in which case the bottom is a completely closed system, and the top
is a completely open one). Note that this is distinct from prior taxonomic work; the idea
is to build a common axiomatic framework into which existing taxonomies could be cast,
helping us to understand structural commonalities and to more easily find and mitigate
vulnerabilities that would otherwise be overlooked.
In summary, establishment of a mathematical framework for science-based cybersecurity
will provide a common technical basis for a research program that will anticipate and
prevent future attacks, strengthen defenses and countermeasures, and build up intellectual
capital. A research program built on a common mathematical framework will facilitate
cooperative and rapid response within the DOE community, as well as the broader
populous of the internet, replacing isolated, valiant efforts with an integrated, measured
response.
Defining Success of a Mathematical Framework
A successful mathematical framework will provide the tools needed to characterize and
understand distributions of flows and events. Using this knowledge, the cybersecurity
community can begin to develop electronic armor to protect entry points, put in place
virtual security guards at transfer points, establish capability to detect anomalies and
intrusions, and produce weapons to defend and counteract malicious attacks. Technical
advances will be shared and disseminated in the open scientific literature to the extent
possible without compromising security. Moreover, our ever-deepening mathematical
understand of graphs and networks will inform the design of the future of the Internet and
build a critical mass of technical expertise. The mathematically rigorous structure and
tools for analysis, discovery, and renovation of our information resources will lead to a
science-informed decision process and an intellectual pool at the ready to face the
unknown challenges of the future.
Jumpstarting the Mathematical Framework
Sustained collaborative effort between cyber experts and scientists from other disciplines
must drive the research agenda. Development of a common mathematical structure for
representing cyber resources and activities will be an important first step toward
development of a coordinated cyber strategy. A preliminary list of cyber R&D topics
provides a starting point for identifying areas of mathematics and other disciplines
appropriate for tackling cyber challenges in parallel with the development of the
mathematical structure.
Meeting the mathematical challenges of cybersecurity will require the talents of the open
research community working in collaboration with DOE cybersecurity experts and
research scientists. Identification of key technical personnel within DOE and the open
science community will be needed to establish a detailed research agenda. Because of
the direct impact on DOE computing resources and the range of information of varying
sensitivities, success of this effort requires the continual involvement of DOE personnel,
not just a handoff to academic theoreticians. Opportunities for dissemination of research
results via workshops, conferences, traditional publications or online journals, will be an
important consideration in engaging the open science community. Involvement of
postdocs and students in this effort will help build the pipeline of trained cyber
professionals for the future. Additional partnerships with commercial hardware and
software vendors may be necessary to fully address the cyber threat.
Contributors
Joanne Wendelberger, Los Alamos National Laboratory
Edward Talbot, Sandia National Laboratory
Christopher Griffin, Oak Ridge National Laboratory
Louis Wilder, Oak Ridge National Laboratory
Yu Jiao, Oak Ridge National Laboratory
Tamara Kolda, Sandia National Laboratories
Chad Scherrer, Pacific Northwest National Laboratory
Download