Mathematical Underpinnings for Science-Based Cybersecurity The Current Landscape and Need for Fundamental Research Cybersecurity, as currently practiced, is a mixed bag of electronic patches and reactionary physical and administrative controls aimed at fixing the crisis of the day. We rely heavily on commercial solutions, despite the critical importance of our cyber resources and infrastructure to the mission space of the Department of Energy (DOE) and more generally to our nation’s security. As the cyber threat continues to grow, it becomes increasingly clear that the DOE must embark on a scientific process of inquiry, investigation, and sound decision-making. Rather than waiting to discover a cyber attack (perhaps days, weeks, or months after it has happened), we need to implement a sciencebased approach to cybersecurity with a rigorous technical foundation. Here, we propose mathematical research that will pave the way for the interdisciplinary advances needed to thwart the growing cyber threat and transform the DOE approach for protecting electronic resources. The Approach: Mathematical Underpinnings for Cybersecurity A rigorous mathematical foundation is critical to the success of an interdisciplinary science-based research program to address fundamental cybersecurity issues. Here we identify five major topic areas that will establish a foundation for defining and modeling problems in cyberspace and cybersecurity. These areas will enable us to develop techniques for monitoring, probing, and even controlling cyber activity. They will also motivate useful tools for risk assessments to aid decision makers in setting cyber security priorities. The proposed mathematics will not only aid in reducing today’s vulnerabilities, but will also provide guidance and modeling capabilities that are essential for the development of a more secure Internet in the future. 1. Knowledge discovery and information science for massive real-time data. Human made computer networks generate massive amounts of data. Knowledge discovery tools and integrated information science approaches are needed to interrogate data and extract information in a rigorous process of scientific inquiry. The current state-of-the-art for threat detection relies on rule-based flagging or blocking of traffic based on signatures that can be computed in real-time. These methods are not adaptive and so must be manually updated every time a new threat is encountered. Machine learning and data mining techniques are needed that can operate in real-time to distinguish between harmless anomalies and malicious attacks, and automatically adapt to changing environments over time. There is reason to believe that these methods could be effective, because they have proved successful in academic settings on small data sets. The challenge is in handling massive amounts of data with the highly heteroscedastic, nonstationary behaviors found in the cyber world. The inherent stochastic nature of cyber activity calls for innovative data-driven statistical approaches for exploration, characterization, and analysis. For example, advances in nonlinear time-series analysis, both discrete and continuous and in both time and frequency domains, will be critical security enablers. Network tomography techniques can be developed for active statistical probing of network traffic patterns. Additionally, for attacks that are not nullified, knowledge discovery can be used for forensic analysis and to derive new rules to deter future attacks. Because automated analysis may be too costly or too complex to handle all aspects of discovery, participation of human experts and development of data analysis, data summarization and visualization techniques will be needed. 2. Graph analysis of the Internet and its inherent structure. Cyberspace can be represented mathematically using a graph theoretic approach because the basic structure of a graph is well-suited to the interconnected world of computer networks. Nodes can be associated with various types of hardware or virtual systems as well as routers and other Internet infrastructure, and edges can represent connections or information flow between nodes. A key challenge in modeling the Internet is to understand its inherent structure. We posit that metrology for the Internet is in its infancy; the distribution of the edges and concepts such as scale-free are only the tip of the iceberg in terms of accurately characterizing and measuring the structure of the Internet. In order to simulate the spread of malware or the effects of an infrastructure attack, we require advances to the state of the art in graph theory, graph theoretic analysis and large scale simulation. The fact that the Internet is a network can be exploited in its defense by developing knowledge discovery techniques on graphs that use the patterns of data flow between nodes for characterization. This necessitates graph algorithms to understand the vulnerability of the network to infrastructure attacks, matrix and tensor methods to rank the criticality and/or influence of individual nodes or groups of nodes, and collective inference across nodes for anomaly detection. 3. Understanding the Internet as a dynamic, complex network. The dynamic nature of the Internet means that we need to understand its time-varying structure. Extensions to graph theory for providing meaningful theoretical statements about large, time-varying graphs will be an essential component of any mathematical framework for cyber-security. Moreover, since we can never produce a 100% secure general system/network, we need methods to mitigate the spread of damage. Understanding the Internet’s mathematical structure will allow us to develop theory for the analysis, detection, obliteration, and prevention of malicious attacks and inadvertent lapses. The Internet is, in fact, a complex system that adapts according to unknown rules. Controlling the network requires advances in robust optimization and game theory, because cyber-security implies the existence of at least two intelligent agents interacting in an attempt to maximize an objective function. 4. Statistical assessments of cyber traffic and risk analyses. Key to making sciencebased decisions is the use of statistical approaches for evaluating risk. Although the malicious attacks are statistically rare, the consequences are extreme and can be compared to a major earthquake. Therefore, evaluating risk requires advances in reliability theory, extreme value theory for highly unusual events, Dempster-Schaefer theory (uncertainty modeled as an interval), non-normal distributional analysis, and riskbased pricing. Current risk analysis is based on reliability theory and limited work in multiple dependent distributions using Gaussian Copula formulations. The theory of risk and its related disciplines of robust and stochastic optimization must be enhanced if we are to have sufficient methods for decision analysis and risk assessment in future cybersystems that will balance the risks of potential threats with the impact and costs of cyber responses. In general, we need new work on distribution characterization, analysis, and comparison for the data sets associated with cyber-security. 5. Structural study of accessibility/vulnerability space. There is a fundamental tension between the competing desires for security and accessibility. A completely inaccessible system is naturally secure; unfortunately, it is also unusable. As accessibility increases, so too does vulnerability, requiring implementation of additional security measures. This leads us to the idea of an accessibility/vulnerability space. The structure of this space may be expressed axiomatically from first principles, using mathematical tools such as those from lattice theory (in which case the bottom is a completely closed system, and the top is a completely open one). Note that this is distinct from prior taxonomic work; the idea is to build a common axiomatic framework into which existing taxonomies could be cast, helping us to understand structural commonalities and to more easily find and mitigate vulnerabilities that would otherwise be overlooked. In summary, establishment of a mathematical framework for science-based cybersecurity will provide a common technical basis for a research program that will anticipate and prevent future attacks, strengthen defenses and countermeasures, and build up intellectual capital. A research program built on a common mathematical framework will facilitate cooperative and rapid response within the DOE community, as well as the broader populous of the internet, replacing isolated, valiant efforts with an integrated, measured response. Defining Success of a Mathematical Framework A successful mathematical framework will provide the tools needed to characterize and understand distributions of flows and events. Using this knowledge, the cybersecurity community can begin to develop electronic armor to protect entry points, put in place virtual security guards at transfer points, establish capability to detect anomalies and intrusions, and produce weapons to defend and counteract malicious attacks. Technical advances will be shared and disseminated in the open scientific literature to the extent possible without compromising security. Moreover, our ever-deepening mathematical understand of graphs and networks will inform the design of the future of the Internet and build a critical mass of technical expertise. The mathematically rigorous structure and tools for analysis, discovery, and renovation of our information resources will lead to a science-informed decision process and an intellectual pool at the ready to face the unknown challenges of the future. Jumpstarting the Mathematical Framework Sustained collaborative effort between cyber experts and scientists from other disciplines must drive the research agenda. Development of a common mathematical structure for representing cyber resources and activities will be an important first step toward development of a coordinated cyber strategy. A preliminary list of cyber R&D topics provides a starting point for identifying areas of mathematics and other disciplines appropriate for tackling cyber challenges in parallel with the development of the mathematical structure. Meeting the mathematical challenges of cybersecurity will require the talents of the open research community working in collaboration with DOE cybersecurity experts and research scientists. Identification of key technical personnel within DOE and the open science community will be needed to establish a detailed research agenda. Because of the direct impact on DOE computing resources and the range of information of varying sensitivities, success of this effort requires the continual involvement of DOE personnel, not just a handoff to academic theoreticians. Opportunities for dissemination of research results via workshops, conferences, traditional publications or online journals, will be an important consideration in engaging the open science community. Involvement of postdocs and students in this effort will help build the pipeline of trained cyber professionals for the future. Additional partnerships with commercial hardware and software vendors may be necessary to fully address the cyber threat. Contributors Joanne Wendelberger, Los Alamos National Laboratory Edward Talbot, Sandia National Laboratory Christopher Griffin, Oak Ridge National Laboratory Louis Wilder, Oak Ridge National Laboratory Yu Jiao, Oak Ridge National Laboratory Tamara Kolda, Sandia National Laboratories Chad Scherrer, Pacific Northwest National Laboratory