Secure Systems Engineering This SOAR focuses primarily on current activities, techniques, technologies, standards, and organizations that have as their main objective the production and/or sustainment of secure software. However, software is typically an element or part of a larger system, whether it is a software-intensive system or a system that is composed of both hardware and software elements. The perspective of this section, then, is that of the systems engineer(s) who seeks to understand the security issues associated with the software components of a larger secure system. Within this section, we will summarize typical issues, topics, and techniques used by systems engineers to build secure systems, with a view towards clarifying the relationships between those issues, topics, and techniques and those that pertain specifically to secure software development. According to Anderson— Security engineering is about building systems to remain dependable in the face of malice, error, or mischance. As a discipline, it focuses on the tools, processes, and methods needed to design, implement, and test complete systems, and to adapt existing systems as their environment evolves…. Security engineering requires cross-disciplinary expertise, ranging from cryptography and computer security, to hardware tamper-resistance and formal methods, to applied psychology, organizational and audit methods and the law… Security requirements differ greatly from one system to another. One typically needs some combination of user authentication, transaction integrity and accountability, fault-tolerance, message secrecy, and covertness. But many systems fail because their designers protect the wrong things, or protect the right things but in the wrong way. To create a context for discussing the development of secure software, we describe here the technical processes and technology associated with the development of secure systems, and the relationship between the technical aspects of secure systems engineering and those of secure software engineering as it occurs within systems engineering lifecycle. Background “… there is growing awareness that systems must be designed, built, and operated with the expectation that system elements will have known and unknown vulnerabilities. Systems must continue to meet (possibly degraded) functionality, performance, and security goals despite these vulnerabilities. In addition, vulnerabilities should be addressed through a more comprehensive life cycle approach, delivering system assurance, reducing these vulnerabilities through a set of technologies and processes.” [from NDIA-1 document] As system assurance can be discussed relative to any desirable attribute of a system, this document is specifically addressing the system assurance of the security properties of the entire system in the system lifecycle. Secure systems engineering for purposes of this document equates to system security assurance. Assurance Cases An assurance case (i.e., a system security assurance case in this document) is the primary mechanism used in secure systems engineering to support claims that a system will operate as intended in its operational environment and minimizes the risk that it has security weaknesses and vulnerabilities. Development and communication of the assurance case will be part of the overall risk management strategy during the system lifecycle. ISO/IEC 15026-2:2011 defines the parts of an assurance case as follows (with emphasis added): “An assurance case includes a top-level claim for a property of a system or product (or set of claims), systematic argumentation regarding this claim, and the evidence and explicit assumptions that underlie this argumentation. Arguing through multiple levels of subordinate claims, this structured argumentation connects the top-level claim to the evidence and assumptions.” Assurance claims would be included in the systems requirements list, and tagged as assurance claims. The assurance case is built and maintained throughout the systems engineering lifecycle and system assurance activities become part of the lifecycle. Assurance cases are typically reviewed at major milestones during the lifecycle. As discussed in [NDIA 1], “Claims identify the system’s critical requirements for assurance, including the maximum level of uncertainty permitted for them. A claim must be a clear statement that can be shown to be true or false–not an action.’1 “An argument is a justification that a given claim (or sub-claim) is true or false. The argument includes context, criteria, assumptions, and evidence.”2 Evidence is information that demonstrably justifies the argument.”3 1 An example of a claim might be unauthorized users are not able to gain entry into the system. A fault tree is an example of a technique and graphical notation used to represent arguments. 3 Mathematical proofs, verification of test results, and analysis tool results are examples of evidence. 2 Systems Engineering and Systems Security Engineering Vee Model Here – to be updated to show relationship with System Assurance activities Systems Security Requirements Definition and Analysis This section addresses software security requirements engineering. The methodologies described here are also applicable at the system level. Users may not be totally aware of the security risks, risks to the mission, and vulnerabilities associated with their system. To define requirements, systems engineers may, in conjunction with users, perform a top-down and bottom-up analysis of possible security failures that could cause risk to the organization as well as define requirements to address vulnerabilities. Fault tree analysis for security (sometimes referred to as threat tree or attack tree analysis) is a topdown approach to identifying vulnerabilities. In a fault tree, the attacker’s goal is placed at the top of the tree. Then, the analyst documents possible alternatives for achieving that attacker goal. For each alternative, the analyst may recursively add precursor alternatives for achieving the subgoals that compose the main attacker goal. This process is repeated for each attacker goal. By examining the lowest level nodes of the resulting attack tree, the analyst can then identify all possible techniques for violating the system’s security; preventions for these techniques could then be specified as security requirements for the system. Failure Modes and Effects Analysis (FMEA) is a bottom-up approach for analyzing possible security failures. The consequences of a simultaneous failure of all existing or planned security protection mechanisms are documented, and the impact of each failure on the system’s mission and stakeholders is traced. Other techniques for developing system security requirements include threat modeling and misuse and abuse cases. Both of these techniques are described in Section 5.2.3.1. Requirements may also be derived from system security policy models and system security targets that describe the system’s required protection mechanisms [e.g., the Target of Evaluation (TOE) descriptions produced for Common Criteria (CC) evaluations]. Attack tree analyses and FMEAs augment and complement the security requirements derived from the system’s threat models, security policy models, and/or security targets. The results of the system security requirements analysis can be used as the basis for security test case scenarios to be used during integration or acceptance testing. Secure Systems Architecture and Design In the design of secure systems, several key design features must be incorporated to address typical system vulnerabilities: security protocol design, password management design, access control, addressing distributed system issues, concurrency control, fault tolerance, and failure recovery. Appendix E describes security functions that are typically incorporated in secure systems. This is not meant to be an exhaustive list, but rather to provide illustrative examples. The following sections discuss two significant design issues with security implications, which are not directly related to security functionality. Timing and Concurrency Issues in Distributed Systems As noted by Anderson, in large distributed systems (i.e., systems of systems), scale-up problems related to security are not linear because there may be a large change in complexity. A systems engineer may not have total control or awareness over all systems that make up a distributed system. This is particularly true when dealing with concurrency, fault tolerance, and recovery. Problems in these areas are magnified when dealing with large distributed systems. Controlling the concurrency of processes (whereby two or more processes execute simultaneously) presents a security issue in the form of potential for denial of service by an attacker who intentionally exploits the system’s concurrency problems to interfere with or lock up processes that run on behalf of other principals. Concurrency design issues may exist at any level of the system, from hardware to application. Some examples of and best practices for dealing with specific concurrency problems, includes— Processes Using Old Data (e.g., out of date credentials, cookies): Propagating security state changes is a way to address this problem. Conflicting Resource Updates: Locking to prevent inconsistent updates (resulting from two programs simultaneously updating the same resource) is a way to address this. Order of Update in Transaction-Oriented Systems and Databases: Order of arrival and update needs to be considered in transaction-oriented system designs. System Deadlock, in which concurrent processes or systems are waiting for each other to act (often one process is waiting for another to release resources): This is a complex issue, especially in dealing with lock hierarchies across multiple systems. However, note that there are four necessary conditions, known as the Coffman conditions (first identified by E.G. Coffman in 1971)[72] that must be present for a deadlock to occur— mutual exclusion, hold and wait, no preemption, and circular wait. Nonconvergence in Transaction-Oriented Systems: Transaction-based systems rely on the ACID (atomic, consistent, isolated, and durable) properties of transactions (e.g., the accounting books must balance). Convergence is a state in transaction systems; when the volume of transactions subsides, there will be a consistent state in the system. In practice, when nonconvergence is observed, recovery from failures must be addressed by the systems design. Inconsistent or Inaccurate Time Across the System: Clock synchronization protocols, such as the Network Time Protocol or Lamport’s logical locks, can be run to address this issue. The above list is merely illustrative. A number of other concurrency issues can arise in software-intensive systems. Fault Tolerance and Failure Recovery In spite of all efforts to secure a system, failures may occur because of physical disasters or from security failures. Achieving system resilience through failure recovery and fault tolerance is an important part of a system engineer’s job, especially as it relates to recovery from malicious attacks. Fault tolerance and failure recovery make denial of service attacks more difficult and thus less attractive. As noted by B. Selic [73], dealing with faults involves error detection, damage confinement, error recovery, and fault treatment. Error detection detects that something in the system has failed. Damage confinement isolates the failure. Error recovery removes the effects of the error by restoring the system to a valid state. Fault treatment involves identifying and removing the root cause of the defect. Failure models of the types of attacks that can be anticipated need to be developed by the systems engineer. Resilience can then be achieved through fail-stop processors and redundancy to protect the integrity of the data on a system and constrain the failure rates. A fail-stop processor automatically halts in response to any internal failure and before the effects of that failure become visible. [74] The systems engineer typically applies a combination of the following to achieve redundancy at multiple levels. Redundancy at the hardware level, through multiple processors, mirrored disks, multiple server farms, or redundant arrays of independent disks (RAID). At the next level up, process redundancy allows software to be run simultaneously on multiple geographically distributed locations, with voting on results. It can prevent attacks where the attacker gets physical control of a machine, inserts unauthorized software, or alters data. At the next level is systems backup to unalterable media at regular intervals. For transaction-based systems, transaction journaling can also be performed. At the application level, the fallback system is typically a less capable system that can be used if the main system is compromised or unavailable. Note that while redundancy can improve the speed of recovery from a security incident, none of the techniques described above provide protection against attack or malicious code insertion Construction Security Testing In A Practical Guide to Security Engineering and Information Assurance, [75] Debra Herrmann recommends that because attackers are not biased by knowledge of a systems design or security protection mechanisms, testing of the integrated system by the system’s engineers be augmented by independent testing by a disinterested third party. Tests to discover design defects are difficult to develop. Like the systems engineers developing security designs, the testing group (whether independent or not), will be able to construct test cases based on understanding the psychology of the attackers and knowledge of typical software, hardware, and other system fault types. Additional sources of information for development of test cases and scripts include— Misuse and abuse cases Threat tree analysis reports Threat models FMEA reports Security policy models Security targets System security requirements. At a minimum, testing the resiliency of a system design to attack would include— Testing for transient faults, such as an unusual combination or sequence of events, degradation of the operating environment (temporary saturation of the network, power losses, environmental changes), or induced temporary loss of synchronization among components of a system Testing for the ability of the system to withstand password guessing, masquerading, etc. Creative “what if” testing. System Integration To be effectively addressed during the integration phase, system security issues must first be identified during the requirements and design phases. In today’s large distributed systems, system components typically interface with outside systems whose security characteristics are uncertain or questionable. The security of an integrated system is built on the behavior and interactions of its components and subsystems. On the basis of a risk analysis of systems components, systems engineers must build in necessary protection mechanisms As noted in DHS’s draft Security in the Software Lifecycle: Determining whether the system that contains a given component, module, or program is secure requires an analysis of how that component/module/program is used in the system, and how the system as a whole will mitigate the impact of any compromise of its individual components that may arise from a successful attack on those components or on the interfaces between them. Risks of insecurity can be reduced through: 1. Vetting all acquired, reused, and from-scratch components prior to acceptance and integration into the whole system; 2. Examining interfaces, observation of instances of trust relationships, and implementing wrappers when needed; 3. Security testing of the system as a whole. Certain systems design architectures and frameworks (e.g., application frameworks, publish/subscribe architectures) can minimize the likelihood of security problems being introduced through improper integration of application components. Issues associated with the use of nondevelopmental [e.g., commercial-off-the-shelf (COTS), open source software (OSS), legacy] components are discussed in Sections 3.2.1 and 5.1.1.2. The same issues apply when selecting and integrating components at the whole system level, rather than specifically at the software level.. Relationship to Systems-of-Systems Security Engineering As stated by Baldwin 2011i, “To implement the war-fighting strategies of today and tomorrow, systems are networked and designed to share information and services in order to provide a flexible and coordinated set of war-fighting capabilities. Under these circumstances, systems operate as part of an ensemble of systems supporting broader capability objectives. This move to s system-of-systems (SoS) environment poses new challenges to systems engineering in general and to specific DoD efforts to engineer secure military capability.” A system of systems is a set or arrangement of systems that results when independent and useful systems are integrated into a larger system that delivers unique capabilities.ii While a System of Systems is still a System, several challenges exist affecting the security aspects of the SoS: 1. Principals of systems engineering are challenged because a system of systems does not have clear boundaries and requirements 2. Systems engineers may have to sub-optimally allocate functions given they have to use existing systems to achieve required functionality 3. Since component systems have independent ownership, limited control over the development environment exists 4. Systems engineers are limited to controlling system security engineering to individual systems. As a result, vulnerabilities in other system components could jeopardize the security status of the overall SoS. 5. Interfaces between component systems increases vulnerabilities and risks in the SoS 6. With limited attention paid to the broader context of SoS, and with the SoS context being very dynamic, a SoS may be viewed as secure at one point, new vulnerabilities may emerge as the SoS context changes. 7. Because competing authorities and responsibilities exist across the component systems, systems engineering is typically not applied to the SoS. Individual systems are only able to address their individual security issues Some possible approaches to addressing these issues include: Data-continuity checking across systems Sharing of reeal-time risk assessment SoS configuration hopping to fool adversaries of SoS configuration More challenges and questions than solutions. Secure Systems Engineering and the CMMI and Other Models CMMI is a major Process Model used by DoD suppliers and development organizations. In 2010 Version 1.3 of CMMI® for Development was released. CMMI Version 1.3 addressed aspects of secure systems engineering. Reference Konrad’s presentation at https://buildsecurityin.uscert.gov/swa/presentations_091211/Konrad-SwAInCMMI.pdf to discuss how CMMI addressed Security Controversy surrounding how CMMI should address Security at LinkedIn Discussion at http://www.linkedin.com/groups/Cyber-Security-CMMI54046.S.200859335?trk=group_search_item_list-0-b-ttl&goback=.gna_54046 (Accessed 10 May 2013) Other Models: iCMM FAA Integrated Capability Maturity Model and Safety and Security extensions ISO/IEC 21827 System Security Engineering Capability Maturity Model (SSE CMM) Related Systems and Software Security Engineering Standards, Policies, Best Practices, and Guides CMMI Department of Homeland Security. Assurance Focus for CMMI (Summary of Assurance for CMMI Efforts), 2009. https://buildsecurityin.us-cert.gov/swa/proself_assm.html. Department of Defense and Department of Homeland Security. Software Assurance in Acquisition: Mitigating Risks to the Enterprise, 2008. https://buildsecurityin.uscert.gov/swa/downloads/SwA_in_Acquisition_102208.pdf. International Organization for Standardization and International Electrotechnical Commission. ISO/IEC 27001 Information Technology – Security Techniques – Information Security Management Systems – Requirements, 2005. http://www.iso.org/iso/iso_catalogue/catalogue_tc/catalogue_detail.htm?csnumber= 42103. NDIA-1: NDIA System Assurance Committee. Engineering for System Assurance. Arlington, VA: NDIA, 2008. http://www.ndia.org/Divisions/Divisions/SystemsEngineering/Documents/Studies/SA-Guidebookv1-Oct2008-REV.pdf. ISO/IEC 12207:2008 Systems and Software Engineering – Software Life Cycle Processes [ISO 2008a] ISO/IEC 15288:2008 Systems and Software Engineering – System Life Cycle Processes [ISO 2008b] ISO/IEC 27001:2005 Information technology – Security techniques – Information Security Management Systems – Requirements [ISO/IEC 2005] ISO/IEC 14764:2006 Software Engineering – Software Life Cycle Processes – Maintenance [ISO 2006b] ISO/IEC 20000 Information Technology – Service Management [ISO 2005b] Assurance Focus for CMMI [DHS 2009], https://buildsecurityin.uscert.gov/swa/downloads/Assurance_for_CMMI_Pilot_version_March_2009.pdf Resiliency Management Model [SEI 2010c], http://www.cert.org/resilience/rmm.html IEC 61508, Functional Safety of Electrical/Electronic/Programmable Electronic Safety-related Systems ISO/IEC 27001 Information Security Management System (ISMS) ISO 9001 – Quality Management IA Controls (NIST SP 800-53, DOD 8500.02) and C&A Methodologies (NIST SP 800-37, DIACAP) ISO/IEC 15408, Common Criteria for IT Security Evaluation ISO/IEC 15443, Information technology -- Security techniques -- A framework for IT security assurance (FRITSA) ISO/IEC 21827, System Security Engineering Capability Maturity Model (SSE CMM) revision ISO/IEC 27000 series – Information Security Management System (ISMS) Supply Chain Risk Management An untrustworthy supply chain can lead to loss of assurance From Baldwin 2011, “Mitigating the opportunity for critical capabilities to be compromised through the supply chain or system design is a relatively new focus for the defense department, and it requires system engineering expertise that has not been involved in acquisition security.” References for Section INCOSE Systems Engineering Handbook (INCOSE 2012): Systems Engineering Handbook: A Guide for System Life Cycle Processes and Activities, version 3.2.2. San Diego, CA, USA: International Council on Systems Engineering (INCOSE), INCOSE-TP-2003-002-03.2.2. i Baldwin, K., Dahmann, J., and Goodnight, J., “Systems of Systems and Security: A Defense Perspective”, Insight, July 2011, Vol 14, Issue 2 ii US Department of Defense. 2008. Systems Engineering Guide for Systems of Systems. Washington, DC