安全操作系统 中国科学技术大学计算机系 陈香兰(0512-87161312) xlanchen@ustc.edu.cn 助教:裴建国 Autumn 2008 第五章 操作系统安全体系结构 安全体系结构的含义和类型 计算机系统的安全体系结构设计的基本原则 Flask体系、LSM以及Flask在LSM中的应用 权能体系 操作系统的设计问题 操作系统的测试 渗透测试和老虎队分析 Ethical hacking:正面的黑客行动 打补丁 可/不可 原因 旧系统有了新的应用 系统在设计时考虑不充分 构造安全操作系统时的安全体系结构应当如何? 第五章 操作系统安全体系结构 安全体系结构的含义和类型 计算机系统的安全体系结构设计的基本原则 Flask体系、LSM以及Flask在LSM中的应用 权能体系 安全体系结构的含义及类型 体系结构设计的主要任务 各种不同角度的需求,可能有冲突 需要折衷 计算机系统的安全体系结构,包括 1. 详细描述系统中安全相关的所有方面 2. 在一定的抽象层次上描述各个安全相关模块之间的 关系 3. 提出指导设计的基本原理 4. 提出开发过程的基本框架及对应于该框架体系的层 次结构 安全体系结构只是一个概要设计,而不能是 系统功能的描述 要考虑到测评认证的有效性,充分参考 可信计算机系统评估准则TCSEC 通用评估准则CC TCSEC没有给出“安全体系结构”的定义,但对系 统的体系结构和系统设计的文档资料提出了定性 的要求,并且给出了顶层规范的定义 CC也没有 美国国防部的目标安全体系(DoD Goal Security Architecture)把安全体系划分为4种类型 抽象体系 通用体系 逻辑体系 特殊体系 第五章 操作系统安全体系结构 安全体系结构的含义和类型 计算机系统的安全体系结构设计的基本原则 Flask体系、LSM以及Flask在LSM中的应用 权能体系 计算机系统的安全体系结构设计的基本原则 1. 2. 3. 4. 5. 6. 7. 从系统设计之初就考虑安全性 应尽量考虑未来可能面临的安全需求 隔离安全控制,并使其极小化 实施特权最小化 结构化安全相关功能 使安全相关的界面友好 不要让安全依赖于一些隐藏的东西 第五章 操作系统安全体系结构 安全体系结构的含义和类型 计算机系统的安全体系结构设计的基本原则 Flask体系、LSM以及Flask在LSM中的应用 Flask LSM Flask在LSM中的应用 权能体系 一、Flask体系结构 Flask history In 1992 & 1993, researchers at the NSA and SCC worked on the design and implementation of DTMach, an outgrowth of the TMach project and the LOCK project. DTMach integrated a generalization of type enforcement , a flexible access control mechanism, into the Mach microkernel. The DTMach project was continued in the DTOS project. The DTOS project improved upon the earlier design and implementation work, yielding a prototype that was released to universities for research (e.g. Secure Transactional Resources, DX). From:http://www.cs.utah.edu/flux/fluke/html/flask.html After the DTOS project, a new joint effort was started by the NSA, SCC, and the University of Utah's Flux project to transfer the DTOS security architecture into the Fluke research OS. During the integration, the architecture was enhanced to provide better support for dynamic security policies It was named Flask. Flask: Flux Advanced Security Kernel Flask was ported to: OSKit Security-Enhanced Linux 论文 Ray Spencer, et al., The flask security architecture: system support for diverse security policies, in Proceedings of the 8th conference on USENIX Security Symposium - Volume 8. 1999, USENIX Association: Washington, D.C. FLASK Flux Advanced Security Kernel The Flask Security Architecture: System Support for Diverse Security Policies Ray Spencer Secure Computing Corporation Stephen Smalley, Peter Loscocco National Security Agency Mike Hibler, David Andersen, Jay Lepreau University of Utah 参考了Jim Stevens的ppt Outline Introduction Policy Flexibility Insufficiency of Popular Mechanisms Related Work Flask Design and Implementation Results Summary Other Flask object managers Current Status Outline Introduction Policy Flexibility Insufficiency of Popular Mechanisms Related Work Flask Design and Implementation Results Summary Other Flask object managers Current Status Introduction The notion of “security” in a system is defined in terms of its security policy A wide range of security policies exist due to the diversity of computing environments Operating systems must be flexible in support for security policies to accommodate the spectrum of security policies Supporting policy flexibility is not as simple as just implementing multiple policies 3 Requirements of Policy Flexibility Support fine-grained access controls on low-level objects Propagate access rights according to security policy Deal with changes in policy over time, including revoking previously granted permissions Earlier systems provided some mechanisms to implement policy flexibility Previous systems failed to address all three requirements at once This paper describes Flask architecture and a microkernel based prototype to demonstrate that policy flexibility is feasible Flask is based on the concept of mandatory access controls (MAC) Compare to discretionary access controls (DAC) What’s ahead Elaboration on meaning of policy flexibility Discussion of two popular mechanisms that limit policy flexibility Flask architecture overview and prototype Evaluation of Flask prototype Outline Introduction Policy Flexibility Insufficiency of Popular Mechanisms Related Work Flask Design and Implementation Results Summary Other Flask object managers Current Status Policy Flexibility How? List all known security policies and define flexibility through that list? Unrealistic A better definition is needed! It is more useful to define security policy flexibility by viewing the computer system as an abstract state machine with atomic state transformations Total flexibility is achieved when security policy knows entire state of system and can affect all operations in the system Allow/deny operation Atomically inject handler routines It is possible to modify the existing security policy and to revoke any previously granted access. Total flexibility is obviously not possible in a real system A more realistic approach is to ask what subset of system state and operations are relevant to security Flexibility of a practical system therefore depends on how complete the set of control operations is and what portion of the state is available to the security policy Granularity of the controlled operations affects the degree of flexibility because it impacts the granularity at which sharing can be controlled A policy flexible system must be capable of supporting a wide variety of security policies. Security policies may be classified by The need to revoke previously granted access The type of input required to make access decisions The sensitivity of policy decisions to external factors like history or environment Transitivity of access decisions Revocation is the most difficult characteristic to support Security policy must deal with policy changes interleaved with execution of controlled operations Interleaving must be atomic so any controlled operation has a consistent policy Atomicity is difficult to achieve because access permissions tend to migrate throughout the system Example: Unix write permissions on a file are only checked when the file is opened. The granted permission is cached in the file descriptor. Changing permissions only affects future open operations. Migrated permissions are common in capabilities, access rights in page tables, open IPC connections, and other operations in progress Must make sure entire system knows if a permission is revoked when policy changes Complicated and potentially expensive Must identify relevant in-progress operations Three ways to handle revocation for an in-progress operation Abort and return error Restart operation and check permission Wait for operation to complete Waiting is not safe because it does not enforce policy and can take an unbounded amount of time Outline Introduction Policy Flexibility Insufficiency of Popular Mechanisms Related Work Flask Design and Implementation Results Summary Other Flask object managers Current Status Insufficiency of Popular Mechanisms We will take a look at: Capability-Based Systems Intercepting Requests Capability-Based Systems Capabilities are transferable tokens that reference an object and access rights A capability is an unforgable data structure maps access rights to objects can be passed around stored in kernel memory so user can only modify it using an interface like file descriptors Example OS implementations are Hydra, KeyKOS, EROS, SCAP, ICAP, and Trusted Mach Capability mechanisms are poorly suited to providing policy flexibility because they allow the holder of the capability to control the propagation of that capability Security policy MUST control propagation of access rights to properly implement rules of security policy Cannot trust the capability holders to implement policy Hydra and KeyKOS had enhancements to limit propagation, but they were specific to certain policies and very complex Intercepting Requests A common approach to add security is to intercept service requests with an additional security layer May be done in capability or non-capability based systems Examples: Kernel Hypervisors (not VM!), SPIN, Lava, KeySAFE Can work at kernel-level or user-level Limitations Must expose all abstractions and information flows that the security policy wishes to control Requires state to be exposed to avoid redundancies and to make sure that policy enforcement mechanisms know what to do Can only affect an operation as requests pass through the interface Outline Introduction Policy Flexibility Insufficiency of Popular Mechanisms Related Work Flask Design and Implementation Results Summary Other Flask object managers Current Status Related Work Security architecture of Flask is based on DTOS, which had similar goals. Had mechanisms that were policy independent, but not rich enough to support some policies (particularly dynamic policies) Used Mach microkernel design to handle revocation of memory permissions (could not handle other permissions) Generalized Framework for Access Control (GFAC) Assumes all controlled operations are performed in same atomic operation in which the policy is consulted Difficult to achieve in a practical system and primary obstacle that Flask had to overcome Multics Effectively provided immediate revocation of memory permissions by invalidating segment descriptors Shows that this problem is not new Spring Had a capability revocation method, but didn’t work for migrated permissions Outline Introduction Policy Flexibility Insufficiency of Popular Mechanisms Related Work Flask Design and Implementation Results Summary Other Flask object managers Current Status Flask Design and Implementation Flask prototype is implemented in a microkernelbased multiserver OS Microkernel isn’t essential though Only requires a reference monitor consult The base system is Fluke Originally a capability-based system Modified to meet requirements of Flask architecture In operating systems architecture, a reference monitor is a tamperproof, always-invoked, and small enough to be fully-tested and analyzed module that controls all software access to data objects or devices (verifiable). The reference monitor verifies the nature of the request against a table of allowable access types for each process on the system. Flask体系结构图 Object manager – components that enforce security policy Security server – components that make security policy decisions Primary Goal: Ensure that subsystems always have a consistent view of policy decisions regardless of how they are made and how they change over time Secondary goals: application transparency, defense-in-depth, ease of assurance, and minimal performance overhead Flask provides three primary elements for object managers: Interfaces for accessing security server decisions Access – permission between two entities Labeling – specify security attributes of an object Polyinstantiation – which member of a set of resources should be accessed for a particular request Access vector cache (AVC) to cache decisions and minimize performance overhead Registration service to receive notifications when policy changes Object managers must define: a mechanism to assign labels to their objects a control policy, which specifies how security decisions are actually used and enforced handling routines that are called when policy changes Object Labeling All objects controlled by the security policy are labeled with a set of security attributes, referred to as the security context Flask provides two data types for labeling objects Security contexts – variable length strings that can be interpreted by any application or user that understands the security policy, can contain whatever is needed by the security policy and is therefore flexible SID – fixed size values used as references to security contexts, created for efficiency reasons (cheaper to pass around), security server maintains SID mappings General Support Mechanisms Client and Server Identification IPC calls require the client and server to be identified so the roles are known for a security decision Caching security decisions Use AVC to save security decisions because querying the security server is expensive due to IPC and security computation Coherence is provided by policy change handler routines Polyinstantiation Support Security server identifies which instantiation can be accessed by a client Requesting and caching security decisions in Flask Polyinstantiation in Flask Microkernel-Specific Features Binds an SID to each memory segment, which is the same SID of whatever object is stored in that memory, and allows Flask to leverage Fluke’s protection model Associates a Flask permission with each memory access mode based on the SID of the address space and the memory segment Uses to verify that accesses to mapped memory are allowed by security policy Revocation Support Requirements After policy change, the object manager’s behavior must reflect the change Policy changes must complete in a timely manner Three step protocol Security server notifies all object managers that may previously been exposed to revoked permissions Object manager updates its internal state Object manager notifies the security server that the update is complete Sequence numbers are used to synchronize policy changes and policy decisions Security Server Requirements Provide mapping from SIDs to security contexts Allocate SIDs for newly created objects Manage AVCs of object managers (with handler callbacks) Provide interface for changing policy (if needed) Cache computations on server side as well because computations can get expensive Distributed systems If in a homogeneous policy environment: the security server of each node merely act as a local cache of the environment’s policy to support heterogeneous policy environments, it is desirable for each node to have its own security server with a locally defined policy component, with some degree of coordination at a higher level. Flask security server is defined through a combination of code and a policy database Policy database language can express many policies Any security policy that can be expressed through the prototype’s policy database language may be implemented simply by altering the policy database. but some policy changes may require code changes or by completely replacing the security server. BUT always do not require any changes to the object managers The policies enforced by the prototype server were: Multi-level security The policy logic for the multi-level security policy is largely defined through the security server code, aside from the labels themselves. Type enforcement Identity-based access control Role-based access control The policy logic for the other subpolicies is primarily defined through the policy database language. Outline Introduction Policy Flexibility Insufficiency of Popular Mechanisms Related Work Flask Design and Implementation Results Summary Other Flask object managers Current Status Flexibility of Flask Prototype Three sources of potential inflexibility Range of operations that system can control Limitation of operations that may be invoked by the security policy (depends on object manager instances) Amount of state information available to security policy for making decisions Limited to 2 SIDs per query in the prototype, cannot handle parameters Architecture does not limit this, but changing it may be a source of reduced performance Performance Overhead for labeling is about 1% compared to Fluke Table 2 presents measurements for IPC operations of various bit lengths All tests are in the AVC Table 3 presents measurements for decision time when decision is stored in different locations Table 2: IPC Time Naive: same test as fluke, but with flask Client identification: modified to use flask specific server-side IPC to obtain SID on every call Client impersonation: uses client side IPC to specify an effect SID for every call Table 3: Security Decision Time trivSS – computation is trivial, just communication overhead realSS – combination of computation and communication Revocation time is shown in Table 4. It is the most expensive operation. Shown with a varying Table 4: Revocation Times number of connections. Has overhead of stopping all threads in prototype. That is the majority of the time. Scales linearly with number of connections after that. Although this is expensive, policy changes are relatively rare. Macrobenchmark GNU Build System (make, gcc, ld) Compilation of about 8000 LOC of .c and .h files Also executed on FreeBSD for comparision Table 5: Execution Time Flask-FFS-PM – unmodified Fluke object managers Memfs – memory file system (to reduce page faults) Hint – predetermined location in cache Cache – must find decision in cache Table 6: Security Decision Resolution Invasiveness of Flask Code Overall, Fluke components increased in size less than 8% (see Table 7) Kernel increased by 19% 57% of changes to process manager and 61% of changes to kernel were “trivial” Only extended Fluke API with security functionality, fully backwards compatible with Fluke Outline Introduction Policy Flexibility Insufficiency of Popular Mechanisms Related Work Flask Design and Implementation Results Summary Other Flask object managers Current Status Summary Paper provided a useable definition of policy flexibility Shows that pure capability based systems and intercepting request based systems are inadequate for achieving policy flexibility Paper described operating system security architecture capable of supporting a wide range of security policies Demonstrated practicality of architecture with prototype microkernel-based system Appendix A has examples for Flask based file server, network server, and process manager Outline Introduction Policy Flexibility Insufficiency of Popular Mechanisms Related Work Flask Design and Implementation Results Summary Other Flask object managers Current Status Other Flask object managers File Server Network Server Process Manager File Server The Flask file server provides four types of controlled (labeled) objects: file systems, directories, files, and file description objects. Since file systems, directories and files are persistent objects, their labels must also be persistent. Figure 6: Labeling of persistent objects Table 8: Permission requirements for relabeling a file. Network Server Table 9: Layered controls in the network protocol stack. Process Manager Outline Introduction Policy Flexibility Insufficiency of Popular Mechanisms Related Work Flask Design and Implementation Results Summary Other Flask object managers Current Status Current Status NSA implemented a Linux Security Module (LSM) called SELinux It is an implementation of Flask It is in the mainline kernel Released in many distros including RHEL, Debian, etc. Often criticized for being overly complicated to set up and understand TrustedBSD Part of this system is a port of SELinux extensions to FreeBSD TrustedDarwin is a port of TrustedBSD to the Darwin system Some components of TrustedBSD have spilled over into OS X, not sure if this includes Flask implementation References R. Spencer, S. Smalley, P. Loscocco, M. Hibler, D. Andersen, and J. Lepreau. The Flask Security Architecture: System Support for Diverse Security Policies. In Proceedings of the Eighth USENIX Security Symposium, pages 123-139, Aug. 1999. http://www.cs.utah.edu/flux/fluke/html/flask.html http://www.nsa.gov/selinux 二、LSM SELinux & LSM SELinux motivated the creation of LSM. Separate kernel from security features in order to minimize the impact to kernel. LSM doesn’t provide any security but it adds security fields to kernel and provides interfaces for managing these fields for maintaining security attributes. 论文 1 2 Outline Introduction Design and Implementation Testing and Functionality Conclusions Outline Introduction Design and Implementation Testing and Functionality Conclusions Introduction Security is a chronic and growing problem Linux systems do experience a large number of software vulnerabilities An important way to mitigate software vulnerabilities is through effective use of access controls. DAC Non-DAC But there has been no real consensus on which is the one true access control model. Because of this lack of consensus, there are many patches to the Linux kernel that provide enhanced access controls [6, 10, 11, 13, 16, 18, 23, 19, 31] but none of them are a standard part of the Linux kernel. The Linux Security Modules (LSM) project seeks to solve this Tower of Babel quandry by providing a general purpose framework for security policy modules. This allows many different access control models to be implemented as loadable kernel modules, enabling multiple threads of security policy engine development to proceed independently of the main Linux kernel. A number of existing enhanced access control implementations have already been adapted to use the LSM framework, POSIX.1e capabilities SELinux Domain and Type Enforcement (DTE) Outline Introduction Design and Implementation Testing and Functionality Conclusions The problem: Constrained Design Space At the 2001 Linux Kernel Summit, the NSA presented their work on SELinux, an implementation of a flexible access control architecture in the Linux kernel. Linus Torvalds appeared to accept that a general access control framework for the Linux kernel is needed. However, given the many Linux kernel security projects, and Linus’ lack of expertise in sophisticated security policy, he preferred an approach that allowed security models to be implemented as loadable kernel modules. In fact, Linus’ response provided the seeds of the LSM design. The design of LSM was constrained by the practical and technical concerns of both the Linux kernel developers and the various Linux security projects. Linus Torvalds specified that the security framework must be: truly generic, where using a different security model is merely a matter of loading a different kernel module; conceptually simple, minimally invasive, and efficient; able to support the existing POSIX.1e capabilities logic as an optional security module. The “LSM problem” The “LSM problem” is to unify the functional needs of as many security projects as possible, while minimizing the impact on the Linux kernel. LSM takes the approach of mediating access to the kernel’s internal objects: tasks, inodes, open files, etc., as shown in Figure 1. Figure 1: LSM Hook Architecture why LSM chose this approach? ? system call interposition: mediating system calls as they enter the kernel ? device mediation: mediating at access to physical devices Reason: information critical to sound security policy decisions is not available at those points At the system call interface, userspace data, such as a path name, has yet to be translated to the kernel object it represents, such as an inode. Thus, system call interpostion is both inefficient and prone to time-of-check-to-time-of-use (TOCTTOU) races At the device interface, some other critical information (such as the path name of the file to be accessed) has been thrown away. In between is where the full context of an access request can be seen, and where a fully informed access control decision can be made. Figure 2: Permissive LSM hook. Implementation Implementation Overview Task Hooks Program Loading Hooks IPC Hooks Filesystem Hooks Network Hooks Other Hooks Design and Implementation Overview The LSM kernel patch modifies the kernel in five primary Ways adds opaque security fields to certain kernel data structures inserts calls to security hook functions at various points within the kernel code adds a generic security system call provides functions to allow kernel modules to register and unregister themselves as security modules, moves most of the capabilities logic into an optional security module Opaque Security Fields The opaque security fields are void* pointers, which enable security modules to associate security information with kernel objects. Table 1 Table 1: Kernel data structures modified by the LSM kernel patch and the corresponding abstract objects. The setting of these security fields and the management of the associated security data is handled by the security modules. Calls to Security Hook Functions Figure 3 shows the vfs_mkdir kernel function after the LSM kernel patch has been applied. Caller Callee Registering Security Modules 全局变量:security_ops 提供Callee的接口定义 2.6.26中 观察2.6.26中关于security_operations的定义 security_ops的初始化 将caller与具体的callee相挂接 源码中的security_initcall register_security只能用来注册第一个安全模块 mod_reg_security,栈式 Task Hooks task_struct structure task_security_ops 2.6.26 security_operations中task相关 2.6.26 security.h中对各task hook调用点的封装定义 The LSM task hooks have full task life-cycle coverage. create() task hook a task can spawn children? alloc_security() task hook manage the new task’s security field. 2.6.26中 do_fork在copy_process中,调用security_task_create Hook调用点 CONFIG_SECURITY时 否则 Hook在这里被调用 初始化security域 Hook调用点,用来设置security域 Hook调用点的定义 Security域的具体内容取决于挂上来的callee When a task exits kill() task hook the task can signal its parent? Parent: wait() task hook the parent task can receive the child’s signal? free_security() task hook Release the task’s security field. 任务在运行过程中可能需要修改某些属性。 例如:setuid(2). setuid() task hook. post_setuid() task hook. 为防止某些任务的(潜在)敏感信息被泄露, LSM介入对其他任务状态的查询。 getpgid() getscheduler() Program Loading Hooks The linux_binprm structure represents a new program being loaded during an execve(2). binprm_security_ops May to change privileges when a new program is executed. Hooks are used to verify a task’s ability to load a new program and update the task’s security field. 添加的模糊项 execve(2) alloc_security() :分配模糊域相关空间 set_security() : 设置模糊域 may be called multiple times during a single execve(2) compute_creds() : set the new security attributes of a task Typically, it will calculate the tasks new credentials based on both its old credentials and the security information stored in the linux_binprm security field. Once the new program is loaded, free_security():释放模糊域相关空间 观察2.6.26中security域的分配、释放和赋值 在do_execve中,分配 释放 在prepare_binprm中设置 关于compute_creds compute_creds的调用点 调用点 以a.out为例 装载a.out的程序 Filesystem Hooks For file operations, three sets of hooks filesystem hooks, inode hooks, file hooks. 观察相关的hook定义,及其使用 LSM adds a security field to each of the associated kernel data structures: super block, inode, file. 观察相应数据结构中的security域定义 IPC Hooks standard SysV IPC mechanisms: shared memory, semaphores, message queues. ipc_security_ops shm_security_ops, sem_security_ops, msg_queue_security_ops, msg_msg_security_ops. 观察2.6.26中的相关hook定义 Other Hooks LSM provides two additional sets of hooks: module hooks and a set of top-level system hooks. Module hooks can be used to control the kernel operations that create, initialize, and delete kernel modules. System hooks can be used to control system operations, such as setting the system hostname, accessing I/O ports, and configuring process accounting. Outline Introduction Design and Implementation Testing and Functionality Conclusions Performance Impact The performance cost of the LSM framework is critical to its acceptance it was a major part of the debate at the Linux 2.5 developer’s summit that spawned LSM. Microbenchmarks & Macrobenchmarks Compared a stock Linux kernel to one modified with the LSM patch, but with no modules loaded microbenchmarks LMBench results 最坏情况下的开销: 6.2% for stat(), 6.6% for open/close, 7.2% for file delete. 通常情况下的开销: often 0%, ranging up to 2% macrobenchmarks building the Linux kernel even better: no measurable performance impact. Security Impact LSM provide some real security value?? This can be viewed in two ways. First, must not create new security holes and needs to be thorough and consistent in its coverage. a project from IBM static and dynamic analysis of the LSM framework Second, must be general enough to support a variety of access control models SELinux DTE Linux LSM port of Openwall kernel patch POSIX.1e capabilities LIDS (Linux Intrusion Detection System) Outline Introduction Design and Implementation Testing and Functionality Conclusions Conclusions requirements: to meet two criteria: be relatively painless for people who don’t want it, be useful and effective for people who do want it. LSM meets these criteria. The patch is relatively small, the performance data shows that the LSM patch imposes nearly zero overhead. The broad suite of security products from around the world that have been implemented for LSM shows that the LSM API is useful and effective for developing Linux security enhancements. Flask体系结构在Linux LSM中的应用 对照FLASK体系结构 Thanks! The end.