Chapter 4: Protection in a General-Purpose Operating System This chapter discusses the services provided by a general-purpose operating system to facilitate data integrity and security. Although it is the case that these services were developed to support time-sharing (a mode of computing not much used today), we should note that they remain relevant in the modern computing environment. Early computers were single user systems, with security features designed only to protect the operating system against accidental corruption. The introduction of multi-user time-sharing systems brought a new design challenge. With a number of users having programs executing on a computer at the same time, provisions had to be made to protect the operating system and each user from the actions of a rogue or incompetent user. The author of these notes was the accidental beneficiary of a failure of such a protection scheme. This occurred when a poorly designed program was running and hit an execution fault severe enough to cause a core dump. The core dump contained a hexadecimal representation of all of the contents of core memory, including the operating system area that had stored user ID’s and unencrypted pass words for about twenty other users. Not being in the mood, the author of these notes decided to decline this invitation to mischief. One might protest that in most modern computing systems there is only one user – the person at the keyboard. This view is demonstrably too facile. A modern computing system, such as the one on this PC, comprises a number of cooperating independent processes. As this author is typing on the keyboard and interacting with the MS-Word program, there are a number of other programs running, including the clock (continually updating the time display on the screen), the e-mail program, and too many other programs to list. Many computes also host web sites – there is a background program that is being accessed by who-knowshow-many outside users, some of whom might be malicious hackers. The conclusion of this introduction is that all modern computers should be considered to be time-sharing systems with many users, some of whom might compete for resources and some of whom may be either careless or malicious. Design of a system to tolerate such an environment is one of the primary goals in software security. The central idea in such a design is a protected object, for which access control must be provided by some central system, such as the operating system. Objects that require protection include memory, and I/O devices such as disks, printers, network connections, and tape drives, and system programs. The importance of memory protection is seen by an early hack against passwords. Most computer systems use passwords to authenticate users, and while most systems store these passwords in encrypted form, there is a time in the processing in which the unencrypted password is stored in a memory location. If I as a malicious hacker can monitor the location used by the operating system to hold the unencrypted password, I can access any account. It should be noted at this point that some shared devices, such as printers, are often protected for reasons other than security. When two or more programs have direct access to a shared printer, the output as printed is a mix of the output of each of the two programs – in other words a very big and unusable mess. Modern operating systems provide that only one program – the print spooler – have direct access to the printer and that all other programs, including both user and operating-system services, access the printer indirectly through a process called print spooling. In print spooling, each program writes directly to a disk file and signals the operating system when it is finished. The print spooler program then queues the file for printing, and outputs its contents in proper order. The Multi-Level Security Model The issue of multi-level security is quite simple. Given a computer that allows, directly or indirectly, multiple users to access its assets, can one allow programs and data to be stored on the system if not all users have access to those programs? In the terminology of the US Department of Defense, we ask if classified data and programs can be stored on the system if not all possible users have valid clearances for access to classified data? In the early days of computing, the question was quite pressing. Should a defense contractor maintain two very expensive computer systems, one for company data and one for classified US government data? The answer at the time was “Yes” and seems to have remained the same for some time; consider the fact that computers storing sensitive data are not allowed to be connected to the Internet. In scenarios under which a computer would be allowed to do both classified and unclassified processing, there were protocols for changing its “mode” from classified to unclassified and back again, including the use of removable disk packs and procedures for wiping the primary memory, which often in those days would maintain its state after the power was turned off. We see a partial solution to the multi-level security problem in some database products which allow for definition of windows and views on a data table. Such structures prevent access to the data directly through the database tool, but do not provide security for someone who can locate the data and access them directly with another tool. Virtual Memory Most modern operating systems designed for use on general-purpose computers now employ virtual memory, in which a program issues logical addresses that are converted by some mechanism to physical addresses that refer to actual locations in the computer’s primary memory or on a disk drive that is being used as secondary memory. While the informal definitions of virtual memory focus on the use of the disk drive as a “backing store”, the student should remember that it is the translation between logical (program) addresses and physical (real memory) addresses that characterizes virtual memory. In other words, it is possible to have a virtual memory system without use of a disk drive, but that would throw away many of the very useful benefits of the approach. It is the translation between logical addresses as issued by the program and physical addresses as accessed by the CPU that allows virtual memory to serve as a security feature in a modern operating system. Your program and my program may reference what appears to be the same address, but the operating system makes them different unless we both invoke some facility to allow sharing. Two mechanisms associated with virtual memory represent different ways in which it is implemented in a multi-level memory system: segmentation and paging. In the first approach, programs are broken into logical segments, each representing a section of code to which one might assign specific access rights. Some of the segments we see in a modern program include program segment, data segment, and stack segment. In the pure segment approach, each segment is loaded into a contiguous area of memory, beginning with a base address. All references to a segment are by <segment_name, offset>, where the segment name is made to be unique to a given process. Program segmentation provides for some very powerful management techniques, but is somewhat difficult to implement as it involves movement of various sized chunks of memory. The other approach to virtual memory involves paged memory, in which the address space allocated to a program is divided into equal-sized pages, and primary memory is divided into equally sized units called page frames. One page exactly fits any page frame in the memory. The size of a page is determined by the operating system, probably based on the structure of the disk that serves as the secondary memory. As the book notes, the size of a page is almost always given as a power of two in bytes, commonly between 512 (29) and 4096 (212) bytes. A page should be thought of as the size of the “chunk” of data that is most natural for a disk to transfer. In these notes, we give a way to deduce the most probable page size. Briefly put, in MS-DOS and MS-Windows systems a page (and page frame) are most likely the size of a disk cluster, which is a collection of disk segments. On all disk drives, a disk segment is a collection of 512 bytes of data, along with some extra control information for use by the hardware controlling the disk. It used to be that each segment on the disk was individually addressable. For a long time this implicit upper limit on the disk size presented no difficulty. When it did, the first patch on the design was to group segments into clusters, with a power of two segments being addressed as a cluster. Since a cluster is the smallest unit of disk memory that can be directly addressed, a page is probably the size of a cluster. The cluster approach is an artifact of the File Allocation Table (FAT-16) approach to disk addressing used in early forms of MS-DOS. In FAT-16, each segment had a unique 16-bit address, meaning that the maximum size of the disk was 216 512-byte segments or 21629 = 225 = 32220 = 32 megabytes. When this quickly became inadequate, a number of tricks were used, including disk partitions and disk clusters, with a cluster being 4, 8, or 16 segments. Even with 16-sector clusters, the maximum disk size was 512 megabytes. After a transition through FAT-32, most of the MS-Windows products use 32-bit addressing, allowing for maximum disk sizes of 23229 = 241 = 2 terabytes, if each sector is individually addressable. We may be back to 512-byte pages for a while. Control of Access to Objects Most systems for controlling access to objects make a distinction based on what might be called a structured user ID. This sort of ID is commonly represented as a pair of integers of the type (group_id, user_id) in which a user is associated with a group, with access privileges being allocated on an individual or group basis as the need arises. As mentioned in the book, this group affiliation becomes a bit cumbersome when a user may function as a member of more than one group. Suppose that my structured ID is (120, 37) – that is I am user 37 in group 120. Suppose further that I need access to some material from group 162, but that I am the only member of my group requiring such access. There are a number of options, none natural but none very difficult, by which such access may be granted. There are two basic approaches to granting access to an individual. The book calls these methods directory access and access control lists. In the directory access method, each user has a list of objects and files to which he or she has access. Within this context, the structured user ID has some advantage in that a user is assumed to have access to any object associated with his or her group unless that access is specifically withheld. In the access control list approach each object has a list of users to which access has been authorized and the nature of access for each user. There always is a generic user, often denoted (*, *) for which some level of access (perhaps no access) is specified. The area of physical security has two access mechanisms that are analogous to those above. Suppose that I work for a company and am allowed access to a small number of locked rooms. One way would be for me to keep one key for each room to which I have access. This used to be the only way by which access to physical objects could be granted. With the introduction of card reader technology, computers, and communication networks, it is now possible for each room to have a reader and a list of employees allowed access to that room. Thus, in one job I had, I had access to the building but not to the communications closet. Capability A capability is defined as “an unforgeable token that gives the possessor certain rights to an object”. With this definition, the cynic may be pardoned for asking if such a thing as a capability can really exist. For now, the author of these notes wants to modify the definition to make it a token that is very difficult to forge. Access rights for a process are determined by the execution environment in which that process has been created. Part of this environment is the domain or name space, which is the collection of objects to which a process has access. This concept leads immediately to a simple question, how can a process pass its privileges (a reference to its name space) to another process which has been invoked to execute a specific action? Suppose that the procedure invoked has greater privileges than the main procedure that invoked it. How does one reliably revoke the sub-procedure’s privileges when control returns to main? There are probably many good approaches to this; the designer of an operating system must decide. File Protection Mechanisms The book then discusses a number of file protection mechanisms, from the early schemes that involved no protection to the standard group-user protection schemes. The classic UNIX file protection structure is typical in that a user can grant privileges to others in a group or to all users without regard to group. This division of the world into three layers: myself, my group, and everybody else, has some natural logic to it in that it reflects corporate structures. However, such a division is a bit too simplistic for some needs. The designers of UNIX have added an additional permission that has both advantages and disadvantages – the SUID or set userid feature. Since the book gives a good explanation of the feature on page 208, we quote directly from the book. “The Unix operating system provides an interesting permission scheme based on three-level user-group-world hierarchy. The Unix designers added a permission called set userid (suid). If this permission is set for a file to be executed, the permission level is that of the file’s owner, not the executor. To see how it works, suppose Tom owns a file and allows Ann to execute it with suid. When Ann executes the file, she has the protection rights of Tom, not of herself.” “This peculiar-sounding permission has a useful application. It permits a user to establish data files to which access is allowed only through specified procedures.” “This mechanism is convenient for system functions that general users should be able to perform only in a prescribed way. For example, only the system should be able to modify the file of users’ passwords, but individual users should be able to change their own passwords any time they wish.” Passwords and Other Mechanisms for Verifying Users The book then discusses methods for authenticating users, including the common way of use of memorized passwords. The student is invited to read this section. There is not much more to say about passwords – except “DO NOT WRITE THEM DOWN”. One of the more famous bank heists of the 1970’s occurred when the operators of a wire transfer room wrote down the system password and an outside consultant used this slip to wire himself ten million dollars to a Swiss bank account. The consultant bought diamonds with the money and then, like an idiot, returned to the United States where he was promptly arrested. The bank is rumored to have made a profit on the sale of the diamonds, but the true story is that the consultant handled the diamonds badly so that they became scratched and lost value. Your friendly author realizes that passwords are a nuisance, but an insecure system in much more of a nuisance.