Faculty of Engineering EE5902 MultiprocessorSystems Survey on Managing Shared Memory Architectures under UNIX C Names Matric Numbers Wei Nan HT042584N Contact No: _________81186315___________ Email: _______g0405083@nus.edu.sg_______ Survey on Managing Shared Memory Architectures under UNIX Wei Nan HT042584N Department of Electrical and Computer Engineering Faulty of Engineering National University of Singapore weinan@nus.edu.sg Abstract In multiprocessor systems, the operation system needs to manage shared memory. In order to use the memory sources efficiently, memory management algorithm has evolved over several decades. In this survey, we review some algorithms which have been implemented under UNIX. We also compare and analyze the performance of these algorithms by simple examples. Introduction In almost multiprocessor systems, all processors shared a global memory. Commonly, there are three kinds of shared memory architecture, uniform memory access (UMA), non-uniform memory access (NUMA) and cache-only memory architecture (COMA) [9]. In the UMA systems, all processors access the shared memory through an interconnection network. Each processor has equal access time and opportunity to read/write the any memory location. In the NUMA systems, each processor has a part of shared memory attached. Each processor uses real address to access memory location. Additionally, the access time depends on the distance to the processor. In the COMA systems, each processor also has a part of shared memory attached. However the memory consists of cache memory. Cache directory helps processors to access the cache memory remotely. All these shared memory systems need an operation system such as MS windows, UNIX to manage the memory. UNIX is an operation system which supports multiprocesor, multicomputer and multiuser. Various memory management algorithms have been implemented in this OS during recent several decades. Virtual memory is used in UNIX in order to give the user of impression of larger main memory (physical memory), hide the main memory space and allow many processes time-shared use of the main memory. It simply equals the sum of physical memory and swap space [3]. Swap space belongs to disk memory, but it is used as virtual memory, stores the data when the physical memory is not enough. The purpose of memory manager in UNIX is to allocate and deallocate main memory to executing processes [1]. How to swap in and swap out data between the swap space and main memory is the one of the main concept of memory management algorithm. Table 1 shows some algorithms and the UNIX which implements these algorithms. We will describe these algorithms and discuss their distinction in follow sections. Managing Memory Algorithms Implemental UNIX Swapping Systems Early UNIX Demand Paging Systems UNIX BSD 4.0, UNIX System V Hybrid Memory Systems UNIX System V Anticipatory Paging- OSF/1 Clustered OSF/1 UNIX Paging (without hint) Anticipatory Paging- with Hint Future Table 1: Evolution of memory management algorithm Memory Management Algorithms In UNIX, both main memory and swap space are divided into pages which are blocks of words usually 4096 bytes or 8192 bytes [3]. UNIX use a process called memory swapping to swap the pages between different levels of memory. In this survey, we focus on the swapping between main memory and swap space. The algorithm to implement memory swapping has evolved over several decades. There are two swapping policies, non-preemptive and preemptive. In non-preemptive, incoming pages only can be placed in free space of the main memory. On the contrary, preemptive allocation allows the incoming pages to reside in the spaces which are already placed by other processes. Additionally, in either policy, the memory manager tries to allocate free space first [1]. In order to compare and analyze these algorithms, we assume a system which has 256K bytes main memory which is divided into 32×8K bytes pages and 1M bytes swap space which is divided into 128×8K bytes pages. There are 16 processes each of them are 64K bytes and stored in swap space with 8×8K bytes pages, see figure 1. Contiguous pages of process are mapped in the swap space could enable a faster I/O speed than single located pages. Main Memory 0 0 8K 16K 24K Swap Space 8K 16K … 24K Process 1 Process 1 Process 1 … … … … 64K 72K 256K 256K Process 2 … Process 4 Process16 1M Figure 1 Example System Structure Swapping Systems Swapping systems was used in PDP-11 and early UNIX. It swaps an entire process between main memory and swap space. It can not swap the pages of a process individually. When a process block at a time, it is swapped out to swap space, see figure 2-b. Later on, system requires the process again; it will be swapped back into main memory, see figure 2-a. Lastly, if it no longer be used, system swaps it out. In UNIX, a special process 0 called swapper swaps in and out the process to the main memory. It swaps in the process to main memory before this process is executed. Furthermore, the swapper only works if there are processes need to be swapped, or it goes to sleep. The kernel wakes swapper up periodically when the situation demands [1]. In this algorithm, 4 processes be can stored in the main memory at the same time in the example system, see figure 3. We found that swapping system algorithm is very expensive due to swapping an entire process. However, it could provide lots of free memory if there are some heavyweight processes are swapped out. 0 8K 16K 64K Main Memory Process 1 Process 1 … Swap Space 0 8K 16K 24K Process 1 128K … Process 1 Process 1 Process 1 64K 72K … 256K Process 1 Process 1 256K … … … … Process 2 … Process16 Process 4 Figure 2-a Swapping Systems: Swap in 0 8K 16K 64K Main Memory Process 1 … … Swap Space 0 8K 16K 24K Process 1 128K … Process 1 Process 1 Process 1 64K 72K Process 1 … 256K Process 1 Process 1 256K … … … … Process 2 … Process 4 Figure 2-b Swapping Systems: Swap out Process16 Main Memory 0 8K 16K 64K Process 1 Process 1 … Main Memory 0 8K 16K 64K Process 2 128K … Process 4 Figure 3 Swapping Systems Process 2-4 Process 5 128K Process 3 256K Process 1 Process 1 Process 6-8 Process 9 … 256K Process 16 Figure 4 Demand Paging Systems Demand Paging Systems Demand Paging Systems swap page separately instead of entire process. A process can divide into to many pages when it is stored in memory. In fact, not all the pages of the process will be used at the same time. UNIX system could determine individual pages independently swap in and out. When there are page faults-references to pages that are not in main memory, corresponding pages are swapped in. Back (1986) defined that term working set is the set of pages that the process has referenced in its last n memory references; n is called the window of the working set. Only the working set can be stored in the main memory. Table 2 gives an example of working set where window n = 3 and following LRU (least recently used) replacement policy. This algorithm allows more processes share main memory at the same time than swapping system algorithm. We assume that at a time when the system tries to execute the process, only contiguous two pages (2×8K=16K) are required to swap into the main memory. 256/16 = 16 processes can be stored in main memory at the same time in the example system, see figure 4. For process, it is more efficient than swapping systems due to the factor that more processes could be in main memory simultaneously. However, thrashing is a severe problem in this algorithm. One reason is long disk latency because swap space is slower than main memory. For instance, in the example system, there are 16 processes in the main memory at the same time. Each of them only has two active pages in main memory. Additionally, there are 16 processes waiting for swapping in. Obviously, at each swapping, only two pages can be swapped from swap space into main memory. Hence, the system will waste a lot of time on swapping in and out the pages of all the processes. Page trace 9 8 15 9 23 25 1 1 8 working sets 9 9 9 9 9 9 1 1 1 8 8 8 23 23 23 23 8 15 15 15 25 25 25 23 Table 2 working sets generated with a page trace Hybrid Memory Systems The hybrid memory systems have been implemented in UNIX System V and Unix SVR4, which combine the advantage of both swapping and demand paging. In Unix SVR4, a process called pageout to do the demand paging work. When there are lots of heavyweight processes such as netscapes, sendmail, xterms need to execute simultaneously. And the system does not have enough main memory. Swapper swaps out some least busy processes entirely in order to vacate more free memory for these heavyweight processes. This algorithm can reduce thrashing as well as obtain a good performance. Anticipatory paging In anticipatory paging (prepaging) algorithm, system predicts which pages will be required soon and fetches them before they are referenced. In the above three algorithms, pages will be swapped in only if they are referenced. Nevertheless, if there are only a few pages are referenced, there will be lots of free memory. System will spend lots of time on fetching next pages. Prepaging is generally more efficient due to avoiding long disk latencies by fetching. Nonetheless, if the memory is fully used, it will increase page faults. In recent years, different anticipatory paging algorithms have been studied. Some algorithms allow programmers or compilers use hints to inform the system about future references in the applications [5]. In contrast, some algorithms such as OBL [6], OSF/1 clustered paging [7] do not use hint. Assume in the same example system, only several pages need to be fetched by page fault occurring, thus, numerous free memory is left. Figure 5 illustrates this example by demand paging system. Processes 1 to 8 are running and each of them has 2 pages in the main memory. We found that the memory from address 129K-256K is empty. In Anticipatory paging algorithm, we could utilize these free memory. Figure 6 shows that system swapped in more pages of process 1 and 2 which will be fetched soon. So the system needs not to fetch them when they are referenced in the near future. Main Memory Main Memory 0 8K 16K 64K Process 1 Process 1 Process 2-4 0 8K 16K 64K Process 5 128K Process 6-8 128K Figure 5 Demand Paging Systems Process 6-8 Process 2 … Free Process 2-4 Process 5 Free 256K Process 1 Process 1 … 256K Process 1 Figure 6 Anticipatory Paging (1) OSF/1 Clustered Paging OSF/1 UNIX system implemented prepaging for its virtual memory system. In this system, it uses page clusters, which are groups of pages that are adjacent in the virtual address space and are stored contiguously on swap space, and each cluster holds 8 pages. In this algorithm, all of the non-resident pages of a cluster are fetched when one of them is demanded by the main memory [4]. (2) Anticipatory paging with hint In this algorithm, system allows programmers or compilers to decide that which pages will be fetched before they are demanded by insert hint into the programs. In normal situation, this algorithm is more efficient than pervious one, because all the anticipated pages will be surely used soon. In contrast, in OSF/1 Clustered Paging, these pages are fetched by guessing. However, this algorithm also has several disadvantages. Firstly, programmers need to rewrite applications to insert these hints. Secondly, some programmers can not be trusted, because they may hint that all their pages are important [8]. Table 3 shows an example for OSF/1 clustered paging and anticipatory paging with hint. We assume the page fault is occurred by some ordered pages (1, 10, 21, 7). In the first algorithm, we found that the system also needs to swap in the pages. The prepaging mechanism makes no sense in this case. On the contrary, in the second algorithm, we assume programmer insert hint to tell the system this page order. The system only fetches pages 1, 10, 21 and 7 at the first time. After that, it needs not to fetch pages anymore. Consequently, in this situation, anticipatory paging with hint is more efficient than the other one. OSF/1 Clustered Paging (without hint) Page fault (by order): 1 10 21 7 Main Memory: 1-8 1-8, 8-16 1-8,9-16,17-24 1-8,9-16,17-24 Need to Swap in: Y Y Y N Anticipatory Paging with hint Main Memory: 1,10,21,7 1,10,21,7 1,10,21,7 1,10,21,7 Need to Swap in: Y N N N Table 3: comparison between two anticipatory paging algorithms Conclusion In this survey, we mentioned five memory management algorithms under UNIX. These algorithms focus on how the swap in and out pages between swap space and main memory. Early UNIX used swapping systems and demanding pages systems. After UNIX System V, Hybrid Memory Systems and Anticipatory paging without hint have been implemented. However, anticipatory paging with hint is seldom used. This might be it gives the programmers big burdens to rewrite the programs with hints are inserted in. We also found that in most situations, the newer algorithms are more efficient than the elders. Moreover, we discovered that every algorithm has its own advantages and disadvantages. Future work can focus on implement some more efficient algorithms which can support prepaging with hint as well as combine the advantages of other algorithms. Reference: [1] Kai Hwang, Advanced Computer Architecture - Parallelism, Scalability, and Programmability, P243-247 [2] Barry Wilkinson, computer architecture design and performance, second edition, 1996, P115-152 [3] Unix for Advanced Users, (http://www.uwsg.iu.edu/UAU/memory/) [4] Scott F. Kaplan, Lyle A. McGeoch, Megan F.Cole, Adaptive Caching for Demand Prepaging, 2002 [5] T. C. Mowry, A. K. Demke, and O. Krieger. Automatic compiler-inserted I/O prefetching for out-of-core applications, In Proceedings of the Second Symposium on Operating Systems Design and Implementation (OSDI), November 1996. [6] M. Joseph. An analysis of paging and program behaviour, 1970 [7] D. Black, J. Carter, G. Feinberg, R. MacDonald, S. Mangalat, E. Sheinbrood, J. Sciver, and P. Wang. OSF/1 virtual memory improvements. In Proceedings of the USENIX Mac Symposium, pages 87–103, November 1991 [8] Paging, A Memory Management Technique (http://www2.cs.uregina.ca/~hamilton/courses/330/notes/memory/paging.html) [9] Advanced Computer Architecture, chapter 3 Shared memory architecture