Figure 3.1 Windows NT `s Address Space

advertisement
Linux vs. Windows NT Memory Management
Contents
1. Introduction
2. Linux Memory Management
2.1 Address Generation in x86
2.2 How Linux Does This
2.3 Page Allocation
2.4 Page Replacement Algorithm
2.5 Kernel Memory Allocation
2.5.1 Slab Layout
3. Windows NT Memory Management
3.1 Reserved vs. Committed Memory
3.2 Page Frame Database
3.3 Page Replacement
References
Linux vs. Windows NT Memory Management
1. Introduction :
Linux and WNT have the common concept in memory management , virtual
memory with paging .The purpose of this paper is to explain the memory management in
Linux and to provide a brief introduction to WNT memory management for comparison
purposes . The research in memory management had a great impact on design of the
hardware . So explanation of memory management is impossible without the discussion
of what support microprocessor or hardware provides to the operating system .
Although memory management in Linux is platform independent but most of
these platforms share a common architecture of page tables with varying paging levels.
Linux memory management was designed by taking 64-bit Alpha processor into
consideration. But it easily accommodates other platforms by slight modifications.
So this discussion is little bit hardware
same. The hardware which
I
dependent , although the basic idea is
have chosen is of course Intel Pentium/x86. This
information is valid for any Intel general purpose processor from 80386 to latest Pentium.
2. Linux Memory Management :
As in SVR4 and Solaris ,Linux also uses two separate memory management
schemes ; virtual memory management for user processes and kernel memory
management for the use of kernel .Linux divides the memory in two parts . Memory from
0 to 3GB(0xBFFFFFFF) is used for user processes and from 3GB to 4GB(0xFFFFFFFF )
is for kernel . This arrangement is shown in figure 2.1.
4GB
Kernel
Space
3GB
User Space
0
Figure 2.1 Linux Address Space
In user space the demand paged virtual memory scheme is used. Let us consider
the address generation mechanism w.r.t x86 to completely understand this concept.
2.1 Address Generation in x86 :
Intel x86 provides the support for both segmentation and paging . The maximum
segment size is 4GB which is the complete linear address space of the processor . Smaller
size segments are created by specifying limit field in the descriptor of the segment.
Logical Address
Selector
Offset
Linear Address
Space
Global Desc.
Table
Linear Address
segment
Physical Addr.
Dir
Table
Segment
Descriptor
offset
Space
Page
Lin Addr
Page Dir.
Table
Page
Phy Addr.
Entry
Entry
Figure 2.2 Segmentation And Paging In x86
To locate a byte in a particular segment , a logical address must be provided . A
logical address consists of a segment selector and an offset . A selector is a unique
identifier for a segment . Among other things it provides an offset into a descriptor table
to a data structure called a segment descriptor . A segment descriptor provides the base
address of the segment , along with the access rights and limit of the segment.
This base address is added with the offset from the logical address to generate a
linear address.
Now if paging is not used , the linear address space of the processor is mapped
directly into the physical address space of processor. But if the paging is used then the
32-bit linear address is treated as follows
31
21
Page Directory
11
Page Table
0
Offset
Figure 2.3 Linear Address
Where the right most 10 bits select a second level page table from the first level page
table called page directory . The next 10 bits select a page from the second level page
table and the last 12 bits are the address of the byte in the 4k size page.
2.2 How Linux does this ? :
As I said that segments can be any size from 0 to 4GB . Linux uses two
sizes. All the segments in user space for all the processors are of 3GB , and the segments
in kernel space are of 1GB starting from 3GB. It means Linux uses a kind of flat memory
model in which all the segments in user space share the same address space. Then how
does the memory is protected in this multitasking environment , the protection at page
level is used for this purpose. In a sense Linux uses pure paging mechanism for virtual
memory management . Now let us consider the platform independent paging scheme of
Linux .
Linux makes use of a three-level page table structure consisting of the following
types of tables :
Page Directory : This is top-level node , known as PAGE GLOBAL DIRECTORY or
“pgd” .
Page Middle Directory : A middle level node is called PAGE MIDDLE DIRECTORY or
“pmd” .
Page Table : A bottom level node which holds the actual PTE(page table entry)
describing pages.
Since x86 provides support for only two level paging the code that traverses the
“middle level “ of page tables does nothing on the x86 architecture --- it gets
preprocessed and compiled down to essentially nothing via platform specific #ifdefs .
This allows other code to be written as though all machines had three – level page tables.
2.3 Page Allocation :
The part of memory management which handles the allocation of pages or which
manages physical memory is called Zone Allocator . Different ranges of physical pages
may have different properties for the kernel purposes . For example DMA , may only
work for physical address less than 16MB . The zone allocator handles such differences
by dividing memory into a number of zones and treating each zone as a unit for allocation
purposes .Within each zone the buddy system is used to manage physical pages . Pages
are always allocated in blocks of 2n pages aligned on 2n –page boundary.
2.4 Page Replacement Algorithm :
The major component of the page replacement mechanism is a clock algorithm .
The clock algorithm is used because it provides an approximation of LRU replacement
and is cheaper to implement . Plus all common general purpose CPU’s have hardware
support for clock algorithm in the form of the reference bit maintained by PTE cache.
The simple clock scheme which uses only one bit is known as “second chance”
algorithm , because it gives a page a second chance to stay in memory one more sweep
cycle.
Linux uses a simple second chance (one-bit clock ) algorithm , but with several
elaborations and complications.
2.5 Kernel Memory Allocation :
The above discussed Buddy System
based zone allocator is a simple and
relatively fast allocator ; but it is a poor allocator in many respects . The fact that it can
only manage block sizes in powers of two means that using it straightforwardly requires
rounding the requested block sizes up to power of two , which can incur a large cost in
internal fragmentation .
Linux therefore uses one more memory allocator for kernel ‘s use called slab
allocator . The basic behind slab allocator is the concept of “object caching” , which is a
technique for dealing with objects that are frequently allocated and freed. In kernel the
small sized objects , like mutex for synchronization ,are very frequently created and
destroyed. However in many cases the cost of initializing and destroying the objects
exceeds the cost of allocating and freeing memory for it . So the idea is to preserve the
invariant portion of an object‘s initial state-its constructed state-between uses, so it does
not have to be destroyed and recreated every time the object is used. This is achieved by
caching the objects in small buffers.
The slab allocator uses the zone allocator to get the largish hunks of memory and
carves them into smaller pieces as needed .
A slab consists of one or more pages of virtually contiguous memory carved up
into equal size chunks , with a reference count of how many of those chunks have been
allocated.
2.5.1 Slab Layout:
The contents of each slab are managed by a kmem_slab structure that maintains
the slab’s linkage in the cache , its reference count , and its list of free buffers. In turn ,
each buffer in the slab is managed by a kmem_bufctl structure that holds the freelist
linkage , buffer addresses , and a back pointer to the controlling slab. This arrangement is
shown in figure.
Kmem
slab
Kmem
bufctl
Kmem
bufctl
Kmem
bufctl
Buf
Buf
buf
unused
Figure 2.4 Slab Layout
3. Windows NT Memory Management :
Windows NT provides a page-based memory management scheme that
allows applications to realize a 32 –bit linear address space for 4GB of memory . Like
Linux , WNT also divides the memory in two equal parts of 2GB each . This is shown in
figure 4.1 . Like Linux the upper half of the address space is reserved for system and
lower half is for
user processes. Similar to Linux, WNT also didn’t choose the
segmented memory architecture but it implemented the pure demand paged virtual
memory system . Same discussion of how the addresses are generated on x86 architecture
Reserved For
Use by
System
Available for
use by
application
4 GB
2GB
0
Figure 3.1 Windows NT ‘s Address Space
can also be applied to WNT . As told the address space integrity of the process is
preserved at page levels. This is achieved in two ways . First each process has its own
page-directory , so that it can not access the address space of any other process . Second
the access rights bits of the PTE can be used to protect the individual pages from being
accidentally corrupted by the process itself.
3.1 Reserved vs. Committed Memory :
In Windows NT, a distinction exists between memory and address space.
Although each process has a 4-GB address space, rarely if ever will it realize anywhere
near that amount of physical memory. Consequently, the virtual-memory manager must
keep track of the used and unused addresses of a process, independent of the pages of
memory it is actually using. In actuality this amounts to having a structure for
representing all of the physical memory in the system and a structure for representing
each process's address space.
As part of the process object (the overhead associated with every process in
Windows NT), the VMM stores a structure called the virtual address descriptor (VAD)
tree to represent the address space of a process. As address space gets used for a process,
the VMM updates the VAD tree to reflect which addresses are used and which are not.
3.2 The Page-Frame Database:
The virtual-memory manager uses a private data structure for maintaining the
status of every physical page of memory in the system. The structure is called the pageframe database. The database contains an entry for every page in the system, as well as a
status for each page. The status of each page falls into one of the following categories:
Valid : A page in use by an active process in the system. Its PTE is marked as valid.
Modified: A page that has been written to, but not written to disk. Its PTE is marked as
invalid and in transition.
Free : A page with no corresponding PTE and available for use. It must first be zeroed
before being used unless it is used as a read-only page.
Zeroed : A free page that has already been zeroed and is immediately available for use by
any process.
Bad : A page that has generated a hardware error and cannot be used by any process in
the system.
Most of the status types are common to most paged operating systems, but the
two transitional page status types are unique to Windows NT. If a process addresses a
location in one of these pages, a page fault is still generated, but very little work is
required of the VMM. Transitional pages are marked as invalid, but they are still resident
in memory, and their location is still valid in the PTE. The VMM merely has to change
the status on this page to reflect that it is valid in both the PTE and the page-frame
database, and let the process continue.
Process Page Table
Page Frame Database
Valid
PTE
Free
Modifed
Standby
Valid
PTE
PTE
Free
Figure 3.2
3.3 Page Replacement :
In Windows NT, the component responsible for making page replacement
decisions is called the working-set manager. When a process starts, the VMM assigns it a
default working set that indicates the minimum number of pages necessary for the
process to operate efficiently. The working-set manager periodically tests this quota by
stealing Valid pages of memory from a process. If the process continues to execute
without generating a page fault for this page, the working set is reduced by one, and the
page is made available to the system.
The act of stealing a page from a process actually occurs in two stages. First, the
working-set manager changes the PTE for the page to indicate an invalid page in
transition. Second, the working-set manager also updates the page-frame database entry
for the physical page, marking it as either Modified or Standby, depending on whether
the page is dirty or not.
References:

UNIX System for Modern Architectures ; Curt Schimmel , Addison-Wesley

Linux Memory Management Documentation ; http://www.linuxmm.org/docs.shtml

THE GNU/LINUX 2.2 VIRTUAL MEMORY SYSTEM, PART I ; Paul Wilson

Operating Systems , Fourth Edition ; William Stallings ,Prentice Hall

Linux MM : Design of a Zone based memory allocator ; Rik Van Riel , July 1998

MSDN Library , Microsoft , Memory Management In Microsoft Windows.
Download