Memory Management

advertisement
Memory Management
•The part of O.S. that manages the
memory hierarchy (i.e., main memory and
disk) is called the memory manager
•Memory manager allocates memory to
process when they need it and deallocate
it when they are done. It also manages
swapping between main memory and
disk when main memory is too small to
hold all processes.
Memory Managements Schemes
• There are two types of memory managements:
the ones that move back and forth between main
memory and disk (i.e., swapping and paging)
and those that do not.
• The example of the second one is
monoprogramming without swapping (rarely
used today). In this approach only one program
is running at a time and memory is used by this
program and O.S.
• Three ways of organizing the memory in
monoprogramming approach is shown is next
slide
Relocation and Protection
• When a program is linked (all parts of program
are combined into a single address space), linker
should know at what address the program will
begin in memory.
• Because the program each time is loaded into a
different partition, its source code should be relocatable.
• One solution is relocation during loading which
means linker should modify the re-locatable
addresses.
• The problem is even this method can not stop the
program to read or writes the memory locations
belonging to the other users (i.e., protection
problem)
Relocation and Protection
• Solution to both of these problems is using two
special hardware registers, called base and limit
registers. When process is scheduled, the base
register is loaded with the address of start of its
partition and limit register with the length of its
partition.
Example: base addr, X if loaded at location x=10
x + 10
20
2x + 3
23
x +40
50
Swapping
• In the interactive system because sometimes
there is not enough memory for the active
processes, excess processes must kept on disk
and brought in to run dynamically.
• Swapping is one of the memory management
technique, in which each process is brought in
its entirely and after running it for a while putting
it back on the disk.
• The partitions can be variable that means
number, location and size of the partitions vary
dynamically to improve memory utilization
Swapping
• Swapping may cause multiple holes in memory in that
case all the processes should move downward to be able
to combine the holes into one big one ( memory
compaction)
• Swapping is slow and also if processes’ data segment
grow dynamically there should be a little extra memory
whenever a process is swapped in. If there is no space in
the memory when processes growing they should be
swapped out of the memory or killed (if there is no disk
space).
• Also if both stack segment (for return addresses and local
variable) and data segment (for heap and dynamically
allocated variables) of the processes grow, special
memory configuration should be considered (see next
slide)
Issues with Variable Partitions
• How do we choose which hole to assign to
a process? (allocation strategy)
• How do we manage memory to keep
track of holes? (see next slide)
• How do we avoid many small holes? One
possible solution is compaction but there
are several variations for compaction. The
questions is which processes should be
moved)
Memory Management with Bitmaps
• Each allocation unit in the memory corresponds
to one bit in the bitmap (e.g. it is 0 if the unit is
free and 1 if it is occupied)
• The size of allocation unit is important. The
smaller allocation unit the larger bitmap.
• If the allocation unit is chosen large, the bitmap
is smaller but if process size is not an exact
multiple of the allocation unit the memory may
be wasted
Memory Management with Linked Lists
• A linked list keeps track of allocated and free segments
(i.e., process or hole).
Memory Management with Linked Lists
• If the list is sorted by address the combinations of
terminating the process x can be shown as follows. List
update is different for each situation. Using double-linked
list is more convenient.
Memory Management with Linked Lists
For allocation of memory for new process allocation
strategies are:
• First fit : first available hole. Generates largest holes
• Next fit: next hole. Search starts from the previous place
instead of the beginning of the list
• Best fit: finds the best hole with the size close to the size
needed. It is slower and the results useless tiny holes
• Worst fit: takes the largest hole
• Quick fit: maintains the separate lists of common sizes
such as 4KB,12KB and 20KB holes. It is fast but the
disadvantage is merging of free spaces is expensive
Virtual Memory
• Overlays are the parts of program that are created by
programmer. By keeping some overlays on disk it was
possible to fit a program larger than memory in the
memory. Swapping overlays in and out (they call each
other) and also splitting a program to overlays were
boring and time consuming.
• To solve this problem virtual memory was devised in
1961. It means if the size of program exceeds the size of
memory, O.S. keeps those part of the program currently
in use, in main memory and the rest on disk. It can be
used in single program and multiprogramming
environments.
Paging
• Most of the virtual memory systems use a
technique called paging, which is based on
virtual addressing.
• When a program references an address, this
address is virtual address and forms virtual
address space. A virtual address does not go
directly to a memory bus. Instead it goes to
Memory Management Unit (MMU) that maps
virtual address onto physical memory address.
(see next slide)
Virtual Address
Paging
• Virtual address space is divided into the units
called pages corresponding to the units in the
physical memory called page frames.
• Page table maps virtual pages onto page frames.
(see next slide)
• If the related page frame exist in the memory
physical address is found and the address in the
instruction transformed into physical address.
• If related page frame does not exist in memory it
is a page fault. That means the related page exist
on disk. In case of page fault O.S. frees one of
the used page frames (writes it to disk) and
fetched the reference page into that page frame
and updates the page table.
Page Table
• The internal structure of the page table consists of 3 bits
that shows the number of page frame and a bit that shows
availability of page frame in memory.
• The incoming address consists of virtual page number
that translate to 3 bits address of page frame and offset
that copies directly to output address. (see next slide)
• Each process has its own page table. If O.S. copies the
process page table from memory to an array of fast
hardware registers, the memory references reduced but
since page table is large, it is expensive. If O.S. keeps the
process page table in the memory, there should be one or
two memory reference (reading the page table) for each
instruction. This is a disadvantage because of slowing
down the execution. We will see different solutions later.
Paging
• In general by knowing the logical address
space and page size the page table can
be designed
• For example with 16 bit addressing and 4k
page size: since 4K= 212 it means we need
12 bit offset and 16- 12 = 4 bit for higher
numbers for memory logical address. The
structure of page table can be same as
previous slide.
Paging
• Example: For the 8 bit addressing system and page size of 32
bytes we need 5 bits for offset and 3 bits for the higher bits of
memory logical address. If we assume 8 bit for physical
addressing the page table can be as the following:
000 010
001 111
010 011
011 000
100 101
101 001
110 100
111 110
changes logical address 011 01101=> 000 01101 (physical)
and logical address 110 0110=> 100 0110 (physical)
Issues with the Page Tables
• With 4KB page size and 32 bit address space 1
million pages can be addressed. It means the
page table must have 1 million entries. One of
the way for dealing with such a large page table
is using multilevel page table. With multilevel
page table we can keep only the pages that are
needed in the memory (see next slide).
• Another issue is mapping must be fast. For
example if the instruction takes 4 nsec, the page
lookup must be done under 1 nsec.
2 level scheme
• The first 2 bits are the index to the table1 that remains in
the main memory.
• The second 2 bits are index to table2 that contains the
pages which may not exist in memory. (table2 is subject to
paging)
table1
table2 page frame
00 ---------- 00
01
10 -------1011
11
01--------------- 00
01
……
for example 0010 1100 changes to 1011 1100
• Only the tables that are currently used are kept in the
memory
Structure of a PageTable Entry
• The modified bit or dirty bit shows if the page is
modified in the memory. In the case of reclaiming
the page frame page with dirty bit must be written
back to disk. Otherwise it could be abandoned.
TLBs Translation Lookaside Buffers
• Using fast hardware lookup cache ( typically
between 8-2048 entries) is standard solution to
speed up paging (see next slide).
• In general the logical address is searched in
TLB. If address is there and access does not
violate the protection bit the page frame is taken
directly from TLB. In case of miss it goes to the
page table. The missed page looked up from
page table and replaced by one of the entry of
TLB for the future use.
TLBs Translation Lookaside Buffers
Inverted Page Tables
• There is only one entry per page frame in real
memory.
• Advantage: Less memory is required for
example with 256 MB Ram and 4K page size 216
pages can hold in the memory so we need
65,536 (equal to 216) entries instead of 252
entries that was required for traditional page
table with 64bit addressing
• Disadvantage: Must search the whole table to
find the address of the page for the requested
process because it is not indexed on virtual
page (see the next slide).
Inverted Page Tables
Page Replacement Algorithm
• When a page fault occurs the operating system
has to choose the page to evict from the memory.
• The Optimal Page Replacement Algorithm is
based on the knowledge of the future usage of a
page. It evicts the pages that will be used as far
as possible in the future. It can not be
implemented except for if the program runs for the
second time and system recorded the usage of
the pages from the first time run of that program.
• The only use of optimal algorithm is to compare
its performance with the other realizable
algorithms.
NRU (Not Recently Used)
This algorithm uses the M (modified) and
R(referenced) bit of page table entry. On
each page fault it removes a page at
random from the lowest numbered of
following classes:
• Class 0: not referenced, not modified
• Class 1:not referenced , modified
• Class 2:referenced, not modified
• Class 3:referenced, modified
The Second Chance Replacement Algorithm
• FIFO: List of the pages with head and tail. On
each page fault the oldest page which is at the
head is removed
• Problem with FIFO is it may throw out the heavily
used pages that came early
• Second chance algorithm is enhancement of
FIFO to solve this problem
• On each page fault it checks the R bit of the
oldest page if it is 0, the page is replaced if it is 1,
the bit cleared and the page is put onto the end of
the list of pages. (see the next slide)
The Second Chance Replacement Algorithm
The Clock Page Replacement Algorithm
• Problem with Second Page Replacement
Algorithm is inefficiency because of moving
pages around its list.
• Clock Page Replacement solves this problem.
On each page fault the pointed page is
inspected. If its R bit is 0, the page is evicted
and pointer(hand) is advanced one position. If R
is 1, it is cleared and pointer is advanced to the
next page (see next slide)
The Clock Page Replacement Algorithm
The Least Recently Used (LRU) Page Replacement
Algorithm
LRU: throw out the page that has not been used for the
longest time. Implementation:
• Link list of the used pages.
Expensive because link list requires update on each
reference to the page
• 64 bit hardware counter that stores the time of references
for each page. On each page fault the page with lowest
counter is LRU page.
Each page table entry should be large enough to contain
the counter.
• Using a matrix (hardware) n*n for n page frames: On each
reference to the page k, first all bits of row k are set to 1
and then all bits of column k are set to zeros. The row with
the lowest binary value is the LRU page.
LRU with matrix. Pages: 0,1,2,3,2,1,0,3,2,3
Simulation LRU in Software
• NFU (Not Frequently Used) algorithm is
approximation of LRU that can be done in
software.
• There is a software counter for each page. O.S
adds the value (0 or 1) of the R bit related to that
page on each clock interrupt.
• The problem with NFU is it does not forget the
number of counts. It means if a page received
lots of references in the past, even if it has not
been used recently it never evicts from the
memory.
Simulation LRU in Software
• To solve this problem the modified algorithm,
known as aging works as the follows:
• 8 bit reference byte is used for each page and at
referenced intervals (clock tick) a one bit right
shift occurs. Then R bit is added to the leftmost
bit. The lowest binary value is the LRU page.
(see next slide)
• The difference between aging and LRU is in
aging if two pages contain the same value of 0,
we can not say which one is less used before 8
clock ticks.
Aging algorithm simulation of LRU
More on Page replacement
algorithms
• High paging activities is called thrashing . A
process is thrashing if it is spending more time
on paging than execution.
• Usually processes exhibit locality of references.
Means they reference small fraction of their
pages
• When a process brings the pages after a while it
has most of the pages it needs. This strategy is
called demand paging that means pages are
load on demand, not in advance.
• The strategies that try to load the pages before
process run are prepaging.
Working set model
• The set of pages that a process is currently using is
called its working set.
• If k considered the most recent memory references,
there is a wide range of k in which working set is
unchanged. (see next slide) It means prepaging is
possible.
• Prepaging algorithms guess which pages will be needed
when a process restarts. This prediction is based on the
process working set when process was stopped.
• At each page fault, working set page replacement
algorithms check to see if the page is part of the working
set of the current process or not.
• WSClock algorithm is the modified working set algorithm
that is based on clock algorithm.
Working set model
K
Summary of Page Replacement Algorithms
Modeling Page Replacement Algorithms
• For some page replacement algorithms,
increasing the number of frames does not
necessarily reduce the number of page faults.
This strange situation has become known as
Belady’s Anomaly.
For example FIFO caused more page faults with
four page frames than with three for a program
with five virtual pages and following page
references: (see next slide)
01 2301401234
Stack Algorithms
In these modeling algorithms, paging can be
characterized by three items:
1- The reference string of the executing process
2- The page replacement algorithm
3- The number of page frames available in the
memory
For example using LRU page replacement and
virtual address of eight pages and physical
memory of four pages next slide shows the state
of memory for each of the reference string:
021354637473355311172341
The State of Memory With LRU stack-based algorithm
The Distance String
• The distance from top of the stack is called distance
string.
• Distance string depends on reference string and paging
algorithm. It shows the performance of algorithm. For
example for (a) most of the string are between 1 and k. It
shows with a memory of k page frames few page fault
occur. But for (b) the reference are so spread out that
generates lots of page faults.
(a)
(b)
Segmentation
• Although paging provides a large linear address
space without having to buy physical memory but
using one linear address space may cause some
problems.
• For example suppose with having virtual address
of 0 to some maximum address we want to
allocate the space for different tables of a
program that are built up as compilation
proceeds. Since we have a one-dimensional
memory, all five tables that are produced by
compiler, should be allocated as contiguous
chunks. The problem is with growing tables, one
table may bump into another. (see the next slide)
Segmentation
• Segmentation is a general solution to this
problem. It is providing the machine with
many completely independent address
spaces, called segments
• Segments have variable sizes and each of
them can grow or shrink independently
without affecting each other.
• For example next slide shows a
segmented memory for the compiler tables
Advantages of Segmentation
• Simplifying handling of growing/shrinking data structures
• Since each segment starts with address of 0, if each
procedure occupies a separate segment then linking up
of procedures compiled separately is simplified. It is true
because a call to procedure in segment n uses two part
address (n,0) it means size changes of other procedures
does not affect the starting address which is n
• Sharing procedures or data between several processes
is easier with segments (e.g. shared library)
• Different segments can have different kind of protection
(e.g. read/write but not executable for an array segment)
• Major problem with segmentation is external
fragmentation that can be dealt by compaction (see next
slide)
Segmentation with Paging
• If the segments are large by paging them only
the pages that are needed can be placed in
memory.
• MULTICS (OS of Honeywell 6000) uses
segmentation with paging. Each 34-bit MULTICS
has segment number and address within that
segment. The segment number is used to find
segment descriptor. Segment descriptor points
to the page table by keeping 18 bits address
and contains the segment information (see next
slide)
Segmentation with Paging in MULTICS
• By finding segment number from virtual address
then if there is no segment fault, page frame
number can be find. If it is in the memory the
memory address of the page is extracted and
will be added to the offset. (see next slide)
• If page frame is not in the memory page fault
occurs.
• The problem is: this algorithm is slow so
MULTICS uses a 16-word TBL to speedup the
searching.
Download