Buffer Cache

advertisement
Buffer Caches
Chapter Four
Digital UNIX Internals II
4-1
Buffer Caches
File System I/O Using a Cache
user process
user process
buffer
buffer
read/write
mmap
buffer
On-disk Data
in memory cache
kernel
Digital UNIX Internals II
4-2
Buffer Caches
Process Reading One Byte
read ( ... ,1)
user process
A Buffer
kernel
Digital UNIX Internals II
4-3
Buffer Caches
File System Caches and I/O
• Read-ahead
– When a file system notices a file being read sequentially, it
can order the physical read of the next block(s) before the
application actually requests them.
• Write-behind
– Data blocks do not have to be immediately written to disk.
File systems can cluster together writes to contiguous disk
blocks to improve performance.
Digital UNIX Internals II
4-4
Buffer Caches
File System Caches in Digital UNIX
• (Traditional BSD UNIX) Buffer Cache
– From BSD
– Fixed pool of physical memory
• Unified Buffer Cache
– Similar to SunOS and SVR4
– Flexible pool of physical memory
– Supports memory mapping
Digital UNIX Internals II
4-5
Buffer Caches
Example: UFS uses both
vnode
v_type = VDIR
v_object
buf
v_cleanblkhd
v_dirtyblkhd
vnode
v_type = VREG
vm_object
v_object
v_cleanblkhd
vm_page
ob_memq
v_dirtyblkhd
vo_vp
vo_cleanpl
vm_vp_object
vo_cleanwpl
vo_dirtywpl
Digital UNIX Internals II
4-6
Buffer Caches
Traditional Buffer Cache
• Pool of Memory
– Allocated at boot time
– Shared with no other subsystem or allocator
• Buffer Structures
–
–
–
–
–
Links into access hash chain, LRU and same vnode lists
Device containing buffer
Pointer to vnode
Logical block in vnode
Pointer to routine called when I/O is done
• Linked lists of Buffers
– Hash chain bucket, LOCKED, LRU, AGE and EMPTY lists
Digital UNIX Internals II
4-7
Buffer Caches
struct buf
b_flags
buf
buf
Hash list
b_forw, b_back
buf
buf
Queue
av_forw, av_back
b_blockf, b_blockb
buf
buf
Vnode buffer list
b_bcount
b_bufsize
b_dev
b_error
Buffer
b_un
b_lblkno, b_blkno
b_resid
b_proc
b_hash_chain
b_iodone()
b_pagelist
b_vp, b_rvp
proc
buf
Head of hash lst
vm_page
vnode
ucred
b_rcred, b_wcred
Credentials
b_dirtyoff, b_dirtyend
driver fields
b_lock
b_iocomplete
Digital UNIX Internals II
4-8
Buffer Caches
Buffer Cache Lists
bfreelist[2]
AGE
bfreelist[0]
LOCKED
bfreelist[1]
LRU
buf
buf
buf
buf
buf
bufhash
bufhd
Buffer
Memory Pages
buf
buf
bfreelist[3]
EMPTY
buf
buf
buf
buf
buf
Digital UNIX Internals II
4-9
Buffer Caches
To Find a Buffer
1. Calculate hash index using disk block number (b_blkno) and vnode
(b_vp) (see BUFHASH macro in /sys/include/sys/buf.h).
2. Index into the hash list.
3. Follow hash pointer to buf structure in queue.
4. Identify the correct buf structure using vnode and block numbers.
5. If no match, follow hash pointer (b_forw) to next buf structure in
queue.
6. If you get to the end of the list (wraps back to beginning) without
finding the buf structure, it does not exist; allocate a new one from
the free list.
Digital UNIX Internals II
4 - 10
Buffer Caches
Getting a Buffer
bread()
VOP_STRATEGY()
getblk()
allocbuf()
getnewbuf()
Digital UNIX Internals II
4 - 11
Buffer Caches
UBC - Unified Buffer Cache(1)
• Motivation
– File Systems and Virtual Memory (Process Management)
compete for physical memory.
– UBC unifies previously separate pools of physical memory.
– Available Memory can be used by File Systems (UBC) or
VM on a first come first serve basis.
– VM can memory map a file using same memory object as
UBC.
• Utilizes memory from the available pool
– vm_page_queue_free
– vm_page_array
Digital UNIX Internals II
4 - 12
Buffer Caches
Unified Buffer Cache (2)
• Uses memory objects of type OT_UBC
– includes a pointer to a vnode
– associates cached pages with a specific file
– accessed by
• a file system looking for cached data
• memory management on pagefault for an mmap’d file
• Utilizes lists;
– vm_page_buckets to find vm_pages belonging to an object
– ubc_lru to time order when pages were cached
Digital UNIX Internals II
4 - 13
Buffer Caches
UBC Memory Object (OT_UBC)
struct vm_ubc_object
ob_memq
<lock>
ob_ops = u_anon_oop
vu_object
vm_page
vm_object_ops
ob_ref_count
vfs_ubcops
ob_res_count
vu_ops
ob_size
vu_vfp
ob_resident_pages
vu_cleanpl
ob_flags
vu_cleanwpl
ob_type
vu_dirtywpl
vu_wirecnt
vu_nsequential
vu_loffset
vu_stamp
vu_seglock
vu_seglist
vu_pshared
vu_freelists
Digital UNIX Internals II
4 - 14
Buffer Caches
UBC LRU Page Queue
• Least recently used list of UBC pages
– One per memory affinity domain
• vm_mads[N].md_ubc.ubc_lru
• Each is a struct vm_page
– vm_page -> vm_ubc_object -> vnode
• For each vnode's VM object,
–
–
–
–
clean page list
clean wired page list
dirty page list
dirty wired page list
Digital UNIX Internals II
4 - 15
Buffer Caches
UBC Routines (1)
Routine
Function
ubc_object_allocate()
Allocates a vm_ubc_object if the vnode is
a regular type and one has not already
been allocated.
Frees the vm_ubc_object when the
vnode is about to be reused.
Looks up the page at the specified offset
and specified vm_vp_object.
Looks for resident pages in the specified
range.
Allocates a page or returns a found page in
the page hash list.
Releases a page to the UBC LRU list or
system memory if possible.
ubc_object_free()
ubc_page_lookup()
ubc_incore()
ubc_page_alloc()
ubc_page_release()
Digital UNIX Internals II
4 - 16
Buffer Caches
UBC Routines (2)
Routine
Function
ubc_lookup()
Performs a hash search lookup on the page
at the specified offset. If found, removes the
page from the ubc_lru list and holds it.
Transitions a page from the vnode's clean
page list to its dirty page list.
Calls for mmap to free all clean pages and
writes all dirty pages.
Invalidates some (or all) resident pages for
a vnode.
Starts I/O on all dirty pages for a vnode.
Does not wait for I/O completion if flag
B_ASYNC is used.
ubc_page_dirty()
ubc_msync()
ubc_invalidate()
ubc_flush_dirty()
Digital UNIX Internals II
4 - 17
Buffer Caches
UBC Routines (3)
Routine
ubc_dirty_kluster()
ubc_bufalloc()
ubc_sync_iodone()
ubc_async_iodone_lwc()
Digital UNIX Internals II
Function
Creates a list of sorted pages for a
vnode. Assumes pages are scheduled
for writing.
Allocates a buf structure.
Waits for synchronous I/O transfer to
complete, then frees buf and pages.
Called as LWC when asyncronous I/O
transfer completes.
4 - 18
Buffer Caches
File System and VM Routines
System Call
read()
write()
VFS
VOP_READ
VOP_WRITE
File System
ufs_read()
ufs_write()
uiomove()
UBC
Resident
Page
Management
ufs_getpage()
returns
VM page
I/O
mmap
Page Fault Handler
Digital UNIX Internals II
4 - 19
Buffer Caches
Finding a UBC page from a file
system
VOP_READ(vnode, ...)
ufs_read(vnode, ...)
ufs_getpage(vnode, ...)
ufs_getapage(vnode,...)
ubc_lookup(vnode, ...)
vm_page_lookup(mem_obj, ..)
Digital UNIX Internals II
4 - 20
Buffer Caches
Limiting UBC
• ubc_dirty_thread
– Calls ubc_memory_flushdirty
• Launders excessive dirty pages via calls to FSOP_PUTPAGE()
• vm_pageout thread (pageout daemon)
– Runs vm_pageout_loop()
– When number of free pages is low and UBC has borrowed to many
pages,
• UBC pages are reclaimed off ubc_lru
• If no free pages, vm_page_alloc() may also come to
ubc_lru.
Digital UNIX Internals II
4 - 21
Buffer Caches
ubc_memory_purge() Flow
Start
Get ubc_lru page
Referenced
bit on?
Yes
Turn off and move
to tail of bc_lru
No
No
Dirty?
Free the page
Yes
Move page from vm_vp_obect
dirty list to clean list
Write the page out (VOP_PUTPAGE())
asynchronously
No
Freed enough?
Yes
Stop
Digital UNIX Internals II
4 - 22
Buffer Caches
Limiting the Amount of Dirty Data in
UBC
• UBC limits the percent of its cached data that is
modified
– improves performance by spreading out IO load
– minimizes loss of data if system crash
• Managed by separate kernel daemon thread
Digital UNIX Internals II
4 - 23
Buffer Caches
ubc_dirty_thread_loop() Flow
Start
Sleep on timer
No
Too many
dirty pages
Yes
Too many
dirty pages
No
Yes
Get ubc_lru_page
No
Dirty
Yes
Remove page from ubc_lru
Move page from vm_vp_obect
dirty list to clean list
Write the page out (VOP_PUTPAGE())
asynchronously
Digital UNIX Internals II
4 - 24
Buffer Caches
UBC Parameters and Thresholds (1)
Field
Description
ubc_pages
ubc_minpages
Count of UBC pages.
Smallest number of pages UBC will shrink to.
ubc_minpages = (vm_managed_pages *
ubc_minpercent)/100 where ubc_minpercent
is tunable (Default =10).
Upper limit of size of UBC. ubc_maxpages =
(vm_managed_pages * ubc_maxpercent)/100
where ubc_maxpercent is tunable (Default = 100).
Number of pages on the UBC LRU queue.
Determines if UBC should flush and free dirty pages.
ubc_dirty_limit=MAX(ubc_min_dirtypages,
((vm_tune_value(ubcdirtypercent) *
ubc_pages)/100)) where ubcdirtypercent is
tunable (Default =10).
ubc_maxpages
ubc_lru_count
ubc_dirty_limit
Digital UNIX Internals II
4 - 25
Buffer Caches
UBC Parameters and Thresholds (2)
Field
Description
ubc_dirty_pages
UBC page currently dirty; tracked by system.
ubc_borrowlimit
Number of pages UBC can have. If
ubc_pages>ubc_borrowlimt then UBC is asked
to free pages.
ubc_borrowlimit=(ubc_borrowpercent *
vm_managed_pages)/100 where
ubc_borrowpercent is 10 by default.
vm_perf.vpf_ubchit
Rate of UBC pages transitioning to the tail of the UBC
LRU list because a pmap_is_referenced returned
TRUE.
vm_perf.vpf_ubcalloc
Rate of UBC page allocation
vm_perf.vpf_ubcpagepushes Rate of pages being evicted from the UBC because
of memory reclamation activity.
vm_free_count
Current count of free pages.
Digital UNIX Internals II
4 - 26
Buffer Caches
Source Reference (1 of 4)
Buf Cache
• kernel/sys/buf.h
– definition of struct buf
• kernel/vfs/vfs_bio.c
– bfreelist[], bufhash and buf routines (bread() etc.)
Digital UNIX Internals II
4 - 27
Buffer Caches
Source Reference (2 of 4)
UBC
• kernel/vm/vm_page.h
– definitions of vm_page, vm_page_array
• kernel/vm/vm_resident.c
– definition of vm_page_bucket hashing array
• kernel/vfs/vfs_ubc.c
– definition of ubc lru list
• kernel/vm/vm_ubc.h
– definition of vm_ubc_object
• kernel/vfs/vfs_ubc.c
– implementation of ubc routines interface routines.
Digital UNIX Internals II
4 - 28
Buffer Caches
Source Reference (3 of 4)
Reading Data From a UBC Cached UFS File
• kernel/ufs/ufs_vnops.c
ufs_read()
ufs_getpage()
ufs_getapage()
• kernel/vfs/vfs_ubc.c
ubc_lookup()
• kernel/vm/vm_resident.c
vm_page_lookup()
Digital UNIX Internals II
4 - 29
Buffer Caches
Source Reference (4 of 4)
Pagefaulting on a UBC MMAPed Page
• kernel/arch/alpha/locore.s
XentMM
• kernel/arch/alpha/trap.c
trap()
• kernel/vm/vm_fault.c
vm_fault()
• kernel/vm/vm_umap.c
u_map_fault()
• kernel/vm/u_mape_vp.c
u_vp_fault()
Digital UNIX Internals II
4 - 30
Buffer Caches
Download