buffer

advertisement
제07강 : Loading File into Memory
Loading File into Memory
DMA buffer replacement LRU
1
cpu
• Buffer
Memory
• each buffer -- holds one disk block (sector)
• kernel has N buffers -- shared by all
buffer
DMA
sector
– OS needs information about each buffer
• user
• hw
• state
Clinton, Bob, ... (who’s using this buffer now)
device, sector number
free/used (empty/waiting/reading/full/locked/writing/dirty)
• “buffer header” (struct)
• stores all information about each buffer
• points to actual buffer
• buffer header has link fields (doubly linked)
– device_list,
free_buffer_list,
I/O_wait_list
2
“Buffer Cache”
• Managed like CPU cache
– read ahead (reada)
– delayed write (dwrite)
cpu
Memory
buffer
DMA
sector
• dwrite
– just set “dirty* bit” in buffer cache (on update)
– write to disk later (when it is being replaced)
• reada
– prefetch if offset moves sequentially
• dirty: data came from disk. Later memory copy is modified.
Now disk copy and memory copy are different
3
Delayed Write ---- Pros & cons
cpu
• Good performance
– many disk traffic can be saved
Memory
buffer
DMA
sector
• Complex reliability
– logically single information
– physically many copies (disk, buffer) -- inconsistency
– If system crashes ...
4
(1) problem detected
(2) computer full stop
t
5
Emergency action
during this period
problem
detected
& interrupt
computer
full stop
t
How many disk blocks can
you save during this interval?
6
Crash ...
• Only few blocks can be saved
• What happens if they cannot be saved…?
if lost, following goes wrong
superblock
inode
data block
which block is free/occupied?
pointer to file data block
if directories -- subtree structure
if regular files -- just a file content
• metadata are more important
– superblock, directory, inode
7
Super
block
root
directory
inode
Occupied
Holes
data
Damage --- if this block becomes bad block?
8
Crash ...
• In program, sync(2) system call
– sync(2) flush (disk write) dirty buffers
• doesn’t finish disk I/O (just queue them) on return
• So sync(2) twice …2nd return guarantees flush
• At keyboard
– updated calls sync(2) every 30 second -- periodic
– halt(8), shutdown(8) calls sync(2) -- by super user
try man 8 intro ….
(before logoff)
• Caution
– Do not power down without sync(2) or halt(8)
– Otherwise the system crashes. What if it crashes? 
9
fsck(8)
• file system check -- check & repair file system
–
–
–
–
performed at system bootup time
start from root inode -- mark all occupied blocks
start from superblock -- mark all free blocks
something is wrong if:
• some block has no incoming arc (unreachable)
• some block has many incoming arc (reached many times)
• lost+found
– Very time-consuming
10 ms. * (1 GB / 1 KB) = 10 mega ms. = 10,000 sec !!!
10
Design Goal
• Original UNIX file system design was
– cheap, good performance
– adequate reliability for School, SW house
• on power fault (電源 中斷)
– max. 30 seconds’ amount of work is gone
– most important metadata are saved
Power
Down?
– timesharing market (school, sw house)
Some
Contents
lost
• UNIX for bank?
– Need to solve these problems
30 sec
30 sec
flush
11
Modern systems
• System V
– To reduce boot time (minimize downtime)
•
•
•
•
On successful return from sync(2), make /fastboot file
if /fastboot exits, system was shutdown cleanly (don’t fsck)
After successful boot, remove /fastboot file
If /fastboot doesn’t exist, do fsck (only for /etc/fstab)
• Log Structured File System
– collect dirty nodes in one big segment (~track size)
Memory
– periodically write this log to disk
• fast -- no seek/rotational delay
– recovery is fast & complete
buffer
DMA
sector
12
“remove b”
Issues
directory
•
Transactional guarantee
–
–
–
–
•
Write all, or no write at all
“Account A  Account B (transfer $ 100)”
Atomic transaction
Write both or cancel both
a b dev bin
7 9 11 45
inode of b
pointers[ ]
Ordering guarantee
–
“Delete file A”
1. Modify parent directory’s data block (file name A)
2. Release file A’s inode (address of data block sectors, …)
3. Release file A’s data block
–
–
–
data
data
data
Suggested order : (3  2  1),
Otherwise, A’s inode exists, pointer exists, wrong data …,
Write the next block to disk, only if previous write is complete 
synchronous write
** Reference: Vahalia, 11.7.2
13
Back to buffer cache
14
Some buffers are linked to free buffer pool
22
14
74
23
25
37
88
45
11
83
32
19
Free buffers
15
Some buffers are allocated to a device
11
18
64
43
15
97
23
44
10
33
54
99
Disk 3
16
Allocate buffers to whom?
Process 1
Buffer cache
user
offset
CPU
inode
Linux
dev
CPU
17
11
18
64
43
15
97
23
44
10
33
54
99
Disk 3
Buffer header has flag
Among buf allocated to dev ...
some will do (waiting) DMA
some is currently doing DMA
others has done DMA
(I/O wait queue) within (dev)
18
Some buffers are waiting for disk I/O
I/O wait
Queue
11
18
43
15
23
44
33
54
Waiting to do DMA
Disk 3
has done DMA
19
struct buf
{
int
b_flags;
/* see defines below */
struct buf *b_forw;
struct buf *b_back;
struct buf *av_forw;
struct buf *av_back;
/* headed by devtab of b_dev */
/* " */
/* position on free list, */
/* if not BUSY*/
int
b_dev;
char *b_blkno;
int
b_wcount;
char b_error
/* major+minor device name */
/* block # on device */
/* transfer count (usu. words) */
/* returned after I/O */
char
char
/* low order core address */
/* high order core address */
*b_addr;
*b_xmem;
} buf[NBUF];
struct buf bfreelist;
20
struct devtab
{ char
char
struct
struct
struct
struct
};
struct
devtab
d_active;
d_errcnt;
buf *b_forw;
buf *b_back;
buf *d_actf;
buf *d_actl;
d_active
b_forw
b_back
d_actf
d_actl
/* busy flag */
/* error count (for recovery) */
/* first buffer for this dev */
/* last buffer for this dev */
/* head of I/O queue */
/* tail of I/O queue */
11
18
64
43
15
97
23
44
10
33
54
99
I/O waiting buffers
21
Remember ..OS Kernel
(plain C program with variables and functions)
Process 1
Process 2
Process 3
PCB
PCB
PCB
CPU
mem
disk
tty
CPU
mem
disk
tty
: Table (Data Structure)
: Object (hardware or software)
22
Kernel Data Structure
Process 1
devswtab
user
Buffer cache
inode
offset
disk_
read ( )
superblock
inode
data
devtab
CPU
CPU
/
bin
cc
date
etc
sh
getty
passwd
23
– Each buffer header has 4 link fields
– buf can belong to two doubly linked list at a time
– read(fd) system call
• get offset
• get inode
user
file
inode
dev
fd
offset
– checks access permission (rwx rwx rwx)
– mapping: offset  sector address
– get major/minor device number
• search buffer cache (buffer header has disk & sector #)
– start from device table, traverse the links
– compare each buffer with sector address
• if already in buffer cache, done
• if miss, then arrange to read from disk
24
– read() system call
{fd  offset  inode  device  search buffer list}
If (hit)
then
done /* return data from buffer cache */
else /* buffer cache miss – must read disk */
if (free buf available?)
then /* using this free buffer, read disk */
get buf  read disk  fill buf  done
else /* need replacement first */
{get most LRU buffer
If (dirty?) {write old content -first, delayed write}
{read disk  fill buf  done}
}
25
mounting
System can have many file systems
Compare with Windows {C: D: E: ...}
26
<Logically>
FS 1
At bootup time
specify which F.S. to boot
as a “root file system”
Bootblock
Superblock
Inode list
Data block
FS
FS 2
Bootblock
Superblock
Inode list
FS
Data block
FS
FS 3
Bootblock
Superblock
Inode list
Data block
27
<Logically>
FS 1
“root file system”
Bootblock
Superblock
Inode list
/
bin
Data block
dsk1
FS 2
Bootblock
Superblock
Inode list
FS 3
sh
getty
usr
passwd
dsk2
Now
all files under
root file system
can be accessed
dsk3
But
how do we access files
in other file systems?
Data block
Bootblock
Superblock
Inode list
date
etc
Windows  C: D: E:
Data block
28
<Logically>
FS 1
Bootblock
Superblock
Inode list
/
bin
Data block
dsk1
FS 2
Bootblock
Superblock
Inode list
date
etc
sh
usr
getty
passwd
Mount it!
dsk2
Data block
dsk3
FS 3
Bootblock
Superblock
Inode list
/dev/dsk3
/
bin
include
src
Data block
banner
yacc
studio.h
uts
29
System call
mount (path1, path2, option)
dev special file: /dev/dsk3 (which)
mount point:
/usr
example:
read-only (how)
After mounting,
/dev/dsk3 is accessed as /usr
i-numbers
in disk-1
root
superblock
date
/
etc
bin
sh
(where)
getty
usr
passwd
i-numbers
in disk-2
root
superblock
bin
banner
include
yacc
studio.h
src
uts
30
Mount Table Entry
Purpose:
- resolve pathname
- locate superblock
inode (/usr)
inode (root)
/
superblock
date
usr
etc
bin
sh
getty
passwd
bin
banner
device number
include
yacc
studio.h
src
uts
31
Relationship between Tables
Inode table
Mount table
Buffer Cabe
buf
inode
of /usr
Superblock
Mounted on inode
Root inode
inode
of dsk 3 root
32
Disk File System
•
•
•
•
Boot block
Superblock  pointers to free space in disk
inode list  pointers to data block
data block
• mounting file system
33
Download