CPS 210 Unix and Beyond Jeff Chase Duke University

advertisement
Duke Systems
CPS 210
Unix and Beyond
Jeff Chase
Duke University
http://www.cs.duke.edu/~chase/cps210
“Just make it”
• To get started on heap manager,
download the files and type “make”.
– Provides a script to build the heap manager
test programs on Linux or MacOS.
• This lab is just a taste of system
programming in C.
• The classic text is CS:APP.
• Also see PDF “What every computer
systems student should know about
computers” on the course website.
• You may think of it as notes from
CS:APP. It covers background from
Computer Architecture and also some
material for this class.
http://csapp.cs.cmu.edu
a classic
64 bytes: 3 ways
p + 0x0
0x0
int p[]
int* p
char p[]
char *p
0x1f
p
0x0
char* p[]
char** p
0x1f
Pointers (addresses) are 8
bytes on a 64-bit machine.
0x1f
Alignment
p + 0x0
0x0
int p[]
int* p
X
char p[]
char *p
X
0x1f
p
char* p[]
char** p
0x0
X
0x1f
The machine requires that an n-byte value
is aligned on an n-byte boundary. n = 2i
0x1f
Heap allocation
A contiguous chunk of
memory obtained from
OS kernel.
E.g., with Unix sbrk()
system call.
A runtime library obtains the
block and manages it as a
“heap” for use by the
programming language
environment, to store
dynamic objects.
E.g., with Unix malloc and
free library calls.
Allocated heap blocks
for structs or objects.
Align!
Variable Partitioning
Variable partitioning is the strategy of parking differently sized cars
along a street with no marked parking space dividers.
1
2
3
Wasted space
external fragmentation
Alternative: block maps
The storage in a heap block is
contiguous in the VAS. C and
other PL environments require this.
That complicates the heap
manager because the heap
blocks may be different sizes.
Idea: use a level of indirection
through a map to assemble a
storage object from “scraps” of
storage in different locations.
The “scraps” can be fixed-size
slots: that makes allocation
easy because they are
interchangeable.
map
Example: page tables that
implement a VAS.
Indirection
Fixed Partitioning
Wasted space
internal fragmentation
Post-note
• We took much of the class talking about some general
issues for naming, illustrated in Unix.
• Block maps and other indexed maps are common
structure to implement “machine” name spaces:
– sequences of logical blocks, e.g., virtual address spaces, files
– process IDs, etc.
– For sparse block spaces we may use a tree hierarchy of block
maps (e.g., inode maps or 2-level page tables, later).
– Storage system software is full of these maps.
• Symbolic name spaces use different kinds of maps.
– They are sparse and require matching  more expensive.
– Trees of maps create nested namespaces, e.g., the file tree.
Files: hierarchical name space
root directory
applications etc.
mount point
user home
directory
external media
volume or
network
storage
File I/O
char buf[BUFSIZE];
int fd;
Pathnames are translated
through the directory tree,
starting at the root directory or
current directory.
if ((fd = open(“../zot”, O_TRUNC | O_RDWR) == -1) {
perror(“open failed”);
Every system call should
exit(1);
check for errors and
}
handle appropriately.
while(read(0, buf, BUFSIZE)) {
if (write(fd, buf, BUFSIZE) != BUFSIZE) { File grows as process
perror(“write failed”);
writes to it  system
exit(1);
must allocate space
}
dynamically.
}
System finds the physical disk locations of the file’s logical blocks
by indexing a block map (the file’s index node or “inode”).
A filesystem on disk
inode 0
bitmap file
inode 1
root directory
fixed
locations
on disk
11100010
00101101
10111101
wind: 18
0
snow: 62
0
once upo
n a time
/n in a l
10011010
00110001
00010101
allocation
bitmap file
blocks
rain: 32
hail: 48
directory
blocks
file
blocks
00101110
00011001
01000100
and far
far away
, lived th
regular file
(inode)
This is a toy example (Nachos).
Names and layers
User
view
notes in notebook file
Application
notefile fd, byte range*
fd
bytes
block#
File System
device, block #
Disk Subsystem
surface, cylinder, sector
Add more layers as needed.
Directories
A creat operation must scan
the directory to ensure that
creates are exclusive.
wind: 18
0
directory
inode
snow: 62
0
There can be no duplicate
names: the name mapping
is a function.
rain: 32
hail: 48
Note: implementations
vary. Large directories
are problematic.
lblock 32
Entries or free slots are typically found by a linear scan.
Operations on Directories
(UNIX)
•
•
•
•
•
Link - make entry pointing to file
Unlink - remove entry pointing to file
Rename
Mkdir - create a directory
Rmdir - remove a directory
Links
usr
ln -s /usr/Marty/bar bar
Lynn
creat foo
unlink foo
foo
Marty
creat bar
ln
/usr/Lynn/foo
bar
unlink
bar
bar
Unix File Naming (Hard Links)
directory A
A Unix file may have multiple names.
Each directory entry naming the file is
called a hard link.
Each inode contains a reference count
showing how many hard links name it.
directory B
0
rain: 32
wind: 18
0
hail: 48
sleet: 48
inode link
count = 2
inode 48
link system call
link (existing name, new name)
create a new name for an existing file
increment inode link count
unlink system call (“remove”)
unlink(name)
destroy directory entry
decrement inode link count
if count == 0 and file is not in active use
free blocks (recursively) and on-disk inode
Illustrates: garbage collection by reference counting.
Unix Symbolic (Soft) Links
A soft link is a file containing a pathname of some
other file.
directory A
directory B
0
rain: 32
wind: 18
0
hail: 48
sleet: 67
symlink system call
symlink (existing name, new name)
allocate a new file (inode) with type symlink
initialize file contents with existing name
create directory entry for new file with new name
inode link
count = 1
../A/hail/0
inode 48
inode 67
The target of the link may be
removed at any time, leaving
a dangling reference.
How should the kernel
handle recursive soft links?
Concepts
•
•
•
•
•
Reference counting and reclamation
Redirection/indirection
Dangling reference
Binding time (create time vs. resolve time)
Referential integrity
Processes and the kernel
Programs
run as
independent
processes.
data
data
Protected
system calls
Protected OS
kernel
mediates
access to
shared
resources.
Each process
has a private
virtual address
space and one
thread.
...and upcalls
(e.g., signals)
Threads
enter the
kernel for
OS
services.
The kernel is a separate component/context with enforced modularity.
The kernel syscall interface supports processes, files, pipes, and signals.
GS4. Layered systems
Garlan and Shaw, An Introduction to Software Architecture, 1994.
Processes: A Closer Look
virtual address space
+
The address space is
a private name space
for a set of memory
segments used by the
process.
The kernel must
initialize the process
memory for the
program to run.
thread
stack
process descriptor (PCB)
+
Each process has a thread
bound to the VAS.
The thread has a stack
addressable through the
VAS.
The kernel can
suspend/restart the thread
wherever and whenever it
wants.
user ID
process ID
parent PID
sibling links
children
resources
The OS maintains
some state for each
process in the
kernel’s internal
data structures: a
file descriptor table,
links to maintain the
process tree, and a
place to store the
exit status.
VAS example (32-bit)
• An addressable array of bytes…
0x7fffffff
Reserved
Stack
• Containing every instruction the
process thread can execute…
• And every piece of data those
instructions can read/write…
– i.e., read/write == load/store
• Partitioned into logical segments
with distinct purpose and use.
• Every memory reference by a thread
is interpreted in its VAS context.
– Resolve to a location in machine memory
• A given address in different VAS
may resolve to different locations.
Dynamic data
(heap/BSS)
Static data
Text
(code)
0x0
A Peek Inside a Running Program
0
CPU
common runtime
x
your program
code library
your data
R0
heap
Rn
PC
SP
x
y
registers
y
stack
high
“memory”
address space
(virtual or physical)
Unix File Descriptors Illustrated
user space
kernel
file
pipe
process file
descriptor
table
socket
open file table
Processes may share open files
(“objects”), but the binding of file
descriptors to objects is specific
to each process.
e.g., see the dup system call
tty
Disclaimer:
this drawing is
oversimplified
.
Networking
endpoint
port
operations
advertise (bind)
listen
connect (bind)
close
channel
binding
connection
node A
write/send
read/receive
node B
Some IPC mechanisms allow communication across a network.
E.g.: sockets using Internet communication protocols (TCP/IP).
Each endpoint on a node (host) has a port number.
Each node has one or more interfaces, each on at most one network.
Each interface may be reachable on its network by one or more names.
E.g. an IP address and an (optional) DNS name.
Networking stack
What is a distributed system?
"A distributed system is one in which the
failure of a computer you didn't even know
existed can render your own computer
unusable." -- Leslie Lamport
Leslie Lamport
Example: browser
GS6. Interpreter
Garlan and Shaw, An Introduction to Software Architecture, 1994.
Interpreter: example
An interpreter controls
how a program executes
and what it sees.
An interpreter can
“sandbox” a program
for isolation.
Processes in the browser
Threads: a familiar metaphor
1
Page links and
back button
navigate a
“stack” of pages
in each tab.
2 Each tab has its own stack.
One tab is active at any given time.
You create/destroy tabs as needed.
You switch between tabs at your whim.
3
Similarly, each thread has a separate stack.
The OS switches between threads at its whim.
One thread is active per CPU core at any given time.
time 
Fork
• Child can’t be an exact copy
• Is distinguished by one variable (the return value of fork)
if (fork () == 0) {
/* child */
execute new program
} else {
/* parent */
carry on
}
Memory and fragmentation
An advantage of address spaces
Enforced modularity
Concept: garbage collection
Managing the pointers
Post-note: understand garbage collection
• Garbage collection: the language runtime system calls the
underlying heap manager to free unused heap blocks
automatically; the program itself does not have to do it.
– Java does it for you, but C does not.
• A heap block is “garbage” only when there are no references to
the block, e.g., no pointers to the object that lives in that block.
– A reference is a stored name. The garbage collector counts these
references, and marks a block as garbage when all references to it
are gone. To do that it must find/identify all stored references.
• Java knows the types of all of a program’s data objects, so it
can find stored references and identify their targets.
• A language that supports garbage collection may also move
objects around to compact the heap to reduce fragmentation.
• Weakly typed languages like C cannot do this for you. Q: can a
file system garbage collect or compact stored data on disk?
Post-note
• Next slide gives more detail on fork/exit.
• We will discuss kernel protection and kernel
entry and exit more later.
Mode Changes for Fork/Exit
• Syscall traps and “returns” are not always paired.
• Fork “returns” (to child) from a trap that “never
happened”
• Exit system call trap never returns
• System may switch processes between trap and return
parent
Fork
call
Fork
return
Wait
call
Wait
return
Exec enters the child by
doctoring up a saved user
context to “return” through.
child
Fork
entry to
user space
Exit
call
transition from user to kernel mode (callsys)
transition from kernel to user mode (retsys)
Example: System Call Traps
• Programs in C, C++, etc. invoke system calls by
linking to a standard library of procedures written in
assembly language.
– the library defines a stub or wrapper routine for each syscall
– stub executes a special trap instruction (e.g., chmk or
callsys or int)
Alpha CPU architecture
– syscall arguments/results passed in registers or user stack
read() in Unix libc.a Alpha library (executes in user mode):
#define SYSCALL_READ 27
move arg0…argn, a0…an
move SYSCALL_READ, v0
callsys
move r1, _errno
return
# op ID for a read system call
# syscall args in registers A0..AN
# syscall dispatch index in V0
# kernel trap
# errno = return status
Representing a File On Disk
file attributes: may include
owner, access control list, time
of create/modify/access, etc.
once upo
n a time
/nin a l
logical
block 0
and far
far away
,/nlived t
logical
block 1
block map
Index by logical block number
physical block pointers in the
block map are sector IDs or
physical block numbers
“inode”
he wise
and sage
wizard.
logical
block 2
Post-note
• The following slides were presented in the next class (on
Android) as intro to motivate Android.
• Android keeps the Unix (Linux) kernel, but replaces the entire
application framework.
– Shell is gone. App execution is controlled by trusted system-wide
server process, which is part of the system TCB.
– Pipes are gone. Apps interact through system events (intents) and
service bindings (binder RPC).
– There is only one user, but each app has its own userID.
– Each app has at most one instance, with its private files.
– Terminals are gone: user opens screens (activities) to interact with
apps. The system keeps an activity stack with a “back” button.
• foreground and background activities?
– System launches app components and reclaims them at suitable
times. They don’t “exit”.
Unix, looking backward: UI+IPC
• Conceived around keystrokes and byte streams
– User-visible environment is centered on a text-based
command shell.
• Limited view of how programs interact
– files: byte streams in a shared name space
– pipes: byte streams between pairs of sibling processes
Unix, looking backward: upcalls
• Limited view of how programs interact with the OS.
– The kernel directs control flow into user process at a fixed entry
point: e.g., entry for exec() is _crt0 or “main”.
– Process may also register a signal handlers for events relating to
the process, (generally) signalled by the kernel.
– Process lives until it exits voluntarily or fails
• “receives an unhandled signal that is fatal by default”.
data
Protected
system calls
data
...and upcalls
(e.g., signals)
X Windows (1985)
Big change: GUI.
1. Windows
2. Window server
3. App events
4. Widget toolkit
Unix, looking backward: security
• Presumes multiple users sharing a machine.
• Each user has a userID.
– UserID owns all files created by all programs user runs.
– Any program can access any file owned by userID.
• Each user trusts all programs it chooses to run.
– We “deputize” every program.
– Some deputies get confused.
– Result: decades of confused deputy security problems.
• Contrary view: give programs the privileges they
need, and nothing more.
– Principle of Least Privilege
Download