PPT - Duke University

advertisement
Duke Systems
CPS 210
Unix and All That
Jeff Chase
Duke University
http://www.cs.duke.edu/~chase/cps210
Unix: A lasting achievement?
“Perhaps the most important achievement of Unix is to
demonstrate that a powerful operating system for
interactive use need not be expensive…it can run on
hardware costing as little as $40,000.”
DEC PDP-11/24
The UNIX Time-Sharing System*
D. M. Ritchie and K. Thompson
1974
http://histoire.info.online.fr/pdp11.html
Let’s pause a moment to reflect...
Performance (vs. VAX-11/780)
10000
1000
From Hennessy and Patterson,
Computer Architecture: A
Quantitative Approach, 4th edition,
2006
??%/year
52%/year
100
Core Rate
(SPECint)
10
25%/year
Note
log scale
1
1978 1980 1982 1984 1986 1988 1990 1992 1994 1996 1998 2000 2002 2004 2006
Today Unix runs embedded in
devices costing < $100.
Small is beautiful?
The UNIX Time-Sharing System*
D. M. Ritchie and K. Thompson
1974
[RT74]: historical hardware details
• [Ritchie/Thompson74] is the classic reference on Unix.
• In 1974, the advances we take for granted were in the future.
• They had to prove it on the hardware they had at the time.
• Many specific implementation choices have changed.
– 14 –character file names
– assembly language  C
– 7 protection bits on files
– i-numbers and i-list
– 512-byte blocks
– ppt is “paper tape”???
– vowel embargo
The UNIX Time-Sharing System*
D. M. Ritchie and K. Thompson
1974
Some lessons of history
• At the time it was created, Unix was the “simplest
multi-user OS people could imagine.”
– It’s in the name: Unix vs. Multics
• Simple abstractions can deliver a lot of power.
– Many people have been inspired by the power of Unix.
• The community spent four decades making Unix
complex again....but the essence is unchanged.
• Unix is a simple context to study core issues for
classical OS design. “It’s in there.”
• Unix variants continue to be in wide use.
• They serve as a foundation for advances.
Abstraction
The UNIX Time-Sharing System*
D. M. Ritchie and K. Thompson,1974
Innovation
Simple?
• users
• files
• processes
• pipes
– which “look like” files
These persist across reboots.
They have symbolic names (you
choose it) and internal IDs (the
system chooses).
These exist within a running
system, and they are transient:
they disappear on a crash or
reboot. They have internal IDs.
Unix supports dynamic create/destroy of these objects.
It manages the various name spaces.
It has system calls to access these objects.
It checks permissions.
Unix: some key concepts
• Names and namespaces
– directories and pathnames
– name tree and subtree grafting (mount)
– root directory and current directory
– path prefix list
– resolution
– links (aliases) and reference counting
• Access control by tags and labels
– inheritance of tags and labels
• Context manipulation
– fork vs. exec
Files: hierarchical name space
root directory
applications etc.
mount point
user home
directory
external media
volume or
network
storage
“Everything is a file”
“Files”
regular
Afiles
The UNIX Time-Sharing System*
D. M. Ritchie and K. Thompson,1974
Universal Set
special
Bfiles
directories
File I/O
Open files are named within the
process by an integer file descriptor.
char buf[BUFSIZE];
int fd;
Pathnames may be relative to
process current directory.
if ((fd = open(“../zot”, O_TRUNC | O_RDWR) == -1) {
perror(“open failed”);
Process passes status
exit(1);
back to parent on exit, to
}
report success/failure.
while(read(0, buf, BUFSIZE)) {
if (write(fd, buf, BUFSIZE) != BUFSIZE) {
perror(“write failed”);
Process does not specify
exit(1);
current file offset: the
}
system remembers it.
}
Standard descriptors (0, 1, 2) for input, output,
error messages (stdin, stdout, stderr).
“Components in context”
execute
Program
Context
(Domain)
Thread
A context defines an isolated
sandbox for a running program, so
that it can use only the data and
resources that the OS grants it.
For our purposes, an operating system is a platform that supports
protection and isolation: every component runs within a context.
Program, context and thread are OS abstractions.
Running a program
code
constants
initialized data
imports/exports
symbols
types/interfaces
data
Program
“Unix Classic” simplifications
Context == process == (1 VAS + 1 thread + ...)
Each process runs exactly one program/component instance (at a time).
IPC channels are pipes.
All I/O is based on a simple common abstraction: file / stream.
The theater analogy
script
context
(stage)
Threads
Program
Address space
Running a program is like performing a play.
[lpcox]
Processes and the kernel
Programs
run as
independent
processes.
data
data
Protected
system calls
Protected OS
kernel
mediates
access to
shared
resources.
Each process
has a private
virtual address
space and one
thread.
...and upcalls
(e.g., signals)
Threads
enter the
kernel for
OS
services.
The kernel is a separate component/context with enforced modularity.
The kernel syscall interface supports processes, files, pipes, and signals.
Enforced modularity
pipe
(or other
channel)
An important theme from Monday’s class
By putting each component instance in a separate context, we
can enforce modularity boundaries among components. Each
component runs in a sandbox: they can interact only through
pipes. Neither can access the internals of the other.
Unix defines uniform, modular ways to combine
programs to build up more complex functionality.
Other application programs
sh
nroff
who
cpp
a.out
Kernel
date
comp
Hardware
cc
wc
as
ld
grep
vi
ed
Other application programs
A key idea: Unix pipes
[http://www.bell-labs.com/history/unix/philosophy.html]
Unix programming environment
Standard unix programs
read a byte stream from
standard input (fd==0).
stdin
They write their output to
standard output (fd==1).
stdout
Stdin or stdout
might be bound to a
file, pipe, device, or
network socket.
If the parent sets it up,
the program doesn’t
even have to know.
That style makes it
easy to combine
simple programs
using pipes or files.
Unix fork/exec/exit/wait syscalls
fork parent
fork child
initialize
child
context
exec
int pid = fork();
Create a new process that is a clone of
its parent.
exec*(“program” [, argvp, envp]);
Overlay the calling process with a new
program, and transfer control to it.
exit(status);
Exit with status, destroying the process.
Note: this is not the only way for a
process to exit!
wait
exit
int pid = wait*(&status);
Wait for exit (or other status change) of a
child, and “reap” its exit status. Note:
child may have exited before parent calls
wait!
Wait
Unix: users and their namespaces
• A unix system has a set of user accounts.
– identities, principals
– often correspond to real users, but not always
• Each account has a username.
– a human-readable character string: “chase”
– also called a symbolic name
• Each account has a userID
– a number for internal use
• These namespaces are flat.
• The system keeps a bidirectional map:
– f(username) = userID or 
Protection Systems 101
Reference monitor
Example: Unix kernel
Isolation boundary
Principles of Computer System Design  Saltzer & Kaashoek 2009
Labels and access control
Alice
Every file and every process is
labeled/tagged with a user ID.
log in
login
fork, setuid(“alice”),
A privileged process may
set its user ID.
Bob
login
fork, setuid(“bob”),
exec
shell
exec
shell
fork/exec
fork/exec
creat(“foo”)
tool
write,close
uid=“alice”
A process inherits its userID
from its parent process.
foo
open(“foo”)
read
owner=“alice”
tool
uid=“bob”
A file inherits its owner userID from
its creating process.
Labels and access control
Every system defines rules for
assigning security labels to
subjects (e.g., Bob’s process)
and objects (e.g., file foo).
Alice
login
shell
Every system defines rules to
compare the security labels to
authorize attempted accesses.
Bob
login
shell
creat(“foo”)
tool
uid=“alice”
write,close
foo
open(“foo”)
read
owner=“alice”
Should processes running with
Bob’s userID be permitted to
open file foo?
tool
uid=“bob”
Post-note
• We talked about access policy in vanilla Unix.
• The owner of a Unix file may tag it with additional status
specifying access rights for subjects.
– Access types = {read, write, execute} [3 bits]
– Subject types = {owner, group, other/anyone} [3 bits]
– If the file is executed, should the system setuid the process to the
userID of the file’s owner. [1 bit]
– 10 bits total: (3x3)+1. Usually given in octal: e.g., “777” means 9
bits set: anyone can r/w/x the file, but no setuid.
– It is a very simple form of an access control list (ACL). Later
systems like AFS have richer ACLs.
• Unix provides a syscall and shell command for owner to set the
permission bits on each file (inode).
• “Group” was added later and is a little more complicated: a
user may belong to multiple groups.
Init and Descendents
Kernel “handcrafts”
initial process to run
“init” program.
Other processes descend from
init, and also run as root,
including user login guards.
Login invokes a setuid
system call to run user
shell in a child process
after user authenticates.
Children of user shell
inherit the user’s
identity (uid).
Processes: A Closer Look
virtual address space
+
The address space is
a private name space
for a set of memory
segments used by the
process.
The kernel must
initialize the process
memory for the
program to run.
thread
stack
process descriptor (PCB)
+
Each process has a thread
bound to the VAS.
The thread has a stack
addressable through the
VAS.
The kernel can
suspend/restart the thread
wherever and whenever it
wants.
user ID
process ID
parent PID
sibling links
children
resources
The OS maintains
some state for each
process in the
kernel’s internal
data structures: a
file descriptor table,
links to maintain the
process tree, and a
place to store the
exit status.
VAS example (32-bit)
• An addressable array of bytes…
0x7fffffff
Reserved
Stack
• Containing every instruction the
process thread can execute…
• And every piece of data those
instructions can read/write…
– i.e., read/write == load/store
• Partitioned into logical segments
with distinct purpose and use.
• Every memory reference by a thread
is interpreted in its VAS context.
– Resolve to a location in machine memory
• A given address in different VAS
may resolve to different locations.
Dynamic data
(heap/BSS)
Static data
Text
(code)
0x0
64 bytes: 3 ways
p + 0x0
0x0
int p[]
int* p
char p[]
char *p
0x1f
p
0x0
char* p[]
char** p
0x1f
Pointers (addresses) are 8
bytes on a 64-bit machine.
0x1f
Alignment
p + 0x0
0x0
int p[]
int* p
X
char p[]
char *p
X
0x1f
p
char* p[]
char** p
0x0
X
0x1f
The machine requires that an n-byte value
is aligned on an n-byte boundary. n = 2i
0x1f
Heap allocation
A contiguous chunk of
memory obtained from
OS kernel.
E.g., with Unix sbrk()
system call.
A runtime library obtains the
block and manages it as a
“heap” for use by the
programming language
environment, to store
dynamic objects.
E.g., with Unix malloc and
free library calls.
Allocated heap blocks
for structs or objects.
Align!
Alternative: block maps
The storage in a heap block is
contiguous in the VAS. C and
other PL environments require this.
That complicates the heap
manager because the heap
blocks may be different sizes.
Idea: use a level of indirection
through a map to assemble a
storage object from “scraps” of
storage in different locations.
The “scraps” can be fixed-size
slots: that makes allocation
easy because they are
interchangeable.
map
Example: page tables that
implement a VAS.
Indirection
Variable Partitioning
Variable partitioning is the strategy of parking differently sized cars
along a street with no marked parking space dividers.
1
2
3
Wasted space
external fragmentation
Fixed Partitioning
Wasted space
internal fragmentation
“Classic Linux Address Space”
N
http://duartes.org/gustavo/blog/category/linux
What’s in an Object File or Executable?
Header “magic
number”
indicates type of image.
Section table an array
of (offset, len, startVA)
program sections
Used by linker; may
be removed after final
link step and strip.
header
text
program instructions
p
data
idata
immutable data (constants)
“hello\n”
wdata
writable global/static data
j, s
symbol
table
j, s ,p,sbuf
relocation
records
int j = 327;
char* s = “hello\n”;
char sbuf[512];
int p() {
int k = 0;
j = write(1, s, 6);
return(j);
}
A Peek Inside a Running Program
0
CPU
common runtime
x
your program
code library
your data
R0
heap
Rn
PC
SP
x
y
registers
y
stack
high
“memory”
address space
(virtual or physical)
Process Creation in Unix
int pid;
int status = 0;
if (pid = fork()) {
/* parent */
…..
pid = wait(&status);
} else {
/* child */
…..
exit(status);
}
The fork syscall returns
twice: it returns a zero to
the child and the child
process ID (pid) to the
parent.
Parent uses wait to sleep
until the child exits; wait
returns child pid and status.
Wait variants allow wait on a
specific child, or notification
of stops and other signals.
The Shell
• Users may select from a range of interpreter
programs available
– or even write their own (to add to the confusion)
– csh, sh, ksh, tcsh, bash: choose your flavor…
• Shells execute commands composed of program
filenames, args, and I/O redirection symbols.
– Shells can run files of commands (scripts) for more complex
tasks, e.g., by redirecting shell’s stdin.
– Shell’s behavior is guided by environment variables.
– E.g., $PATH
Using the shell
• Commands: ls, cat, and all that
• Current directory: cd and pwd
• Arguments: echo
• Signals: ctrl-c
• Job control, foreground, and background: &, ctrl-z, bg, fg
• Environment variables: printenv and setenv
• Most commands are programs: which, $PATH, and /bin
• Shells are commands: sh, csh, ksh, tcsh, bash
• Pipes and redirection: ls | grep a
• Files and I/O: open, read, write, lseek, close
• stdin, stdout, stderr
• Users and groups: whoami, sudo, groups
Download