Kernel Services CIS 657 System Processes in Traditional Unix Three processes created at startup time swapper – – – init – process 1 – user-mode – administrative tasks (keeps getty running; shutdown) – ancestor of all of your processes – Still in FreeBSD process 0 kernel process moves entire processes between main memory and secondary storage pagedaemon – process 2 – kernel process – moves parts of processes in and out Kernel Processes in FreeBSD 5.x idle pagedaemon swapper Runs when there are no other ready procs Moves processes from secondary storage -> main memory vmdaemon Moves processes from main memory -> secondary storage Writes portions of a process’ process’s address space to storage pagezero bufdaemon Supplies zero-filled pages Supplies clean buffers (by writing out dirty buffers) 1 Kernel Processes in FreeBSD 5.x (II) syncer ktrace Logs system call trace records to a log file vnlru Maintains supply of free vnodes (LRU) random Ensures dirty file data written within 30 seconds Seeds kernel random numbers and /dev/random g_event g_up Handles dynamic devices Data from device drivers -> processes g_down Data from processes -> device drivers Note on Init and Logging In init keeps a getty process running on each terminal port (tty (tty)) getty initializes the port and waits for a login name (the “login:” login:” prompt) getty reads a string and execs login login prompts for the password, performs oneway encryption, and compares values if successful, login sets the user id and execs a shell Run-Time Organization Top Half per-process stack library of shared code maintains process structure (always resident) maintains user structure (can be swapped out) division between process/user dependent on memory never preempted for another process (but can yield the processor) can block interrupts by setting processor priority level (see discussion on bottom half) 2 Run-Time Organization II Bottom Half handles hardware interrupts asynchronous activities (wrt (wrt Top Half) special kernel stack (might not be a process running) Top and bottom half coordinate around work queues using mutexes top half starts I/O requests, waits for bottom half to finish Entry Into the Kernel Hardware interrupt I/O device (disks, network cards, etc.) (used for scheduling, time of day) clock Hardware trap Software-initiated trap divide by 0, illegal memory reference system call Entry Into the Kernel II First, kernel must save machine state Example sequence Why? hardware switches to kernel mode hardware pushes onto per-process kernel stack the PC, PSW, trap info additional asm routine saves all other state that the hardware doesn’ doesn’t kernel calls a C routine--the handler. 3 Entry Into the Kernel III Handlers for each kind of entry: syscall() syscall() for a system call for hardware traps interrupt handlers for devices trap() Each kind of handler takes specific parameters (e.g., syscall number, exception frame; or the unit number for an interrupt). Return From Kernel asm routine restores registers and user stack pointer (it undoes what the companion asm routine did) hardware restores the stored PC, PSW, etc. (undoes what it did on the way in) execution returns at the next instruction in the user process Software Interrupts Used as low-priority processing mechanism in the kernel Hardware interrupts have high priority Can put work in work queues (cf. network) When high-priority work is done, low-priority software interrupt does the rest might be real interrupt, might be flag checked in kernel (architecture-dependent) can be preempted by another hardware interrupt 4 Priority Levels in FreeBSD 5.2 High-priority hardware interrupt creates work for lower-priority software interrupt Work queues Software interrupt routines lower priority than device drivers; higher than user processes Hardware interrupt > software interrupt > user Example: Network Packets Device driver (hardware interrupt) takes packets from network, puts them in a work queue; controller re-enabled Software interrupt handler moves packets from work queues to destination processes Thus, delivery to processes doesn’ doesn’t block packets coming in from the network What FreeBSD 4.4 Did for x86 The cpl variable holds the current priority level Various macros are defined to set the cpl to a new level (e.g. spl0(), splx(), splx(), spltty()) spltty()) Interrupts at lower levels are masked (but not really--a pending bit is set and the majority of the work handling the interrupt is deferred) See /usr/src/sys/i386/isa/ipl_funcs.c Homework: find out (and submit) what FreeBSD 5.2.1 does (you may work with your lab partner). 5 Clock Interrupts The system clock interrupts, or ticks, at regular intervals (usually 100 Hz). Interrupt handler calls the hardclock() hardclock() routine, which must run quickly running for more than one tick will miss the next interrupt, causing the time-of-day clock to skew lower-priority devices (network devices, disk controllers) cannot be serviced while hardclock() hardclock() is running. Non-critical clock functions handled by softclock() softclock() The Four Clocks Hardclock: Hardclock: hardware timer, 100 Hz. Softclock: Softclock: handles non-critical timing work Profclock: Profclock: profiling clock (collect process performance information), 1024 Hz. Statclock: Statclock: collects system statistics, 128 Hz. Clock Interrupts: What hardclock() hardclock() does check for an interval timer on the currently running process increment the time of day do the job of profclock() profclock() if there is no spparate profiling clock do the job of statclock() statclock() if there is no separate clock for statistics gathering call softclock() softclock() directly if the cpl is low (saves overhead of a software interrupt that would just do that when hardclock() hardclock() returns) 6 Statistics Historically, hardclock() hardclock() collected resource utilization statistics, and forced context switches Problems (see McCanne & Torek) Torek) potential for inaccurate measurement of CPU utilization inaccurate profiling Use semi-randomized sampling with a second clock (the stat clock, see statclock()) statclock()) charge the current process with a tick; if it has four, recalculate priority record what the system was doing at time of tick Softclock() Softclock() Handles events in the callout queue, such as: timeouts (real-time timer) retransmits dropped network packets monitors some peripherals that require polling process scheduling Scheduler would/should be called every second in a “perfect world;” world;” uses interval timer to run 1 second after it last finished Discussion: why not do scheduling in hardclock()? hardclock()? Callout Queue A circular list of n queues; sorted by time of event (in ticks) Each queue sorted in time order (soonest first) Pointer to current queue (now (now)) moves around circular list hardclock() hardclock() advances pointer, and softclock runs when the lead item in the new queue has time 0. 7 Example Callout Queue (200 queues in example) now +199 now now + 1 now now+200 now f(x) g(y) f(z) now + 2 now + 2 Question: does h(x) f(z) happen this now + 3 . .. cycle, or next? Memory Management Two kinds of executable files in BSD Unix interpreted compiled (directly executed) First 16 bits in a file contain a “magic number” number” telling what kind of file it is. “#!” #!” indicates an interpreted file; interpreter must be directly executable (#!/bin/sh (#!/bin/sh is the most common) Other magic numbers indicate whether the file can be paged and whether the text is sharable. FreeBSD Process Layout 0xfff00000 Special stuff... User stack shared libs heap 0-filled bss, bss, stack most of process is demand paged into memory (Ch. 5) bss Symbol table initialized data Initialized data text 0x00000000 text elf header elf magic number 8 What Was That Special Stuff? Per-process kernel stack Red zone User area Ps_strings struct Signal code Env strings argv strings Argv, Argv, argc, argc, envp contain arguments and environment signal code used by kernel to deliver signals ps_strings used by ps to located argv of process Env pointers argc argv pointers Timing Services Real Time: gettimeofday() gettimeofday() returns the time since 1 Jan 1970 in UTC (the Epoch) adjtime() adjtime() allows one to tweak the clock keeps multiple machines “close enough” enough” response to normal clock skew give a delta argument; speed up or slow down the counted microseconds per clock tick by 10% until delta is reached Time is reported in microseconds Interval Timers Each process gets three interval timers real: decrements in real time; SIGALRM; run from timeout queue maintained by softclock() softclock() profiling: decrements only when process runs, but tracks both user and kernel-mode execution; SIGPROF; checked by profclock() profclock() process virtual: decrements only when the process is running; SIGVTALRM; checked by profclock() profclock() 9 User, Group, Other Identifiers User ID (uid ): 32-bit identifier for all (uid): processes of each user, set by administrator Group ID (gid ): 32-bit identifier. Many (gid): users in one group; many groups for each user Root: uid 0, gid 0. These bits are checked on file access Permission Checks Checked in order If the UIDF == UIDP use owner permission bits If UIDF != UIDP, but GIDF ∈ GIDP then use group permission bits If UIDF != UIDP, and GIDF ∉ GIDP then use the other permission bits Recall discussion last time of how uid, uid, gid set on login Can I own a file that I can’ can’t read? Rights Amplification Users may need temporary write access on files (e.g. passwd) passwd) setuid() setuid() does this changes effective user id real user id stays the same effective uid also saved seteuid() seteuid() changes only the effective user id setgid() setgid() used to work like setuid() setuid() now just put “effective” effective” gid into 0th element of array 10 Effects of Syscalls on UIDs Action Real Effective Saved Exec-normal R R R Exec-setuid R S S Seteuid(R) R R S Seteuid(S) R S S Seteuid(R) R R S Exec-normal R R R R = Real UID, S = Special-privilege UID 11