CPS110 / EE 153:
Intro to Operating Systems
Jeff Chase
August 25, 2008
Background
BS Math/CS: Duke, ’99
PhD EECS: Michigan, ’05
Research interests
OS, p2p, economics, security, mobility
Why am I a professor?
Research and teaching are a lot of fun
Explaining things improves my understanding
Background
BS Math/CS: Dartmouth, back in the 1980s sometime
PhD CS: University of Washington (Seattle), ’95
Research interests
OS, networked systems, Internet service infrastructure, utility computing, energy/green, cool new stuff
Why am I a professor?
Research and teaching are a lot of fun, etc.
Explaining things improves my understanding
Office with view of tower
CPS 100
Basic data structures
Allocating memory on the stack versus from the heap
CPS 104
Basic computer architecture, ISAs
Registers: stack pointer, PC, general-purpose
Virtual memory translation
Page tables
TLB, caching
Lecture notes on the web (125 pages)
Exams based on content of lectures
Textbooks
Not required
On-line: Saltzer and Kaashoek
“Modern Operating Systems” is OK
Useful: Storage, data, and information systems
($15 on Amazon)
Two sections, starting next week
MW 2:50- 4:05
F 2:50 – 4:05 (sometimes)
Teaching Assistant
Amre Shakimov ( shan@cs.duke.edu
)
Seasoned and energetic
Undergraduate Teaching Assistant
Matt Jacobson
Where you will learn the most
4 projects
0: very simple intro to C++
1: building a user-level thread package
2: building a virtual memory manager
3: hack into a vulnerable system
Projects aren’t long, but are difficult
Only 100-1,000 lines/code, but many hours
Everything is in C++
Project 0 has been posted today
Posted on web by Friday
Should be done before discussion section
Not graded, but count toward participation
All projects done in groups of 2 or 3
Email groups to chase@cs.duke.edu
By Friday (August 29)!
Group members will rate each other
Procedure for firing, quitting in syllabus
All projects are auto-graded
Allows groups to get immediate feedback
Use submit110 script on cs machines
One submission/group/day gets feedback
Can’t use to debug your project
Any group member’s submission counts
Very limited feedback: correct or incorrect
Doesn’t say what is wrong
Still have to write a test suite (except P0)
Don’t rely on auto-grader feedback alone
To get more useful feedback
Come talk to us!
We will provide many office hours every week
(double office hours week before a deadline)
Due at 6pm, accepted until 11:59:59pm
Auto-grader clock is the one that counts
Last submission to auto-grader is final
3 late days/group/semester
Intended for unexpected problems
No extensions
Start early!
Ok, among groups
C++ syntax, course concepts
“What does this part of the handout mean?”
Not ok, among groups
Design/writing of another’s program
Includes prior class solutions
“How do I do this part of the handout?”
We use automated similarity-detection software
Just changing the variable names won’t save you
If in doubt, ask me
Projects: 35%
Midterm: 30%
mid-October
Final: 30%
December
Participation: 5%
The two are not independent
Familiarity with projects is critical to doing well on exams
I like to ask questions about projects on exams
“Extend Project X to include this functionality”
Know your project!
You can assign roles to different people
But each member must understand all aspects
Linux/GNU environment
Need to sign-up for term CS account
Use the form on the cs.duke.edu/csl page
Send CS login name to chase@cs.duke.edu
Can login into linux.cs.duke.edu
Use this account for all auto-grading
Newsgroup
http://courses.duke.edu
Office hours
With me:
With Amre:
Don’t email Amre or me directly
Post to the newsgroup, which we monitor
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Some kind of A Some kind of B sko D
First part: demystify the operating system
How does my computer start running?
How does a program load into memory?
Second part: demystify Internet systems
How does my email know where to go?
Why is Google so fast?
How is everything “virtualized”?
CPS 1,6,100,108
CPS 104
Applications
Ideas high-level programming languages
compiling, reading programs off disk, getting program into memory, reading keyboard, starting the computer, saving files, filenames, networking
Hardware
Assembly language program gates
Consider the Java language and its key word “interface”
What is a Java object?
List of methods and collection of internal state
What is a Java interface?
Set of methods associated with an object that a programmer can call
What do those methods do?
Invoke code (let the object do work on the caller’s behalf)
Mutate the object’s public/private state
Why are interfaces useful?
They provide an “abstraction” or simplification
Callers don’t have to know an object exact type
Key terms: interface, resource (cpu, mem, etc), abstraction, virtual
Define an interface in terms of resources
An interface is a set of primitives or operations
Interfaces provides access to resources
What do we mean by abstraction?
How resources are presented to a client
Can think of as an illusion that makes resources easier to program
What does it mean to virtualize something?
Provides an abstraction (simple way to manipulate resources)
(mostly) disallow direct access to reality/resources
Program that runs on CPU, (mostly) like any other
“Virtual machine”
Interface
“Physical machine”
Interface
Virtual interface should be simpler than physical
What interface does the OS present?
“Virtual machine”
Interface
“Physical machine”
Interface
What interface does the hardware present?
Instruction set:
Load/store, mem, regs
Applications
Threads
Virtual
Memory
System
Calls
OS
Page
Tables
Traps
Hardware
Atomic
Test/Set
Socket
Ether,
80211
Familiar view
User program
Alternate view
OS
OS
User program User program
How do programs start?
Tasks outside program? (net recv)
How to prevent CPU hogging?
OS runs first, calls program
Programs run until they return control to OS
(by themselves or forced by hardware)
Then OS calls another program
Key question: who calls whom?
1.
Illusionist
Makes computer seem nicer than it really is
Examples?
Programs seem to have their own CPU
AFS: single, unified file system
Name data with human-readable names
Directories
Packets get lost; OS makes net look reliable
Disk is slow; OS makes it look fast via caching
1.
Illusionist
Makes computer seem nicer than it really is
2.
Government
Divides hardware resources among competing programs
What hardware resources does the OS manage?
Processor
Memory
Network
Disk
1.
Illusionist
Makes computer seem nicer than it really is
2.
Government
Divides hardware resources among competing programs
Taxes programs (OS needs CPU, memory to run)
Taken for granted when it works, cursed when it breaks
Very few of you will ever write one …
Illusionist, govn functions appear in many domains
Google provides the illusion of a single web server
Word does background spell checking
Design principles
Proper abstractions, caching, indirection
Concurrency, naming, atomicity, authentication
Protection, resource multiplexing (fairness)
How does OS create the illusions we know/love?
What is a system?
Components, interconnections
Interfaces, environment
Systems do something for their environs
Exhibit this behavior via interface
Cleanly divides the world in two
Parts of the system + the environment
Component
Component
Component
System
Environment aka “the client”
1.
Emergent properties
Can’t predict all component interactions
Millennium bridge
Synchronized stepping leads to swaying
Swaying leads to more forceful synchronized stepping
Leads to more swaying …
2.
Propagation of effects
3.
Incommensurate scaling
4.
Trade-offs
1.
Emergent properties
2.
Propagation of effects
Want a better ride so increase the tire size
Need a larger trunk for the larger spare
Need to move the back seat forward
Need to make front seats thinner
Leads to worse driver comfort than before
3.
Incommensurate scaling
4.
Trade-offs
1.
Emergent properties
2.
Propagation of effects
3.
Incommensurate scaling
Consider the giant mouse
Weight ~ size 3 (volume)
Bone strength ~ size 2 (cross section area)
An elephant sized mouse is not sustainable
4.
Trade-offs
1.
Emergent properties
2.
Propagation of effects
3.
Incommensurate scaling
4.
Trade-offs
“Waterbed effect”
Push on one end, and the other goes up
Spam filters and smoke detectors
False positives vs false negatives
1.
Emergent properties
2.
Propagation of effects
3.
Incommensurate scaling
4.
Trade-offs
In the immortal words of HT Kung
“Systems hard. Must work harder.”
History dominated by two trends
Increasingly inexpensive hardware
Increased software complexity
Microsoft embodies tension between these trends
MS gained 90% market share by running on cheap hw
Supporting all that hardware complicates the OS
(3 rd -party drivers responsible for vast majority of crashes)
How is Apple’s strategy different?
Jobs chooses the hardware you will run
HW-to-app control reduces complexity, choice, discount
One goal: make it work
Interactive (user has entire machine to herself)
Users sign up, get room for two hours at a time
“OS” is really just a library compiled into program
What is wrong with this timeline?
CPU utilization is awful
Since CPUs were expensive, this mattered
Goal: improve CPU, I/O utilization
Machine is no longer interactive
Users submit program (stack of cards) to queue
One job at a time, CPU idle during I/O, I/O idle during CPU
OS is a batch monitor + library of services
Loads program, runs program, prints results
Loads next program …
Goal: improve CPU, I/O utilization
Machine is no longer interactive
Users submit program (stack of cards) to queue
One job at a time, CPU idle during I/O, I/O idle during CPU
What key OS function starts to matter now?
Protection: programs must not corrupt monitor
Programs must relinquish CPU to monitor
Goal: improve CPU, I/O utilization
Machine is no longer interactive
Users submit program (stack of cards) to queue
One job at a time, CPU idle during I/O, I/O idle during CPU
Why wasn’t protection an issue before?
No batch monitor to corrupt
Person in lab coat took CPU back from program
Third phase: multi-program batch
Goal: overlap CPU, I/O
When one job is reading from disk, run another job on CPU
Use DMA + interrupts to allow background I/O
DMA: devices write to program memory
Interrupts: devices can tell CPU the I/O is done
Job 1
Job 2
Third phase: multi-program batch
Goal: overlap CPU, I/O
What are the OS’s new responsibilities?
Switch between processes
Manage multiple I/Os across devices
Protect processes from each other
Job 1
Job 2
Fourth phase: time-sharing
Goal: keep efficiency, restore interactivity
Key insight: humans are really just slow I/O devices
Switch between programs during think-time
Job 1
Job 2
Increased complexity:
• Many jobs
• Outstanding reqs
• Many job sources
Job 3
Fifth phase: personal computing
What are PC operating systems most like?
As PC prices dropped, single-operator became feasible
OS was again just a library of services (MS-DOS)
With one user, do jobs need to time-share?
Early PC OSes could only do one thing at a time
Everything waited while printing/loading a program (Mac < X)
Need protection if I’m the only one using the PC?
Protect me from myself (or my buggy software)
Early PCs provided no protection
(why Windows before XP, Mac before X were awful)
PC operating systems are basically time-sharing OSes now
Windows XP
> 40 million lines of code
Most of this code is device drivers (not written by MS)
Windows NT took 7 years to develop
Only worked well years after it shipped
Windows 2000
Shipped with 63,000 “potential known defects”
Hot research area
Simplify, automatically find OS bugs