Intro to CompSci 510

advertisement
CompSci 510: Graduate OS
Landon Cox
January 15, 2016
About me
• Background
• BS in Math, CS: Duke ‘99
• PhD in CSE: Michigan ‘05
• Research interests
• OS, distributed systems, privacy, and security
• Why am I a professor?
• Research and teaching is a lot of fun
• It’s the family business (dad is a law
professor)
About the TA
• Teaching Assistant
• Animesh Srivastava (animesha@cs.duke.edu)
• Office hours to be announced
About CompSci 510
• CompSci 510 is about operating-systems research
• You will read a lot of old and new papers
• You will perform a semester-long research project
• What CompSci 510 is not about
• Learning basic operating systems concepts
• Will do some review, but you should know this material already
• Who should take it?
• Graduate students and undergrads who enjoyed CompSci 310
First, a little philosophy
• What is computer science?
• Structure and Interpretation of Computer
Programs
• Harold Abelson and Gerald Jay Sussman
• Longtime book for MIT’s first course in CS
“Underlying our approach to this subject is our
conviction
that ‘computer science’ is not a science and that its
significance has little to do with computers…
Mathematics provides a framework for dealing precisely
with notions of ‘what is.’ Computation provides a
framework for dealing precisely with notions of ‘how
to.’”
How we got here
1 “constructs for describing computation”
•
i.e., learn how to program
2 “physical constructs for realizing computation”
•
i.e., learn about hardware design
3 Then we branch
•
•
•
Theory of computation (the ‘what is’ of ‘how to’?)
Artificial intelligence
Design of computer systems
4 And branch again (within systems)
•
OS, databases, architecture, software engineering, compilers,
networks, security, reliability
Across systems categories
• Many common, important problems
•
•
•
•
•
•
Fault tolerance
Coordination of concurrent activities
Geo. separated but linked data
Large-scale data sets
Protection from mistakes and attacks
Interactions with many entities
• All of these problems lead to
complexity
Complexity
• How do we control complexity in a system?
• Build abstractions that hide unimportant details
• Establish conventional interfaces
• Enable composition of simple, well-defined components
• None of this is specific to computer systems
• Just principles of good engineering
• Civil engineering, urban planning, mechanical engineering,
aviation and space flight, electrical engineering, ecology and
political science
Two roles of the OS
OS as illusionist
Abstractions, hardware reality
Programs
Applications
Threads
Virtual
Memory
Files,
web
OS
Atomic
Test/Set
Hardware
Page
Tables
Disk ,
NIC
Hardware
OS as government
Main government functions
• Resource manager (who gets what and
when)
•
•
•
•
Lock acquisition
Processes
Disk requests
Page eviction
• Isolation and security (law and order)
• Access control
• Kernel bit
• Authentication
Two roles of the OS
Abstractions
Government
Modularity
Simplicity
Hide messy reality
Law and order
Fair, efficient allocation
Source of trust
Goals for each role?
Two roles of the OS
Abstractions
Government
Modularity
Simplicity
Hide messy reality
Law and order
Fair, efficient allocation
Source of trust
How does OS enforce modularity?
Two roles of the OS
Abstractions
Government
Modularity
Simplicity
Hide messy reality
Law and order
Fair, efficient allocation
Source of trust
How does OS ensure fair allocation?
Two roles of the OS
Abstractions
Government
Modularity
Simplicity
Hide messy reality
Law and order
Fair, efficient allocation
Source of trust
What is the basis for trust?
Why do we trust the government?
Key questions for semester
•
•
•
•
What are the right abstractions?
How should we enforce modularity?
How do we ensure fair, efficient resource allocation?
Is there a reasonable basis for trust?
• We will read a lot of papers this semester
• Useful to think about them in terms of these questions
• Sometimes goals are in tension (e.g., modularity vs. efficiency)
• Good papers explain the trade-offs
Course administration
• Syllabus is online
• Reading list/schedule is subject to change
• In general, two papers per lecture
• Grade composition
•
•
•
•
•
Paper presentations and summaries (5%)
Programming projects (20%)
Research project (25%)
In-class midterm (25%)
In-class final exam (25%)
Paper presentations & summaries
• Post summaries to Piazza
• Summaries will be available to all
• Due before class
• Summaries must include
• Two positives
• Two negatives
• Two questions
• Presentations: 2nd half of semester
Programming Projects
• Done in groups of two or three
• Registration form will be up next week
• Register on course website by January 22
• Two small programming projects
1. Concurrency and synchronization (50%)
2. File systems and storage (50%)
Research Projects
• Done in groups of two or three
• Five phases
1.
2.
3.
4.
5.
Form groups (due after concurrency project)
Write proposal (20% of project grade)
Write status report (10% of pg)
Write final report (60% of pg)
Give presentation (10% of pg)
Exams
• Both will be in-class
• Will cover topics covered to that point
• Often will ask about composing
systems
• “How would SimOS run on top of LFS?”
• Requires a deep understanding of both
Syllabus: project collaboration
• Okay between groups
• Programming syntax, course concepts
• “What does this part of the project specification mean?”
• Not okay between groups
•
•
•
•
Design/writing of another’s program
Includes prior class solutions and Piazza
“How do I do this part of the handout?”
Don’t post details of your solution to Piazza
• If in doubt, ask me
Thoughts on cheating
Cheating is a form of laziness.
I like to think that cheating happens elsewhere.
Duke students work hard and don’t cut corners.
Quick review of OS
• Common themes in computer systems
• Atomicity
• Fault tolerance
• Protection and trust
• Should be familiar with each concept
• Will do a quick review of each
Atomicity
• What does it mean for an operation to be
atomic?
• The operation occurs without interruption
• No interleaving between atomic operations
• Goal: high-level atomic ops from low-level
• Which CPU operations are atomic?
• Load, store, test-and-set, interrupt enable/disable
• Used these to implement locks, CVs, and
semaphores
Synchronization layers
Concurrent program
Higher-level synchronization
(reader-writer functions)
High-level synchronization
(locks, monitors, semaphores)
Hardware (load/store, interrupt
enable/disable, test&set)
Atomicity
• What does it mean for an operation to be
atomic?
• The operation occurs without interruption
• No interleaving between atomic operations
• Goal: high-level atomic ops from low-level
• Which network operations are atomic?
• Send/receive Ethernet frame
• Used these to implement byte streams
Protocol layers
NFS
(files)
HTTP
(web)
SMTP
(email)
SSH
(login)
RPC
UDP
TCP
IP
Ethernet
ATM
PPP
Atomicity
• What does it mean for an operation to be
atomic?
• The operation occurs without interruption
• No interleaving between atomic operations
• Goal: high-level atomic ops from low-level
• Which storage operations are atomic?
• Read/write a disk block
• Used these to implement transactions
Storage layers
User program
Database (x-action begin,
commit)
File system
(open, close, read, write)
Hardware (Block read/write)
Fault tolerance
Dealing with failure
• In what ways can the network fail?
• Messages can be reordered, dropped, and corrupted
• How do we deal with re-ordered messages?
• Assign a sequence number to each message
• What is a “connection”?
• A sequence of related messages
• Applications determine which messages are related
Dealing with failure
• In what ways can the network fail?
• Messages can be reordered, dropped, and corrupted
• How do we deal with dropped messages?
• Send the message again
• How to detect that a message was dropped?
• Require the receiver to send an acknowledgement (ACK)
Dealing with failure
• In what ways can the network fail?
• Messages can be reordered, dropped, and corrupted
• Possible reasons we didn’t receive an ACK?
• Message was delayed or dropped
• ACK was delayed or dropped
• What if we assume a delay when there was a drop?
• We’ll wait forever, kind of like a deadlock
• What if we assume a drop when there was a delay?
• We’ll send duplicate messages
Dealing with failure
• In what ways can the network fail?
• Messages can be reordered, dropped, and corrupted
• How can we handle duplicate messages?
• Just drop duplicates using sequence number
• What can happen if we have too many duplicates?
• Can create crippling network congestion
• If network congestion was causing delays, creates positive feedback
• How can we limit or eliminate positive feedback loop?
• If you start to see dropped messages, send at a slower rate (TCP)
Dealing with failure
• Processes in a distributed system can fail too
• Bugs can crash processes
• Hardware failures can bring down machines
• It is easiest to think about fail-stop failures
• Implicit assumption was that running == correct
• Can you think of scenarios in which this isn’t the case?
• If a process has a bug that causes incorrect behavior
• If a process becomes compromised
• This larger class of failures is called Byzantine faults
• Famous Byzantine Fault Tolerance result
• Can only ensure correctness if fewer than 1/3 of processes are faulty
Clients
Server
Problems with this model?
1 Performance of accessing over the network
2 Single point of failure (availability)
3 Performance bottleneck of server (scalability)
Clients
Server
1 Performance of accessing over the network
How can we make this faster?
Caching!
Clients
Server
S=v
S=v
S=v
1 Performance of accessing over the network
What should happen if I modify my copy?
Clients
Server
S=v’
S=v’
S=v
1 Performance of accessing over the network
What should happen if I modify my copy?
Clients
Server
S=v’
S=v’
S=v’
1 Performance of accessing over the network
What should happen if I modify my copy?
Could update other copies
Clients
Server
S=v’
S=v’
X
1 Performance of accessing over the network
What should happen if I modify my copy?
Could update other copies
Could invalidate other copies
Clients
Server
S=w
S=v
S=x
1 Performance of accessing over the network
What should happen if two people modify
concurrently?
Let server pick a winner (e.g., last writer wins)
Server “serializes” updates (assigns a canonical order)
Clients
Server
What can we do about availability and scalability?
Clients
Server
S=v
S=v
S=v
What can we do about availability and scalability?
Add more servers
Now we have to keep servers consistent too!
Introduces lots of issues for large-scale web services
Clients
Server
S=v
S=v
S=v
Where should writes go? (defines write set)
Where should reads go? (defines read set)
Clients
Server
S=v
writer
S=v
reader
S=v
Say reads and writes can go to 1 of 3 servers.
What can happen? Good and bad?
Clients
S=v’
Server
S=v’
writer
S=v
S=v
reader
S=v
Say reads and writes can go to 1 of 3 servers.
What can happen? Good and bad?
Good: fast reads and writes
Bad: readers can get stale data
(copies eventually converge via async gossiping)
Clients
S=v’
Server
S=v’
writer
S=v
S=v
reader
S=v
Say reads and writes go to one server.
What about availability? How many failures can we
tolerate?
Clients
S=v’
Server
S=v’
writer
S=v
S=v’
reader
S=v’
Say reads and writes go to one server.
What about availability? How many failures can we
tolerate?
Reads/writes can tolerate 1 or 2 failures
Clients
Server
S=v
writer
S=v
reader
S=v
Say reads come from 2 and writes go to 1.
What can happen? Good and bad?
Clients
S=v’
Server
S=v’
writer
S=v
S=v
reader
S=v
Say reads come from 2 and writes go to 1.
What can happen? Good and bad?
Writes are still fast, reads are slower
Readers can still get stale data
Clients
S=v’
Server
S=v’
writer
S=v
S=v
reader
S=v
Say reads come from 2 and writes go to 1.
What about availability? How many failures can we
tolerate?
Clients
S=v’
Server
S=v’
writer
S=v
reader
S=v
Say reads come from 2 and writes go to 1.
What about availability? How many failures can we
tolerate?
Reads can tolerate one failure, but not two
Writes can tolerate one or two failures
Clients
Server
S=v
writer
S=v
reader
S=v
Say reads come from 2 and writes go to 2.
What can happen? Good and bad?
Clients
S=v’
Server
S=v’
writer
S=v’
S=v’
reader
S=v
Say reads come from 2 and writes go to 2.
What can happen? Good and bad?
Writes are slower, reads are slower
Readers always get latest copy
Clients
S=v’
Server
S=v’
writer
S=v’
S=v’
reader
S=v
Say reads come from 2 and writes go to 2.
Why are readers guaranteed to get latest copy?
Clients
S=v’
Server
S=v’
writer
S=v’
S=v’
reader
S=v
Say reads come from 2 and writes go to 2.
Why are readers guaranteed to get latest copy?
Size of read set + size of write set > # replicas
Guarantees overlap between two sets
Clients
S=v’
Server
S=v’
writer
S=v’
S=v’
reader
S=v
Say reads come from 2 and writes go to 2.
What about availability? How many failures can we
tolerate?
Clients
S=v’
Server
S=v’
writer
S=v’
S=v’
reader
S=v
Say reads come from 2 and writes go to 2.
What about availability? How many failures can we
tolerate?
Availability suffers (can tolerate 1 failure, not 2)
Clients
S=v’
Server
S=v’
writer
S=v’
reader
S=v
Say reads come from 2 and writes go to 2.
What about availability? How many failures can we
tolerate?
Availability suffers (can tolerate 1 failure, not 2)
Protection and trust
Protection and trust
• Define trust
• Expectation of correct behavior
• For anything to be useful, you have to trust something
• Need to protect yourself from components you don’t trust
• How are processes on same machine protected from each
other?
• Separate address spaces, managed by the kernel
• Controlled transitions to/from the kernel via system calls
• Everyone trusts the kernel
• How do processes on separate machines protect themselves?
•
•
•
•
Have to rely on secure communication, correct protocols
Confidentiality
Authentication
Freshness
Symmetric key encryption
• Keys
• E-key = d-key
• (hence symmetric)
S
E
• Sender and receiver know the key
• Nobody else knows it
• Sometimes called the “secret key”
• Symmetric key algorithms are fast
D
Public key encryption
• Keys
• E-key ≠ d-key
• Typically, encrypt() = decrypt () = crypt ()
E
Crypt
D
Encrypt
Decrypt
Authenticating SSL public keys
• I want to send my CCN to e-trade
• No one but e-trade should see my message
• E-trade wants to know it’s really me
• Use Secure Socket Layer (SSL)
Authenticating e-trade
• E-trade has a public key
• How do you learn this public key?
• Web solution: someone else vouches for key
• Often called a certification authority (CA)
• E.g., Verisign
• E-trade sends you their public key
• Public key is digitally signed by Verisign
{“e-trade’s public key is Etrade-public”}verisign-private
Authenticating e-trade
• E-trade has a public key
{“e-trade’s public key is Etrade-public”}verisign-private
• Decrypt using Verisign’s public key
• I see that Verisign endorses Etrade-public
• Once talking to e-trade, establish session
key
{“use session key K-sec”}
Etrade-public
Authenticating e-trade
• Once talking to e-trade, establish session key
• How do you know Verisign’s public key?
• Hard-coded into Firefox/IE binary
• How to trust Firefox binary?
• Downloaded from firefox.com (possibly over SSL)
• Without SSL, maybe downloaded with included cryptographic hash
• Why trust this?
• Went out and verified the hash, got the hash 3 places, …
Certificate authorities (CAs)
• Say we get the right key
• Why do we trust the CAs?
• Because we have to trust something …
• Verasign in 2001
• Issued cert to “someone” pretending to be Microsoft
• Mozilla has list of 36 root CAs
• Indirectly trusts Etilisat (UAE) via Verizon
• Etilisat installed spyware on 100k Blackberries
• Who controls the CAs? What if they are
compromised?
• By government?
• By hackers?
• Lots of interesting questions
Next week
• Review the basics (things you should
know)
• Concurrency/synchronization
• Address spaces
• Storage
• After that we’ll start reading papers
• Any questions?
Download