Spring 2006 - Computer Science

advertisement
COSC 5302-01
ADVANCED OPERATING SYSTEMS
Spring 2006
INSTRUCTOR: DR. LAWRENCE OSBORNE
OFFICE: 201 MAES
OFFICE HOURS: 8:00 a.m. to 9:30 a.m. MW, and by appointment
CLASS MAILING LIST: cs5302@cs.lamar.edu
CLASS WEBPAGE: http://cs.lamar.edu
TEXTBOOKS: Andrew S. Tanenbaum & Maarten van Steen, Distributed
Systems: Principles and Paradigms, Prentice-Hall (2002) ISBN 0-13088893-1.
Robbins, Kay A., and Robbins, Steven, UNIX Systems Programming:
Communication, Concurrency, and Threads. Prentice-Hall Publishers,
(2003), ISBN 0-13-042411-0.
OTHER SOURCES: Coulouris, Dollimore, and Kindberg, Distributed
Systems: Concepts and Design, Third Edition,
Addison-Wesley, 2001.
Singhal and Shivaratri, Advanced Concepts in
Operating Systems, McGraw-Hill, Inc., 1994.
Tanenbaum, Modern Operating Systems, Second
Edition, Prentice-Hall, 2001.
Distributed Systems, ed. Mullender, Second Edition,
Addison-Wesley, 1993.
M.L. Liu, Distributed Computing, Principles and
Applications, Addison-Wesley, 2004. ISBN 0-20179644-9.
GRADING: Assigned Readings & Participation in Class 10 %
Midterm Exam: 15 %
Final Exam: 25 %
Homework and Quizzes: 25 %
Project: 25 %
A weighted average of your total points based on this grading scheme will determine
your final grade.
PREREQUISITES: Data Structures COSC 2371), Operating Systems
(COSC 4302), Computer Architecture (COSC 4310), and a course in
probability and statistics. Students should be familiar with C and socket
programming with UNIX system calls.
GOALS
This course is intended to be a second course in operating systems for graduate students
in computer science. The course is meant to provide a basic foundation in the design of
advanced operating systems. Therefore, instead of discussing the design and structure of
a specific operating system, the course emphasizes the fundamental concepts and
mechanisms which form the basis of the design of advanced operating systems. This
course provides an in-depth examination of the principles of distributed systems in
general, and distributed operating systems in particular. Covered topics include processes
and threads, concurrent programming, distributed interprocess communication, deadlock
protection, multiprocessor scheduling, distributed process scheduling, shared virtual
memory, distributed file systems, fault tolerance in distributed systems, distributed
middleware and applications such as the web and peer-to-peer systems. Some coverage
of operating system principles for multiprocessors will also be included. A brief overview
of advanced topics such as multimedia operating systems, real-time operating systems
and mobile computing will be provided, time permitting. The main emphasis is on the
various alternative approaches to the solution of problems encountered in the design of
distributed operating systems.
This course builds upon the topics covered in undergraduate operating systems course,
such as process synchronization, interprocess communication, and file system
organization.
Course Policies
Late Assignments: Late assignments will be allowed only by prior arrangement with the
instructor. Such assignments will be penalized 10 % for each 24-hour period or franction
thereof (including weekends) that they are late.
Cheating and Plagiarism: While I do not think that students need to be reminded of these
issues, there is a long and ugly history in computer science of violations in the
Department Honesty Policy especially in courses with projects. Academic dishonesty is
an egregious offense against the entire class and will not be tolerated.
The Computer Science Department Honesty Policy will be strictly enforced in this
course. Cheating on an examination or quiz will result in a zero on the examination or
quiz. Since your grade will be based on total points, a zero on either the midterm or the
final will reduce your final letter grade considerably. On projects, students who
plagiarize code from the Internet or from any other sources, those who copy source code
from other students, and those students who knowingly or unknowingly allow other
students to copy from them will be penalized with a zero on any homework assignments
in which this occurs.
We expect and encourage students to discuss design strategies with one another, but there
should be NO sharing of code or header files, and all assistance must be cited. You may
work on any Unix machine or Linux machine with a modern C++ compiler. However,
your assignments will be evaluated on one of the Solaris machines in Maes 214. Thus,
we strongly recommend that you develop and test your code on one of these machines.
Violations of the Honesty Policy can ruin your academic career if you are found guilty of
one. If you ever find that you are uncertain about how the Policy applies to your
situation, ask me. There is no reason to take risks on such an important matter as this.
Midterm and Final Examinations: The final examination will be comprehensive. The
final exam will be given May 4, 2006 in Maes 109 from 8 a.m. to 10:30 a.m. The
midterm exam will be given on March 9.
Missed Examinations or Quizzes: If quizzes, the midterm or the final exam are missed, a
makeup test will only be given in the case of a documented illness or death in the family.
The fact that your car does not run or that you wish to be with your girlfriend/boyfriend
at the hospital are examples of excuses that will not be accepted. It is your responsibility
to be in class and on time each class period.
It is the responsibility of the student to find out what assignments have been missed after
returning from an illness or other emergency. I feel no obligation to rescue students who
do not turn in assignments on schedule. In fact, I could not do that even if I were so
inclined.
Incomplete Grades: No incompletes will be given in this course. Make sure that you
determine before the drop deadline whether you can complete it satisfactorily.
Required Readings: One paper from the published literature is assigned for each class
lecture. Papers should be read in advance of the lecture so that students are prepared to
participate in class discussions. The lectures will not simply repeat the material in the
textbook or in the papers. For exams, students are responsible for material covered in
assigned papers, in assigned readings from the textbooks, and in lectures.
Suggested Readings
General Readings

Overview Papers
1. Andrew S. Tannenbaum and Robbert van Renesse, ``Distributed
Operating Systems’’, Computing Surveys, Vol. 17, No. 4, Pages 419-470,
December 1985
2. E. Levy and A. Silberschatz, ``Distributed File Systems: Concepts and
Examples’’, ACM Computing Surveys, Vol. 22, No. 4, Pages 321-374,
December 1990

Distributed Computing
1. Jim Basney and Miron Livny, “Deploying a High Throughput Computing
Cluster”, High Performance Cluster Computing, Rajkumar Buyya, Editor,
Vol. 1, Chapter 5, Prentice Hall PTR, May 1999.

2. The Worldwide Computer. An operating system spanning the Internet
would harness the power of millions of the world’s networked PCs.
Scientific American, February 2002
Readings for Chapter 2 on Communication

Remote Procedure Call
1. Andrew Birrell and Bruce Nelson, “Implementing RPCs”, ACM
Transactions on Computer Systems, Vol. 2, No. 1, Pages 39-59, February
1984.
2. B. Bershad, T. Anderson, E. Lazowska, and H. Levy, ``Lightweight
Remote Procedure Call’’, Proceedings of the 12th ACM Symposium on
Operating Systems Principles, Operating Systems Review, Vol. 23, No. 5,
Pages 12-113, December 1989
3. Tutorials on RPC programming in UNIX and Linux and rpcgen
 Remote Procedure Calls I., Sun RPC, by Francisco Moya
Fernandez

Remote Procedure Calls (RPC) at and
http://www.cs.cf.ac.uk/Dave/C/node33.html#SECTION003300000
000000000000
 Protocol Compiling and Lower Level RPC Programming tutorials
on RPC programming at
http://www.cs.cf.ac.uk/Dave/C/node34.html#SECTION003400000
000000000000
4. Waldo, “Remote Procedure Calls and Java Remote Method Invocation.”
IEEE Concurrency, vol. 6, no. 8, pp. 5-7, July 1998. Available at
http://www.mcs.vuw.ac.nz/courses/COMP413/2003T1/Handouts/waldo98
.pdf
5. Open Grid Architecture, at
http://www.globus.org/research/papers/ogsa.pdf
6. Foster, I., Kellselman, C., Nick, J., and S. Tueke, “Grid Services for
Distributed Integration”, IEEE Computer, June 2002, p. 37-46, found at
http://www.globus.org/research/papers/ieee-cs-2.pdf
Readings for Chapter 3 on Processes

Process and Thread Management
1. Thomas E. Anderson, Edward D. Lazowska, and Henry M. Levy, ``The
Performance Implications of Thread Management Alternatives for SharedMemory Multiprocessors’’, IEEE Transactions on Computers, Vol. 38,
No. 12, Pages 1631-1644, December 1989
2. Thomas E. Anderson, Brian N. Bershad, Edward D. Lazowska, and Henry
M. Levy, “Scheduler Activations: Effective Kernel Support for the UserLevel Management of Parallelism”, ACM Transactions on Computer
Systems, 10(1), February 1992, pp. 53-79, available at
http://citeseer.nj.nec.com/anderson92scheduler.html

Scheduling
1. D. L. Black, “Scheduling Support for Concurrency and Parallelism in the
Mach Operating System,” IEEE Computer, 23, 5, Pages 35-43, May 1990.

Process Migration
1. F. Douglis and J. Ousterhout, ``Process Migration in the Sprite Operating
System’’, In Proceedings of the IEEE International Conference on
Distributed Computing Systems, Berlin, Germany, Pages 18-25,
September 1987
2. M.Theimer, K.Lantz, D.Cheriton, ``Preemptable Remote Execution’’,
Proceedings of the 10th SOSP, Operating Systems Review, Vol. 19, No. 5,
Pages 2-12, December 1985

Readings for Chapter 10: Distributed File Systems

Internet File Systems
1. S. Ghemawat, H. Bobioff, and S-T Leung, “The Google File System, “
SOSP 2003, New York, October 19-22, 2003. Available at
http://www.cs.rochester.edu/sosp2003/papers/p125-ghemawat.pdf
2.
Readings for Consistency and Replication

Shared Memory Computing
1. Christiana Amza, Alan L. Cox, Sandhya Dwarkadas, Pete Kelecher,
Honghui Lu, Ramakrishnan Rajamony, Weimin Yu, Willy Zwaenepoel,
“TreadMarks: Shared Memory Computing on Networks of
Workstations”, IEEE Computer, 29(2), 1996, p. 18-28.
Readings for Distributed Resource Management and Scheduling

Global Memory Management
1. M. Freeley, W. Morgan, F. Pighim, A. Karlin, H. Levy, and C. Thekkath,
“Implementing Global Memory Management in a Workstation Cluster”,
Proceedings of the 15th ACM Symposium on Operating Systems
Principles, December 1995.
2. A. Arpaci-Dusseau, D. Culler, A. Mainwaring, “Scheduling with Implicit
Information in Distributed Systems”, Sigmetrics’98 Conference on the
Measurement and Modeling of Computer Systems.
3. K. M. Chandy, J. Misra, and L. M. Haas, “Distributed Deadlock
Detection,” ACM Transactions on Computer Systems 1, 2(May 1983), pp.
144-156. Available in the ACM Digital Library.
4. Mamoru Maekawa, “ A SQRT(N) Algorithm for Mutual Exclusion in
Decentralized Systems,” ACM Transactions on Computer Systems, Vol.
3, No. 2, May 1985, p. 145-159.
Readings for Mobility

Supporting Mobility in an Operating System
1. M. Baker, X. Zhao, S. Cheshire, J. Stone, “Supporting Mobility in
MosquitoNet”, Proceedings of the 1996 USENIX Conference, San Diego,
CA, January 1996.
2. E. Jul, H. Levy, N. Hutchinson, A. Black, “Fine-Grained Mobility in the
Emerald System”, ACM Transactions on Computer Systems 6(1),
February 1988, pp. 109-133.
Readings in Distributed Systems

General Concepts
1. B. Walker, G. Popek, R. English, C. Kline, and G. Thiel, “The LOCUS
Distributed Operating System”, Proceedings of the 9th ACM Symnposium
on Operating Systems Principles, October 1983.
2. Leslie Lamport, “Time, Clocks, and the Ordering of Events in a
Distributed System”, Communications of the ACM, July 1978, pp. 558564.
Readings in Naming

General Concepts
1. Lecture Notes in Computer Science, 60, Operating Systems – An
Advanced Course, R. Bayer, R.M. Graham, and G. Seegmüller (eds.),
Springer-Verlag, 1978, pp. 99-208, J.H. Saltzer, Chapter 3.A., “Naming
and Binding Objects.”

Names in Distributed Systems
1. Needham, Roger, “Names”, Distributed Systems, second edition, Sape
Mullender, editor, Addison-Wesley, ACM Press, 1993, Chapter 12, pp.
315-327.
2. Hudson, Richard, Morrison, Ron, Moss, J. Eliott B., and Munro, David,
“Garbage Collecting the World: One Car at a Time”.
Readings in Fault Tolerance and Reliable Systems

High-Availability Systems
1. J. Gray, D. Siewiorek, “High-Availabilty Computer Systems”, Computer
24, 9 (September 1991), pp. 39-48.
2. L. Lamport, R. Shostak, and M. Pease, “The Byzantine Generals
Problem”, ACM Transactions on Programming Languages and Systems,
July 1982, pp. 382-401.
3. S.C. Wang, and K.Q. Yan, “Revisiting Fault Diagnosis Agreement in a
New Territory,” Operating Systems Review, April 2004, pp. 41-61.
Readings in Security

Cryptosystems
1. R. Rivest, A. Shamir, and L. Adelmann, “A Method for Obtaining Digital
Signatures and Public-Key Crptosystems.” Communications of the ACM, 21:
120-126. February 1978.

Authentication
1. B. Clifford Neuman and Theodore Ts’o, “Kerberos: An Authentication
Service for Computer Networks”, IEEE Communications Magazine,
Volume 32, Number 9, pages 33-38, September 1994.
Readings in Consensus

Asynchronous Systems
1. M. Fischer, N. Lynch, M. Paterson, “Impossibility of Distributed
Consensus with One Faulty Process”, in: Journal of the ACM, April 1985,
vol. 32, no 2, p. 374-382.
2. Ada Waichee Fu, “Delay-Optimal Quorum Consensus for Distributed
Systems”. IEEE Transactions on Parallel and Distributed Systems,
Volume 8 , Issue 1 (January 1997), pages: 59 – 69, 1997.
3. Turek, John, and Shasha, Dennis, “The Many Faces of Consensus in
Distributed Systems.” IEEE Computer Science Press, Volume 25, Issue 6,
pages 8-17, 1992.
Lecture Schedule and Assigned Readings from Papers
January 12: Introduction
January 17: Specifying Distributed Operating Systems
Paper: L. Kleinrock. “Distributed Systems”, Communications of the
ACM”, November, 1985.
January 19: Clocks and Distributed Snapshots
Paper: L. Lamport. “Time, clocks, and the ordering of events in a
distributed system”, Communications of the ACM, July 1978.
January 24: Synchronization and Agreement
Paper: L. Lamport, R. Shostak, and M. Pease. “The Byzantine generals
Problem”, ACM Transactions on Programming Languages, July 1982.
January 26: RPC and objects
Paper: A.D. Birrell and B.J. Nelson. “Implementing remote procedure
calls,” ACM Transactions on Computer Systems, February 1984.
January 31: Group Communication
Paper: K.P. Birman. “The process group approach to reliable distributed
computing,” Communications of the ACM, December 1993.
February 2: Distributed Shared Memory
Paper: K. Li and P. Hudak. “Memory coherence in shared virtual
memory systems”, ACM Transactions on Computer Systems, November
1989.
February 7: Naming and Resource Location
Paper: D.C. Oppen and Y.K. Dalal. “The Clearinghouse: A
decentralizaed agent for locating named objects in a distributed
environment”, ACM Transactions on Office Information Systems, July
1983.
February 9: Distributed name servers
Paper: D.R. Cheriton and T.P. Mann. “Decentralizing a global naming
service for improved performance and fault tolerance”, ACM Transactions
on Computer Systems, May 1989.
February 14: Distributed file systems
Paper: J. Howard, M. Kazar, S. Menees, D. Nichols, M. Satyanarayanan,
R. Sidebotham, and M. West. “Scale and performance in a distributed file
system.” ACM Transactions on Computer Systems, February 1988.
February 16: Encryption
Paper: T. Lomas, L. Gong, J. Saltzer, and R. Needham. “Reducing risks
from poorly chosen keys”, Proceedings Twelfth ACM Symposium on
Operating Systems Principles, Litchfield Park, Arizona, December 1989,
pages 14-18.
February 21: Authentication
Paper: B. Lampson, M. Abadi, M. Burrows, and E. Wobber.
“Authentication in distributed systems: Theory and practice”, Proceedings
Thirteenth ACM Symposium on Operating Systems Principles, Pacific
Grove, California, October 1991, pages 165-182.
February 23: Replicated state machines
Paper: R.B. Schneider. “Implementing fault-tolerant services using the
state machine approach: A tutorial”, ACM Computing Surveys,
December 1990.
February 28: Transactions
Paper: R. Haskin, Y,. Malachi, W. Sawdon, and G. Chan. “Recovery
management in QuickSilver”, ACM Transactions on Computer Systems,
February 1988.
March 2:
Replicated data
Paper: D.K. Gifford. “Weighted voting for replicated data”, Proceedings
of the Seventh ACM Symposium on Operating Systems Principles, Pacific
Grove, California, Decmeber 1979, pages 150-162.
March 9: Midterm Exam
March 13-March 17: Spring Break
March 21:
Mobility and disconnected operation
Paper: D.B. Terry, M.M. Theimer, K. Petersen, A.J. Demers, M.J.
Spreitzer, and C.H. Hauser. “Managing update conflicts in Bayou, aq
weakly connected replicated storage system”, Proceedings of the Fifteenth
ACM Symposium on Operating Systems Principles, Cooper Moountain,
Colorado, December 1995, pages 172-183.
March 23:
Peer-to-Peer systems
Paper: I. Stoica, R. Morris, D. Karger, M.F. Kaashoek, and H.
Balakrishnan. “Chord: A Scalable Peer-to-Peer lookup service for Internet
Applications”, Proceedings of ACM SIGCOMM Conference, 2001, pages
149-160.
March 28:
WEB and content distribution networks
Paper: S. Saroiu, K. Gummadi, R. Dunn, S. Gribble, and H. Levy. “An
analysis of internet content delivery systems”, Procedings Fifth
Symposium on Operating Systems Design and Implementation, 2002.
March 30:
Distributed Systems Growth and Evolution
Paper: M. Schroeder, A. Birrell, and R. Needham. “Experience with
Grapevine: The Growth of a Distributed System”, ACM Transactions on
Computer Systems, February 1984.
April 4:
Ubiquitous Computing
Paper: M. Weiser. “Some computer science issues in ubiquitous
computing”, Communications of the ACM, July 1993.
April 6:
Failure Detection
Paper: Michel Reynal. “A short introduction to failure detectors for
Asynchronous Distributed Systems”, ACM SIGACT News, 2005, pages
53-70.
April 11:
Clusters
Paper: Fox, Gribble, Chawathe, Brewer, and Gauthier. “Cluster-Based
Scalable Network Services”, Proceedings of the Sixteenth ACM
Symposium on Operating Systems Principles, 1997, pages 78-91.
April 13:
Multicast Group Communications
Paper: Moser, Mellar-Smith, Agarwal, Budhia, and Lingley-Papadpoulas.
“ Totem: A fault-tolerant multicast group communication system”, ACM
Communications of the ACM, April 1996, pages 54-63.
April 18:
Review for Final Examination
April 20:
Demonstrations of Student Projects
April 27:
Demonstrations of Student Projects
May 2:
May 4:
Dead Day
Final Examination
Download