COSC 5302-01 ADVANCED OPERATING SYSTEMS Spring 2006 INSTRUCTOR: DR. LAWRENCE OSBORNE OFFICE: 201 MAES OFFICE HOURS: 8:00 a.m. to 9:30 a.m. MW, and by appointment CLASS MAILING LIST: cs5302@cs.lamar.edu CLASS WEBPAGE: http://cs.lamar.edu TEXTBOOKS: Andrew S. Tanenbaum & Maarten van Steen, Distributed Systems: Principles and Paradigms, Prentice-Hall (2002) ISBN 0-13088893-1. Robbins, Kay A., and Robbins, Steven, UNIX Systems Programming: Communication, Concurrency, and Threads. Prentice-Hall Publishers, (2003), ISBN 0-13-042411-0. OTHER SOURCES: Coulouris, Dollimore, and Kindberg, Distributed Systems: Concepts and Design, Third Edition, Addison-Wesley, 2001. Singhal and Shivaratri, Advanced Concepts in Operating Systems, McGraw-Hill, Inc., 1994. Tanenbaum, Modern Operating Systems, Second Edition, Prentice-Hall, 2001. Distributed Systems, ed. Mullender, Second Edition, Addison-Wesley, 1993. M.L. Liu, Distributed Computing, Principles and Applications, Addison-Wesley, 2004. ISBN 0-20179644-9. GRADING: Assigned Readings & Participation in Class 10 % Midterm Exam: 15 % Final Exam: 25 % Homework and Quizzes: 25 % Project: 25 % A weighted average of your total points based on this grading scheme will determine your final grade. PREREQUISITES: Data Structures COSC 2371), Operating Systems (COSC 4302), Computer Architecture (COSC 4310), and a course in probability and statistics. Students should be familiar with C and socket programming with UNIX system calls. GOALS This course is intended to be a second course in operating systems for graduate students in computer science. The course is meant to provide a basic foundation in the design of advanced operating systems. Therefore, instead of discussing the design and structure of a specific operating system, the course emphasizes the fundamental concepts and mechanisms which form the basis of the design of advanced operating systems. This course provides an in-depth examination of the principles of distributed systems in general, and distributed operating systems in particular. Covered topics include processes and threads, concurrent programming, distributed interprocess communication, deadlock protection, multiprocessor scheduling, distributed process scheduling, shared virtual memory, distributed file systems, fault tolerance in distributed systems, distributed middleware and applications such as the web and peer-to-peer systems. Some coverage of operating system principles for multiprocessors will also be included. A brief overview of advanced topics such as multimedia operating systems, real-time operating systems and mobile computing will be provided, time permitting. The main emphasis is on the various alternative approaches to the solution of problems encountered in the design of distributed operating systems. This course builds upon the topics covered in undergraduate operating systems course, such as process synchronization, interprocess communication, and file system organization. Course Policies Late Assignments: Late assignments will be allowed only by prior arrangement with the instructor. Such assignments will be penalized 10 % for each 24-hour period or franction thereof (including weekends) that they are late. Cheating and Plagiarism: While I do not think that students need to be reminded of these issues, there is a long and ugly history in computer science of violations in the Department Honesty Policy especially in courses with projects. Academic dishonesty is an egregious offense against the entire class and will not be tolerated. The Computer Science Department Honesty Policy will be strictly enforced in this course. Cheating on an examination or quiz will result in a zero on the examination or quiz. Since your grade will be based on total points, a zero on either the midterm or the final will reduce your final letter grade considerably. On projects, students who plagiarize code from the Internet or from any other sources, those who copy source code from other students, and those students who knowingly or unknowingly allow other students to copy from them will be penalized with a zero on any homework assignments in which this occurs. We expect and encourage students to discuss design strategies with one another, but there should be NO sharing of code or header files, and all assistance must be cited. You may work on any Unix machine or Linux machine with a modern C++ compiler. However, your assignments will be evaluated on one of the Solaris machines in Maes 214. Thus, we strongly recommend that you develop and test your code on one of these machines. Violations of the Honesty Policy can ruin your academic career if you are found guilty of one. If you ever find that you are uncertain about how the Policy applies to your situation, ask me. There is no reason to take risks on such an important matter as this. Midterm and Final Examinations: The final examination will be comprehensive. The final exam will be given May 4, 2006 in Maes 109 from 8 a.m. to 10:30 a.m. The midterm exam will be given on March 9. Missed Examinations or Quizzes: If quizzes, the midterm or the final exam are missed, a makeup test will only be given in the case of a documented illness or death in the family. The fact that your car does not run or that you wish to be with your girlfriend/boyfriend at the hospital are examples of excuses that will not be accepted. It is your responsibility to be in class and on time each class period. It is the responsibility of the student to find out what assignments have been missed after returning from an illness or other emergency. I feel no obligation to rescue students who do not turn in assignments on schedule. In fact, I could not do that even if I were so inclined. Incomplete Grades: No incompletes will be given in this course. Make sure that you determine before the drop deadline whether you can complete it satisfactorily. Required Readings: One paper from the published literature is assigned for each class lecture. Papers should be read in advance of the lecture so that students are prepared to participate in class discussions. The lectures will not simply repeat the material in the textbook or in the papers. For exams, students are responsible for material covered in assigned papers, in assigned readings from the textbooks, and in lectures. Suggested Readings General Readings Overview Papers 1. Andrew S. Tannenbaum and Robbert van Renesse, ``Distributed Operating Systems’’, Computing Surveys, Vol. 17, No. 4, Pages 419-470, December 1985 2. E. Levy and A. Silberschatz, ``Distributed File Systems: Concepts and Examples’’, ACM Computing Surveys, Vol. 22, No. 4, Pages 321-374, December 1990 Distributed Computing 1. Jim Basney and Miron Livny, “Deploying a High Throughput Computing Cluster”, High Performance Cluster Computing, Rajkumar Buyya, Editor, Vol. 1, Chapter 5, Prentice Hall PTR, May 1999. 2. The Worldwide Computer. An operating system spanning the Internet would harness the power of millions of the world’s networked PCs. Scientific American, February 2002 Readings for Chapter 2 on Communication Remote Procedure Call 1. Andrew Birrell and Bruce Nelson, “Implementing RPCs”, ACM Transactions on Computer Systems, Vol. 2, No. 1, Pages 39-59, February 1984. 2. B. Bershad, T. Anderson, E. Lazowska, and H. Levy, ``Lightweight Remote Procedure Call’’, Proceedings of the 12th ACM Symposium on Operating Systems Principles, Operating Systems Review, Vol. 23, No. 5, Pages 12-113, December 1989 3. Tutorials on RPC programming in UNIX and Linux and rpcgen Remote Procedure Calls I., Sun RPC, by Francisco Moya Fernandez Remote Procedure Calls (RPC) at and http://www.cs.cf.ac.uk/Dave/C/node33.html#SECTION003300000 000000000000 Protocol Compiling and Lower Level RPC Programming tutorials on RPC programming at http://www.cs.cf.ac.uk/Dave/C/node34.html#SECTION003400000 000000000000 4. Waldo, “Remote Procedure Calls and Java Remote Method Invocation.” IEEE Concurrency, vol. 6, no. 8, pp. 5-7, July 1998. Available at http://www.mcs.vuw.ac.nz/courses/COMP413/2003T1/Handouts/waldo98 .pdf 5. Open Grid Architecture, at http://www.globus.org/research/papers/ogsa.pdf 6. Foster, I., Kellselman, C., Nick, J., and S. Tueke, “Grid Services for Distributed Integration”, IEEE Computer, June 2002, p. 37-46, found at http://www.globus.org/research/papers/ieee-cs-2.pdf Readings for Chapter 3 on Processes Process and Thread Management 1. Thomas E. Anderson, Edward D. Lazowska, and Henry M. Levy, ``The Performance Implications of Thread Management Alternatives for SharedMemory Multiprocessors’’, IEEE Transactions on Computers, Vol. 38, No. 12, Pages 1631-1644, December 1989 2. Thomas E. Anderson, Brian N. Bershad, Edward D. Lazowska, and Henry M. Levy, “Scheduler Activations: Effective Kernel Support for the UserLevel Management of Parallelism”, ACM Transactions on Computer Systems, 10(1), February 1992, pp. 53-79, available at http://citeseer.nj.nec.com/anderson92scheduler.html Scheduling 1. D. L. Black, “Scheduling Support for Concurrency and Parallelism in the Mach Operating System,” IEEE Computer, 23, 5, Pages 35-43, May 1990. Process Migration 1. F. Douglis and J. Ousterhout, ``Process Migration in the Sprite Operating System’’, In Proceedings of the IEEE International Conference on Distributed Computing Systems, Berlin, Germany, Pages 18-25, September 1987 2. M.Theimer, K.Lantz, D.Cheriton, ``Preemptable Remote Execution’’, Proceedings of the 10th SOSP, Operating Systems Review, Vol. 19, No. 5, Pages 2-12, December 1985 Readings for Chapter 10: Distributed File Systems Internet File Systems 1. S. Ghemawat, H. Bobioff, and S-T Leung, “The Google File System, “ SOSP 2003, New York, October 19-22, 2003. Available at http://www.cs.rochester.edu/sosp2003/papers/p125-ghemawat.pdf 2. Readings for Consistency and Replication Shared Memory Computing 1. Christiana Amza, Alan L. Cox, Sandhya Dwarkadas, Pete Kelecher, Honghui Lu, Ramakrishnan Rajamony, Weimin Yu, Willy Zwaenepoel, “TreadMarks: Shared Memory Computing on Networks of Workstations”, IEEE Computer, 29(2), 1996, p. 18-28. Readings for Distributed Resource Management and Scheduling Global Memory Management 1. M. Freeley, W. Morgan, F. Pighim, A. Karlin, H. Levy, and C. Thekkath, “Implementing Global Memory Management in a Workstation Cluster”, Proceedings of the 15th ACM Symposium on Operating Systems Principles, December 1995. 2. A. Arpaci-Dusseau, D. Culler, A. Mainwaring, “Scheduling with Implicit Information in Distributed Systems”, Sigmetrics’98 Conference on the Measurement and Modeling of Computer Systems. 3. K. M. Chandy, J. Misra, and L. M. Haas, “Distributed Deadlock Detection,” ACM Transactions on Computer Systems 1, 2(May 1983), pp. 144-156. Available in the ACM Digital Library. 4. Mamoru Maekawa, “ A SQRT(N) Algorithm for Mutual Exclusion in Decentralized Systems,” ACM Transactions on Computer Systems, Vol. 3, No. 2, May 1985, p. 145-159. Readings for Mobility Supporting Mobility in an Operating System 1. M. Baker, X. Zhao, S. Cheshire, J. Stone, “Supporting Mobility in MosquitoNet”, Proceedings of the 1996 USENIX Conference, San Diego, CA, January 1996. 2. E. Jul, H. Levy, N. Hutchinson, A. Black, “Fine-Grained Mobility in the Emerald System”, ACM Transactions on Computer Systems 6(1), February 1988, pp. 109-133. Readings in Distributed Systems General Concepts 1. B. Walker, G. Popek, R. English, C. Kline, and G. Thiel, “The LOCUS Distributed Operating System”, Proceedings of the 9th ACM Symnposium on Operating Systems Principles, October 1983. 2. Leslie Lamport, “Time, Clocks, and the Ordering of Events in a Distributed System”, Communications of the ACM, July 1978, pp. 558564. Readings in Naming General Concepts 1. Lecture Notes in Computer Science, 60, Operating Systems – An Advanced Course, R. Bayer, R.M. Graham, and G. Seegmüller (eds.), Springer-Verlag, 1978, pp. 99-208, J.H. Saltzer, Chapter 3.A., “Naming and Binding Objects.” Names in Distributed Systems 1. Needham, Roger, “Names”, Distributed Systems, second edition, Sape Mullender, editor, Addison-Wesley, ACM Press, 1993, Chapter 12, pp. 315-327. 2. Hudson, Richard, Morrison, Ron, Moss, J. Eliott B., and Munro, David, “Garbage Collecting the World: One Car at a Time”. Readings in Fault Tolerance and Reliable Systems High-Availability Systems 1. J. Gray, D. Siewiorek, “High-Availabilty Computer Systems”, Computer 24, 9 (September 1991), pp. 39-48. 2. L. Lamport, R. Shostak, and M. Pease, “The Byzantine Generals Problem”, ACM Transactions on Programming Languages and Systems, July 1982, pp. 382-401. 3. S.C. Wang, and K.Q. Yan, “Revisiting Fault Diagnosis Agreement in a New Territory,” Operating Systems Review, April 2004, pp. 41-61. Readings in Security Cryptosystems 1. R. Rivest, A. Shamir, and L. Adelmann, “A Method for Obtaining Digital Signatures and Public-Key Crptosystems.” Communications of the ACM, 21: 120-126. February 1978. Authentication 1. B. Clifford Neuman and Theodore Ts’o, “Kerberos: An Authentication Service for Computer Networks”, IEEE Communications Magazine, Volume 32, Number 9, pages 33-38, September 1994. Readings in Consensus Asynchronous Systems 1. M. Fischer, N. Lynch, M. Paterson, “Impossibility of Distributed Consensus with One Faulty Process”, in: Journal of the ACM, April 1985, vol. 32, no 2, p. 374-382. 2. Ada Waichee Fu, “Delay-Optimal Quorum Consensus for Distributed Systems”. IEEE Transactions on Parallel and Distributed Systems, Volume 8 , Issue 1 (January 1997), pages: 59 – 69, 1997. 3. Turek, John, and Shasha, Dennis, “The Many Faces of Consensus in Distributed Systems.” IEEE Computer Science Press, Volume 25, Issue 6, pages 8-17, 1992. Lecture Schedule and Assigned Readings from Papers January 12: Introduction January 17: Specifying Distributed Operating Systems Paper: L. Kleinrock. “Distributed Systems”, Communications of the ACM”, November, 1985. January 19: Clocks and Distributed Snapshots Paper: L. Lamport. “Time, clocks, and the ordering of events in a distributed system”, Communications of the ACM, July 1978. January 24: Synchronization and Agreement Paper: L. Lamport, R. Shostak, and M. Pease. “The Byzantine generals Problem”, ACM Transactions on Programming Languages, July 1982. January 26: RPC and objects Paper: A.D. Birrell and B.J. Nelson. “Implementing remote procedure calls,” ACM Transactions on Computer Systems, February 1984. January 31: Group Communication Paper: K.P. Birman. “The process group approach to reliable distributed computing,” Communications of the ACM, December 1993. February 2: Distributed Shared Memory Paper: K. Li and P. Hudak. “Memory coherence in shared virtual memory systems”, ACM Transactions on Computer Systems, November 1989. February 7: Naming and Resource Location Paper: D.C. Oppen and Y.K. Dalal. “The Clearinghouse: A decentralizaed agent for locating named objects in a distributed environment”, ACM Transactions on Office Information Systems, July 1983. February 9: Distributed name servers Paper: D.R. Cheriton and T.P. Mann. “Decentralizing a global naming service for improved performance and fault tolerance”, ACM Transactions on Computer Systems, May 1989. February 14: Distributed file systems Paper: J. Howard, M. Kazar, S. Menees, D. Nichols, M. Satyanarayanan, R. Sidebotham, and M. West. “Scale and performance in a distributed file system.” ACM Transactions on Computer Systems, February 1988. February 16: Encryption Paper: T. Lomas, L. Gong, J. Saltzer, and R. Needham. “Reducing risks from poorly chosen keys”, Proceedings Twelfth ACM Symposium on Operating Systems Principles, Litchfield Park, Arizona, December 1989, pages 14-18. February 21: Authentication Paper: B. Lampson, M. Abadi, M. Burrows, and E. Wobber. “Authentication in distributed systems: Theory and practice”, Proceedings Thirteenth ACM Symposium on Operating Systems Principles, Pacific Grove, California, October 1991, pages 165-182. February 23: Replicated state machines Paper: R.B. Schneider. “Implementing fault-tolerant services using the state machine approach: A tutorial”, ACM Computing Surveys, December 1990. February 28: Transactions Paper: R. Haskin, Y,. Malachi, W. Sawdon, and G. Chan. “Recovery management in QuickSilver”, ACM Transactions on Computer Systems, February 1988. March 2: Replicated data Paper: D.K. Gifford. “Weighted voting for replicated data”, Proceedings of the Seventh ACM Symposium on Operating Systems Principles, Pacific Grove, California, Decmeber 1979, pages 150-162. March 9: Midterm Exam March 13-March 17: Spring Break March 21: Mobility and disconnected operation Paper: D.B. Terry, M.M. Theimer, K. Petersen, A.J. Demers, M.J. Spreitzer, and C.H. Hauser. “Managing update conflicts in Bayou, aq weakly connected replicated storage system”, Proceedings of the Fifteenth ACM Symposium on Operating Systems Principles, Cooper Moountain, Colorado, December 1995, pages 172-183. March 23: Peer-to-Peer systems Paper: I. Stoica, R. Morris, D. Karger, M.F. Kaashoek, and H. Balakrishnan. “Chord: A Scalable Peer-to-Peer lookup service for Internet Applications”, Proceedings of ACM SIGCOMM Conference, 2001, pages 149-160. March 28: WEB and content distribution networks Paper: S. Saroiu, K. Gummadi, R. Dunn, S. Gribble, and H. Levy. “An analysis of internet content delivery systems”, Procedings Fifth Symposium on Operating Systems Design and Implementation, 2002. March 30: Distributed Systems Growth and Evolution Paper: M. Schroeder, A. Birrell, and R. Needham. “Experience with Grapevine: The Growth of a Distributed System”, ACM Transactions on Computer Systems, February 1984. April 4: Ubiquitous Computing Paper: M. Weiser. “Some computer science issues in ubiquitous computing”, Communications of the ACM, July 1993. April 6: Failure Detection Paper: Michel Reynal. “A short introduction to failure detectors for Asynchronous Distributed Systems”, ACM SIGACT News, 2005, pages 53-70. April 11: Clusters Paper: Fox, Gribble, Chawathe, Brewer, and Gauthier. “Cluster-Based Scalable Network Services”, Proceedings of the Sixteenth ACM Symposium on Operating Systems Principles, 1997, pages 78-91. April 13: Multicast Group Communications Paper: Moser, Mellar-Smith, Agarwal, Budhia, and Lingley-Papadpoulas. “ Totem: A fault-tolerant multicast group communication system”, ACM Communications of the ACM, April 1996, pages 54-63. April 18: Review for Final Examination April 20: Demonstrations of Student Projects April 27: Demonstrations of Student Projects May 2: May 4: Dead Day Final Examination