CS194-3/CS16x Introduction to Systems Lecture 1 What is a “System”? August 27, 2007 Prof. Anthony D. Joseph http://www.cs.berkeley.edu/~adj/cs16x Who am I? Professor Anthony D. Joseph • 465 Soda Hall (RAD Lab) • adj AT cs.berkeley.edu • Office hours Mon/Tue 1-2pm in 413 Soda • Background: – MIT undergrad and grad student • Research areas: – Current: Network security, OS security, very large security testbeds – Other: Mobile computing, wireless networking, cellular telephony 8/27/07 Joseph CS194-3/16x ©UCB Fall 2007 Lec 1.2 Goals for Today • Motivation for a new course • Topics: – Operating systems, Databases, Networking, Security, Software engineering, Distributed systems • Complexity Interactive is important! Ask Questions! Note: Some slides and/or pictures in the following are adapted from slides ©2005 Silberschatz, Galvin, and Gagne. Slides courtesy of Kubiatowicz, AJ Shankar, George Necula, Alex Aiken, Eric Brewer, Ras Bodik, Ion Stoica, Doug Tygar, and David Wagner. 8/27/07 Joseph CS194-3/16x ©UCB Fall 2007 Lec 1.3 Why Change CS 162? • Only minor changes since early 1990’s… – Slides! – Java version of Nachos – Content: More crypto/security, less databases and distributed filesystems – Time to update again!! • Most CS students take CS 162 and 186 – But, not all take EE 122, CS 169/161 – We’d like all students to have a basic understanding of key concepts from these classes • Each class introduces the same topics with classspecific biases 8/27/07 – Concurrency in an Operating System versus in a Database – Introduce concepts with a common framework Joseph CS194-3/16x ©UCB Fall 2007 Lec 1.4 Rapid Underlying Technology Change • “Cramming More Components onto Integrated Circuits” – Gordon Moore, Electronics, 1965 8/27/07 Joseph CS194-3/16x ©UCB Fall 2007 Lec 1.5 Computing Devices Everywhere 8/27/07 Joseph CS194-3/16x ©UCB Fall 2007 Lec 1.6 People-to-Computer Ratio Over Time From David Culler 8/27/07 Joseph CS194-3/16x ©UCB Fall 2007 Lec 1.7 Increasing Software Complexity From MIT’s 6.033 course 8/27/07 Joseph CS194-3/16x ©UCB Fall 2007 Lec 1.8 But, Latency Improves Slowly… From MIT’s 6.033 course 8/27/07 Joseph CS194-3/16x ©UCB Fall 2007 Lec 1.9 Heat is a Major Problem! From MIT’s 6.033 course 8/27/07 Joseph CS194-3/16x ©UCB Fall 2007 Lec 1.10 The Internet 8/27/07 Joseph CS194-3/16x ©UCB Fall 2007 Lec 1.11 The Dark Side of the Internet… 8/27/07 Joseph CS194-3/16x ©UCB Fall 2007 Lec 1.12 Zombie Networks • Click on the link and you join the STORM zombie network of 250K-10M “0wned” PCs • Zombies used by malicious hackers (crackers) for phishing, spamming, identity theft, extortion • Crackers build zombie networks of 10K-1M compromised machines & sell services – Ex: Take down competitor's website for $1K • Hugely profitable! – Massive spamming, ID fraud through phishing – Roughly half of all spam is sent by zombies • How can we secure our machines against folks like this? 8/27/07 Joseph CS194-3/16x ©UCB Fall 2007 Lec 1.13 Complexity • How to manage complexity at all levels? • Many issues and many tradeoffs • Need a global view of systems – Decompose into components • Need a global understanding of systems – Applications, networks, databases, operating systems, security, software engineering… 8/27/07 Joseph CS194-3/16x ©UCB Fall 2007 Lec 1.14 Course Administration • Instructor: Anthony D. Joseph (adj AT cs.berkeley.edu) 465 Soda Hall Office Hours: M/Tu 1-2 in 413 Soda • TAs: Kai Xia (dk7x@berkeley.edu) • Website: http://www.cs.berkeley.edu/~adj/cs16x • Reader and book: TBA – Most likely: Silberschatz, Galvin, and Gagne, Operating Systems Concepts, 7th Ed., 2005 • Projects: First project will likely be Nachos-based • Grading: TBA 8/27/07 Joseph CS194-3/16x ©UCB Fall 2007 Lec 1.15 Topic Coverage • • • • • • • • • • • • 8/27/07 Managing complexity (abstractions, layering, modularity) Team programming, IDEs, documentation style OS, memory, database, and network security Kernel and address spaces, Address translation, Caching, TLBs, demand paging I/O Systems, File systems, directories, database buffer pools, tuple layouts, files of tuples Internet evolution, architectures, protocols, routing, P2P and overlay networks, Concurrency, processes, threads, ACID Enforcing mutual exclusion, serializability, 2PL, logging, recovery, deadlock Viruses, worms, and botnets, DDoS, Cryptographic algorithms: RSA, MD5, DES Simple authentication protocols, PKI Query (dataflow) operators, map-reduce Joseph CS194-3/16x ©UCB Fall 2007 Lec 1.16 Creating Software Is Awesome • It’s like art: – There’s a vision, a realization, an aesthetic appeal, a sense of ownership and satisfaction • It’s not like art: – The end result is useful » To you, and anyone else • It’s immensely satisfying to do – Your project is your baby » It’ll keep you up at night, make you proud… » But won’t disown you when it’s 14 (though you might disown it) • Good software engineering can be learned 8/27/07 – But it is hard to teach – Most people only learn through experience (making mistakes) Joseph CS194-3/16x ©UCB Fall 2007 Lec 1.17 Group Project Simulates Industrial Environment • Project teams have 4 or 5 members in same discussion section – Must work in groups in “the real world” • Communicate with colleagues (team members) – – – – – Communication problems are natural What have you done? What answers you need from others? You must document your work!!! Everyone must keep an on-line notebook • Communicate with supervisor (TAs) – How is the team’s plan? – Short progress reports are required: 8/27/07 » What is the team’s game plan? » What is each member’s responsibility? Joseph CS194-3/16x ©UCB Fall 2007 Lec 1.18 Typical Lecture Format Attention 20 min. Break 25 min. Break 25 min. “In Conclusion, ...” Time • • • • • • • 1-Minute Review 20-Minute Lecture 5- Minute Administrative Matters 25-Minute Lecture 5-Minute Break (water, stretch) 25-Minute Lecture Instructor will come to class early & stay after to answer questions 8/27/07 Joseph CS194-3/16x ©UCB Fall 2007 Lec 1.19 Academic Dishonesty Policy • Copying all or part of another person's work, or using reference material not specifically allowed, are forms of cheating and will not be tolerated. A student involved in an incident of cheating will be notified by the instructor and the following policy will apply: • • • • http://www.eecs.berkeley.edu/Policies/acad.dis.shtml The instructor may take actions such as: – require repetition of the subject work, – assign an F grade or a 'zero' grade to the subject work, – for serious offenses, assign an F grade for the course. The instructor must inform the student and the Department Chair in writing of the incident, the action taken, if any, and the student's right to appeal to the Chair of the Department Grievance Committee or to the Director of the Office of Student Conduct. The Office of Student Conduct may choose to conduct a formal hearing on the incident and to assess a penalty for misconduct. The Department will recommend that students involved in a second incident of cheating be dismissed from the University. 8/27/07 Joseph CS194-3/16x ©UCB Fall 2007 Lec 1.20 Computer System Organization • Computer-system operation – One or more CPUs, device controllers connect through common bus providing access to shared memory – Concurrent execution of CPUs and devices competing for memory cycles 8/27/07 Joseph CS194-3/16x ©UCB Fall 2007 Lec 1.21 Example: Some Mars Rover Requirements • Serious hardware limitations/complexity: – 20Mhz powerPC processor, 128MB of RAM – cameras, scientific instruments, batteries, solar panels, and locomotion equipment – Many independent processes work together • Can’t hit reset button very easily! – Must reboot itself if necessary – Always able to receive commands from Earth • Individual Programs must not interfere – Suppose the MUT (Martian Universal Translator Module) buggy – Better not crash antenna positioning software! • Further, all software may crash occasionally – Automatic restart with diagnostics sent to Earth – Periodic checkpoint of results saved? • Certain functions time critical: – Need to stop before hitting something – Must track orbit of Earth for communication 8/27/07 Joseph CS194-3/16x ©UCB Fall 2007 Lec 1.22 How do we tame complexity? • Every piece of computer hardware different – Different CPU » Pentium, PowerPC, ColdFire, ARM, MIPS – Different amounts of memory, disk, … – Different types of devices » Mice, Keyboards, Sensors, Cameras, Fingerprint readers – Different networking environment » Cable, DSL, Wireless, Firewalls,… • Questions: – Does the programmer need to write a single program that performs many independent activities? – Does every program have to be altered for every piece of hardware? – Does a faulty program crash everything? – Does every program have access to all hardware? 8/27/07 Joseph CS194-3/16x ©UCB Fall 2007 Lec 1.23 OS Tool: Virtual Machine Abstraction Application Operating System Hardware Virtual Machine Interface Physical Machine Interface • Software Engineering Problem: – Turn hardware/software quirks what programmers want/need – Optimize for convenience, utilization, security, reliability, etc… • For Any OS area (e.g. file systems, virtual memory, networking, scheduling): – What’s the hardware interface? (physical reality) – What’s the application interface? (nicer abstraction) 8/27/07 Joseph CS194-3/16x ©UCB Fall 2007 Lec 1.24 Interfaces Provide Important Boundaries software instruction set hardware • Why do interfaces look the way that they do? – History, Functionality, Stupidity, Bugs, Management – CS152 Machine interface – CS160 Human interface – EE122 Protocol stack – CS169 Software engineering/management • Should responsibilities be pushed across boundaries? – RISC architectures, Graphical Pipeline Architectures 8/27/07 Joseph CS194-3/16x ©UCB Fall 2007 Lec 1.25 Virtual Machines • Software emulation of an abstract machine – Make it look like hardware has features you want – Programs from one hardware & OS on another one • Programming simplicity – – – – Each process thinks it has all memory/CPU time Each process thinks it owns all devices Different Devices appear to have same interface Device Interfaces more powerful than raw hardware » Bitmapped display windowing system » Ethernet card reliable, ordered, networking (TCP/IP) • Fault Isolation – Processes unable to directly impact other processes – Bugs cannot crash whole machine • Protection and Portability – Java interface safe and stable across many platforms 8/27/07 Joseph CS194-3/16x ©UCB Fall 2007 Lec 1.26 Four Components of a Computer System Definition: An operating system implements a virtual machine that is (hopefully) easier and safer to program and use than the raw hardware. 8/27/07 Joseph CS194-3/16x ©UCB Fall 2007 Lec 1.27 Virtual Machines (con’t): Layers of OSs • Useful for OS development – When OS crashes, restricted to one VM – Can aid testing programs on other OSs 8/27/07 Joseph CS194-3/16x ©UCB Fall 2007 Lec 1.28 What does an Operating System do? • Silerschatz and Gavin: “An OS is Similar to a government” – Begs the question: does a government do anything useful by itself? • Coordinator and Traffic Cop: – Manages all resources – Settles conflicting requests for resources – Prevent errors and improper use of the computer • Facilitator: – Provides facilities that everyone needs – Standard Libraries, Windowing systems – Make application programming easier, faster, less error-prone • Some features reflect both tasks: – E.g. File system is needed by everyone (Facilitator) – But File system must be Protected (Traffic Cop) 8/27/07 Joseph CS194-3/16x ©UCB Fall 2007 Lec 1.29 OS Systems Principles • OS as illusionist: – Make hardware limitations go away – Provide illusion of dedicated machine with infinite memory and infinite processors • OS as government: – Protect users from each other – Allocate resources efficiently and fairly • OS as complex system: – Constant tension between simplicity and functionality or performance • OS as history teacher – Learn from past – Adapt as hardware tradeoffs change 8/27/07 Joseph CS194-3/16x ©UCB Fall 2007 Lec 1.30 BREAK Data Complexity • Need to store information? • Can put it in a file • Too big or too complex? – Use a database • How big is the web? – 400 million hosts – 15-30 billion pages (http://www.pandia.com/sew/383-web-size.html)… • With a billion users looking for information 8/27/07 Joseph CS194-3/16x ©UCB Fall 2007 Lec 1.32 What is a Database System Today? 8/27/07 Joseph CS194-3/16x ©UCB Fall 2007 Lec 1.33 More Complex Database Systems 8/27/07 Joseph CS194-3/16x ©UCB Fall 2007 Lec 1.34 So… What is a Database? • We will be broad in our interpretation • A Database: – A very large, integrated collection of data. • Typically models a real-world “enterprise” – Entities (e.g., teams, games) – Relationships (e.g. The A’s are playing in the World Series) • Might surprise you how flexible this is – Web search: » Entities: words, documents » Relationships: word in document, document links to document. – P2P filesharing: » Entities: words, filenames, hosts » Relationships: word in filename, file available at host 8/27/07 Joseph CS194-3/16x ©UCB Fall 2007 Lec 1.35 What is a Database Management System? • A Database Management System (DBMS) is: – A software system designed to store, manage, and facilitate access to databases. • Typically this term used narrowly – Relational databases with transactions » E.g. Oracle, DB2, SQL Server – Mostly because they predate other large repositories » Also because of technical richness – When we say DBMS in this class we will usually follow this convention » But keep an open mind about applying the ideas! 8/27/07 Joseph CS194-3/16x ©UCB Fall 2007 Lec 1.36 Is the WWW a DBMS? • Fairly sophisticated search available – Crawler indexes pages on the web – Keyword-based search for pages • But, currently – data is mostly unstructured and untyped – search only: » can’t modify the data » can’t get summaries, complex combinations of data – few guarantees provided for freshness of data, consistency across data items, fault tolerance, … – Web sites typically have a (relational) DBMS in the background to provide these functions. • The picture is changing quickly – Information Extraction to get structure from unstructured – New standards e.g., XML, Semantic Web can help data modeling 8/27/07 Joseph CS194-3/16x ©UCB Fall 2007 Lec 1.37 “Search” versus Query • What if you wanted to find out which actors donated to John Kerry’s presidential campaign? • Try “actors donated to john kerry” in your favorite search engine. • If it isn’t “published”, it can’t be searched! 8/27/07 Joseph CS194-3/16x ©UCB Fall 2007 Lec 1.38 A “Database Query” Approach 8/27/07 Joseph CS194-3/16x ©UCB Fall 2007 Lec 1.39 “Yahoo Actors” JOIN “FECInfo” 8/27/07 (Courtesy of the Telegraph research group Joseph CS194-3/16x ©UCB Fall @Berkeley) 2007 Lec 1.40 Why Study Systems – OS/Net/DB/Sec/SE? • Learn how to build complex systems: – How can you manage complexity for future projects? • Engineering issues: – Why is the web so slow sometimes? Can you fix it? – What features should be in the next mars Rover? – How do large distributed systems work? (e.g. Skype) • Business issues: – Will my web services application scale to 1M users? • Buying and using a personal computer: – Why different PCs with same CPU behave differently? – Should you upgrade to Vista or wait? – Why does Microsoft have such a bad name (and Apple a good name)? • Security, viruses, and worms – What exposure do you have to worry about? 8/27/07 Joseph CS194-3/16x ©UCB Fall 2007 Lec 1.41