COS 109 Monday October 26 • Housekeeping – Midterm this week 90 minutes to take it No class next Wednesday (can do midterm here then). I will bring copies. Will be available from Wednesday 1201 AM to Friday 1159PM on blackboard – Midterm review today (10/26) 7-830PM in Aaron Burr 219 See website for a list of concepts and details of exam Open book and notes Can use computer only to read lecture notes and as a calculator Cannot use computer, phone, tablet, … for anything else Cannot discuss the exam with others until after midnight on Friday – Problem Set 4 returned today Complications about state machines giving output • Today’s class – Software we depend on File systems (continued) The cloud Applications – APIs and SDKs State machines – what if peanuts are purchased • Incorrect answer 20 25 p 30 p p p 0 35 p State machines – what if peanuts are purchased • correct answer 20 25 p/p p/p 0 5 30 p/p 35 p/p 10 15 State machines • Let’s look at the state where there are 30 cents input d/r n/r 10 p/p 30 q/r c/c r/r 0 Levels of network file systems • Your PC – all files are local – a reference to foo.jpg shows the image if it is in the same folder • cpanel – what is there is what you have uploaded – a reference to foo.jpg shows the image if it is in the same folder – foo.jpg will not display if it is on your laptop and you are looking at your webpage on cpanel • The H: drive (mounting from Princeton) – files can be made to appear to be local to your PC – foo.jpg on the H: drive will display if it is properly referenced H:\foo.jpq Grades on Problem Set 4 20 18 16 14 12 10 8 6 4 2 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 Graded out of 20, average score 18.1 How the file system converts logical to physical • disk is physically organized into sectors, or blocks of bytes – each sector is a fixed number of bytes, like 512 or 1024 or …) – reading and writing always happens in sector-sized blocks • each file occupies an integral number of blocks – files never share a block – some space is wasted: a 1-byte file wastes all but 1 byte of the block • if a file is bigger than one block, it occupies several blocks – the blocks are not necessarily adjacent on the disk • need a way to keep track of the blocks that make up the file • this is usually done by a separate "file allocation table" that lists the blocks that make up each file – this table is stored on disk too so it persists when machine is turned off – lots of ways to implement this Converting logical to physical, continued • every block is part of some file, or reserved by operating system, or unused • "file allocation table" keeps track of blocks – by chaining/linking them together first block of a file points to second, second points to third, etc. last block doesn't point to a successor (because it doesn't have one) – or (much more common) by some kind of table or array that keeps track of related blocks • also keeps track of unused blocks – disk starts out with most blocks unused ("free") some are reserved for file allocation table, etc. – as a file grows, blocks are removed from the unused list and attached to the list for the file: to grow a file, remove a block from the list of unused blocks and add it to the blocks for the file Converting logical to physical: directories • a directory / folder is a file – stored in the same file system – uses the same mechanisms • but it contains information about other files and directories • the directory entry for a file tells where to find the blocks • the directory entry also contains other info about the file – name (e.g., midterm.doc) – size in bytes, date/time of changes, access permissions – whether it's an ordinary file or a directory • the file system maintains the info in a directory – very important to keep directory info consistent – application programs can change it only indirectly / implicitly What happens when you say "Open"? • search for file in sequence of directories as given by components of its name – report an error if any component can't be found • read blocks of the file as needed – using the location information in the file allocation table to find the blocks – store (some of) them in RAM What happens when you say "Save"? • make sure there's enough space (enough unused blocks) – don't want to run out while copying from RAM to disk • create a temporary file with no bytes in it • copy the bytes from RAM and/or existing file to temporary file: while (there are still bytes to be copied) { get a free block from the unused list copy bytes to it until it's full or there are no more bytes to copy link it in to the temporary file } • update the directory entry to point to the new file • move the previous blocks (of old version) to the unused list – or to recycle bin / trash What happens when you remove a file? • move the blocks of the file to the unused list • set the directory entry so it doesn't refer to any block – set it to zero, maybe • recycle bin / trash – recycle bin or trash is just another directory – removing a file just puts the name, location info, etc., in that directory instead • "emptying the trash" moves blocks into unused list – removes entry from Recycle / Trash directory • why "removing" a file isn't enough – usually only changes a directory entry – often recoverable by simple guesses about directory entry contents – file contents are often still there even if directory entry is cleared Forgotten, but not gone… Q. Are files really deleted when I empty my Recycle Bin, or are they lurking somewhere in the depths of the computer? A. When you delete the items in your Recycle Bin, you are not actually erasing the data from your computer but rather telling Windows (which tracks such matters in its file allocation table) that it is all right to write over those files with new ones. Until you save new files and data to your computer that can take the space previously allotted to the old ones, the files deleted from your Recycle Bin are still present and could possibly be recovered with a recovery or "unerase" utility program. New York Times, 10/25/02 Removing files in the 7th Circuit The decision • The court ruled that Citrin’s authorization terminated with his breach of his duty of loyalty in quitting, and that his actions were “exceeding authorized access”, as defined by the Computer Fraud and Abuse Act (CFAA) to be “access[ing] a computer with authorization and…us[ing] such access to obtain or alter information in the computer that the accesser is not entitled so to obtain or alter.” While Citrin argued that his employment contract authorized him to “return or destroy” data in the laptop, it was unlikely that this was intended to authorize him to irreversibly destroy data that the company had no copies of, or data that incriminated him in misconduct. Therefore, the judgment was reversed with directions to reinstate the suit. • This case was one of a series of cases that applied the CFAA to employee misconduct. Originally, the CFAA had been crafted to prevent criminal hacking in government interest computers. The definition of “authorization” used in Citrin would be later rejected by the Ninth Circuit court in LVRC Holdings v. Brekka in favor of a more narrow “active authorization” that is granted by an authority. Electronic discovery • Federal Rules of Civil Procedure (FRCP) updated Dec 2006, definition of electronic items that may be subject to discovery includes all electronically stored information. • "everything from standard Word documents and emails to voicemail messages, instant messages, blogs, backup tapes, and database files." (from findlaw.com) The user has control -- File permissions Network file systems • software system for accessing remote files across networks • user programs access files and folders as if they are on the local machine • operating system converts these into requests to ship information to/from another machine across a network • there has to be a program on the other end to respond to requests • "mapping a network drive" or "mounting your H: drive" sets up the connections • subsequent reads and writes go through the network instead of the local disk Levels of network file systems • Your PC • The H: drive (mounting from Princeton) • The cloud – Google docs/Office 365 vs. Microsoft Office Run software that is not on your machine Store files remotely + Software updates are easy, storage is robust, sharing is facilitated - If you are not connected, you may not be able to edit your files – Google Drive/Dropbox/Amazon Web Services/… Store files remotely Mount stores on your desktop or view through web browser + facilitates sharing, files in a folder on your desktop as though they were local - Requires an internet connection - Chromebook - The operating system is the browser Everything (except browser) is remote + less expensive, storage is robust, - needs a connection (but this is being fixed by downloadable apps) One step further into the cloud • SaaS (software as a Service) e.g. from Amazon Web Services • Levels of service – I need a bunch of machines for some time Could be for a single computation Could be for the long term many startups begin (and continue) this way – I need machines but I want you to do my processing Very attractive to small businesses Amazon manages inventory, transactions, … – I need storage to back up my systems Typically slow speed of retrieval, retrieval rare How big is the cloud? • Amazon has more than 1 million users of its cloud • Amazon’s cloud has more than 2 million servers in 11 cloud regions around the world • Microsoft and Google each have around 1 million servers • Many other companies have hundreds of thousands of servers – Facebook, Akamai, Rackspace, GODaddy, HP/EDS, IBM Web site of the day • Show me the love • When are my bits coming? How applications use the operating system • operating system provides services to be accessed by application programs – Unix "system calls", Windows Application Programming Interface ("API") "what is the exact time?" "allocate more memory to me" "read N bytes from file F into memory location M" "write N bytes from memory location M into file F" "establish a network connection to www.princeton.edu" "write N bytes to the network connection" “I’m all done; get rid of me” • operating system provides an interface for applications to use – – – – – programs access machine capabilities only through this interface different physical hardware can provide the same interface programs can be moved to any system that provides the same interface different operating systems can provide the same interface one operating system can simulate the interface provided by another • operating system hides details of specific hardware Example of system-call level coding • C program to copy input to output ("copy" command) • read, write, exit are system calls main() { char buf[8192]; int n; while ((n = read(0, buf, sizeof(buf))) > 0) write(1, buf, n); exit(0); } Software is organized into "layers" • each layer presents an interface that higher layers can use – defines a "platform" for putting more on top – insulates the higher layer from how the lower layer is implemented – often called "Application Programming Interface" or API • operating system ("kernel") – lowest software layer, on top of hardware – presents its capabilities as system calls • libraries – code to be used as building blocks in programs – present their capabilities as APIs • applications – e.g., browser, word processor, mailer, compiler, directory lister, ... – use libraries and system calls through APIs Operating system kernel • The kernel is the most fundamental part of an operating system. It can be thought of as the program which controls all other programs on the computer. It is responsible for the creation and destruction of memory space which allows software to run. It provides services so that programs can request the use of the network card, the disk or any other piece of hardware (the kernel forwards that request to special programs called drivers which control the hardware), manages the file system and sets interrupts for the CPU to enable multitasking. Layering • an application generally calls multiple libraries applications – might not make direct system calls • a library generally calls other libraries • library and system call levels define interfaces (APIs) • programmers may not know what is "library" and what is "system call" library calls libraries system calls operating system hardware Interface issues • • • • • • • application/kernel boundary application programming interfaces interface ownership independent implementations platforms library calls middleware library virtual machines application system calls operating system hardware Where's the line between OS and applications? • there are lots of ways to create layers and glue them together • many choices of what to include in kernel or put in library • “operating system” and “kernel” are not well defined – – – – “Windows” might mean everything (OS, applications, etc) “Windows OS” usually means the part that controls the rest "Linux" may mean "kernel" or may mean "kernel + applications" dividing line is not always clear • "kernel" – minimal part that runs regardless of what else the system is being used for or is doing – provides essential, central services – controls shared resources – protects information, enforces privacy and security – user programs can only use it through its defined interfaces – usually runs in hardware-supported protected mode Microsoft antitrust case (1994-2011) • “operating system” and “kernel” are not well defined – “Windows” might mean everything (OS, applications, etc) – “Windows OS” usually means the part that controls the rest • what is operating system and what is application? • Dept of Justice v Microsoft was partly about this question – is Internet Explorer part of the operating system? – will the system be damaged or restricted if IE is removed or replaced? • Microsoft said Yes, DoJ said No – http://www.usdoj.gov/atr/cases/ms_index.htm What's an API? Operating systems perform many functions, including allocating computer memory and controlling peripherals such as printers and keyboards. Operating systems also function as platforms for software applications. They do this by "exposing" — i.e., making available to software developers — routines or protocols that perform certain widely-used functions. These are known as Application Programming Interfaces, or "APIs." Excerpted from Final Judgment State of New York, et al v. Microsoft Corporation US District Court, District of Columbia, Nov 1, 2002 API fragment Sample Java API (tiny excerpt) Independent implementations of an interface • who owns an interface? • can interfaces be owned? • company A sells something (hardware or software) • company A publishes (widely) the API for programming it – with the intent that third parties will develop applications for the thing – and thus make it more attractive so company A will sell more • company B uses A's interface definition to make a cheaper version of the thing that works the same – so all the third-party applications will run on B's cheaper version – thus cutting into A's market • company A sues company B • who should win? Android phone organization apps written in Java Java APIs library calls libraries virtual machine system calls operating system hardware Oracle v Google The Java readable readable readable (from the decision in May, 2012) language, like C and C++, is a humanlanguage. Code written in a humanlanguage — “source code” — is not by computer hardware. Only “object code,” which is not human-readable, can be used by computers. Most object code is in a binary language, meaning it consists entirely of 0s and 1s. Thus, a computer program has to be converted, that is, compiled, from source code into object code before it can run, or “execute. In the Java system, source code is first converted into “bytecode,” an intermediate form, before it is then converted into binary machine code by the Java virtual machine. Oracle v Google (from the 2012 decision) “So long as the specific code used to implement a method is different, anyone is free under the Copyright Act to write his or her own code to carry out exactly the same function or specification of any methods used in the Java API. It does not matter that the declaration or method header lines are identical. Under the rules of Java, they must be identical to declare a method specifying the same functionality — even when the implementation is different. When there is only one way to express an idea or function, then everyone is free to do so and no one can monopolize that expression." RangeCheck private static void rangeCheck(int arrayLen, int fromIndex, int toIndex) { if (fromIndex > toIndex) throw new IllegalArgumentException("fromIndex(” + fromIndex + ") > toIndex(" + toIndex+")"); if (fromIndex < 0) throw new ArrayIndexOutOfBoundsException(fromIndex); if (toIndex > arrayLen) throw new ArrayIndexOutOfBoundsException(toIndex); } RangeCheck (slightly simpler version, in Javascript) function rangeCheck(len, from, to) { if (from > to || from < 0 || to > len) return 0; else return 1; } Oracle v Google “I have done, and still do, a significant amount of programming in other languages. I've written blocks of code like rangeCheck a hundred times before. I could do it, you could do it. The idea that someone would copy that when they could do it themselves just as fast, it was an accident. There's no way you could say that was speeding them along to the marketplace. You're one of the best lawyers in America, how could you even make that kind of argument?” Judge William Alsup, 9th Circuit, to David Boies, attorney for Oracle Quotes from brief • "Major modern operating systems reimplement the groundbreaking UNIX API" • "The C programming language became universal because of its uncopyrightable interface" • "Computers rely on the uncopyrightable nature of APIs and network protocols to communicate over the Internet" • "Treating API as copyrightable would undermine the industry standards for cloud computing" • "Uncopyrightable interfaces allow software that makes different systems compatible" • "Uncopyrightable interfaces help programmers develop completely new capabilities for software" Platforms, middleware, virtual machines • platform: hardware or software on which applications can run • middleware: uses OS interface but exposes its own APIs to developers, so applications using it can move to any OS where the middleware has been moved application library calls library (e.g., browser-based software) • virtual machine: software that mimics behavior of hardware so other software can run on it (can be above the operating system too, as in VMWare) middleware system calls operating system virtual machine hardware