10/26

advertisement
COS 109 Monday October 26
• Housekeeping
– Midterm this week
90 minutes to take it
No class next Wednesday (can do midterm here then). I will bring copies.
Will be available from Wednesday 1201 AM to Friday 1159PM on blackboard
– Midterm review today (10/26) 7-830PM in Aaron Burr 219
See website for a list of concepts and details of exam
Open book and notes
Can use computer only to read lecture notes and as a calculator
Cannot use computer, phone, tablet, … for anything else
Cannot discuss the exam with others until after midnight on Friday
–
Problem Set 4 returned today
Complications about state machines giving output
• Today’s class
– Software we depend on
File systems (continued)
The cloud
Applications – APIs and SDKs
State machines – what if peanuts are purchased
• Incorrect answer
20
25
p
30
p
p
p
0
35
p
State machines – what if peanuts are purchased
• correct answer
20
25
p/p
p/p
0
5
30
p/p
35
p/p
10
15
State machines
• Let’s look at the state where there are 30 cents input
d/r
n/r
10
p/p
30
q/r
c/c
r/r
0
Levels of network file systems
• Your PC
– all files are local
– a reference to foo.jpg shows the image if it is in the same folder
• cpanel
– what is there is what you have uploaded
– a reference to foo.jpg shows the image if it is in the same folder
– foo.jpg will not display if it is on your laptop and you are looking at your
webpage on cpanel
• The H: drive (mounting from Princeton)
– files can be made to appear to be local to your PC
– foo.jpg on the H: drive will display if it is properly referenced H:\foo.jpq
Grades on Problem Set 4
20
18
16
14
12
10
8
6
4
2
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49
Graded out of 20, average score 18.1
How the file system converts logical to physical
• disk is physically organized into sectors, or blocks of bytes
– each sector is a fixed number of bytes, like 512 or 1024 or …)
– reading and writing always happens in sector-sized blocks
• each file occupies an integral number of blocks
– files never share a block
– some space is wasted: a 1-byte file wastes all but 1 byte of the block
• if a file is bigger than one block, it occupies several blocks
– the blocks are not necessarily adjacent on the disk
• need a way to keep track of the blocks that make up the file
• this is usually done by a separate "file allocation table" that lists
the blocks that make up each file
– this table is stored on disk too so it persists when machine is turned off
– lots of ways to implement this
Converting logical to physical, continued
• every block is part of some file, or reserved by operating system,
or unused
• "file allocation table" keeps track of blocks
– by chaining/linking them together
first block of a file points to second, second points to third, etc.
last block doesn't point to a successor (because it doesn't have one)
– or (much more common) by some kind of table or array
that keeps track of related blocks
• also keeps track of unused blocks
– disk starts out with most blocks unused ("free")
some are reserved for file allocation table, etc.
– as a file grows, blocks are removed from the unused list and attached to
the list for the file:
to grow a file, remove a block from the list of unused blocks
and add it to the blocks for the file
Converting logical to physical: directories
• a directory / folder is a file
– stored in the same file system
– uses the same mechanisms
• but it contains information about other files and directories
• the directory entry for a file tells where to find the blocks
• the directory entry also contains other info about the file
– name (e.g., midterm.doc)
– size in bytes, date/time of changes, access permissions
– whether it's an ordinary file or a directory
• the file system maintains the info in a directory
– very important to keep directory info consistent
– application programs can change it only indirectly / implicitly
What happens when you say "Open"?
• search for file in sequence of directories
as given by components of its name
– report an error if any component can't be found
• read blocks of the file as needed
– using the location information in the file allocation table
to find the blocks
– store (some of) them in RAM
What happens when you say "Save"?
• make sure there's enough space (enough unused blocks)
– don't want to run out while copying from RAM to disk
• create a temporary file with no bytes in it
• copy the bytes from RAM and/or existing file to temporary file:
while (there are still bytes to be copied) {
get a free block from the unused list
copy bytes to it until it's full or there are no more bytes to copy
link it in to the temporary file
}
• update the directory entry to point to the new file
• move the previous blocks (of old version) to the unused list
– or to recycle bin / trash
What happens when you remove a file?
• move the blocks of the file to the unused list
• set the directory entry so it doesn't refer to any block
– set it to zero, maybe
• recycle bin / trash
– recycle bin or trash is just another directory
– removing a file just puts the name, location info, etc., in that directory
instead
• "emptying the trash" moves blocks into unused list
– removes entry from Recycle / Trash directory
• why "removing" a file isn't enough
– usually only changes a directory entry
– often recoverable by simple guesses about directory entry contents
– file contents are often still there even if directory entry is cleared
Forgotten, but not gone…
Q. Are files really deleted when I empty my Recycle Bin,
or are they lurking somewhere in the depths of the
computer?
A. When you delete the items in your Recycle Bin, you
are not actually erasing the data from your computer
but rather telling Windows (which tracks such matters
in its file allocation table) that it is all right to write
over those files with new ones.
Until you save new files and data to your computer that
can take the space previously allotted to the old ones,
the files deleted from your Recycle Bin are still
present and could possibly be recovered with a
recovery or "unerase" utility program.
New York Times, 10/25/02
Removing files
in the 7th Circuit
The decision
• The court ruled that Citrin’s authorization terminated with his
breach of his duty of loyalty in quitting, and that his actions were
“exceeding authorized access”, as defined by the Computer Fraud and
Abuse Act (CFAA) to be “access[ing] a computer with authorization
and…us[ing] such access to obtain or alter information in the
computer that the accesser is not entitled so to obtain or alter.”
While Citrin argued that his employment contract authorized him to
“return or destroy” data in the laptop, it was unlikely that this was
intended to authorize him to irreversibly destroy data that the
company had no copies of, or data that incriminated him in
misconduct. Therefore, the judgment was reversed with directions to
reinstate the suit.
• This case was one of a series of cases that applied the CFAA to
employee misconduct. Originally, the CFAA had been crafted to
prevent criminal hacking in government interest computers. The
definition of “authorization” used in Citrin would be later rejected by
the Ninth Circuit court in LVRC Holdings v. Brekka in favor of a more
narrow “active authorization” that is granted by an authority.
Electronic discovery
• Federal Rules of Civil Procedure (FRCP) updated Dec 2006,
definition of electronic items that may be subject to discovery
includes all electronically stored information.
• "everything from standard Word documents and emails to
voicemail messages, instant messages, blogs, backup tapes, and
database files." (from findlaw.com)
The user has control -- File permissions
Network file systems
• software system for accessing remote files across networks
• user programs access files and folders as if they are on the
local machine
• operating system converts these into requests to ship
information to/from another machine across a network
• there has to be a program on the other end to respond to
requests
• "mapping a network drive" or "mounting your H: drive" sets up
the connections
• subsequent reads and writes go through the network instead of
the local disk
Levels of network file systems
• Your PC
• The H: drive (mounting from Princeton)
• The cloud
– Google docs/Office 365 vs. Microsoft Office
Run software that is not on your machine
Store files remotely
+ Software updates are easy, storage is robust, sharing is facilitated
- If you are not connected, you may not be able to edit your files
– Google Drive/Dropbox/Amazon Web Services/…
Store files remotely
Mount stores on your desktop or view through web browser
+ facilitates sharing, files in a folder on your desktop as though they were local
- Requires an internet connection
- Chromebook
-
The operating system is the browser
Everything (except browser) is remote
+ less expensive, storage is robust,
- needs a connection (but this is being fixed by downloadable apps)
One step further into the cloud
• SaaS (software as a Service) e.g. from Amazon Web Services
• Levels of service
– I need a bunch of machines for some time
Could be for a single computation
Could be for the long term
many startups begin (and continue) this way
– I need machines but I want you to do my processing
Very attractive to small businesses
Amazon manages inventory, transactions, …
– I need storage to back up my systems
Typically slow speed of retrieval, retrieval rare
How big is the cloud?
• Amazon has more than 1 million users of its cloud
• Amazon’s cloud has more than 2 million servers in 11 cloud regions
around the world
• Microsoft and Google each have around 1 million servers
• Many other companies have hundreds of thousands of servers
– Facebook, Akamai, Rackspace, GODaddy, HP/EDS, IBM
Web site of the day
• Show me the love
• When are my bits coming?
How applications use the operating system
• operating system provides services to be accessed by application
programs
– Unix "system calls", Windows Application Programming Interface ("API")
"what is the exact time?"
"allocate more memory to me"
"read N bytes from file F into memory location M"
"write N bytes from memory location M into file F"
"establish a network connection to www.princeton.edu"
"write N bytes to the network connection"
“I’m all done; get rid of me”
• operating system provides an interface for applications to use
–
–
–
–
–
programs access machine capabilities only through this interface
different physical hardware can provide the same interface
programs can be moved to any system that provides the same interface
different operating systems can provide the same interface
one operating system can simulate the interface provided by another
• operating system hides details of specific hardware
Example of system-call level coding
• C program to copy input to output ("copy" command)
• read, write, exit are system calls
main() {
char buf[8192];
int n;
while ((n = read(0, buf, sizeof(buf))) > 0)
write(1, buf, n);
exit(0);
}
Software is organized into "layers"
• each layer presents an interface that higher layers can use
– defines a "platform" for putting more on top
– insulates the higher layer from how the lower layer is implemented
– often called "Application Programming Interface" or API
• operating system ("kernel")
– lowest software layer, on top of hardware
– presents its capabilities as system calls
• libraries
– code to be used as building blocks in programs
– present their capabilities as APIs
• applications
– e.g., browser, word processor, mailer, compiler, directory lister, ...
– use libraries and system calls through APIs
Operating system kernel
• The kernel is the most fundamental part of an operating system. It
can be thought of as the program which controls all other programs
on the computer. It is responsible for the creation and destruction of
memory space which allows software to run. It provides services so
that programs can request the use of the network card, the disk or
any other piece of hardware (the kernel forwards that request to
special programs called drivers which control the hardware), manages
the file system and sets interrupts for the CPU to
enable multitasking.
Layering
• an application generally
calls multiple libraries
applications
– might not make direct
system calls
• a library generally calls
other libraries
• library and system call
levels define interfaces
(APIs)
• programmers may not know
what is "library" and what
is "system call"
library calls
libraries
system calls
operating system
hardware
Interface issues
•
•
•
•
•
•
•
application/kernel boundary
application programming interfaces
interface ownership
independent implementations
platforms
library calls
middleware
library
virtual machines
application
system calls
operating system
hardware
Where's the line between OS and applications?
• there are lots of ways to create layers and glue them together
• many choices of what to include in kernel or put in library
• “operating system” and “kernel” are not well defined
–
–
–
–
“Windows” might mean everything (OS, applications, etc)
“Windows OS” usually means the part that controls the rest
"Linux" may mean "kernel" or may mean "kernel + applications"
dividing line is not always clear
• "kernel"
– minimal part that runs regardless of what else the system is being used
for or is doing
– provides essential, central services
– controls shared resources
– protects information, enforces privacy and security
– user programs can only use it through its defined interfaces
– usually runs in hardware-supported protected mode
Microsoft antitrust case
(1994-2011)
• “operating system” and “kernel” are not well defined
– “Windows” might mean everything (OS, applications, etc)
– “Windows OS” usually means the part that controls the rest
• what is operating system and what is application?
• Dept of Justice v Microsoft was partly about this question
– is Internet Explorer part of the operating system?
– will the system be damaged or restricted if IE is removed or replaced?
• Microsoft said Yes, DoJ said No
– http://www.usdoj.gov/atr/cases/ms_index.htm
What's an API?
Operating systems perform many functions, including
allocating computer memory and controlling peripherals
such as printers and keyboards. Operating systems also
function as platforms for software applications. They do
this by "exposing" — i.e., making available to software
developers — routines or protocols that perform certain
widely-used functions. These are known as Application
Programming Interfaces, or "APIs."
Excerpted from Final Judgment
State of New York, et al v. Microsoft Corporation
US District Court, District of Columbia, Nov 1, 2002
API fragment
Sample Java API (tiny excerpt)
Independent implementations of an interface
• who owns an interface?
• can interfaces be owned?
• company A sells something (hardware or software)
• company A publishes (widely) the API for programming it
– with the intent that third parties will develop applications for the thing
– and thus make it more attractive so company A will sell more
• company B uses A's interface definition to make a cheaper version
of the thing that works the same
– so all the third-party applications will run on B's cheaper version
– thus cutting into A's market
• company A sues company B
• who should win?
Android phone organization
apps
written in
Java
Java APIs
library calls
libraries
virtual machine
system calls
operating system
hardware
Oracle v Google
The Java
readable
readable
readable
(from the decision in May, 2012)
language, like C and C++, is a humanlanguage. Code written in a humanlanguage — “source code” — is not
by computer hardware.
Only “object code,” which is not human-readable,
can be used by computers. Most object code is in
a binary language, meaning it consists entirely of
0s and 1s. Thus, a computer program has to be
converted, that is, compiled, from source code
into object code before it can run, or “execute. In
the Java system, source code is first converted
into “bytecode,” an intermediate form, before it is
then converted into binary machine code by the
Java virtual machine.
Oracle v Google
(from the 2012 decision)
“So long as the specific code used to implement a
method is different, anyone is free under the
Copyright Act to write his or her own code to carry
out exactly the same function or specification of
any methods used in the Java API. It does not
matter that the declaration or method header lines
are identical. Under the rules of Java, they must
be identical to declare a method specifying the
same functionality — even when the implementation
is different. When there is only one way to express
an idea or function, then everyone is free to do so
and no one can monopolize that expression."
RangeCheck
private static void rangeCheck(int arrayLen,
int fromIndex, int toIndex) {
if (fromIndex > toIndex)
throw new IllegalArgumentException("fromIndex(”
+ fromIndex + ") > toIndex(" + toIndex+")");
if (fromIndex < 0)
throw new ArrayIndexOutOfBoundsException(fromIndex);
if (toIndex > arrayLen)
throw new ArrayIndexOutOfBoundsException(toIndex);
}
RangeCheck
(slightly simpler version, in Javascript)
function rangeCheck(len, from, to) {
if (from > to || from < 0 || to > len)
return 0;
else
return 1;
}
Oracle v Google
“I have done, and still do, a significant amount of
programming in other languages. I've written blocks
of code like rangeCheck a hundred times before. I
could do it, you could do it. The idea that someone
would copy that when they could do it themselves
just as fast, it was an accident. There's no way
you could say that was speeding them along to the
marketplace. You're one of the best lawyers in
America, how could you even make that kind of
argument?”
Judge William Alsup, 9th Circuit,
to David Boies, attorney for Oracle
Quotes from brief
• "Major modern operating systems reimplement the groundbreaking
UNIX API"
• "The C programming language became universal because of its
uncopyrightable interface"
• "Computers rely on the uncopyrightable nature of APIs and
network protocols to communicate over the Internet"
• "Treating API as copyrightable would undermine the industry
standards for cloud computing"
• "Uncopyrightable interfaces allow software that makes different
systems compatible"
• "Uncopyrightable interfaces help programmers develop completely
new capabilities for software"
Platforms, middleware, virtual machines
• platform: hardware or software
on which applications can run
• middleware: uses OS interface
but exposes its own APIs to
developers, so applications using
it can move to any OS where
the middleware has been moved
application
library calls
library
(e.g., browser-based software)
• virtual machine: software that
mimics behavior of hardware so
other software can run on it
(can be above the operating
system too, as in VMWare)
middleware
system calls
operating system
virtual machine
hardware
Download