11/23

advertisement
COS 109 Monday November 23
• Housekeeping
– Lab 6 and Problem Set 7 due dates
Lab 6 is due by midnight on Friday November 27
Problem Set 7 is due by 5 PM on Monday November 30
– Because these deadlines have been extended, there will be no further
extensions
– Final exam – January 18 (Monday) at 7:30PM
• Today’s class
– A few more words about the internet
– The World Wide Web
Grades on Problem set 6
45
40
35
30
25
20
15
10
5
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45
Average score 35.8; a few people did not complete the assignment
The geography of the internet
• In 2012, there were 903.9 million Internet hosts
–
–
–
–
–
–
USA
Japan
Brazil
Italy
China
Germany
505M (498M in 2011)
64.5M
26.6M
25.7M
20.6M
20.0M
–
–
–
–
–
…
Iraq
Guam
North Korea
Chad
26
23
8
6
Source CIA Factbook
Internet Users WorldWide
• Internet Users (2014 Est.)
–
–
–
–
–
–
–
–
–
China
European Union
USA
India
Japan
Brazil
Russia
Germany
Nigeria
626M
398M
276.6M
237.3M
109.3M
108.2M
84.4M
70.3M
66.6M
– Total WorldWide 3.2B
The backbone of the internet
• http://upload.wikimedia.org/wikipedia/commons/d/d2/Internet_ma
p_1024.jpg
• http://internet-map.net/
Lets register an internet domain
• http://www.directnic.com
Who manages this?
•
•
•
Internet Corp. for Assigned Names and Numbers (ICANN)
– Formed in October 1998,
– non-profit, private-sector corporation
– broad coalition of the Internet's business, technical, academic, and user
communities.
– recognized by the U.S. and other governments as the global consensus
entity to coordinate the technical management of the Internet's domain name
system, the allocation of IP address space, the assignment of protocol
parameters, and the management of the root server system.
– funded through the many registries and registrars that comprise the global
domain name and Internet addressing systems.
ICANN was formed in 1998. It is a not-for-profit public-benefit
corporation with participants from all over the world dedicated to
keeping the Internet secure, stable and interoperable. It promotes
competition and develops policy on the Internet’s unique identifiers.*
ICANN doesn’t control content on the Internet. It cannot stop spam and
it doesn’t deal with access to the Internet. But through its coordination
role of the Internet’s naming system, it does
have an important impact
* From http://www.icann.org/en/about/
on the expansion and evolution of the Internet.*
What does ICANN govern
• DNS – domain name system
– Relates names to numbers
• TLD – top level domains
– Originally there were 7
.com, .edu, .gov, .int, .mil, net, .org
– 200+ country code top level domains
– 1000+ gTLD (generic top level domains)
– ..academy, .accountant, .apartments, .biz, .black, .cool, .dad,
.money, .ooo, .sucks, .vodka, .xxx, .zone
– More are here
• Management
– One company (called a registry) is in charge of each TLD.
– A large number of companies (called registrars) can sell (and
manage) names within a TLD
How does ICANN govern
•
•
•
•
Draws up contracts with each registry
Runs an accreditation system for registrars
Oversees IP addresses (through companies)
Oversees root servers
– Root servers are 13 addresses on the Internet where complete
address tables can be found
What about the root servers?
• What do they do?
– Ultimately resolve addresses
With help from top level domains
Cs.princeton.edu
 .edu TLD to find princeton
 princeton.edu to find cs.princeton.edu
– But things change slowly, so
There are intermediate name servers which cache addresses
Very few address queries actually come to a root server.
List of root servers
Hostname
a.root-servers.net
IP Addresses
198.41.0.4, 2001:503:ba3e::2:30
Manager
VeriSign, Inc.
b.root-servers.net
192.228.79.201, 2001:500:84::b
University of Southern California (ISI)
c.root-servers.net
192.33.4.12, 2001:500:2::c
Cogent Communications
d.root-servers.net
199.7.91.13, 2001:500:2d::d
University of Maryland
e.root-servers.net
192.203.230.10
NASA (Ames Research Center)
f.root-servers.net
192.5.5.241, 2001:500:2f::f
Internet Systems Consortium, Inc.
g.root-servers.net
192.112.36.4
US Department of Defense (NIC)
h.root-servers.net
128.63.2.53, 2001:500:1::803f:235
US Army (Research Lab)
i.root-servers.net
192.36.148.17, 2001:7fe::53
Netnod
j.root-servers.net
192.58.128.30, 2001:503:c27::2:30
VeriSign, Inc.
k.root-servers.net
193.0.14.129, 2001:7fd::1
RIPE NCC
l.root-servers.net
199.7.83.42, 2001:500:3::42
ICANN
m.root-servers.net
202.12.27.33, 2001:dc3::35
WIDE Project
Root servers
• Some are fixed in location (unicast)
• Others are distributed (anycast)
– Queries are routed to the topologically closest of a group of receivers all
identified by the same destination address.
– So, a decentralized service is provided.
– Anycase servers can be used to distribute the impact of a distributed
denial of service (DDoS) atack and so reduce its impact.
And where are they?
Details at http://www.root-servers.org/
Peering points
• There are several hundred such points
• Largest is Deutscher Commercial Internet Exchange with 650+
members and a peak speed of 5000 Gbit/sec (average speed
3000 Gbit/sec) of connected capacity and an average thruput of
1061 Gbit/sec Quick Facts (100% up time since 1997)
Summarizing internet Ideas
• packets versus circuits
– different models (mail vs phone)
• names and addresses
– what is a computer called, how to find it
• routing
– how to get from here to there
• protocols and standards
– Internet works because of IP as common mechanism
higher level protocols all use IP
specific hardware technologies carry IP packets
• layering
– divide system into layers
each of which provides services to next higher level
while calling on service of next lower level
– a way to organize and control complexity, hide details
Summarizing internet technical issues:
• privacy & security are hard
– data passes through shared unregulated dispersed media and sites
scattered over the whole world
– it's hard to control access & protect information along the way
– many network technologies (e.g., Ethernet, wireless) use broadcast
encryption necessary to maintain privacy
– many mechanisms are not robust against intentional misuse
– it's easy to lie about who you are
• service guarantees are hard
– no assurance of reliable delivery, let alone of bandwidth, delay or jitter
• some resources are running low
– IPv4 addresses are pretty much all assigned
– IPv6 (the next generation) uses 128-bit addresses
acceptance growing, by necessity
• but it has handled exponential growth amazingly well
To summarize
• How the internet works
• And now that we’ve reached the end of the internet
Website of the day
• google trends
Moving above internet pipes -- information flows to apps
Higher level protocols
•
•
•
•
SSH: secure login
SMTP: mail transfer
HTTP: hypertext transfer -> Web
protocol layering:
–
–
–
–
–
a single protocol can't do everything
higher-level protocols build elaborate operations out of simpler ones
each layer uses only the services of the one directly below
and provides the services expected by the layer above
all communication is between peer levels: layer N destination receives
exactly the object sent by layer N source
application
reliable transport service
connectionless packet delivery service
physical layer
Encapsulation
• each piece of data at one level is wrapped up with a header
and sent as a packet at the next lower level
• lowest level is what moves across specific network
data
HTTP
TCP
IP
ether
data
data
data
data
One particular app – the (World Wide) Web
• a way to connect computers that provide information (servers)
with computers that ask for it (clients like you and me)
– uses the Internet, but it's not the same as the Internet
• URL (uniform resource locator, e.g., http://www.amazon.com)
– a way to specify what information to find, and where
• HTTP (hypertext transfer protocol)
– a way to request specific information from a server and get it back
• HTML (hyptertext markup language)
– a language for describing information for display
• browser (Firefox, Safari, Internet Explorer, Opera, Chrome, …)
– a program for making requests, and displaying results
• embellishments
– pictures, sounds, movies, ...
– loadable software
• the set of everything this provides
Web history
• 1989: Tim Berners-Lee at CERN
– a way to make physics literature and
research results accessible on the Internet
• 1991: first software distributions
• Feb 1993: Mosaic browser
– Marc Andreessen at NCSA (Univ of Illinois)
• Mar 1994: Netscape
– first commercial browser
• technical evolution managed by World Wide Web Consortium
– non-profit organization at MIT, Berners-Lee is director
– official definition of HTML and other web specifications
– see www.w3.org
HTTP: Hypertext transfer protocol
• What happens when you click on a URL?
• client opens TCP/IP connection to host, sends request
GET
/filename
• server returns
– header info
– HTML
HTTP/1.0
GET url
server
client
HTML
• since server returns the text, it can be created as needed
– can contain encoded material of many different types (MIME)
• URL format
service://hostname/filename?other_stuff
• filename?other_stuff part can encode
– data values from client (forms)
– request to run a program on server (cgi-bin)
– anything else
Embellishments
• original design of HTTP just returns text to be displayed
• now includes pictures, sound, video, ...
– need helpers or plug-ins to display non-text content
e.g., GIF, JPEG graphics; sound; movies
• forms filled in by user
– need a program on the server to interpret the information (cgi-bin)
• cookies to remember information on client
– HTTP is stateless: server doesn't saveanything from one request to next
– cookies are a way to remember information at the client
• active content: download code to run on the client
–
–
–
–
Javascript
Java applets
plug-ins
ActiveX
Forms and CGI programs
• "common gateway interface"
– standard way to request the server to run a program
– using information provided by the client via a form
• if the target file on server is an executable program
• and it has the right properties and permissions
– e.g., in /cgi-bin directory and executable
• then run it on server to produce HTML to send back to client
– using the contents of the form as input
– output depends on client request: created on the fly, not just a file
• CGI programs can be written in any programming language
– Perl, Python, PHP, Java, Ruby, …
Example form in HTML
(dpd.mycpanel2.princeton.edu/mailform.html)
<html>
<body>
<form METHOD="post"
ACTION="http://dpd.mycpanel2.princeton.edu/zcgi-bin/
mailform.cgi">
<input type="hidden" name="email" value=“cos109@princeton.edu">
Your name: <input type="text" name="name"><p>
Your email: <input type="text" name="address"><p>
Please rate this page:<p>
<input type=radio name=rate value=poor>
Poor
<input type=radio name=rate value=ok>
OK
<input type=radio name=rate value=good>
Good <p>
<input type="submit">
<input type="reset">
</form>
</body>
</html>
Cookies
• HTTP is stateless: doesn't remember from one request to next
• cookies intended to deal with stateless nature of HTTP
– remember preferences, manage "shopping cart", etc.
• cookie: one chunk of text sent by server to be stored on client
– stored in browser while it is running (transient)
– stored in client file system when browser terminates (persistent)
• when client reconnects to same domain,
browser sends the cookie back to the server
– sent back verbatim; nothing added
– sent back only to the same domain that sent it originally
– contains no information that didn't originate with the server
• in principle, pretty benign
• but heavily used to monitor browsing habits, for commercial
purposes
Cookie crumbs
• fetch a page from xyz.com
– it contains <img src=http://doubleclick.com/advt.gif>
– this causes a page to be fetched from DoubleClick.com
– which now knows your IP address and what page you were looking at
• DoubleClick sends back a suitable advertisement
– with a cookie that identifies "you" at DoubleClick
• next time you fetch any page that contains a DoubleClick.com
image
– the last DoubleClick cookie is sent back to DoubleClick
– the set of sites and images that you are viewing is used to
- update the record of where you have been and what you have looked
at
- send back targeted advertising (and a new cookie)
Advertising marketplace
• advertising exchanges
– Yahoo Right Media, Doubleclick Ad Exchange, Facebook Atlas ...
• a person uses a browser to request a web page
• web page "publisher" notifies exchange that advertising space
on that page is available
– publishers are typically portals or entertainment and news sites
– publisher provides information about the person: past online activity,
viewing and shopping habits, geographic location, demographics
probably not actual identity (?)
• advertisers bid on the ad space
– amount depends on person's attributes and location, advertiser's
budget, etc.
• winner's advertisement is inserted into the page
• elapsed time: 10-100 milliseconds
• this happens for multiple advertisements on one page
Download