Overcoming the Internet Impasse through Virtualization

advertisement
Rethinking the Internet Architecture
Process, Architecture, and Troubleshooting
Scott Shenker
(joint work with many people, including Katerina Argyraki, Hari
Balakrishnan, David Cheriton, Petros Maniatis, Ion Stoica, Mike
Walfish)
1
Process
Why are we doing this, anyway?
2
Why the Clean Slate Mania?
• Internet in crisis?
- lack of functionality not a crucial problem
- lack of reliability is most important problem
• Research community in crisis?
- little practical impact on architecture
- narrowed focus, stopped asking the big questions
• NSF’s response: FIND and GENI
- but not enough by itself....
3
You Can Lead an Academic to
Architecture, but....
• Normal academic behavior won’t produce architecture
- Publication requires differentiation and/or indifference
- Architecture comes from critique and synthesis
• work on ideas other than your own.....
• Can’t just design, simulate and abandon
- must also experiment and deploy.....
- .....then discuss and synthesize
• Process change harder than technical issues
- adoption is much harder than both!
4
Some Thoughts on Architecture
material covered in several papers
(apologies to those who have heard all this before)
not comprehensive architecture, many issues ignored
5
What’s Wrong with the Internet?
• Internet is everywhere, used for (almost) everything
• Main limiting factor seems to be lack of reliability
- can’t do telesurgery, air traffic control, etc.
• Hard to improve reliability of packet delivery within
current architecture
• Vulnerable to attacks, misconfigurations and failures
6
Packet Delivery Problems
• Access link failures
- multihome
• Routing failures
- security, policy, configuration, convergence, multipath,...
• Congestion control failures
- FQ, XCP, RCP, ....
• DoS
- default-off, capabilities, filters,...
7
Packet Delivery Problems
• Technical solutions are largely at hand
- not perfect, but huge improvement over status quo
• No overarching synthetic architecture has emerged
- symptom of process failure, or just too early?
• But packet delivery won’t be the focus of this talk....
- because only experts see it as the major problem
8
Normal User’s Perspective
Other forms of failure dominate:
• out-of-date email addresses
• broken links
• misleading urls and/or inauthentic data
• applications blocked by NATs, etc.
• email unusable or unreliable due to spam
• ......
9
Why? Three Important Changes...
1. Host-to-host  accessing data and services
2. End-to-end  middleboxes
3. Appropriate communication  spam
10
Three Important Changes
1. Host-to-host  accessing data and services
2. End-to-end  middleboxes
3. Appropriate communication  spam
11
Not just host-oriented apps....
• Of course, packets always flow from host to host
- modulo middleboxes....
• But which host are the packets sent to?
• This is controlled by what hostname is used
• So adjusting to data-oriented apps involves reevaluating the Internet naming system
- data, service specified by host/path pair
12
Problems with host/path names
• Data movement causes broken links
- names should be persistent
• Replication unnecessarily difficult
- Akamai expensive, and can’t replicate at object granularity
- Google, P2P, etc. do this now....
• DNS names lead to legal/political battles
- increasingly important, witness ICANN debacle
• Names don’t facilitate authentication
- can’t easily verify that data originated with intended source
13
Fix #1: Name Data/Services Directly
• Network locations: IP addresses
• Hosts: endpoints identifiers (EIDs)
• Data/Services: service identifiers (SIDs)
- direct naming supports fine-grained migration/replication
• User-level descriptors:
- search terms
- canonical names (AOL keywords)
- .......
14
Fix #2: Use Names in Appropriate Layer
User-level descriptors
(e.g., search)
App-specific search/lookup
returns SID
SIDs
App session
Application
App session
Resolves SID to EID
Opens transport conns
Bind to EID (HIP)
Transport
Transport
Resolves EID to IP
IP
IP hdr EID TCP SID …
IP
15
Fix #3: Names Should be Flat!
0xf436f0ab527bac9e8b100afeff394300
• A name can be persistent if and only if it doesn’t
embed any mutable information about its referent
• Flat names embed no information, so they can be
used to persistently name anything
- Enables inter-domain migration, etc.
• Once you have a large flat namespace, you never
need other global handles
- no distinction between EIDs, SIDs, etc.
16
Disadvantages of Flat Names
• Hard to resolve
• No local control
• No locality
• Not human friendly
all can be handled, but flat names do require
new resolution infrastructure
17
Fix #4: Make Names Self-certifying
• Name = Hash(pubkey, salt)
• Value = <pubkey, salt, data, signature>
- can verify name related to pubkey and pubkey signed data
• Can receive data from caches or other 3rd parties
without worry
- much more opportunistic data transfer
18
Proposed Naming System
• Flat, self-certifying identifiers for all entities
• Used in “layered” fashion so that each protocol binds
to the correct level of abstraction
• Names are persistent, verifiable, and support easy
replication and migration
• Requirement: industrial-strength flat name resolver
- names, key revocation (later, another use)
19
Three Important Changes
1. Host-to-host  accessing data and services
2. End-to-end  middleboxes
3. Appropriate communication  spam
20
Not just end-to-end....
• Middleboxes provide important functionality
- NATs, firewalls, proxies, caches, app accelerators, etc.
• But processing between endpoints violates pure endto-end religion, and causes many practical problems
- e.g., NATs interfere with many applications,
• How can architecture support middleboxes better?
- eliminate problems and make them architecturally sound
21
Delegation via Resolution
• Names usually resolve to “location” of entity
• Delegation principle: A network entity should be able
to direct resolutions of its name not only to its own
location, but also to chosen delegates
• Semantics:
- where am I  where should packets be sent to reach me
• This allows packets to be directed towards
middleboxes in a clean and coherent manner
22
Architecturally-Sound
Current (Bad)
Middleboxes
Middleboxes
Example
Dest EID
d
Mapping
ipd
ipf
Packet structure
ipd TCP
hdrTCP hdr
ipf
EID d
Firewall
EID d
IP ipd
EID s
IP ipf
• Delegate can be anywhere, not necessarily on path
• Can apply to app-layer middle boxes
• Including SID, EID in packet is crucial
23
Possible Impacts
• More general services: more complex services (like
Riverbed, transcoding, etc.) can fit within framework
• Remote services, not boxes: since middleboxes need
not be on-path, services like firewalls, virus-scanners,
etc. can be provided as remote services
• Rethinking transport: with intermediaries between
endpoints, basic notion of the transport layer should
be rethought, combining ideas from DTN, DOT, etc.
24
Three Important Changes
1. Host-to-host  accessing data and services
2. End-to-end  middleboxes
3. Appropriate communication  spam
25
Restraining Usage
• Can’t be at packet level, must be app-dependent
• But don’t want separate mechanism for each app
- Email, IM, wiki, etc.
• Proposal: quota system
- quotas allocated in application-dependent manner
- quotas enforced through single mechanism
• stamp for each usage, canceled through mechanism
• see NSDI 06 paper for details....
• Uses flat name resolution
26
Summary: Other Forms of Failure.....
• broken links and pointers: persistent names
• inauthentic data: self-certifying names
• applications blocked by NATs, etc.: delegation
• spam and other clutter: quota enforcement
No change to IP or routers!
27
Troubleshooting and Debugging
because things inevitably fail.....
28
User’s Perspective
• Want to know who to yell at
- identify responsible entity (at appropriate granularity)
• Want their complaints to be taken seriously
- provide credible and actionable report
• Want the problem fixed, now
- detailed diagnostic tools
- this is traditional focus of troubleshooting
29
User’s Perspective
• Want to know who to yell at
- identify responsible entity (at appropriate granularity)
• Want their complaints to be taken seriously
- provide credible and actionable reports
• Want the problem fixed
- detailed debugging tools
- this is traditional focus of work in this area
30
Vision
• Incorporate coherent set of monitoring tools into
architecture that:
- record necessary information
- process information to answer relevant questions
• Key points:
- not just statistics (e.g., Netflow), but answers
- focus broader than just detailed diagnostics
• Three examples
31
Ex. #1: Monitoring ISPs
• Monitor boxes on peering links record packet digests
- no internal information revealed
• Boxes exchange information to determine where
packets are dropped and/or delayed
• Information ends up at source ISP or end user
• Overhead: ~2-4% of packet bandwidth
• Can be applied within enterprises, etc.
32
Ex. #2: Multilayer Tracing
• Traceroute is useful, but limited to IP
• XTrace (just started) is a generalized version:
- operates at multiple layers
- follows recursive packet generation (DNS queries, etc.)
- can implement policies about when to respond
• Requirements:
- layer must be able to handle and propagate metadata
- module on box to intercept and report on packets
33
Ex. #3: Distributed Debugging
• When bugs occur in operation, it can be extremely
difficult to locate and reproduce
• We are developing liblog, a log-and-replay
debugging tool (early) that is always turned on
• Lots of log-and-replay debuggers, ours meets a
special set of requirements....(not described here)
34
Logging and Replay
1. Each process logs its execution to a local file
2. Logs are collected at central location and replayed
app
app
app
liblog
liblog
liblog
Log 1
Node 1
Log 2
Log 3
Node 2
Node 3
Replay Node
GDB
console
GDB
app/liblog
1
app/liblog
5
6
3 4
GDB
8
app/liblog
2
7
9
35
Extensions
• liblog generates too much data
- hard to sift through for large systems
• Next step: setting global watchpoints and breakpoints
• Can specify in terms of general expressions (python)
- routing loops, state inconsistencies, etc.
• No operational experience yet
36
Troubleshooting and Debugging
• Automated end-user reporting tools would be useful
to both users and ISPs
- lots of low-hanging fruit
• Not clear ISPs will take the lead on troubleshooting
- ISPs may not be eager to admit fault
- but they should be eager to reduce phonebank expenses
• Experience needed with distributed debugger in
networking context
37
Summary
• Biggest challenge is to get community talking to each
other rather than past each other
• Reliability more pressing than functionality
- have tools to provide better packet delivery
- then considered wider set of failure modes
- can handle without IP/router involvement
• Troubleshooting should be part of “architecture”
- nowhere near coherent yet
- looking for basic building blocks
38
Download