Uploaded by samburdy

Research Proposal Flexible Routers

advertisement
A RESEARCH PROPOSAL
Flexible Routers
Keshava Bharadwaj
Infosys Technologies Limited
Networking and Administration
Infosys Technologies Limited
India
CONTENTS
PAGE Nos.
Background ………………………………………………………………………..3
Objective ……………………………………………………..……………….…..6
Scope
……………………………………………………….………………….7
Methodology and Approach ..………………………………………………….…9
Facilities
………………………………………………………………………11
References ………………………………………………………………………...14
BACKGROUND
Autonomous services need a flexible router platform that provides the mechanisms to install, modify and remove
services at run-time of the node without interfering with others. Instantiated services must have the ability to reconfigure themselves and to exchange service functionality on demand.
Envisioned router platforms must be able to run multiple services in parallel and are required to scale with the
number of network-interfaces while they need to provide a straightforward to use service programming interface.
In this paper, we present the PromethOS NP router platform together with a
service architecture to counteract distributed denial of service attacks in an autonomous, policy-based way.
PromethOS NP manages and controls a processorhierarchy composed of host processors and network processors
embedded in network interface cards. It provides a dynamically code-extensible router platform of which all
processor tiers are at run-time programmable following a unified component programming model.
The service architecture illustrates the capabilities of the router platform and its applicability to autonomous
network services.
1 Introduction and Motivation
Routers in the not-too-distant future need to provide extensible mechanisms to process packets in addition to
legacy packet routing and forwarding. Extended packet processing may range from in-depth packet inspection to
service-specific packet transcoding for, e.g. content-dependent filtering or advanced media adaptation,
respectively.
While todays software routers are able to give proof-of-concept, they fail at high-performance and scalability.
Recently developed network processors provide a suitable processing element to be embedded at the link
interface.
Together with managing host processors, they provide a perfect attempt to increase processing capacity in a
scalable way for high-performance routers. While a hierarchical network node (built of several host and network
processor based link interfaces) overcomes limitations in performance and scalability, it increases, however, the
complexity in management and control of a router platform.
Network services are urgently required that are deployed in the network at critical locations to protect the vital
communication infrastructure of today. Present day communication infrastructure has been seriously threatened by
large-scale distributed denial of service (DDoS) attacks in the Internet. These attacks destroy information or
hinder customers from accessing specific services.
Services provided in the Internet like on-line stock trading, virtual travel agencies or book-stores are very
important to economy already today. The Economist reported in May 2004: “The 200m Americans who now have
web access are likely to spend more than US$120 billion online this year.”
But in eCommerce, brief inaccessibility of services results in loss of business . Since the impact of eCommerce on
economy is expected to grow further, the risk of economic damage resulting from a large-scale Internet attack
increases . The situation becomes more dramatic because the number of attacks increases at least at the same pace
as the impact of eCommerce does. Of further threatening importance is the fact that newly discovered errors in
soft- or hardware are exploited more rapidly for fresh attacks .
Fighting DDoS attacks requires in-depth packet inspection to identify malicious streams in the flood of
traffic.With today’s commercial high-performance routers, however, payload analysis is not possible, usually. Or
if it is, the functionality is coded either in firmware or hard-wired in the box. Attack schemes vary a lot over time.
In addition, the period becomes shorter between the first detection of an exploit and the widespread launch of the
attack. So, it is crucial that large-scale DDoS attacks are defeated on routers as close to the core of the Internet as
possible.
Specific Anti-DDoS components must be installed, configured and removed on request. For obvious reasons, the
deployment of the specific detection and countermeasure components must not interfere with other services.
Further, they must be able to tackle the problem of known as well as unknown attacks semi-automatically
according to predefined policies.
We propose PromethOS NP as the dynamically code-extensible router platform for the envisioned Anti-DDoS
service. It provides the abstractions required for node-internal communication among service components by
which services are allowed to span arbitrary processors. Further, it provides the mechanisms to install, configure,
instantiate and remove service components on any code-extensible processor of the processor hierarchy.
Hence, the goal in this paper is to briefly introduce the architecture of an Internet backbone Anti-DDoS service for
our powerful PromethOS NP
router architecture.
2 Router Platform
Fig. 1 depicts the architecture of a PromethOS NP node using a three-tier processor hierarchy1 and a node control
layer. On all tiers, PromethOS NP provides dynamically code-extensible processing environments (PEs).
PromethOS NP creates a hierarchical execution environment (EE) by that an interface to the hierarchical EE is
provided only via the control layer.
Internally, PromethOS NP manages two different types of code-extensible PEs, in which service components can
be installed and instantiated. On the general purpose processor cores, the PE is implemented as an extended
PromethOS EE (cf. Host Processor Processing Environment in Fig. 1).
This PE provides a binary compatible interface to the PromethOS EE. In contrast to the PromethOS EE, that runs
on a single processor node only, the other PE (cf. Network Processor Processing Environments in Fig. 1) is
embedded in the hierarchical router platform and provides the abstractions to build a service of distributed service
components residing in other PEs. On the packet processors (PPs), a PE is instantiated that provides the
mechanisms to install and execute service components without stopping the PP.
The control layer contains components which are responsible for the whole node. The Node Manager provides the
interface to create a service at node run-time and instructs the other components on the node to act according to its
decision. The Service Mapper creates the required map specification that provides the information to install and
instantiate service components on specific processors such that, first, a service can be created and, second, the
resources available are not overbooked. It instructs the PE specific Component Loaders to load, instantiate,
configure and unload service
components.
3 Service Architecture
Fig. 2 visualizes our Anti-DDoS service architecture in a particular configuration that consists of a basis service
infrastructure and an attack specific Service Handler. While the Service Handler must make the required
functionalities available to detect and counteract DDoS attacks, the other components are generic in the sense that
they provide the fundamental service architecture.
Since the path via the Service Handler creates the needed countermeasure functionality, we refer to this path as the
service path. Irrespective of the functionality provided, for the PromethOS NP router platform service components
are black boxes. As such not only the service path but also the service infrastructure are built of service
components that provide the appropriate functionalities. The service specification is used by the Node Manager
that triggers the installation and instantiation of the service as mentioned above. The service logic, however, is
service specific. As such, the service logic may contain mechanisms to request the installation or removal of
service components depending on service-internal policies. Due to this autonomous, policy based service-internal
management, our service architecture provides the basis of a node-local autonomous service.
4 Brief Description
The PromethOS NP router platform provides the execution environment to dynamically install and run services on
hierarchical network nodes that are built of network and host processors. Services are composed of service
components.
Components of a service reside in processing environments that provide the required functionality on processing
elements with sufficient resources. Channels that inter-connect service components abstract from the underlying
communication-complexity. By these mechanisms, a service is created as an arbitrary graph of service
components. The service graph is mapped on the processor hierarchy by the control layer to exploit the available
resources best.
Our service architecture creates the basis infrastructure for the deployment of Anti-DDoS service components on
demand. Based on policy mechanisms, the policy handler installs appropriate countermeasures, and reconfigures
the architecture accordingly.
Functionality deployed is supposed to detect specific DDoS attacks and allow for appropriate counteraction.We
envision slowdown and intelligently blocking or capturing packet filters as suitable countermeasures. They are
installed on demand in the specific processing environments as service components.
OBJECTIVE
GENI and Router Vendors
GENI (Global Environment for Network Innovation) seeks to allow large scale experimentation with routers and
perhaps encourage whole new protocols to emerge.
Virtual/metarouters allow isolation of each experimenters protocols in protected “slice”
Fine from research standpoint but what about router vendors?
“Virtual routers” already popular but these are several customers sharing the same fixed functions in a router
Beyond smaller, faster, cheaper
Routers have historically been compared (LightReading tests) by cost-performance and provenance. But (IBM
Autonomic Computing). “In fact, a continued obsession with smaller, faster, cheaper is really a distraction . . the
real obstacle is complexity.”
Two supporting trends:
Complexity of running networks (OpEx) may be a serious obstacle to Web Services/e-commerce
Commoditization of routers using merchant silicon
SCOPE
Flexible Routers
Economic incentive for router vendors to allow more flexible functions that allow customers manage complexity.
Some Cisco Examples (market pull):
NetFlow  Flexible NetFlow
Packet Classification  Flexible Packet Matching
Fixed Packet Parsing  Flexible Parsers
Perhaps not complete flexibility at lower speeds (GENI) but limited flexibility at the highest speeds
What functions?
Functions that address complexity:
Flexible Measurement: Allow managers to ask flexible queries.
Hardcoding / NetFlow insufficient
Flexible Security: Identify attack patterns to mitigate attacks. Detection
Heuristics change
Flexible fault detection: Identify/localize/fix faults. Need flexible
measures as new faults emerge
Motivated by market pull and technology push
Market Pull 1: Better ROI for Networks
reroute or add B/W
Customer Site 1
Customer Site 2
Customer Site 3
ISP
Better ROI: Optimize resources (OSPF weights, light up fibers) based on resource usage patterns.
P2P Traffic: Identify and rate control P2P traffic
Competitive Edge: As banks use data mining to optimize loan portfolios, can ISPs optimize “bandwidth
portfolio”?
Why flexible, high speed measurement?
Cisco today has SNMP counters and NetFlow logs.
NetFlow Issues
Tool: need a tool to process; front end tools do not support flexible queries
Export: large B/W needed to export to tool; loss
Limited flexibility (partially addressed by Flexible NetFlow)
Poor at counting flows Not real-time: Several minutes to receive and post-process.
SNMP Issues:
Hardwired support for a few low granularity counters (total packets, bytes, errors on each interface)
Large time scales (e.g., 1 minute) good for provisioning but bad for performance anomalies at small time scales
Market Pull 2: Costs of (In)Security
IDS
Attacker
Zombie 1
traceback
Zombie N
Victi
(patches)
m
Firewall
ISP
Cost: Too many isolated perimeter solutions (firewalls, IDS devices). Total cost of
ownership (TCO) very high.
Delay: When perimeter detects, damage is already done.
Complexity: End users finding and installing patches; or manual procedures for
traceback etc.,
Gartner Research: Security solutions deployed within
enterprises and ISPs by 2006
Example: Too many flavors of Anomaly Detection
Anomaly detection used to detect new attacks/P2P traffic etc
Several flavors as examples:
Riverhead: Anomalies based on large number of spoofed sources sending to a server. (Does more)
HP: Anomalous if sources sending more K connections/second Maazu, Arbor: Anomalous if a source sends more
than K new connections per second compared to baseline connection matrix.
NetSift: Anomalous if content repeats K times
Flexible AD as an example
Changing world requires changing AD because definition of anomalous changes:
New good behaviors (e.g. Skype, BitTorrent) look like old bad behaviors
Attackers are constantly inventing new bad behaviors (e.g., encrypted attacks)
Latency
Theoretically, SIMs that take input from various feeds and can write flexible rules can do Flexible AD.
Disadvantage is latency for fast attacks.
Useful to build somewhat flexible but high speed AD into routers. More general flexible security as well.
Market Pull 3: Costs of Fault Tolerance
Cost: Anecdotal evidence from our friends at ATT (Albert Greenberg, Jennifer Yates) say that network operators
spend a large amount of time diagnosing and dealing with faults
Some Causes: Ephemeral identifiers (VCIs, VPNs, MPLS labels), non-determinism (e.g., hidden hash functions),
cross-layer interactions (IP and optical layer), hidden dependencies (several IP circuits over a single Optical
Amplifier)
Technology Push: Streaming Algorithms and Hardware Gates
Algorithms: Recent major thrust in streaming algorithms in database, web analysis, theory, networks
Hardware: Memory accesses expensive (< 100), SRAM not scaling with connections (< 32 Mbits), but gates
are plentiful.
Mapping: Randomized streaming algorithms (e.g., Bloom Filters) map well to network ASICs.
Opportunity: Invent or adapt streaming algorithms for networking patterns to provide limited flexibility but at
very high speeds.
Approaches to Flexibility
FPGAs and Network Processors: Hard to meet cost/performance goals.
ASICs and Primitives: Embed high level primitives into ASICs on every line card that can then be composed at
will.
Appears to be able to get performance with fair amount of flexibility.
Key Issue: User Model
Many routers are programmable internally to allow new lookup algorithms, QoS etc. But often requires
microcoders.
For flexibility to be a market force, ordinary users must be able to change router function.
Would be a good by-product of GENI research if router programming can be done without always needing to
program FPGAs
What is a good API/good user model. StreamSQL and BPF are two extremes.
Description
GENI metarouter/virtual router proposal:
Allows routers to be arbitrarily programmed by knowledgeable researchers
Based on current plans, cost-performance (NPUs, FPGAs) may lag ASIC based router/switches at high speeds.
May not have a clear market case
Limited flexibility at high speeds
Allows routers to change function in a limited sense based on simple programming by operators
May have good cost-performance to compete with fixed function routers based on ASICs
May have a market case to address the complexity of networks esp wrt to measurement, security & fault-tolerance
Nevertheless, these two approaches can learn a great deal from each other.
METHODOLOGY AND APPROACH ( An Example of Flexible Router )
Flexible SpaceWire Router FSR-RG408
EtherSpaceLink test and monitoring equipment for aerospace
Product Outline
Flexible, monitoring, SpaceWire routing switches
400Mbits/s SpaceWire ports monitored and controlled via Gbit Ethernet and IP
In a SpaceWire network, the only way to monitor traffic from within the network is inside the routing switches.
The FSR-RG408 provides a means to monitor traffic statistics in the routing switch and hence in the network,
together with the Network layer routing switch capabilities specified in the ECSS SpaceWire standard. Each unit
provides eight ports, which may be used as a single 8-port router or may be split into several completely
independent routers, for example 4+4 or 3+5. The units can be used standalone, with the routing tables held on a
plug-in memory card, or they can be connected via Gbit Ethernet to a computer for monitoring and control. A
front-panel display shows the routing switch label of each port of each separate switch, together with the activity
and status of each link.
The EtherSpaceLink products can be used for testing, monitoring, analyzing, validating, modelling and emulating
any or all the chips, boards, subsystems, and instruments in a SpaceWire network.
Flexible SpaceWire Router FSR-RG408 EtherSpaceLink test
Control SpaceWire
networks from any
computer, any operating
system
any year
Model the routing
switches you want to use
Routing compliant with
ECSS SpaceWire
Standard
Gather statistics
Tunnel SpaceWire
Protect
Choose the platform and
options required
Because almost every computer and every operating system is able to connect to
Ethernet and to the Internet Protocol, the FSR-RG408 can control and monitor
SpaceWire networks from the computer and operating system of the user’s
choice
While PCs need to be replaced every few years, projects can last a decade or
more. Ethernet and IP allow the use of the test equipment throughout the project,
even as the computers and OS are changed.
The FSR-408 enables the user to share the ports of one unit between more than
one routing switch. Flexible network management permits modelling the FDIR
system appropriate for the application. Flexible link speeds are settable in 1MHz
increments (or smaller) up to 400Mbits/s and 2Mbits/s increments beyond
400Mbits/s.
The FSR-RG408 provides routing functions as defined in the ECSS SpaceWire
standard. Path addressing and logical addressing are provided as standard,
grouping and time-code distribution are available as options.
Each port is monitored for how many packets with each header value have come
to that port, and how many packets from each input port leave each output port,
and statistics can be displayed on the user’s computer.
Any SpaceWire link can be configured to tunnel traffic from other links to a
second unit which fans the traffic out to the appropriate output port.
Test and simulation equipment must protect flight equipment from any damage
caused by the test equipment. The FSR-RG408 protects flight equipment with
five layers of current and voltage protection, while also offering optional
galvanic isolation for ultimate protection.
Platform: RG408-l, RG408-m, RG408-ls or RG408--ms: (platforms above
RG408-l are not required for FSR-RG408 but are useful for other functions and:
synchronized time and triggers). Firmware options: EW: Error Waveforms,
GR: Grouped Routing, TC: Time Code distribution, PS: Packet Statistics.
Update, Reconfigure and
Re-use throughout the
project life cycle
The function of the FSR-RG408 is defined by a plug-in memory card which can
be updated to provide extra firmware enhancements and options . A different
memory card can be used to provide an alternative function such as an
EtherSpaceLink/diagnostic interface, monitor/ analyzer, or link or network
validation, or other required function.
FACILITIES TO BE USED
Internet’s increasing ubiquity and centrality has brought A number of challenges for which the current architecture
is ill-suited
Increasing interest in new architectures
Adopting a new architecture requires
Not also changes in routers and host software
But, given multi-provider nature of the Internet
ISPs jointly agree on any architectural change
Goal of this paper is to issue a call to action
Status quo is not acceptable
Unable to deploy, evaluate new architectures
Return to roots of applied architectural research with the intention of once again changing the world
Live experimentation
Currently through testbeds
But, there is severe limitation
We are not able to deploy, or even evaluate new architecture
Overcoming Internet impasse will require
Researchers must be able to easily experiment with new architectures on live traffic
There must be a plausible deployment path where architectural ideas, once validated, can come into
practice
Proposed architectural solutions should be comprehensive, capable of addressing the broad range of
current architectural problem
Constructing a virtual testbed to meet these three requirements
Supports multiple simultaneous architectures
Provides a clean path for radical new architectures to be unilaterally and globally deployed
Facilities
User Services
GMC
- name space for users, slices, & components
- set of interfaces (“plug in” new components)
- support for federation (“plug in” new partners)
Physical Substrate
Players
Owners of parts of the substrate
Administrators of parts of GENI
Developers of infrastructure services
Researchers employing GENI
End users not affiliated with GENI
Third parties
Actions
Allow owners to declare resource allocation and usage policies for substrate facilities under their control,
and provide mechanisms for enforcing those policies
Allow administrators to manage the GENI substrate
Allow researchers to create and populate experiments
Expose low-level information about the state of the GENI substrate to developers
Naming
GMC defines GENI Global Identifiers (GGID)
The object identified by GGID holds the private key forming the basis for authentication
Name repository maps string to GGIDs
GMCI (GENI management core implementation)
Three major abstractions that the GMC defines
1. Components
2. Slices
3. Aggregates
1. Components
A collection of resources
Physical resources (e.g., CPU, memory, disk, bandwidth)
Logical resources (e.g., file descriptors, port numbers)
E.g., Programmable edge node (PEN) (i.e., a conventional compute server), Programmable core node
(PCN) (a customizable router, i.e., a backbone router), Programmable access point (PAP) (e.g., for wireless
connectivity) uniquely identified using GGIDs (GENI global identifiers)
E.g., geni.us.backbone.nyc
Each component is controlled via a component manager (CM), the entity responsible for allocating
resources at a component
Sliver
A distinct partition of the component’s resources
Each component must include HW or SW mechanisms that isolate sliver from each other
E.g., virtual server, virtual router, virtual switch, virtual access point
2. Slices
A distributed, named collection of slivers that collectively provides the execution context for an
experiment, service, or network architecture
Slices are uniquely identified by GGIDs (GENI global identifiers)
E.g., geni.us.princeton.codeen
3. Aggregates
A GMC object representing a group of components, where a given component can belong to zero, one, or
more aggregates
Example aggregate might correspond to a physical location (components co-located at the same site), a
cluster (components that share a physical interconnect), an authority (a group of components managed by a single
authority), a network (a group of components that collectively implement a backbone network or a wireless
subnet)
Researcher portal
Coordinate resource allocation
Manage set of components
Users
GENI users create and manipulate slices
Each user is identified by a certificate and key pair issued by one of the GENI authority
Authorization
1. Authorization model for slice creation
Slices are represented in the GMC by names
Based on controlling access to the slice name space
Slice Authority (SA): naming hierarchy rooted at a top-level
2. Authorization model for Physical resources
GENI physical resources are encapsulated as components
Owner is responsible for defining and implementing the authorization policy
Ticket: GENI- specific authorization mechanism used to implement this model
3. Represents a principal’s right to (1)create sliver (2) bind component resources to an existing slice
Substrate Hardware
Substrate HW
Substrate HW
Substrate HW
Virtualization Software
Virtualization SW
Substrate HW
Virtualization SW
Substrate HW
Virtualization SW
Substrate HW
Components
CM
CM
CM
Virtualization SW
Substrate HW
Virtualization SW
Substrate HW
Virtualization SW
Substrate HW
Aggregates
Aggregate
Slice Coordination
CM
CM
CM
Virtualization SW
Substrate HW
Virtualization SW
Substrate HW
Virtualization SW
Substrate HW
User Portals
Researcher Portal
(Service Front-End)
REFERENCES
1. Jonathan Turner, “G E N I Global Environment for Network Innovations”, GENI document, GDD-06-09,
March 2006
2. Flexible Routers, “George Varghese Proposal Of Flexible Routers, San Diego University”
3. Google Document, “Flexible Space Wire Routers”, January 2007
Download