Security services and the IXP Wu-chang Feng wuchang@cse.ogi.edu Systems Software Laboratory Dept. of Computer Science and Engineering About the project.. • 6 months old – Just started, pardon the vapor • Supported by Intel (12/2001) and ETIC (4/2002) – Graduate Students • Francis Chang: francis@cse.ogi.edu • Deepa Srinivasan: deepsrini@hotmail.com • Jin Choi (1/2003): j3choi@student.math.uwaterloo.ca – Undergraduate Interns from Charles Consel’s group • Ludovic Martorel • Damien Berger Talk outline • • • • IXP and network security research Packet classification Packet classification caching strategies Curriculum The IXP and network security research A research opportunity • IXP – Provides an open high-speed networking platform – Research enabler • Analyzing packet classification/routing algorithms • Analyzing packet classification/routing lookup caching algorithms • Security functions – Sandbox to test and compare algorithms on a real platform IXP and research • Quickly becoming the ns of experimental networking systems – Open hardware – Open software • What’s needed? • A library of reference implementations and benchmarks – – – – IP route lookup (longest-prefix match) algorithms General packet classification algorithms Route and classification lookup caching algorithms Security functions Our focus: Security • Borrow and use liberally… – – – – – Princeton (VERA) Columbia (NetBind) Georgia Tech (IDS) Utah (Emulab) Others.. • Build what’s missing – Range of full packet classifiers – Range of lookup caching algorithms – Merging the goals of research and education • A security-focused IXP laboratory course • Eventually, examine additional security services – Anomaly detection – Content filtering – etc. Packet classification Student: Deepa Srinivasan Packet classification • Use the IXP and open-source tools to – Compare full, packet classification algorithms – Benchmark algorithms via real rule sets and real traffic traces – Explore adaptive packet classifiers A hard, but well-studied problem • What are the key issues? – Storage – Search time – Update time • General filter matching problem ~ Problems in computational geometry – N=number of filters or rules, d=number of dimensions – Requires • O(log N) time with O(Nd) space OR • O((log N)(d-1) time with O(N) space • Classic space-time tradeoff problem A space-time tradeoff example • Hierarchical tries: slow and compact • Set-pruning tries: fast and large Hierarchical Trie (Figure should terminate at R2) Set-pruning Trie A space-time tradeoff example • Hierarchical tries vs. Set-pruning tries (worst-case) Algorithm Time Storage Updates Linear Search N N 1 Hierarchical trie Wd NdW d2W Set-pruning trie dW Nd Nd Notes Simple, poor scaling, iptables Backtracking search Fast retrieval at the cost of storage. Good for relatively static classifiers. N – Number of Rules W – Width of dimension d – Number of dimensions Packet classification • Approaches – Generic classifiers • Optimized for best worst-case performance – Heuristic classifiers • Take advantage of structure in rule sets (as done with IP router lookups) • Tradeoff speed, storage, and update time in the worst case for speed and storage in the common case – Hardware classifiers • Throw hardware and parallel processing at the problem • Serves as a wish-list for the IXP – Is a hardware-based packet classification engine in the works? – Can I go home? – Will I need to shoot myself when the IXP4xxx comes out? So many algorithms, so little time… • Which one to choose? – – – – – – – – – – Hierarchical tries with backtracking search Set-pruning tries Bit vector, Fractional cascading [Lakshman98] Aggregated bit vector [Baboescu00] Grid of tries, Cross-producting [Srinivasan98] Area-based quadtrees [Buddhikot99] Fat inverted segment tree [Feldman00] Tuple-space search [Srinivasan99] Recursive flow classification [Gupta99] Hierarchical intelligent cuttings [Gupta00] • Performance and cost a function of – – – – d = number of dimensions W = width of dimensions N = number of rules l = number of levels in tree (FIS-tree only) Summary of schemes [Gupta00] Algorithm Notes Time Storage Updates Linear Search N N 1 Hierarchical trie Wd NdW d2 W Set-pruning trie Cross-producting dW Nd Nd W d-1 NdW NdW aW NW a Sqrta(N) (l + 1) W l x N1 + 1/l -- Tree must be recomputed on update RFC d Nd --- Not suitable for large sets of rules (> 6000); preprocessing and large storage space. 10Gbps line rates in hardware and 2.5Gbps rates in software. Hierarchical Intelligent Cuttings d Nd --- Parameters can be tuned to trade-off query time against storage requirements. Tuple-space search M N 1 Performs well for multiple dimensions if the number of tuples (i.e. hash entries) are small. Only supports prefixes; generic rules increase storage complexity. Ternary CAM 1 N 1* Simple; Good for small classifiers; Costly dW + N/memwidth dN2 --- Incremental updates not supported; Good for multiple dimension and a small number of rules Grid-of-tries AQT FIS-tree Bit vector Simple, poor scaling Fast retrieval at the cost of storage. Good for relatively static classifiers. Rebuild for each update; Could be used for last 2 dimensions of a multi-dimensional hierarchical trie. a is a tunable integer parameter N=# of rules, W=Width of dimensions, d=# of dimensions, l=levels of tree, M=# of Tuples Is there a winner? • Not really, it depends on…. – – – – Rule sets Incoming traffic characteristics Metric desired (average vs. worst-case lookup time) Hardware cost (memory, ternary CAM) • How much chip area did that 16-entry CAM on the IXP2xxx take? Adaptive packet classifiers • Hypothesis – Value in adaptation – Reconfigure for high-speed based on amount of memory and rule set given a fixed hardware configuration and performance metric • Approach – Implement a small set of classifiers – Build modules that translate ipchains/iptables/netfilter rule sets into data structures of individual classifiers – Study adaptation policies for classifiers based on rule analysis – Implement seamless switching between implementations (i.e. double buffering [Partridge98]) – Performance evaluation using • Library of publicly available rule sets • Public traffic trace • An Emulab with loadable IXPs Classification lookup caching Student: Francis Chang Caching and IP route lookups • IP destination-based routing – A one-dimensional packet classifier • Caching instrumental in building gigabit IP routers – Full lookup extremely expensive to support at high rates – Cache of 12,000 entries gives 95% hit rate [Jain86, Feldmeier88, Heimlich90, Jain90, Newman97, Partridge98] – “A 50 Gb/s IP Router” [Partridge98] • Switched interconnection fabric • Alpha 21164-based forwarding cards (separate from line cards) • First-level on-chip caches Icache=8kB (2048 instructions), Dcache=8kB • Secondary on-chip cache=96kB – Fits 12000 entry route cache in memory – 64 bytes per entry presumably due to cache line size • Tertiary cache=16MB (full, double-buffered route table) Caching and multi-dimension lookups • Flow-based firewalls – A five-dimensional packet classifier • Caching even more important – Full classification algorithms will not run anywhere near linespeed on the current incarnation of the IXP – Inherently harder to do – Much lower hit rates [Xu00] – Rule and traffic dependent Current approaches • Direct-mapped hashing with LRU replacement – Typical for IP route caches [Partridge98] • Parallel hashing and searching with set-associative hardware [Xu00] – ASIC solution with parallel processing and a fixed, LRU replacement scheme • Proprietary vendor solutions – ? Class-based caching • Structure of application traffic can provide useful information • W. Feng, F. Chang, W. Feng, J. Walpole, “Provisioning On-line Games: A Traffic Analysis of a Busy Counter-Strike Server” – Packet load of an on-line game server over 10ms intervals Observations • Game traffic – – – – – – Large number of periodic packets Extremely small packet sizes Persistent flows Small number of clients per server Without caching, a packet classification disaster With caching, a poster-child for LFU replacement? • Web traffic – Bursty, heavy-tailed packet arrival – Many more clients per server – Small number of packets per flow Goal of study • Attack the packet classification caching problem • Resource requirements and data structures for high performance packet classification caches • “Segregate, Hash, and Cache” – Understand traffic characteristics – Examine hierarchical class-based partitioning of cache – Examine class-based partitioning of classification function (i.e. MEv2) – Examine alternative replacement algorithms per class such as LFU Curriculum Student: Jin Choi An IXP course for OGI/OHSU • Goal – Spread the IXP gospel – Provide students with experience on a modern networking platform • Train (and test drive) potential Ph.D. students • Train future Intel employees – 171 OGI/OHSU alums @ Intel – Intel is the single largest employer of OGI/OHSU graduates Approach • Ask for help – Dirk & Raj (PCs, IXP boards, and support) – Ken Mackenzie (course material and advice) • Keep it simple • Align with security research project • Ask for feedback – Curriculum completed – Guide and slide presentation available at http://www.cse.ogi.edu/~wuchang/ixp/ – Course will be offered as CSE58?: Networking Practicum – Scheduled for Spring 2003 The course itself • Errata – Weekly 3-hour sessions – Dedicated laboratory of 10 IXP workstations • Cloned via Norton Ghost • Week #1 – Conceptual framework – IXP architecture • Hardware: StrongARM, memory resources, micro-engines • Software: ACEs, microACEs • Week #2 – Introduce Linux/Windows2000/VMware, and the IXP platform – Remedial Linux network administration material • ifconfig, route, netstat, ipchains, ping, traceroute, arp etc. – Learn the IXP environment setup/configuration • Building core components on Linux using standard GNU toolchain • Building microcode using microengine toolchain on Windows2000 The course itself (cont.) • Week #3 – Build and run the L3 forwarder application • Test with external sources and sinks • Week #4 – Add a packet counter to the L3 forwarder • Makes sure that everyone with a CS degree from OGI/OHSU has programmed in assembly code at some point. • Week #5 – In-line port filter • Add microcode to block TCP segments based on destination port – Code review of L3 forwarder to design full port filter The course itself (cont.) • Week #6: continued The course itself (cont.) • Week #6 – Full port filtering functionality • Pass port numbers to be blocked as arguments • SRAM management (allocation and initialization of multistride trie in the core component, access to data structure from the microengine) • Add logic in core component to handle port filtering of exceptional packets The course itself (cont.) • Week #7-#10 – Propose and implement functions of their own for a final project • Packet classifiers • Classification lookup caching Questions Future work • Support for high-speed intrusion and anomaly detection (E-boxes and A-boxes) – Content-based filters • Basic network-level filters (Snort) • Application-specific filters (Bro) – Usage-based filters • Accounting • Logging What makes sense on an IXP? • Function-based decomposition used in security – Common Intrusion Detection Framework (CIDF) [Porras01] • Event generators (E-boxes) – produce entries based on filtered activities • Event databases (D-boxes) – store events in a persistent manner • Event analyzers (A-boxes) – synthesize higher-level activity based on individual range of events • Response units (R-boxes) – perform actions based on events