SPIN 2020 FastFE: Accelerating ML-based Traffic Analysis with Programmable Switches Jiasong Bai1, Menghao Zhang1, Guanyu Li1, Chang Liu1, Mingwei Xu1, Hongxin Hu2 1 2 Traffic Analysis 2 Traffic Analysis • Definition – The class of applications that infer sensitive information from network communication patterns to identify malicious behaviors • Application 2 Traffic Analysis • Definition – The class of applications that infer sensitive information from network communication patterns to identify malicious behaviors • Application Botnet detection • Trend – Use machine learning, especially deep learning, to achieve a better accuracy 2 Traffic Analysis • Definition – The class of applications that infer sensitive information from network communication patterns to identify malicious behaviors • Application Botnet detection Website fingerprint • Trend – Use machine learning, especially deep learning, to achieve a better accuracy 2 Traffic Analysis • Definition – The class of applications that infer sensitive information from network communication patterns to identify malicious behaviors • Application Botnet detection Website fingerprint Covert channel detection • Trend – Use machine learning, especially deep learning, to achieve a better accuracy 2 Traffic Analysis • Definition – The class of applications that infer sensitive information from network communication patterns to identify malicious behaviors • Application Botnet detection Website fingerprint Covert channel detection Intrusion detection • Trend – Use machine learning, especially deep learning, to achieve a better accuracy 2 Similarity of Traffic Analysis Applications 3 Similarity of Traffic Analysis Applications • Two typical components – Feature extractor • Extract necessary traffic features from raw network traffic – Behavior detector • Leverage machine learning algorithms to detect the desired network behaviors 3 Similarity of Traffic Analysis Applications • Two typical components – Feature extractor • Extract necessary traffic features from raw network traffic – Behavior detector • Leverage machine learning algorithms to detect the desired network behaviors • Example – Kitsune [NDSS '18] Packets Feature Extractor Feature Vectors Results Behavior Detector (ML based Algorithm) 3 Similarity of Traffic Analysis Applications • Two typical components – Feature extractor • Extract necessary traffic features from raw network traffic – Behavior detector • Leverage machine learning algorithms to detect the desired network behaviors • Example – Kitsune [NDSS '18] Packets Feature Extractor Feature Vectors Results Behavior Detector (ML based Algorithm) 3 Similarity of Traffic Analysis Applications • Two typical components – Feature extractor • Extract necessary traffic features from raw network traffic – Behavior detector • Leverage machine learning algorithms to detect the desired network behaviors • Example – Kitsune [NDSS '18] Packets Feature Extractor Feature Vectors Results Behavior Detector (ML based Algorithm) 3 Challenge for Traffic Analysis Applications Results Feature Extractor Behavior Detector (ML based Algorithm) 4 Challenge for Traffic Analysis Applications • Scalability issue – They can not scale to today's high-speed large-volume network traffic • The performance of the feature extractor (FE) has not caught up – Existing FE: port mirroring, large number of servers • High communication, storage and computation overheads Results Feature Extractor Behavior Detector (ML based Algorithm) 4 Challenge for Traffic Analysis Applications • Scalability issue – They can not scale to today's high-speed large-volume network traffic • The performance of the feature extractor (FE) has not caught up – Existing FE: port mirroring, large number of servers • High communication, storage and computation overheads 100 Gbps Results Feature Extractor Behavior Detector (ML based Algorithm) 4 Challenge for Traffic Analysis Applications • Scalability issue – They can not scale to today's high-speed large-volume network traffic • The performance of the feature extractor (FE) has not caught up – Existing FE: port mirroring, large number of servers • High communication, storage and computation overheads Bottleneck 100 Gbps Results Feature Extractor Behavior Detector (ML based Algorithm) 4 Challenge for Traffic Analysis Applications • Scalability issue – They can not scale to today's high-speed large-volume network traffic • The performance of the feature extractor (FE) has not caught up – Existing FE: port mirroring, large number of servers • High communication, storage and computation overheads Bottleneck 100 Gbps Bottleneck Feature Extractor Results Behavior Detector (ML based Algorithm) 4 Challenge for Traffic Analysis Applications • Scalability issue – They can not scale to today's high-speed large-volume network traffic • The performance of the feature extractor (FE) has not caught up – Existing FE: port mirroring, large number of servers • High communication, storage and computation overheads Bottleneck 100 Gbps Bottleneck Results Feature Extractor Behavior Detector Resource consumption of different components in Kitsune (ML based Algorithm) 4 Challenge for Traffic Analysis Applications • Scalability issue – They can not scale to today's high-speed large-volume network traffic • The performance of the feature extractor (FE) has not caught up – Existing FE: port mirroring, large number of servers • High communication, storage and computation overheads Bottleneck 100 Gbps Bottleneck Results Feature Extractor Behavior Detector Resource consumption of different components in Kitsune (ML based Algorithm) 4 New Opportunities: Programmable Switches *www.barefootnetworks.com 5 New Opportunities: Programmable Switches Programmable Parser *www.barefootnetworks.com 5 New Opportunities: Programmable Switches Memory Programmable Parser ALU Programmable Match-Action Pipeline *www.barefootnetworks.com 5 New Opportunities: Programmable Switches Parser Program parser parse_ethernet { extract(ethernet); return switch(ethernet.ethertype) { 0x8100 : parse_vlan_tag; 0x0800 : parse_ipv4; 0x8847 : parse_mpls; default: ingress; } Memory Programmable Parser ALU Programmable Match-Action Pipeline *www.barefootnetworks.com 5 New Opportunities: Programmable Switches Header and Data Declarations Parser Program header_type header_type parser parse_ethernet { extract(ethernet); return switch(ethernet.ethertype) { 0x8100 : parse_vlan_tag; 0x0800 : parse_ipv4; 0x8847 : parse_mpls; default: ingress; } Memory Programmable Parser header header metadata ethernet_t { … } l2_metadata_t { … } ethernet_t ethernet; vlan_tag_t vlan_tag[2]; l2_metadata_t l2_meta; ALU Programmable Match-Action Pipeline *www.barefootnetworks.com 5 New Opportunities: Programmable Switches Header and Data Declarations Parser Program header_type header_type parser parse_ethernet { extract(ethernet); return switch(ethernet.ethertype) { 0x8100 : parse_vlan_tag; 0x0800 : parse_ipv4; 0x8847 : parse_mpls; default: ingress; } Memory Programmable Parser header header metadata ethernet_t { … } l2_metadata_t { … } ethernet_t ethernet; vlan_tag_t vlan_tag[2]; l2_metadata_t l2_meta; Tables and Control Flow table port_table { … } control ingress { apply(port_table); if (l2_meta.vlan_tags == 0) { process_assign_vlan(); } } ALU Programmable Match-Action Pipeline *www.barefootnetworks.com 5 New Opportunities: Programmable Switches 6 New Opportunities: Programmable Switches 6 New Opportunities: Programmable Switches ➢ Programmed using P4 • Flexibility to extract traffic features as desired 6 New Opportunities: Programmable Switches ➢ Programmed using P4 • Flexibility to extract traffic features as desired ➢ Same power and cost as fixed-function switches • Lower unit capital cost 6 New Opportunities: Programmable Switches ➢ Programmed using P4 • Flexibility to extract traffic features as desired ➢ Same power and cost as fixed-function switches • Lower unit capital cost ➢ Programs always run at line-rate • Enable in-network, online and performant detection 6 New Opportunities: Programmable Switches ➢ Programmed using P4 • Flexibility to extract traffic features as desired ➢ Same power and cost as fixed-function switches • Lower unit capital cost ➢ Programs always run at line-rate • Enable in-network, online and performant detection FastFE: Bring these benefits to the feature extractor 6 FastFE System Overview 7 FastFE System Overview • Policy interface – Help operators express which traffic features they desire • Policy enforcement engine – Translate high-level policies into the underlying primitives 7 FastFE System Overview • Policy interface – Help operators express which traffic features they desire • Policy enforcement engine – Translate high-level policies into the underlying primitives • Infrastructure – Run the primitives in programmable switches and commodity servers 7 Policy Interface 8 Policy Interface • Goal – Giving operators flexibility to specify traffic features as desired • Observation – Heterogeneous traffic analysis applications share a common feature extraction procedure • • • • Select interested traffic Group traffic into multiple subsets Update stateful variables Produce and emit feature vectors 8 Policy Interface • Goal – Giving operators flexibility to specify traffic features as desired • Observation – Heterogeneous traffic analysis applications share a common feature extraction procedure • • • • Select interested traffic Group traffic into multiple subsets Update stateful variables Produce and emit feature vectors • Interface – filter, groupby, update, produce 8 Policy Example 9 Policy Example • Feature extractor of Kitsune (part) in FastFE – Extract mean value and standard deviation of packet size from SrcIP aggregated network traffic 9 Policy Example • Feature extractor of Kitsune (part) in FastFE – Extract mean value and standard deviation of packet size from SrcIP aggregated network traffic 9 Policy Example • Feature extractor of Kitsune (part) in FastFE – Extract mean value and standard deviation of packet size from SrcIP aggregated network traffic 9 Policy Example • Feature extractor of Kitsune (part) in FastFE – Extract mean value and standard deviation of packet size from SrcIP aggregated network traffic 9 Policy Enforcement Engine 10 Policy Enforcement Engine • Goal – Translate and deploy the FastFE policy into the underlying infrastructure • Principle (for high scalability) – High-speed programmable switches are first-class citizens to deploy the policy 10 Policy Enforcement Engine • Goal – Translate and deploy the FastFE policy into the underlying infrastructure • Principle (for high scalability) – High-speed programmable switches are first-class citizens to deploy the policy • Interfaces in the programmable switches – filter(R, pred): match-action table – groupby(R, [fields]): match-action table, stateful ALUs – update(R, func): register read/write, stateful ALUs, and etc. – produce(R, func): multicast, register read/write, and etc. 10 Policy Enforcement Engine • Goal – Translate and deploy the FastFE policy into the underlying infrastructure • Principle (for high scalability) – High-speed programmable switches are first-class citizens to deploy the policy • Interfaces in the programmable switches – filter(R, pred): match-action table – groupby(R, [fields]): match-action table, stateful ALUs – update(R, func): register read/write, stateful ALUs, and etc. – produce(R, func): multicast, register read/write, and etc. 10 Policy Enforcement Engine 11 Policy Enforcement Engine • Orchestration – When the given func contains computation unsupported by switches • Put the previous operations on the switch, and the remaining operations on the server – When an interface gets deployed on the server • Put the following interfaces which depend on this interface on the server 11 Policy Enforcement Engine • Orchestration – When the given func contains computation unsupported by switches • Put the previous operations on the switch, and the remaining operations on the server – When an interface gets deployed on the server • Put the following interfaces which depend on this interface on the server • Other considerations – Multi-switch scalability • For higher bandwidth, employ multiple switches in a parallel way • For more stages, employ multiple switches in a pipeline – Order preserving • FastFE may causes potential packet disorder • Solution: set a timestamp for each packet group on switches 11 Policy Enforcement Engine • Orchestration – When the given func contains computation unsupported by switches • Put the previous operations on the switch, and the remaining operations on the server – When an interface gets deployed on the server • Put the following interfaces which depend on this interface on the server • Other considerations – Multi-switch scalability • For higher bandwidth, employ multiple switches in a parallel way • For more stages, employ multiple switches in a pipeline – Order preserving • FastFE may causes potential packet disorder • Solution: set a timestamp for each packet group on switches 11 Policy Enforcement Engine • Orchestration – When the given func contains computation unsupported by switches • Put the previous operations on the switch, and the remaining operations on the server – When an interface gets deployed on the server • Put the following interfaces which depend on this interface on the server • Other considerations – Multi-switch scalability • For higher bandwidth, employ multiple switches in a parallel way • For more stages, employ multiple switches in a pipeline – Order preserving • FastFE may causes potential packet disorder • Solution: set a timestamp for each packet group on switches 11 Policy Enforcement Engine • Orchestration – When the given func contains computation unsupported by switches • Put the previous operations on the switch, and the remaining operations on the server – When an interface gets deployed on the server • Put the following interfaces which depend on this interface on the server • Other considerations – Multi-switch scalability • For higher bandwidth, employ multiple switches in a parallel way • For more stages, employ multiple switches in a pipeline – Order preserving • FastFE may causes potential packet disorder • Solution: set a timestamp for each packet group on switches 11 Policy Enforcement Engine • Orchestration – When the given func contains computation unsupported by switches • Put the previous operations on the switch, and the remaining operations on the server – When an interface gets deployed on the server • Put the following interfaces which depend on this interface on the server • Other considerations – Multi-switch scalability • For higher bandwidth, employ multiple switches in a parallel way • For more stages, employ multiple switches in a pipeline – Order preserving • FastFE may causes potential packet disorder • Solution: set a timestamp for each packet group on switches 11 Policy Enforcement Engine • Orchestration – When the given func contains computation unsupported by switches • Put the previous operations on the switch, and the remaining operations on the server – When an interface gets deployed on the server • Put the following interfaces which depend on this interface on the server • Other considerations – Multi-switch scalability • For higher bandwidth, employ multiple switches in a parallel way • For more stages, employ multiple switches in a pipeline – Order preserving • FastFE may causes potential packet disorder • Solution: set a timestamp for each packet group on switches 11 Policy Enforcement Engine • Orchestration – When the given func contains computation unsupported by switches • Put the previous operations on the switch, and the remaining operations on the server – When an interface gets deployed on the server • Put the following interfaces which depend on this interface on the server • Other considerations – Multi-switch scalability • For higher bandwidth, employ multiple switches in a parallel way • For more stages, employ multiple switches in a pipeline – Order preserving • FastFE may causes potential packet disorder • Solution: set a timestamp for each packet group on switches 11 Policy Enforcement Engine • Orchestration – When the given func contains computation unsupported by switches • Put the previous operations on the switch, and the remaining operations on the server – When an interface gets deployed on the server • Put the following interfaces which depend on this interface on the server • Other considerations – Multi-switch scalability • For higher bandwidth, employ multiple switches in a parallel way • For more stages, employ multiple switches in a pipeline – Order preserving • FastFE may causes potential packet disorder • Solution: set a timestamp for each packet group on switches 11 Evaluation Overview 12 Evaluation Overview • Case study – Kitsune [NDSS '18], a neural network based network intrusion detection system – The feature extractor can be expressed with FastFE (157 LOC) • Implementation – FE-ASIC: ∼3K LOC in P4 (for the ASIC) & ∼1K LOC in Python (for the controller) – FE-Server: additional ∼200 LOC in Python, compatible with the native Kitsune • Experimental setup – One 3.3Tb/s Barefoot Tofino switch – Two Dell R730 servers: a traffic generator, a backend-server of FastFE 12 Evaluation 13 Evaluation • Soundness – Compare the feature vectors generated by Kitsune-with-FastFE and the native Kitsune – Cosine similarity: 0.989 • Scalability – Measure the bandwidth and CPU utilization of the back-end server • Overhead – Assess the resource usage of Kitsune with FastFE in our Tofino switch 13 Evaluation • Soundness – Compare the feature vectors generated by Kitsune-with-FastFE and the native Kitsune – Cosine similarity: 0.989 • Scalability – Measure the bandwidth and CPU utilization of the back-end server • Overhead – Assess the resource usage of Kitsune with FastFE in our Tofino switch Communication reduction by FastFE 13 Evaluation • Soundness – Compare the feature vectors generated by Kitsune-with-FastFE and the native Kitsune – Cosine similarity: 0.989 • Scalability – Measure the bandwidth and CPU utilization of the back-end server • Overhead – Assess the resource usage of Kitsune with FastFE in our Tofino switch Communication reduction by FastFE CPU alleviation by FastFE 13 Evaluation • Soundness – Compare the feature vectors generated by Kitsune-with-FastFE and the native Kitsune – Cosine similarity: 0.989 • Scalability – Measure the bandwidth and CPU utilization of the back-end server • Overhead – Assess the resource usage of Kitsune with FastFE in our Tofino switch 13 Evaluation • Soundness – Compare the feature vectors generated by Kitsune-with-FastFE and the native Kitsune – Cosine similarity: 0.989 • Scalability – Measure the bandwidth and CPU utilization of the back-end server • Overhead – Assess the resource usage of Kitsune with FastFE in our Tofino switch 13 Conclusion & Future Work 14 Conclusion & Future Work • Conclusion – Identify the bottleneck component of traffic analysis applications – Propose a high-speed feature extractor FastFE using programmable switches – Implement the initial prototype based on the case study • Future work – Redesign the policy interfaces to fit traffic analysis tasks better – Build a full prototype of the policy enforcement engine – Apply FastFE to more traffic analysis applications – Conduct larger-scale real-world experiment – Consider more complex scenarios 14 Thanks! Q&A ligy18@mails.tsinghua.edu.cn zhangmh16@mails.tsinghua.edu.cn 15 Traffic Analysis v.s. Network Monitoring 16 Traffic Analysis v.s. Network Monitoring • Network Monitoring – Packet/network behavior • E.g., congestion, packet loss, switch queue and etc. – Network-layer semantics • ML-Based Traffic Analysis – Host behavior • E.g., whether hosts are compromised, accessing a certain website – Application-level semantics 16