CloudPolice: Taking Access Control Out of the Network Lucian Popa UC Berkeley/ICSI MinlanYu Steven Y. Ko Princeton Univ. Princeton Univ. Sylvia Ratnasamy Ion Stoica Intel Labs Berkeley UC Berkeley Context Infrastructure as a Service virtualized clouds VM VM Hypervisor Traffic internal to cloud VM Context Cloud computing requires network access control Context Cloud computing requires network access control Access control policy of tenant X = what network traffic is tenant X willing to accept Y can talk to me Tenant X Tenant Y Why Access Control in Clouds? (1) For isolation Policy: deny incoming traffic from any other tenant Exbay Amazonia Why Access Control in Clouds? (2) For inter-tenant & tenant-provider communication Policy: allow/deny traffic from specific tenants Increasingly common in cloud environments Low latency and high bandwidth Ease of service composition Exbay Amazonia Why Access Control in Clouds? (2) For inter-tenant & tenant-provider communication Policy: allow/deny traffic from specific tenants Exbay Amazonia Real-time bidding advertising Why Access Control in Clouds? (2) For inter-tenant & tenant-provider communication Policy: allow/deny traffic from specific tenants Send information about client Exbay Amazonia Real-time bidding advertising Ad Ad Ad Networks Network Network21 Why Access Control in Clouds? (2) For inter-tenant & tenant-provider communication Policy: allow/deny traffic from specific tenants Receive ad bids Exbay Amazonia Real-time bidding advertising Ad Ad Ad Networks Network Network21 Why Access Control in Clouds? (2) For inter-tenant & tenant-provider communication Policy: allow/deny traffic from specific tenants Return ad of highest bidder Exbay Amazonia Real-time bidding advertising Ad Ad Ad Networks Network Network21 Why Access Control in Clouds? (2) For inter-tenant & tenant-provider communication Policy: allow/deny traffic from specific tenants Policy of Exbay: allow traffic from AdNetworks, deny all other traffic Exbay Amazonia Real-time bidding advertising Ad Ad Ad Networks Network Network21 Why Access Control in Clouds? (2) For inter-tenant & tenant-provider communication Policy: allow/deny traffic from specific tenants Other service examples: database (SimpleDB), desktop, communication (SQS), map-reduce++, Facebook, host managing, locking, etc. Exbay Amazonia Ad Ad Ad Networks Network Network21 Why Access Control in Clouds? (3) For inter-tenant & tenant-provider communication Policy: weighted bandwidth allocation between tenants Exbay Amazonia Ad Ad Ad Networks Network Network21 Why Access Control in Clouds? (3) For inter-tenant & tenant-provider communication Policy: weighted bandwidth allocation between tenants Share bandwidth fairly among tenants regardless of #VM sources Nextbay Exbay Amazonia Ad Ad Ad Networks Network Network21 Why Access Control in Clouds? (3) For inter-tenant & tenant-provider communication Policy: weighted bandwidth allocation between tenants Other example policies: • Rate-limited access • Allow only locally initiated connections • Nighttime access only Nextbay Exbay Amazonia Ad Ad Ad Networks Network Network21 Why Access Control in Clouds? (4) DoS protection One tenant can attack another tenant Reduce bandwidth and slow down machines Attackers more powerful: higher bandwidths Barrier is lower: pay for attacking hosts (compromise credit cards instead of hosts) Nextbay Exbay AmazoniaX Ad Ad Ad Networks Network Network21 Hence, the problem Want access control in clouds that Is resilient to DoS Supports rich inter-tenant policies Hence, the problem Want access control in clouds that Is resilient to DoS Supports rich inter-tenant policies Scales 100k servers 10k tenants Hence, the problem Want access control in clouds that Is resilient to DoS Supports rich inter-tenant policies Scales Tolerates high dynamicity 100k VMs started per day, more than one per second Hence, the problem Want access control in clouds that Is resilient to DoS Supports rich inter-tenant policies Scales Tolerates high dynamicity Traditional access control mechanisms not well suited to meeting these requirements Existing Access Control Cloud APIs are narrow On/off No locally initiated connections, no rate-limiting, no weighted allocation Mechanisms inherited from enterprises VLANs Firewalls Existing Access Control Cloud APIs are narrow On/off No locally initiated connections, no rate-limiting, no weighted allocation Mechanisms inherited from enterprises VLANs Firewalls But clouds != enterprises Clouds != Enterprises Enterprises are not multi-tenant Few DoS concerns between departments Typically simpler policies Enterprises don’t have the same dynamicity and scale 10k tenants vs. 10s departments; 1 VM/s vs. mostly static Clouds have different network designs High bisection bandwidths, multiple paths, different L2/L3 mix Many new topologies: FatTree, VL2, BCube, DCell, etc. VLANS not well suited for clouds Inflexible policies Difficult to scale (cloud size & dynamicity) Limited number, spanning tree Limited network designs No L3 networks, no multiple paths, inter-VLAN through router Firewalls not well suited for Clouds Offering DoS protection is difficult Must be applied at source hard to update Inflexible policies Scale through prefix aggregation Difficult to manage 10k tenants with multiple prefixes, different scaling requirements No L3 networks Recap Traditional access control is not well suited for clouds Couple access control with network operation With switching –VLANs With address assignment – Firewalls Recap Traditional access control is not well suited for clouds Couple access control with network operation With switching –VLANs With address assignment – Firewalls CloudPolice takes access control out of the network Outline Part 1 – Context and Motivation Access control for clouds: why and what? Limitations of traditional mechanisms Part 2 – CloudPolice Approach Operation Goal Network Access Control for Clouds that is: 1. Independent of network topology and addressing 2. Scalable (millions hosts, high churn) 3. Flexible (on/off access, rated access, fair access) 4. Robust to (internal) DDoS attacks CloudPolice Sufficient and advantageous to implement access control only within hypervisors VM VM Hypervisor VM CloudPolice Sufficient and advantageous to implement access control only within hypervisors Trusted Network independent Full software programmability flexible Close toVMs block unwanted traffic before network and help DoS Easy deployability VM VM Hypervisor VM CloudPolice Sufficient and advantageous to implement access control only within hypervisors CloudPolice Policy Model Group = set of tenant VMs with same access control policy VM VM Hypervisor VM CloudPolice Sufficient and advantageous to implement access control only within hypervisors CloudPolice Policy Model VM Policy = set of Rules Rule = IF ConditionTHEN Action VM Hypervisor VM CloudPolice Sufficient and advantageous to implement access control only within hypervisors Condition = logical expression with predicates based on: • Group of sender • Packet header • Current time • History of traffic CloudPolice Policy Model VM VM Hypervisor VM CloudPolice Sufficient and advantageous to implement access control only within hypervisors Action: • Allow • Block • Rate-limit (token bucket) CloudPolice Policy Model VM VM Hypervisor VM CloudPolice Sufficient and advantageous to implement access control only within hypervisors Action: • Allow • Block • Rate-limit (token bucket) CloudPolice Policy Model VM VM VM Applied per Hypervisor flow source VM source group CloudPolice Hypervisor-based VM VM Hypervisor Src. VM VM VM Hypervisor Dst. VM CloudPolice Hypervisor-based Avoid DoS and wasted resources apply policy at source VM VM Hypervisor Src. VM VM VM Hypervisor Dst. VM CloudPolice Hypervisor-based How to apply destination’s policy at the source hypervisor? VM VM Hypervisor Src. VM VM VM Hypervisor Dst. VM CloudPolice Hypervisor-based Centralized policy repository? VM VM Hypervisor Src. VM VM VM Hypervisor Dst. VM CloudPolice Hypervisor-based Centralized policy repository? VM VM Src. VM VM Allow? Hypervisor VM Hypervisor Dst. VM CloudPolice Hypervisor-based Centralized policy repository? Centralized service requires high availability and throughput 100k servers and 10 new flows/VM/s 1M decisions/s on average! Caching can be ineffective (random patterns, malicious pollution) Centralized service can be a DoS target VM VM Src. VM VM Allow? Hypervisor VM Hypervisor Dst. VM CloudPolice Hypervisor-based Decentralized VM VM Hypervisor Src. VM VM VM Hypervisor Dst. VM CloudPolice Hypervisor-based Decentralized Distribute all policies to all hypervisors? VM VM Hypervisor Src. VM VM VM Hypervisor Dst. VM CloudPolice Hypervisor-based Decentralized Distribute all policies to all hypervisors? VM VM Hypervisor Src. VM VM Allow? VM Hypervisor Dst. VM CloudPolice Hypervisor-based Decentralized Distribute all policies to all hypervisors? Too heavyweight if network independent Full group membership required; Group updates propagated everywhere 100k new VMs/day, 100k servers 100k updates/s on average VM VM Hypervisor Src. VM VM VM Hypervisor Dst. VM CloudPolice Hypervisor-based Decentralized Apply at destination and enforce at source VM VM Hypervisor Src. VM Apply destination’s policy VM VM Hypervisor Dst. VM CloudPolice Hypervisor-based Decentralized Apply at destination and enforce at source VM VM Hypervisor Src. VM Enforce policy’s action VM VM Hypervisor Dst. VM Inspired by Internet Research Internet solutions to DDoS Push-back filters [AIP, Pushback, AITF, StopIt] Network Capabilities [SIFF, TVA] Handle large and dynamic networks, millions of users Inspired by Internet Research Internet solutions to DDoS Push-back filters [AIP, Pushback, AITF, StopIt] Network Capabilities [SIFF, TVA] Handle large and dynamic networks, millions of users More easily deployed: Clouds != Internet Clouds are controlled environments Both communication endpoints can be controlled Single administrative domain New tools: trusted software layer – Hypervisor Outline Part 1 – Context and Motivation Access control for clouds: why and what? Limitations of traditional mechanisms Part 2 – CloudPolice Approach Operation CloudPolice CloudPolice X Y Z X’s group policy: IF group = A allow IF group = B block IF group = C & port = 80 rate-limit to 100Mbps Y’s group policy: IF … Z’s group policy: IF … Hypervisor Policies for X,Y and Z Policy could also be specified / updated by VM Installed by provider service that starts VMs Each hypervisor needs to know for hostedVMs: group and policy CloudPolice X Y Z Hypervisor Filter for incoming/outgoing flows CloudPolice Start flow to C X Y A Z Hypervisor B C Hypervisor Z group CloudPolice inserts control packet containing group of Z and first packet header CloudPolice X Y A Z Hypervisor B C Hypervisor Z group If blocked or rate limited, send control If allowed, packets are forwarded to packet to source hypervisor to block destination VM or rate-limit source (flow/VM) Block/rate-limit Soft-state CloudPolice verifies policy of and timeouts handle policy invalidations and packetVMlosses destination Scalability CloudPolice takes the best of both worlds Centralized vs. every server stores all policies Load spread across all servers Maintaining and enforcing policies Update propagation is contained Group membership updates not propagated Policy updates propagated only to group Security Analysis Sketch Attackers VMs – corrupted or paid by malicious tenants Attacks considered Violate access control policies to reach destination DoS with unauthorized traffic DoS with authorized traffic Assumptions Hypervisors not compromised Security Analysis Sketch Violate access control policies to reach destination Policy distributed securely to hypervisor Control packets cannot be spoofed, only sent by hypervisors X Y Z Fake group Fake group Hypervisor Security Analysis Sketch Violate access control policies to reach destination DoS with unauthorized traffic Control packets block unauthorized traffic at source Security Analysis Sketch Violate access control policies to reach destination DoS with unauthorized traffic Control packets block unauthorized traffic at source Attackers attempt to cause drops of control packets Block/rate-limit Security Analysis Sketch Violate access control policies to reach destination DoS with unauthorized traffic Control packets block unauthorized traffic at source Attackers attempt to cause drops of control packets Retry or prioritize control packets Security Analysis Sketch Violate access control policies to reach destination DoS with unauthorized traffic DoS with authorized traffic Also need performance isolation for full protection Congestion Security Analysis Sketch Violate access control policies to reach destination DoS with unauthorized traffic DoS with authorized traffic Also need performance isolation for full protection CloudPolice can implement some performance isolation Rate-limit to fair share of destination link rate-limit Share access link evenly between destination VMs Future Work Implement CloudPolice prototype Extend CloudPolice Policies with application-level semantics (dynamic policies) Policies based on group-wide state Beyond access control? More flexible actions, e.g., send to middlebox Performance isolation framework Summary Access control in cloud computing requires new mechanisms and extended policies CloudPolice Takes advantage of trusted hypervisors Inspired by past work on Internet DDoS protection Properties Network independent Scalable Flexible Robust to (internal) DDoS attacks Backup Slides Related Work OpenFlow & (Onix | Difane) & OpenVSwitch Decisions not based on logical identifier (group/tenant) Onix only isolation framework OpenFlow actions designed for switches (e.g., currently can’t rate-limit) Require scaling central controller Vs. software update for CloudPolice Contributions Identify that new access control mechanism is needed in clouds Pinpoint the challenges and requirements Identify that access control should be done in hypervisors Propose CloudPolice, mechanism that satisfies requirements Compromise Single Hypervisor Can prevent compromised hypervisors from violating security policies Security credentials associated with group identifier 1. Cannot be sent if unknown (known only for hosted VMs) E.g., group ID has key in name Prevent spoofed control packets in the network 2. Like IP anti-spoofing in switches/routers Today’s Cloud Mechanisms? Solutions not public Could be similar to our solution Could provide fewer properties API is narrow On/off between groups No locally initiated connections, no rate-limiting, no weighted allocation Feasibility Working on implementing CloudPolice prototype Fast path – act on per flow state Open VSwitch and software routers [RouteBricks, PacketShader] suggest this is feasible Slow path – execute policy and install flow state 1/N of requirements for centralized repository Few hosted VMs dominated by policy complexity Software router applications suggest if-then-else structures can be parsed fast [RBF] Other Related Work VL2’s approach if it would be applied to hypervisors Centralized repository Can violate policies if IP of destination known Firewalls not Suited for Clouds Not well suited against DoS Must be applied at source hard to update Inflexible policies for clouds Scaling & network designs With no prefix aggregation Difficult to scale (100k+ entries) Needs updating on all VM starts (more than once/s) With prefix aggregation Complex to manage 10k tenants with multiple prefixes, different scaling requirements No L3 networks