Structure Preserving Anonymization of Router Configuration Data David A. Maltz, Jibin Zhan, Geoffrey Xie, Hui Zhang Carnegie Mellon University Gisli Hjalmtysson, Albert Greenberg, Jennifer Rexford ATT Labs Research 1 Why Configuration Files are Valuable Configuration file = program loaded on each router • Controls operation of router • Controls interactions between routers Configuration files allow researchers to study of the details of real networks • The problem is getting access to them • We have developed a technique for anonymizing configuration files • We have a proposal for how configs could be made accessible to the research community 2 Why Configuration Files are Valuable - 2 The set of configurations defines the network • Captures many of the network’s properties – – – – Topology (node degree, interconnectivity) Policies (CoS, QoS, packet filters, reachability) Routing (neighbors, OSPF weights, BGP policies) Security (vulnerabilities, mitigations) Only source of insight for Enterprise networks • 10K+ networks that are currently a mystery • Interesting! 10 – 1200 routers, global scale • Configs are the only way to look at them – Networks firewalled, external probes dropped 3 Topology Internet Router 1 Config Router 2 Config interface Serial1/0.5 interface Serial2/1.5 ip address 1.1.1.1/30 ip address 1.1.1.2/30 4 Quality of Service class-map GoodCustomer match access-group 136 policy-map GoldService class GoodCustomer bandwidth 2000 queue-limit 40 class class-default fair-queue 16 queue-limit 20 interface Serial0/0 service-policy output GoldService Class definition CB-WFQ parameters CB-WFQ policy name 5 Routing AS Numbers router bgp 65501 neighbor EdgeSwitch peer-group Policies neighbor EdgeSwitch remote-as 64740 neighbor EdgeSwitch distribute-list 11 in neighbor EdgeSwitch route-map exportRoutes out neighbor 192.168.96.8 peer-group EdgeSwitch neighbor 192.168.96.9 peer-group EdgeSwitch neighbor 10.217.248.14 remote-as 65500 neighbor 10.217.248.14 ebgp-multihop 5 Peers 6 Security Issues access-list 143 deny 53 any any access-list 143 deny 55 any any access-list 143 deny 77 any any access-list 143 permit ip any any Access list 143: Drops packets that can attack Cisco interfaces interface Serial0.2 multipoint This interface is ip access-group 143 in ip address 66.248.162.13 255.255.255.224 safe interface Ethernet0 ip address 144.201.41.59 255.255.255.0 This interface is not 7 How to Get Configuration Files? Considered proprietary secrets of network owners • Discloses business strategy • Discloses vulnerabilities Anonymization breaks tie between data and owner • Anonymized configs will show some network is vulnerable, but which/where to attack? We developed method for anonymizing configuration files • Approach convinced some customers of ATT to disclose their configs to CMU researchers 8 Anonymization Challenges We don’t know the intended use of the data • Must anonymize entire configuration file • A customized data set is easier to anonymize Must preserve structure of information in files • Relationships of identifiers inside/between files • IP address subnet relationships Traditional parsing tools are of no use • No published grammar for Cisco IOS • 200+ different versions seen in 31 networks 9 Anonymize Non-numeric Tokens Created “pass list” of words by string-scraping Cisco’s web pages • Contains most IOS commands • Other words are generic networking terms (“IETF”) All tokens not in pass list are hashed with salted SHA1 router bgp 64780 redistribute ospf 64 match route-map NYOffice neighbor 1.2.3.4 remote-as 701 route-map NYOffice deny 10 match ip address 4 router bgp 64780 redistribute ospf 64 match route-map 8aTzlvBrbaW neighbor 66.253.160.68 remote-as 701 route-map 8aTzlvBrbaW deny 10 match ip address 4 10 Anonymize Specific Numbers Most numbers are harmless, some reveal identity • Public AS numbers • Phone numbers (NOCs, backup modems) 26 rules used to find and anonymize context-dependent items • • "neighbor\\s+$ipAddrPatt\\s+remote-as" " neighbor\s+\w+\s+remote-as " router bgp 64780 redistribute ospf 64 match route-map NYOffice neighbor 1.2.3.4 remote-as 701 route-map NYOffice deny 10 match ip address 4 router bgp 64780 redistribute ospf 64 match route-map 8aTzlvBrbaW neighbor 66.253.160.68 remote-as 1237 route-map 8aTzlvBrbaW deny 10 match ip address 4 11 Limits of Anonymization Anonymization is a lossy process • Comments & meaningful identifiers removed • (Were they right anyway???) Anonymizer preserves relationships it knows about • Doesn’t know about IP addr <-> ASN mapping • A packet filter, based on IP address, and route policy, based on ASN, could target same AS • Post-anonymization: both mechanisms preserved, but won’t show them targeting same AS • (Router didn’t have that external information either) 12 Potential Vulnerabilities: Textual Attacks Identifying information left in configs Heuristics used as double-check • Rules that anonymize public AS numbers record the public AS numbers they find • Search post-anonymization file for any remaining occurrences 13 Potential Vulnerabilities: Fingerprinting Attacks Network characteristics (fingerprint) extracted from anonymized configs matched against public data Potential fingerprints • BGP community strings • Number of POPs, number of BGP peers • Structure of address space utilization • Others… Evaluation still in progress • Seems like backbone networks are identifiable • Seems like enterprise networks are not 14 A Clearinghouse for Configuration Data Network owners Retrieve Anonymizer Questions Results Anonymize & test configs Run tools on site: Scalable, pictures Upload configs Blinded email Website enforcing single-blind methodology Retrieve configs Register with site Analyze data Blinded email Questions Results Researchers Boot-strap with configs from academic/research institutions? 15 Questions? 16 Fingerprinting Attacks BGP Peers per POP Data from networks in repository of anonymized configs POPs (sorted by peers/POP) 1. For each anonymized network, compute fingerprint from anonymized config files • Will be 100% accurate 2. Experimentally measure real networks 17 Fingerprinting Attacks BGP Peers per POP Measured network characteristics POPs (sorted by peers/POP) Evaluation still in progress • Seems like backbone networks are identifiable • Seems like enterprise networks are not 18 Anonymize Regular Expressions Some AS numbers appear in regular expressions • Expressions w/ only private AS numbers ! no change ip as-path access-list 99 permit _6451[2-9]_ 64512, 64513, … 64519 ip as-path access-list 99 permit _6451[2-9]_ • Expressions w/ public AS numbers ! expand and anonymize ip as-path access-list 101 permit _70 [1-3]_ 701, 702, 703 Anonymize 1234, 543, 21 ip as-path access-list 101 permit _(1234|543|21)_ 19 Anonymize IP Addresses Extended Minshall’s prefix-preserving algorithm Made it class preserving • Class A to Class A, etc. – RIP and older protocols are class-full Made it “subnet address” preserving • Assume 128.2.0.0/16 is subnet • We want 128.2.0.0 ! 150.7.0.0 • Before extension, 128.2.0.0 ! 150.7.43.66 20 Anonymize IP Addresses - 2 Made it “special address” preserving • Multicast, private address space • Must fix collisions in mapping function IP Addr Special? N Anonymize Y Special? Y N 21 Anonymization Overview Minimize dependence on context • If in-doubt, hash it out 1. Remove all comments 2. Find all IP addresses and hash using specialized prefix-preserving anonymization 3. Hash all non-numeric tokens not known to be safe 4. Anonymize specific numeric tokens using regular expressions 5. Anonymize regular expressions appearing in configs 22