Using Economics to Quantify the Security of the Internet Jason Franklin Internet Security (Availability) • Claim 1: The security of the Internet is directly proportional to the number of compromised endhosts – As the total number of compromised machines grows, the potential for larger DDoS attacks grows – More compromised machines implies more resources available to attackers – Security of the Internet is directly tied to the security of end-hosts in aggregate Internet Security (Availability) • Claim 2: Given a sufficiently powerful adversary, any networked resource can be DoSed successfully – Defenders are fundamentally more resource constrained than attackers – Defenders are restricted to play/pay by the rules • Over-provisioning and DoS defenses cost money Measuring Internet Security • Two basic research questions: – (Number): How many of the Internet’s endhosts are compromised at any one time? • 100 million, 200 million, more? – (Cost): What is the effort required to compromise the security (availability) of a networked resource? • A security metric for Internet availability • Prefer quantity directly related to how much work or effort need be spent Estimating Number of Compromised End-hosts • Approach 1 (Scanning): – Scan entire IP address space with vulnerability scanner • Pros: – Would give reasonable estimate of number of hosts with well-known easy-to-exploit vulnerabilities • Cons: – Scanning won’t reach Internet’s edge (NATs etc.) – Vulnerability scanning is slow and noisy – Hosts that are compromised then patched would be missed Estimating Number of Compromised End-host • Approach 2 (Economics): – Establish market for compromised hosts – Monitor supply and demand • Pros: – Inexpensive to monitor market – Learn more than just quantity supplied • Cons: – Difficult to establish public market for stolen goods – Hard to entice buyers and sellers to participate Hard, but not impossible • Introducing #ccpower – Active underground market for cyber contraband • Includes buyers and sellers specializing in spam, phishing, scamming, hacking, credit card fraud, and identity theft • Global market with thousands of active buyers and sellers • Responsible for ~$100 million in credit card fraud each year, numerous phishing scams, and hordes of other illegal activity Collecting Economic Data • Passive monitoring and archival of Internet Relay Chat (IRC) channels – 50+ monitored servers C S – Over 7 months of data C C – Over 12 million individual messages from as many as 50k individuals • Limitations and Complexities – No private IRC messages – Complex underground dialect (slang) – Difficult to establish reputation C S IRC S C Key S erver C lient Percentage of Monitored Messages Market at a Glance Number of Days Monitored Identifying Useful Data • Text classification problem: – Given 13+ million IRC messages • Including millions of useful messages – “I’ve got hacked hosts for $2, pm me for deal” • And millions of useless messages – “Screw you guys I’m out of here” • Built binary text classifiers to identify interesting classes of data – Hacked hosts sale ads – Hacked hosts want ads – Phishing and spam related ads • Used SVMs with 3k line train set and 1k line test set – Bag of words feature vectors with TFIDF feature representation – SVMs correctly recall over 85% of true positives with precision of around 50% – For each true positive, SVMs identify one false positive Economic Measurements • Law of Demand – All other factors being equal, the higher the price of a good, the smaller is the quantity demanded • Law of Supply – All other factors being equal, the higher the price of a good, the greater is the quantity supplied Price Price of Hacked Hosts over Time Time Period (Days) # Compromised End-hosts • Methodology: – Market equilibrium price for compromised hosts at time t=1 is $10 – Market equilibrium price for compromised hosts at time t=2 is $5 – More compromised hosts are available at a lower price – But how do we know that supply shifted rather than demand? $10 ? $10 $5 ? Ceteris Paribus Assumption Laws of Supply and Demand only hold under ceteris paribus assumption – • “All other factors being equal” Law of Demand’s Other Factors – Size of market (population) • – – – • Measurements show this is fixed Consumer preferences Income Price of related goods Law of Supply’s Other Factors – Cost of required resources (inputs) • • – Search cost for time spent searching for vulnerable hosts Cost of exploits (free) Technology • – Population • Scripts and tools mainly Price of substitute and complement • • Bulletproof hosting services for spammers Substitutes for bots? Days Cost to Buy as a Security Metric • Each networked server S has fixed amount of available resources R – S has sufficient resources to service k hosts at per time period – In our simple model, S is vulnerable to a complete DoS attack by >= k hosts • Natural question to ask is “How much effort is required of an attacker to compromise k hosts?” – Before markets, effort required was dependent on skills of attacker and level of tools available – After markets, effort required at time t can be measured by the Cost to Buy k hosts at time t Cost to Buy Metric • A simple example: – Server S has sufficient resources to service 30 hosts per time period – Security w.r.t. an adversary: • S is 20 (50-30) under provisioned against a $100 adversary at time t • S is 5 over provisioned against a $100 adversary at time t+1 Time: Cost to buy(1) Adversary Adversary Resources t $2 $100 50 t+1 $4 $100 25 – Independent of adversary: • S is $60 (30 * $2) secure at time t and $120 (30 * $4) secure at time t+1 • Measures resources required by adversary / measures risk Conclusion • We looked at how economics can be used to quantify the security of the Internet in a natural way • Asked how many of Internet end-hosts are compromised • Established trend suggesting that the number of compromised hosts is increasing rather than decreasing • Developed the cost to buy security metric to quantity resources of adversary necessary to effect the available of a resource • Price provides natural way to quantify resources Remaining Work • Use simultaneous equation models from econometrics to empirically estimate supply and demand curves – Allows for estimate of quantity supplied at a price • Use event study methodology to correlate Internet security “events” with the price of compromised hosts – New form of validation for security metrics Questions? • Acknowledgements: – Paul Bennett,John Bethencourt, Gaurav Kataria, Leonid Kontorovich, Pratyusa K. Manadhata, Vern Paxson, Adrian Perrig, Srini Seshan, Stefan Savage