Against Internet Intrusions (paper) J. Scott Miller, Spring 2005 CS-495 Advanced Networking Plan of Attack Introduction Data Collection Data Analysis – – – – Data as over-generalized Response to data as flimsy Projections as too simplistic Final analysis shoddy CS-495 Advanced Networking 2 Introduction As we’ve already heard, this paper has a lot of data! Unfortunately, the analysis provided does not match the depth of information collected – Few meaningful conclusions are drawn – Analysis is very simplistic and preliminary – Future work is suggested once CS-495 Advanced Networking 3 Data Collection Firewall logs from across the world – 1600 different locations – Collected over four months Sounds good, but… – No information given regarding the subnets these firewalls protect, such as size and composition (this become important later) – Logs lack IP header and connection information CS-495 Advanced Networking 4 Data as Over-Generalized Data is placed into two very large groups – Worms – Non-worms But behavior of each intruder is specialized – Code Red I exhibits strong day-of-the-month characteristics whereas Code Red II does not – Global characteristics inferred are therefore very dependant on the worm in question – Same for non-worms CS-495 Advanced Networking 5 Data as Over-Generalized (cont.) What does this mean? – Analysis of persistence is biased toward the worms considered • Code Red I is memory resident while II is not – Periodicity is skewed by varied behavior • While it’s neat to see traffic spike during the Code Red I spread phase, it is not necessary telling One more thing… – Not clear if the firewalls catch intra-subnet traffic, important for some worms CS-495 Advanced Networking 6 Response to Data Analysis of top sources – Focus limited to non-worm sources – Author’s find a very Zipf-like distribution CS-495 Advanced Networking 7 Response to Data (cont.) So author’s suggest… – “… blacklisting worst offenders would be an effective mechanism defending against non-port 80 intrusions.” Unfortunately, this is ineffective because of the long tail distribution – A few nodes are making a large number of attacks – Many are making a small number of attacks and not all IPs in that group can be banned – No information is given on how many intrusions would still remain CS-495 Advanced Networking 8 Projections The limited data set is extrapolated to give an idea of the amount of intrusions Internet-wide – Calculated by taking the average intrusions per IP and multiplying that by the IP space – “We assume uniformity, but do not test for it. That is, we assume that since our set of provider networks are reasonably well distributed … our perspective reflects what is seen over the general internet.” Sound naïve? CS-495 Advanced Networking 9 Projections (cont.) It is! – Simply stating that you did not test for uniformity does not make it ok to ignore it! A number of other factors are ignored in this assumption: – Intra-subnet traffic missed by the router – Traffic behind a NAT – Unassigned IP addresses Without regard to these factors, 25 billion scans a day is arbitrary CS-495 Advanced Networking 10 Final Analysis Finally, the author’s take a look at how many subnets is adequate to determine “worst offenders” (top) and target ports (bottom) Data appears erratic still – is it possible that this data does not fit that model? Only mentions the data should be “relatively stable” CS-495 Advanced Networking 11 Moving on to My Opponent… CS-495 Advanced Networking 12