Joel Cassman CPSC 6157 – Network Management Columbus State University April 30, 2010 Modeling and Detecting Stepping Stone Intrusion: A Critique and Proposed Alternatives Abstract: Unauthorized access to computer networks is called an “attack”. Detection of such attacks and identification of the intruders responsible is a difficult problem due to the use of “steppingstones”, that is, the use of multiple intermediate computer hosts used to break in to another computer. Intruders first log into a stepping-stone host, and then continue this step to form a “connection chain” of host computers. A pair of network connections is called a “stepping- stone connection pair” if both connections are part of a connection chain. Intruders typically erase logs on these stepping-stones. It is therefore difficult to trace an attack back to its original origin. Intruders may also use encryption and other techniques to hide the source of network attacks.1 This paper analyzes recently published studies on the stepping stone intrusion problem. There is a growing body of research on this problem. A number of researchers have proposed algorithms to identify and detect stepping stone intrusion, which generally focus on different ways to measure the correlation among different streams of packets to find some kind of similarities This paper will focus on the most recently published work on the subject written by Zhang, Yang and Ye (2009)2. They proposed four models to describe stepping-stone intrusion and applied statistical methods using signal processing technology to run experiments to detect such intrusions. Detection methods use measures of correlation coefficients to compare two pairs of signals received at a host computer: the incoming and outgoing Send packet stream and the incoming and outgoing Echo (acknowledgements) packet stream. By conducting experiments which monitored incoming and outgoing connections simultaneously and collecting the TCP/IP packets flowing through, the authors intended to collect enough data to compare the four methods. However, the paper reported only some very preliminary results, which they admitted were limited. The problem with the algorithms proposed in the Zhang, Yang and Ye paper is that these methods, in practice, generate either too much data or too little data, making meaningful statistical analysis challenging. They also are difficult to apply in real time, making it harder to achieve the overall purpose which is, after, all, identifying and catching the perpetrators of cybercrimes. Ideas to improve the techniques include combining stepping-stone intrusion technology with older methods of packet tracing using TCPdump, WireShark or other sniffer programs and looking for suspicious packets accessing unusual ports on the source and destination computers in the connection chain. 1 1. Introduction A “Network Attack” refers to any operations to disrupt, deny, degrade, or destroy information resident in computers and computer networks, or the computers and networks themselves. “Stepping Stone” refers to a computer intrusion by which a host computer connects to another computer, in order to gain access to one or more other computers. Hackers use stepping stone intrusion for various illegal activities, including identity theft, theft of classified information, fraud, blackmail, extortion, and denial of service attacks. As described in a recent book by Joseph Menn,3 cybercriminals are increasingly sophisticated and often partner with organized crime. Hackers use spoofed source addresses in order to capture a large number of “zombie” computers in forming a connection chain to launch their attack on the unsuspecting victim. There is a growing body of research on the problem of identifying network intruders using stepping stones. If intruders could be identified using computer algorithms, then the perpetrators could be caught and prosecuted for computer crimes. Identification of intruders could also be important in information warfare applications and computer fraud. The problem of detecting stepping stone intrusion has been looked at in a variety of research papers which develop and test algorithms to identify and match packets passing through the network as coming from the same intruder source. In order to match packets, these algorithms generally start by identifying characteristics which would indicate a stepping-stone type of attack. This paper is a review of some recently published papers that proposed algorithms to model and detect stepping stone intrusion, focusing on the methods discussed in the paper by Zhang, Yang and Ye (2009)4 2. Related Work The literature on the problem of detecting stepping stone intrusion is a growing body of work. The problem of stepping-stone detection was first discussed in a paper by Staniford-Chen and Heberlein (1995).5 This paper noted that attackers take advantage of the decentralized architecture of the Internet to hide their point of origin by seeking out and logging into insecure hosts. The intruders then launch attacks on their intended targets with little risk of identification. Staniford-Chen and Heberlein presented a possible solution to the problem by developing the concept of a thumbprint. Their algorithm involves the calculation of a thumbprint (similar to a checksum) for each interval of each connection, which is stored and later used to verify whether or not two connections had the same content. The paper described various ways in which these logs could be used to trace back an 2 intrusion. Reactive methods include host-based solutions such as CIS (Caller Identification System) and Caller ID. These methods trace intrusions back through the chain of intermediate hosts. Zhang and Paxson (2000) Yin Zhang and Vern Paxson took the stepping stone intrusion detection problem further by developing an algorithm based on the distinctive characteristics of interactive Internet traffic, including packet size and timing.6 They noted that the difference between a stepping stone connection pair and a randomly selected pair of connections is that the stepping-stone connections are likely to have some statistically correlated traffic characteristics, including connection contents, inter-packet spacing, ON/OFF patterns of activity, and traffic volume or rate. They then noted that content-based detection techniques are not effective when the content is encrypted, as is the case in Secure Shell (SSH). Therefore, Zhang and Paxson’s algorithm did not collect information on packet contents, but rather focused on the information contained in the packet headers. Their goal was to develop an algorithm that economized on the amount of information gathered and the resulting processing load. For this reason, they devised filtering criteria based on packet size, which significantly reduces the packet capture load on the monitor. They also restricted their screening of connections by eliminating connections which (1) shared the same port on the same host, (2) had inconsistent directions, or (3) had inconsistent timing. They then filtered out network traffic that ran fairly continuously in order to focus on traffic exhibiting the ON/OFF patterns characteristic of interactive sessions. Huang, Lychev and Yang (2007) This paper7 developed a connection chain detection algorithm to identify interactive attacks by measuring correlations of the outgoing stream of outgoing connections (Send) to the outgoing streams of incoming connections (Echo). They hypothesized that the frequency in which packets leave a stepping-stone in the Echo stream depends on the frequency in the Send stream. If the algorithm identifies a computer host with a Send-Echo pair whose statistical relationship is within a certain margin of linearity, then there is a high probability that the host is being used as a steppingstone. Yang and Huang (2008) This paper8 proposed using a “Step-Function” method to detect the length of a connection chain from a host to the targeted computer. This step function is implemented by matching TCP Send and Echo packets and computing the packet round trip times (RTTs) as a measure of the length of the connection chain. The paper then developed two algorithms to match TCP packets in real time. The 3 “Conservative” TCP algorithm matches the packets precisely but only for a small number of packets. The “Heuristic Match” algorithm matches more packets but with less certainty of how closely the packets are related. The data generated by either algorithm was used to estimate levels in RTTs which form a step-function as more and more hosts are used in the connection chain. Zhang, Yang and Ye (2009) This paper9 proposed four models to describe stepping-stone intrusion and applied statistical methods using signal processing technology such as Fourier and Laplace transformations to run experiments to detect such intrusions. Detection methods use measures of correlation coefficients to look at two pairs of signals received at a host computer: the incoming and outgoing Send packet stream and the incoming and outgoing Echo (acknowledgements) packet stream. By monitoring the incoming and outgoing connections simultaneously and collecting the TCP/IP packets flowing through the connections, the paper discussed four methods to analyze the information: Model 1: Sequence Model – Collect information on the order of elements, the number of the elements in each sequence and the value of each timestamp collected on the ordered TCP packets. The incoming and outgoing streams are considered to be separate, discrete signals. The intervals between the time stamps of the packets are analyzed by computing correlation coefficients to determine the degree of correlation between the signals. Model 2: Pair Model – The two signals are considered to be connected. Build a series of paired Send and Echo packets for both the incoming and outgoing signals and compare the number of pairs identified to measure how closely the two signals are related by calculating their correlation coefficients. Model 3: RTT Model – The paired stream of packets is analyzed for the difference in the timestamps of the send and echo packets for the same connection (incoming or outgoing). This difference is called the round-tip time (RTTi and RTTo) which represents the length of the connection chain from the Host to the target of the intrusion. Different pairs will have different RTTs, conforming to a chidistribution which could be tested for statistical significance. Model 4: Crossing Model – This variant of the model pairs the send stream of the incoming connection to the echo stream of the outgoing connection and vice-versa. The differences in timestamps (RTT io and RTT oi) must be related according to a chi-distribution, which can be tested for statistical significance. The paper reported only some limited experimental results to test these four models. They set up a controlled experiment to set up a connection chain between three computers located in different parts of the US. They then sent packets through the chain, collected data, and attempted to compute 4 correlation coefficients between the incoming and outgoing connections. The paper reported only limited, preliminary results which did not really verify their models. They made several assumptions which limited the wider application of these models to actually detect stepping stone intrusion in real life, such as the very narrow scope of their experiment (they only monitored one port of the host computer which connected to only one port of the connected computer). 3. Proposed Alternative Solutions The problem with current stepping-stone detection algorithms, in practice, is that they either collect too much data (in which few of the packets actually match) or not enough data to be able to run significant statistical tests of correlation. They also were too limited in scope to be of any practical benefit. Zhang, Yan and Ye admitted the limitations in their paper but pointed out that it was only a first step, and they proposed that future work should use multiple characteristics to test packet flow correlation, not just one parameter as in their current models. In a recent unpublished lecture, Dr. Yang presented some new intrusion detection methods which looked at time-jittering and chaff-perturbation characteristics.10 While the current state of the art in stepping stone detection is rather unsatisfactory, it is possible that such algorithms could eventually be used to improve existing network intrusion detection methods. Current techniques are based on sniffer programs such as TCPdump, and Wireshark, which tend to look at suspicious activity such as packets which access unusual ports on victim’s computers or run certain known programs such as Echo-Chargen,11 which can crash networks. There are other ways of trying to determine a stepping-stone intrusion as well, such as monitoring traffic volume and lag time between certain host computers, or using packet monitors and analyzing transaction log files to spot suspicious activities. Future research work includes developing improved algorithms that match packet pairs by analyzing the packet streams for human-like characteristics typical of an interactive session (such as time-jitter) and analyzing packets for erroneous chaff packets against normal packets. If these future algorithms could evaluate matched pairs by comparing them against trouble-making packet types already well known in the network intrusion literature, then the matching process should improve. . An alternative proposal to improve these theoretical stepping stone detection algorithms is to avoid testing them by setting up very structured, limited experiments as in the Zhang, Yang and Ye paper but rather by doing real world experiments with real hackers. One technique that could be used 5 is to invite hackers to attack a host (called a “honey pot”) already configured with packet matching algorithms and then run them all at the same time to see which method comes up with the greatest number of detected packet matches from different zombie source computers. 4. Conclusion Stepping-stone detection is very important in the field of network security. Zhang, Yang and Ye’s four methods represent first steps in using digital signal processing techniques to estimate sets of correlation coefficients in order to detect stepping stone intrusion based on different characteristics of packet flows. The techniques have limited usefulness at present but potentially could improve existing network intrusion detection methods based on tedious methods of analyzing network traffic logs and looking for suspicious activity. In the future, algorithms based on some of these methods could detect stepping stone intrusion in real time and set off alarms to network managers. In the meantime, however, the best we can do is to investigate network intrusion after the damage has already been done. S.C.Lee and C. Shields, “Tracing the Source of Network Attack: A Technical, Legal and Societal Problem”, Proceedings of the 2001 IEEE Workshop on Information Assurance and Security, US. Military Academy, June 5-6, 2001, pp. 239-246. 1 Yongzhong Zhang, Jianhua Yang and Chunming Ye, “Modeling and Detecting Stepping-Stone Intrusion”, IJCNS International Journal of Computer Science and Network Security, (Vol. 9, No. 7), July 2009. 2 3 Joseph Menn, Fatal System Error. New York: PublicAffairs Publishing Co, 2010. 4 Yongzhong Zhang, Jianhua Yang and Chunming Ye, op. cit. Stuart Staniford-Chen and L. Todd Herlein, “Holding Intruders Accountable on the Internet”, Proceedings of the IEEE Symposium on Security and Privacy, Oakland, CA, May 1995, pp. 39-49. 5 Yin Zhang and Vern Paxson, “Detecting Stepping Stones”, Proceedings of the Ninth USENIX Security Symposium, Denver CO, August 2000, pp. 67-81. 6 Shou-Husan Stephen Huang, Robert Lychev and Jianhua Yang, “Stepping-Stone Detection via RequestResponse Traffic Analysis”, 2007, pp. 276-285. 7 Jianhua Yang and Shou-Hsuan Stephen Huang, “Matching TCP Packets and Its Application to the Detection of Long Connection Chains on the Internet”, 19th International Conference Information Networking and Applications (AINA 2005) 8 Jianhua Yang, and Shou-Hsuan Stephen Huang , “Matching TCP/IP Packets to Detect Stepping-Stone Intrusion ”, IJCSNS International Journal of Computer Science and Network Security, Vol. 6 No.10, October 2006. 9 10 Yongzhong Zhang, Jianhua Yang and Chunming Ye, op. cit. Jianhua Yang, “Resist Intruder’s Manipulation via Context-based TCP/IP Packet Matching”, unpublished powerpoint presentation, Columbus State University, March 25, 2010. 11 12 Stephen Northcutt and Judy Novak, Network Intrusion Detection, third edition. Indianapolis: New Riders Publishing Co., 2003. 6