CLUSTERING APPROACH TO PREDICT THE CONGESTION LEVEL IN A WIRELESS NETWORK Pravalika D U.G Student, Department of ECE Sridevi Women’s Engg. College Hyderabad, Telangana pravalikadonti@gmail.com Eesha L U.G Student, Department of ECE Sridevi Women’s Engg. College Hyderabad, Telangana eeshalella7@gmail.com Mounika M U.G Student, Department of ECE Sridevi Women’s Engg. College Hyderabad, Telangana madagonimounika14@gmail.com Dr.V. Balaji Professor, Department of ECE Sridevi Women’s Engg. College Hyderabad, Telangana balaji.phd.auc2008@gmail.com Abstract— Congestion in a network is due to the lower bandwidth in the wireless part as compared to the wired one. Extensive planning has to be made on the wireless network side as it is challenging to predict the number of nodes which are connected to the network over a period of time. In this paper we study the link between the Re Transmission Timer with respect to network congestion in a network. We are proposing a cluster based methodology to determine the status of the of access point with respect to the channel which will help us to plan better network. We use the traces collected from wireless monitoring at the 62nd Internet Engineering Task Force (IETF) meeting held in Minneapolis, MN, March, 2005 network is typically well provisioned to handle the network load. Therefore, there arises a compelling need to understand the performance of the wireless portion of heavily utilized and congested wireless networks[2]. The evaluation of wireless network requires the generation of workloads to test the capability and performance of the new protocol or technique being studied. Lack of realistic traffic is a major limitation that has forced researchers to generate synthetic data for their simulations and experiments to evaluate performance of wireless technologies. Keywords— Congestion, TCP flags, Wireless monitoring, Clustering, Access point. The rest of the paper is organized as follows. In Section II we present motivation and related work. Our experimental and analytical methodology is presented in Section III, Finally in Section IV, we conclude with a discussion of how this research may be applied to solve the current issues in the wireless networks. II. MOTIVATION I. INTRODUCTION Wireless network is a type of network which communicates using interconnections through the nodes without the use of any wires [1]. Its technology is implemented with the help of electromagnetic waves and its implementation occurs at physical layer. One of the important aspects of the network is in expensive in nature. A wireless network allows devices to stay connected to the network but roam untethered to any wires. Access points amplify Wi-Fi signals, so that the device can be far from a router but still be connected to the network Wireless networks also use the Open System Interconnect (OSI) reference model in the transmission of data. The manner in which this reference model applies to wireless networks is similar to wired networks with some differences in the data link layer where wireless networks coordinate access by data to a common air medium and also deal with errors which occur due to the inherent nature of the wireless medium. At the Physical layer, the data is transmitted in the form of radio waves. Congestion control is a network layer issue, and is thus concerned that what happens when there is more data in the network that can be sent with reasonable packets delay, no lost packets .The effect of multiple losses in one RTT in TCP transmission of a typical proactive scheme and a typical reactive scheme in the wireless environment. Wireless network with a high density of nodes and within a single collision domain has a high probability of congestion, decreasing the performance significantly. In general congestion include drastic drops in network throughput, unacceptable packet delays and session disruptions. Typically the back-haul wire line portion of a wireless XXX-X-XXXX-XXXX-X/XX/$XX.00 ©20XX IEEE The TCP is widely used in wired network and the result of congestion is any packet loss and hence the congestion window size is reduced. Fading, shadowing and hand off are some of the losses that occur infrequently in wireless links. M. Balazinska et al [4], D. Kotz et al[5], T. Henderson et al[6] A. Jardosh et al[7] and V. Balaji et.al [8] have carried out various studies and research in wireless network deployments. A wide range of wireless network behavior is analyzed in these studies, which provide insights into the actions of deployed networks in different scenario. Majority of the studies have been conducted within the wireless LAN of university campuses. Their research includes extensive amounts of raw wireless traffic data which has been collected for subsequent analysis and research. A proposal to create a model for congestion prediction is made using such raw data. Real time data is chosen because it is not only limited to simulations but can also be used for experimental deployments which is essential for wireless protocol evaluation. TCP congestion control algorithms plays a critical role in improving the performance of TCP and by preventing congestion collapse we can regulate the amount of network traffic on the internet. However, it is a challenging task to predict whether a complex network has a normal behavior or not and analyze network dynamics. One of the most important elements of TCP sender state that can help us study the features of TCP per-connection states in the Internet is congestion window. Wireless local area networks (WLANs) have become very popular, but the complex behavior of wireless signal propagation creates significant challenges. We present an efficient opportunistic retransmission protocol that improves network performance in dynamic infrastructure WLANs. The idea is to exploit overhearing nodes to retransmit (or relay) on behalf of the source after they learn about a failed transmission. Opportunistic retransmission leverages the fact that wireless networks innately use broadcast transmission and that errors are mostly location dependent. Thus, if the intended recipient does not receive the packet, other nodes may have received the packet and thus become candidate re transmitters for that packet. With multiple wireless devices distributed in space, the chance that at least one available device can transmit the packet increases. Candidate relays participate if they have a higher chance of delivering the packet successfully than the source, thus results in an increased throughput. The area of RTO estimation is an area of TCP that has not received the same level of analysis as other TCP flow control mechanisms. Estimating a suitable value for the RTO is very important. Too small a value may result in needless sender time out regardless of the ACK being in transit from receiver to sender. TOO large and RTO value could result in significantly reduced overall excellent flow. Recent results from a particular large scale internet traffic study by Balakrishnan et.al[BP98] have shown that approximately 50% of all packets losses required a time out to recover. another recent study found that over 85% of all time outs are due to non trigger of the fast retransmit mechanism while there are some proposals that look to reduce TCP’S reliance on time out expiry, these will take time to be discusses, agreed-upon an perhaps ultimately deployed on a wide enough scale in the mean time, there is clear need for renewed research into the TCPRTO mechanisms. III. GOAL OF OUR WORK In this paper we investigate raw wireless trace files and the development of a clustering model, the analysis of lost packet segment has been done for the networks which are connected using Wi-Fi by creating a cluster model among the nodes connected together. Majority of the TCP segments transmit data while others are simple acknowledgements for previously received data. Using the 3-way handshake, it completes the connection before data is transferred. The purpose of each segment in TCP is resolute with the help of the TCP flag options. It is a control bit that indicates different connection states and information about how a packet should be handled. This facilitates both transmitter and receiver to specify the flags to be used so that data handling is correct. Acknowledgment flag is used for successful delivery of packets. Loss segment packet is the foundation of our work. For implementing our methodology data from a wireless network deployed at Internet Engineering Task Force (IETF) on March 2005 was used. The IETF network consisted of 38Airespace2 1250 access points (APs) distributed over three floors. Each Airspace Access Point supported up to four virtual APs. A virtual AP is a logical AP that exists within a physical device and enables the wireless LAN to be segmented into multiple broadcast domains. Thus, at the IETF, a total of 112APs (38 physical APs x 4 ESSIDs per physical AP) were available for utilization [8]. In this work we study the data over a 60 minute interval and consider only the TCP and UDP packets as compared to the evaluation methods used by[8]. Figure 1 shows the TCP and UDP traffic flow captured during 60 minutes. Figure 2 shows the duplicate acknowledgement received due to packet loss. The Multi Step cluster method we propose is a scalable cluster analysis algorithm designed to hold very large data sets. In the first pass the pre clustered data is converted to smaller clusters. In the second pass the smaller clusters are broken down to still smaller clusters. The precluster step uses a sequential clustering approach. It scans the data and records one by one and decides if the current documentation should be merged with the previously formed clusters or starts a new cluster based on the distance criterion mentioned herein. The procedure is implemented by constructing a modified cluster feature (CF) tree. The CF tree consists of levels of nodes, and each node contains a number of entries. A leaf entry (an entry in the leaf node) represents a final sub-cluster. The non-leaf nodes and their entries are used to guide a new record quickly into a correct leaf node. If the CF tree grows beyond allowed maximum size, the CF tree is rebuilt based on the existing CF tree by increasing the threshold distance criterion. The rebuilt CF tree is smaller and hence has space for new input records. A set of observation are assigned into smaller groups called clusters. This process is known as Clustering whereby the observations in the same cluster are related in some sense. Being an unsubstantiated learning method, for statistical data analysis and in machine learning, data mining, pattern recognition, image analysis and bioinformatics a clustering method is often used. The log based distance measure is the most popular method for measuring the distance between clusters. The distance between two clusters is related to the decrease in log possibility as they are combined into one cluster. The distance between clusters j and s is defined in the section below. REFERENCES [1]"Overview of Wireless Communications". cambridge.org. http://www.cambridge.org/us/catalogue/catalogue.asp?isbn= 0521837162&ss=exc. Retrieved 2008-02-08. [2] Amit P Jardosh, Krishna N Ramachandran, Kevin C Almeroth, Elizabeth M Belding-Royer “Understanding Congestion in IEEE802.11b Wireless Networks” University of California. [3] Stefan Karpinski, Elizabeth M. Belding, Kevin C. Almeroth “Towards Realistic Models of Wireless Workload”, University of California. [4] M. Balazinska and P. Castro, “Characterizing mobility and Network usage in a corporate wireless local-area network,” in ACM MobiSys, San Francisco, CA, USA, May 2003, pp. 303–316. [5] D. Kotz and K. Essien, “Analysis of a campus-wide wireless network,” in ACM MobiCom, September 2002. [6] T. Henderson, D. Kotz, and I. Abyzov, “The changing usage of a mature campus-wide wireless network,” in ACM MobiCom, September 2004. [7] A. Jardosh, K. Ramachandran, K. Almeroth, and E. Belding-Royer, “Understanding link-layer behavior in highly congested IEEE 802.11b wireless networks,” in ACM Sigcomm EWIND, Philadelphia, PA,USA, August 2005. The clustering result obtained for the nodes connected using wireless access point is shown in the figure 3. It is observed that there was 33% of abnormal loss occurred when the packets are exchanged between the nodes in a wireless connection. [8] V.Balaji ,V.Duraisamy “cluster based packet loss prediction using TCP ACK packets in wireless network” [8] http://hubpages.com/hub/congestion-control [9] http://en.wikipedia.org/wiki/TCP_congestion_avoidance_al gorithm 105 [10] Balakrishan H, Padmanabhan V, Seshan S,Stemm M and Katz M,”TCP Behaviour of a Busy Internet Server. Analysis and Improvements”, Proceedings 0f INFOCOMM 98,San Francisco, march 1998 IV CONCLUSION We proposed a multi-step clustering algorithm for clustering out the time periods where an abnormally high packet loss has been discovered due to either congestion or other effects common in wireless networks. The clustering algorithm proposed was able to cluster the data with very good efficiency. Further studies need to be done for a longer time frame.