International Journal of Engineering Trends and Technology (IJETT) – Volume 16 Number 1 – Oct 2014 An Efficient Supervised Learning approach over Firewall Log Data SanthaKumariAllam1,A Ramakrishna2 1,2 Final M.Tech Student1, AssistantProfessor 2 Dept of CSE,Vignan institute of engineering for women,Visakhapatnam Abstract:Analyzing the firewall log data is always an interesting research area in the field of network traffic analysis. In this paper we are proposing an efficient Classification(Naïve Bayesian Classification) technique for analyze incoming node , based on the training dataset of firewall log data .We compute the posterior probability of the node by forwarding testing sample towards training samples of firewall log. I.INTRODUCTION Firewall protection is local security policy in local networks. It will secure from unknown access of another nodes in the local network. It secures from anonymous user communication in local networks. It is very secure in small size networks and more effective in managing networks. In hosting intra networks firewall technology the development of securing our networks and maintaining the firewall policies limit the efficiency of firewall. In firewall development the similar data packet mat similar more than one filtering rule. Individual firewalls in inter firewall protection in the same path perform various actions on the similar traffic. The system administration must have particular attention to maintain the similar firewall perform different filtering actions[1][2]. The efficiency of firewall security dependent on providing the management methods and the network administrators used to analyze and verify the correctness of filtering rules. In this firewall protection it uses IP Address and default gateway to block computers to communicate with another nodes. The data packet is blocked by particular rule when the data is matched with other network fields of the filtering rule[3]. Network layer firewalls is also referred as packet filters and it operates at low level of the TCP/IP protocols. It not allows thedata packets to send through the firewall until they are similar to already declared group of rules. The firewall manager defines the rules and regulations or default rules may apply. The token packet filter is come from the context of BSD operating systems. ISSN: 2231-5381 In network layer the firewalls basically classified into two categories such as stateful and stateless. Coming to Stateful firewalls it maintainsa context about the active sessions and it uses the state information to increase the efficiency of packet processing. In any previously existing network connection can be explained by many properties consisting of source and destination IP address and the present situation of communication’s life span. If any packet does not similar toprevious connection and it will be analyzingbased on group of rules for another novel connections. If any packet matching to previous connection based on comparative analysiswith the firewall's state table and will allowsending without further processing[4][5]. The Stateless firewalls need less storage and it is very faster for filters that need thelow time to filter than maintain a session. They also required for filteringthesession based network protocols and that have no topic of a session. Theydon’t make much complex predictions based on the stage communications between hosts are reached. II. RELATED WORK A firewall is implemented in network and is operated by trusted or untrusted locations. It permits traffic from the trusted location to untrusted locations and it does not need any external configuration. The main purpose of firewall is maintaining the untrusted or non-requested users from accessing our computers[6]. A firewall may be a software or hardware and it is normally placed at the network with the range and safe guard the incoming and outgoing connections. Its main mechanism is limits the traffic. A device or any applications are using one or many connections it protects very deeply. The data packets are limited based on the some features such as source address, destination address, protocol (TCP/IP etc.), source port and the destination port. It prepares some set of rules and regulations to filter the data. It consists of the rules which deny or accept the http://www.ijettjournal.org Page 1 International Journal of Engineering Trends and Technology (IJETT) – Volume 16 Number 1 – Oct 2014 network connections based on the details of the nodes which is going to connect[7][8]. There is a solution for filtering the data packets based on the cryptographic techniques which is explained below. This method consists of a secret key, encryption, and decryption. 1. Key generation method: It is a dynamically changing algorithm that gives keySk 2. Encryption: It is a method that inputs key Sk and formal text message M results cipher text 3. Decryption: It is a method thatinputs a key Sk and converts and results a plaintext m. 4. This method should the below property: For all m ∈ M and Sk∈ K, Pr [Decrypt (Encrypt (m)) = m] = 1. Encryption is a method that the informationwhich could either be a file or mail message into ciphertextin aunformatted without a decoding key in order to avoid anyone except the intended recipient from reading thatdata. Decryption is the reverse process of converting encodeddata to its original un-encoded form, plaintext. A key incryptography is a long sequence of bits used by encryption /decryption algorithms. Secret Key Cryptography (SKC): Usesa single key for both encryption and decryption. A firewallconfiguration is specified as a sequence of rules. Each rule ina firewall configuration is of the form <Predicate>-><decision> The <predicate of a rule is a Boolean expression oversome packet fields together with the [physical networkinterface on which a packet arrives. The <decision> of a rulecan be accept, or discard, or a combination of these decisionswith other options such as a logging option. A packet matchesa rule if a firewall configuration overlaps if there is at least onepacket that can match both rules[9][10]. III. PROPOSED SYSTEM We are proposing an efficient firewall data classification over log data or training dataset which ISSN: 2231-5381 consists of source ip address or name, Destination ip address and port number, type of protocol and number of packets transmitted from source to destination. When a node connects if retrieves the Meta data i.e testing dataset and forwards to the training dataset .both training and testing datasets CAN be forwarded to Bayesian classifier for analyzing the behavior of the connected node. We proposed a novel and efficient trust computation mechanism with naive Bayesian classifier by analyzing the new agent information with existing agent information, by classifying the feature sets or characteristics of the agent. This approach shows optimal results than the traditional trust computation approaches. In our approach we proposes an efficient classification based approach for analyzing the anonymous users over network traffic and calculates the trust measures based on the training data with the anonymous testing data. Our architecture contributes with the following modules like Analysis agent, Neighborhood node, Classifier and data collection and preprocess as follows 1) Analysis agent –Analysis agent or Home Agent is present in the system and it monitors its own system continuously. If an attacker sends any packet to gather information or broadcast through this system, it calls the classifier construction to find out the attacks. If an attack has been made, it will filter the respective system from the global networks. 2) Neighbouring node - Any system in the network transfer any information to some other system, it broadcast through intermediate system. Before it transfer the message, it send mobile agent to the neighbouring node and gather all the information and it return back to the system and it calls classifier rule to find out the attacks. If there is no suspicious activity, then it will forward the message to neighbouring node. 3) Data collection - Data collection module is included for each anomaly detection subsystem to collect the values of features for corresponding layer in an system. Normal profile is created using the data collected during the normal scenario. Attack data is collected during the attack scenario. 4) Data pre-process - The audit data is collected in a file and it is smoothed so that it can be used for anomaly detection. Data pre-process is a technique to process the information with the test train data. In the entire layer anomaly detection systems, the above mentioned preprocessing technique is used http://www.ijettjournal.org Page 2 International Journal of Engineering Trends and Technology (IJETT) – Volume 16 Number 1 – Oct 2014 2. Get (source IP, dest IP, port no, protocol, no. of packets Node 1 with firewall Log Node 2 with firewall Log 1. Connect 3. Training & Testing Sample 5. Status (o / 1) Classification 4. Computes posterior probability Node 3 with firewall Log Node 4 with firewall Log Fig1: Proposed Architecture For the classification process we are using Bayesian classifier for analyzing the neighbor node testing data with the training information.Bayesian classifier is defined by a set C of classes and a set A of attributes. A generic class belonging to C is denoted by cjand a generic attribute belonging to A as Ai. Consider a database D with a set of attribute values and the class label of the case. The training of the Naïve Bayesian Classifier consists of the estimation of the conditional probability distribution of each attribute, given the class. P (H/Xi) is our confidence that Xi is an incoming node In our example we will consider a synthetic dataset which consists of various anonymous and non anonymous users node names, type of protocols and number of packets transmitted and class labels, that is considered as our feature set C (c1,cc,……cn) for training of system and calculates overall probability for positive class and negative class and then calculate the posterior probability with respect to all features ,finally calculate the trust probability. P(H), P(Xi) and P(Xi/H) may be estimated from given training and testing data samples Algorithm to classify malicious agent: P(H) is Prior Probability of H and it is probability that any given training sample is an agent regardless of its anomaly or not anomaly behavior P(H/X) is a conditional probability and P(H) is independent of X Estimating probabilities: P(H|Xi)=P(Xi|H)*P(H)/P(Xi) Steps Involved: 1. Each training data sample is of attribute type X= (xj) j =1(1….n), where xj is the values of X for attribute Aj Sample space: set of agent 2. Suppose there are m decision classes Cj, j=1(1…m). H= Hypothesis that X is a node P(Ci|X) > P(Cj|X) for 1<= j <= m, j>i ISSN: 2231-5381 http://www.ijettjournal.org Page 3 International Journal of Engineering Trends and Technology (IJETT) – Volume 16 Number 1 – Oct 2014 i.e. classifier assigns X to decision class Cj having highest posterior probability conditioned on testing sample X The decision class for which P(Cj|X) is maximum is known as maximum posterior hypothesis of the sample. From Bayes Theorem 3. P(Xi) is constant and Only need be maximized. if class initial probabilities not known prior then we can assume all decision classes to be more equally likely decision classes Otherwise maximize the samples P(Ci) = Si/S 4. Naïve assumption for attribute independence P(X|Cj) = P(x1,…..,xm|C) = PP(xk|C) 5. To classify an unknown testing sample Xi, compute each decision class Ciand Sample X is assigned to the class iff ( Prob(Xi|Ci)P(Ci)> P(Xi|Cj) P(Cj) ). In the above classification algorithm , computes the posterior probabilities of the input samples with respect to the data records in the training dataset over all positive and negative probabilities, analyzes the network traffic with positive and negative probabilities IV. CONCLUSION We are concluding our research work with efficient classification approach by analyzing the anonymous behaviors of the log data packet analysis with their respective posterior probabilities of the individual attribute And final class labels to compute final probabilities of the connected node. Experimentally proved that Preprocessed firewall log analysis gives optimal performance than traditional approaches. REFERENCES 1) Internet assigned numbers authority (IANA), http://www.iana.org/assignments/port-number (last accessed October, 2009) 2) A. Madhukar, C. Williamson, A longitudinal study of p2p traffic classification, in: MASCOTS ’06: Proceedings of the14th IEEE International Symposium on Modeling, ISSN: 2231-5381 Analysis,and Simulation, IEEE Computer Society, Washington, DC,USA, 2006, pp. 179– 188.doi:http://dx.doi.org/10.1109/MASCOTS.2006.6. 3) J. Klensin, SIMPLE MAIL TRANSFER PROTOCOL, IETF RFC 821, April 2001; http://www.ietf.org/rfc/rfc2821.txt [4] Bro intrusion detection system - Bro overview, http://broids. org, as of August 14, 2007. [5] V. Paxson, “Bro: A system for detecting network intruders in real-time,” Computer Networks, no. 31(23-24), pp. 2435–2463, 1999. [6] Azzouna, Nadia Ben and Guillemin, Fabrice, Analysis of ADSL Trafficon an IP Backbone Link, IEEE IEEEGlobalTelecommunicationsConference 2003, San Francisco, USA,December 2003. [7] Cho, Kenjiro, Fukuda, Kenshue, Esaki, Hiroshi and Kato, Akira, The Impact and Implications of the Growth inResidential User-to-User Traffic, ACM SIGCOMM 2006, Pisa, Italy, September 2006. [8] Balachandran, Anand; Voelker, Geoffrey M.; Bahl, Paramvir and Ragan,P. Venkat, Characterizing user behavior andnetwork performance in a public wireless LAN, Proceedings of the 2002 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, pp. 195-205, 2002. [9] Internet assigned numbers authority (IANA), http://www.iana.org/assignments/port-number (last accessedOctober, 2009) BIOGRAPHIES Mr A Ramakrishna, a well-known by his excellence in teaching who received Mtech(CSE) from JNTU Kakinada. He is working assistant professor, Dept of CSE at Vignan Institute of Engineer for Women. He has motivated no.of students in his seven years of teaching. His area of interest includes Network Security, Wireless Sensor Networks and expert in Web Technologies and guided many other projects. SanthaKumari, received B-tech (IT)degree from Vignan Institute of Engineering for Women under JNTUK in the year 2012 and pursuing Mtech(CSE) in Vignan Institute of Engineering for Women .My interested area includes Data Mining, Image Processing, Network Security. http://www.ijettjournal.org Page 4