International Journal of Engineering Trends and Technology (IJETT) - Volume4Issue5- May 2013 Spam And The Techniques Used For Spam Filters: A Review Prachi Oswal1 and Prof. Anurag Jain 2 1 Department of Computer Science & Engineering, Radharaman Institute of Technology & Science, Bhopal, India 2 Radharaman Institute of Technology & Science, Bhopal, India ABSTRACT Today’s cut throat competition in business driving organization and companies to improvise and invent different ideas to promote their business and remain in the fray. Spam is one such message and mail technique that helps in promoting the events that prevails the information in to the public for their commercial benefit without knowing the pros and cons of it. These unsolicited emails now a day’s becomes a major problem in today’s Internet that causes damage financially to the company and annoying the users also. In this paper we give a survey over the Spam ant try to convey the approaches that have been brought before us to resolve these unwanted mails. KEYWORDS Spam, Spam Filter, Unsolicited Commercial e-mail 1. INTRODUCTION Electronic mails are the most reliable and usually fastest mode of communication as far as information sharing is concerned. E mails do have low transmission costs too. Electronic messaging is quite easy to automate commercially or so as per the requirement of the user. Due to these properties it is quite open for commercial advertising purposes and in recent years organizations are swiftly working and experiencing the development where electronic messaging is abused by flooding users mailboxes with unsolicited messages. One of the anomalies caused by these electronic messaging is spamming which is the act of sending the bulk messages and the word Spam has become the synonym for such messages. ISSN: 2231-5381 This word is originally derived from spiced ham (luncheon meat), which is a registered trademark of Hormel Foods Corporation [1]. Monty Python’s flying circus used the term spam in the so-called spam sketch as a synonym for frequent occurrence and someone adopted this for unsolicited mass mail, based on the origin of the word Spam all other email is called ham. Conventionally it is referred as unsolicited bulk mail (UBE) or unsolicited commercial e-mail (UCE). 2. LITERATURE SURVEY AND BACKGROUND Electronic mails are an integrated medium of information sharing on the web. This medium is extensively and hugely used by the commercial organizations to promote their product or service to create new customers in the market. This is so because the service is easy and nearly cost less as they are sending the messages in bulk. Consequently different portal 2.1 Spam Mails As per the discussion and explanation made by researchers who have defined the spam mails according to their researches like according to Vapnik et al. (1999) [2] spam mails are unwanted bulk mails more specifically: Basically it is the electronic version of junk mail that is delivered by the postal service. Similarly Oda and White (2003) [3] have definition like the electronic equivalent of junk e-mail which typically covers a range of unsolicited and undesired advertisements and bulk e-mail messages. According to Lazzari et al. (2005) [4] http://www.ijettjournal.org Page 1889 International Journal of Engineering Trends and Technology (IJETT) - Volume4Issue5- May 2013 Electronic messages posted blindly to thousands of recipients, and represent one of the most serious and urgent information overload problems. Zhao and Zhang (2005) [5] has explained Spam or junk mail, is an unauthorized intrusion into a virtual space - the E-mail box. Further Youn and McLeod (2007) [6] said that Spam as bulk email - e-mail that was not asked for which is send to multiple recipients. Wu and Deng (2008) [7] defined Spam e-mails, also known as ‘junk e-mails’, are unsolicited ones sent in bulk (unsolicited bulk E-mail) with hidden or forged identity of the sender, address, and Header information. In the same fashion Amayri and Bouguil (2009) [8] asserted about Spam e-mails that they can be recognized either by content or delivery manner and indicated that spam e-mails were recognized according to the volume of dissemination and permissible delivery. Another definition proposed by Spamhaus (2010) that an electronic message is "spam" if (A) the recipient's personal identity and context are irrelevant because the message is equally applicable to many other potential recipients; AND (B) the recipient has not verifiably granted deliberate, explicit, and still-revocable permission for it to be sent. When we talk about Spam filters it is a classifier which classifies email messages sent to user, as accurately as possible into Spam or ham (nonspam).in this proposal we are primarily concerned with the online personal spam filtering process shown in figure 1 [9]. As the figure shown the email arrives the Spam filter classifies them as spam that are put in the inbox, or Spam, which are quarantined (that is it is kept in the junk folder). It is supposed the user reads that inbox regularly; while the junk folder is not been checked frequently as it supposed that it will not contain legitimate emails. The user can note the misclassification errors by the filter Spam emails in the inbox and in the junk folder and report those learning based filter. Now the filter uses the feedback to update its internal model. Basically it is improving the future perception of the predictive performance. Now it is quite cumbersome that the user always reports the errors. SENDER COMPUTER OR SERVER CLIENT MAIL SERVER 1.2 Spam Filtering As we know that the spam is “unsolicited, unwanted email that was sent indiscriminately, directly or indirectly, by sender having no current relationship with the recipient” [6]. A huge amount of spam is being generated every day and waste significant Internet resources as well as users time. It has been projected that email traffic would reach 419 billion emails per day , out of which 83 percent are going to be spam, which translated into 347 billion spam emails each day rad, 2012. Spam attacks both the computer and its users. Spam email can contain viruses, key loggers, phishing attacks and more. These types of malware can compromise a user’s sensitive private data by capturing bank account information username and passwords. ISSN: 2231-5381 INTERN ET Fig 1: Spam Filter Process It is quite clear with some of the review references that the most important characteristics of any Spam filter is to efficiently and reliably prevent and block junk mails. There are certain criterion on the performance of spam filters is being evaluated by the research fraternity. To protect the unwanted spam is one of the criteria when there is a creation of multiple user account. Along with this the filter should able to protect the mails containing classified attacks such as worms, viruses etc., and phishing attacks as well. Apparently when http://www.ijettjournal.org Page 1890 International Journal of Engineering Trends and Technology (IJETT) - Volume4Issue5- May 2013 the mails are classified they should be blocked efficiently and effectively depending on their category such as community based or so. Along with the blocking and protection some rule or protocols for the users to change the settings of the spam filters according to the requirement is one more parameter to be considered. With all the parameters the spam filter should be conducive according to the email client service provider. Several methods have been proposed for anti spam methods or spam filters some of the effective approaches are proposed by Russel W. et al in [10] working over system log files representing them critical for troubleshooting complex modern computer system.Using various data mining techniques of filtering and clustering. Their research for cussed on using very easily accessible Bayesian spam filters for categorizing log entries, and they have effectively used it. Another approach suggested by Mithlesh et al. in [11] analyzing the malicious activities like UCE (unsolicited commercial e-mail or SPAM has been the imminent menace to the today’s internet world. They have comparatively analyzed the different spam filtering techniques and provided required gust to researchers. Moreover Ola Amayri and Nizar Bouguila in [12] proposed content based spam filtering using hybrid generative discriminative learning of both textual and visual features in their paper they proposed a framework based on building probabilistic support vector machines (SVMs) kernels from mixture of Language in distributions. Through empirical experiments they have demonstrated the effectiveness and the merits of the proposed learning framework. But on the same time they failed to efficiently filter the personal males. In [13] Cheng et. Al proposed a model that separated the original feature space in to several disjoint feature groups. Individual models on these groups of features are learned using logistic regression and their predictions are combined using naïve bayes principle to produce a robust final estimation. They have tried to show that their model is better both empirically and theoretically. Cheng et al again proposed certain theory in [14] their paper regarding the personalized emails for gray mails in their paper ISSN: 2231-5381 they have proposed the study of class of mails using a large real world email corpus and signature based campaign detection techniques. The analysis shows that an optimal filter will inevitably perform unsatisfactorily a gray mail, unless user preferences are taken in to consideration. To reduce this they have designed a light weight user model that is highly scalable and can be easily combined with a global spam filter, they have incorporated both partial and complete user feedback on message labels and catches up to 40 percent more spam from gray mail in the low false region. Further according to Gordon V. C. and Aleksander K.[15] there are certain spam filters evaluated with imprecise ground truth in there paper they explained about the trained and evaluated on accurately labeled datasets , online email spam filters they are better than the classifiers in similar kinds of applications as far as errors are concerned . 3.CONCLUSION In this paper we have briefly discussed the problem of Spam and try to give an overview of Spam characteristics and Spam filter features. There is no common definition of what Spam is, but several resources are on a consensus that the core feature of the spam messages are that they are unsolicited means they are unwanted junk mails or bulk mails. Spam mails cause a many problems both the economical and ethical nature. The vital characteristic of Spam filter that supposed to keep in mind is the reaction of spammers other way round active intelligent opposition to every useful anti Spam technique. One approach that we can think over could be the learning base solution of filtering. The Spam filter should be designed keeping the discussed major criterion like security, reliability, blocking, rules, and compatibility. Thus it is a never ending research as far as internet is concerned spammers always try to find different techniques to mail in bulk to earn commercial profit and due to that damages will take place. Now to stop this Spam filters need to be improved as per the requirement and nature of the spams. http://www.ijettjournal.org Page 1891 International Journal of Engineering Trends and Technology (IJETT) - Volume4Issue5- May 2013 REFERENCE [1]. Hormel Food corporation http://www.hormel.com [2] Vapnik VN, Druck H, Wu D, “Support Vector Machines for Spam Categorization. IEEE Transactions on Neural Networks,” , pp1048- 1054. [3] Oda T, White T., “Increasing the Accuracy of a Spam Defecting Aritical immune System”, IEEE, pp. 390-396. [4] Lazzari L, Mari M, Poggi A (2005). A collaborative and multi agent approach to email filtering. IEEE/WIC/ACM International Conference on Intelligent Agent Technology (IAT’05), pp. 238-241. [5] Zhao W, Zhang Z (2005).An E-mail Classification Model Based on Rough SetTheory. IEEE, pp. 403-408. [6] Youn S, McLeod D (2007). Efficient Spam E-mail Filtering using Adaptive Ontology. IEEE International Conference on Information Technology (ITNG’07), pp. 249-254. [7] Wu J, Deng T (2008).Research in AntiSpam Method Based on Bayesian Filtering. IEEE, Pacific-Asia Workshop on Computational Intelligence and Industrial Application, pp. 887 – 891. [8]Amayri O, Bouguil N (2009).Online Spam Filtering Using Support Vector Machines.IEEE. pp. 337- 340 ISSN: 2231-5381 [9] Goodman, J., Cormack, G. V., and Heckerman, D. (2007). Spam and the ongoing battle for the inbox. Commun. ACM, 50920; 24-33. [10] W Russel Havens, Barry Lunt, ChiaChi, “Naive Bayesian filters for log file analysis”, IEEE, 2012. [11] Mithilesh K. P., and Shanthi B P., and Aghila G., “Spam Filtering: Comparative Analysis of filtering techniques”, IEEE international conference on advances in engineering, science and management (ICAESM-2012) March 30,31,2012. [12] Ola A., and Nizar B., “Content-based spam filtering using hybrid generative discriminative learning of both textual and visual features”, IEEE 2012. [13]Ming-Wei C., Wen-tau Y and Christoper M., “Partition Logistic Regression for Spam Filtering”, KDD’08,August 24-27, 2008, Las Vegas, Nevada, USA ACM 2008. [14] Ming-Wei C., Wen-tau Y., ‘Personalized Spam Filtering for Gray Mail”, 2008. [15] Gordon V. Cormack and Aleksander Kolcz., ‘Spam Filte Evaluation with Imprecise Ground Truth’, SIGIR’09, July 19-23, 2009 Boston Massachusetts USA ACM 2009. http://www.ijettjournal.org Page 1892