Spam And The Techniques Used For Spam Filters: A Review Prachi Oswal

advertisement
International Journal of Engineering Trends and Technology (IJETT) - Volume4Issue5- May 2013
Spam And The Techniques Used For Spam Filters: A
Review
Prachi Oswal1 and Prof. Anurag Jain 2
1
Department of Computer Science & Engineering, Radharaman Institute of Technology &
Science, Bhopal, India
2
Radharaman Institute of Technology & Science, Bhopal, India
ABSTRACT
Today’s cut throat competition in business driving
organization and companies to improvise and invent
different ideas to promote their business and remain
in the fray. Spam is one such message and mail
technique that helps in promoting the events that
prevails the information in to the public for their
commercial benefit without knowing the pros and
cons of it. These unsolicited emails now a day’s
becomes a major problem in today’s Internet that
causes damage financially to the company and
annoying the users also. In this paper we give a
survey over the Spam ant try to convey the
approaches that have been brought before us to
resolve these unwanted mails.
KEYWORDS
Spam, Spam Filter, Unsolicited Commercial e-mail
1. INTRODUCTION
Electronic mails are the most reliable and
usually fastest mode of communication as far as
information sharing is concerned. E mails do
have low transmission costs too. Electronic
messaging is quite easy to automate
commercially or so as per the requirement of the
user. Due to these properties it is quite open for
commercial advertising purposes and in recent
years organizations are swiftly working and
experiencing the development where electronic
messaging is abused by flooding users
mailboxes with unsolicited messages.
One of the anomalies caused by these electronic
messaging is spamming which is the act of
sending the bulk messages and the word Spam
has become the synonym for such messages.
ISSN: 2231-5381
This word is originally derived from spiced ham
(luncheon meat), which is a registered trademark
of Hormel Foods Corporation [1]. Monty
Python’s flying circus used the term spam in the
so-called spam sketch as a synonym for frequent
occurrence and someone adopted this for
unsolicited mass mail, based on the origin of the
word Spam all other email is called ham.
Conventionally it is referred as unsolicited bulk
mail (UBE) or unsolicited commercial e-mail
(UCE).
2. LITERATURE SURVEY AND
BACKGROUND
Electronic mails are an integrated medium of
information sharing on the web. This medium is
extensively and hugely used by the commercial
organizations to promote their product or service
to create new customers in the market. This is so
because the service is easy and nearly cost less
as they are sending the messages in bulk.
Consequently different portal
2.1 Spam Mails
As per the discussion and explanation made by
researchers who have defined the spam mails
according to their researches like according to
Vapnik et al. (1999) [2] spam mails are
unwanted bulk mails more specifically:
Basically it is the electronic version of junk mail
that is delivered by the postal service. Similarly
Oda and White (2003) [3] have definition like
the electronic equivalent of junk e-mail which
typically covers a range of unsolicited and
undesired advertisements and bulk e-mail
messages. According to Lazzari et al. (2005) [4]
http://www.ijettjournal.org
Page 1889
International Journal of Engineering Trends and Technology (IJETT) - Volume4Issue5- May 2013
Electronic messages posted blindly to thousands
of recipients, and represent one of the most
serious and urgent information overload
problems.
Zhao and Zhang (2005) [5] has explained Spam
or junk mail, is an unauthorized intrusion into a
virtual space - the E-mail box. Further Youn and
McLeod (2007) [6] said that Spam as bulk email - e-mail that was not asked for which is
send to multiple recipients. Wu and Deng (2008)
[7] defined Spam e-mails, also known as ‘junk
e-mails’, are unsolicited ones sent in bulk
(unsolicited bulk E-mail) with hidden or forged
identity of the sender, address, and Header
information. In the same fashion Amayri and
Bouguil (2009) [8] asserted about Spam e-mails
that they can be recognized either by content or
delivery manner and indicated that spam e-mails
were recognized according to the volume of
dissemination and permissible delivery.
Another definition proposed by Spamhaus
(2010) that an electronic message is "spam" if
(A) the recipient's personal identity and context
are irrelevant because the message is equally
applicable to many other potential recipients;
AND (B) the recipient has not verifiably granted
deliberate,
explicit,
and
still-revocable
permission for it to be sent.
When we talk about Spam filters it is a classifier
which classifies email messages sent to user, as
accurately as possible into Spam or ham (nonspam).in
this proposal we are primarily
concerned with the online personal spam
filtering process shown in figure 1 [9]. As the
figure shown the email arrives the Spam filter
classifies them as spam that are put in the inbox,
or Spam, which are quarantined (that is it is kept
in the junk folder). It is supposed the user reads
that inbox regularly; while the junk folder is not
been checked frequently as it supposed that it
will not contain legitimate emails. The user can
note the misclassification errors by the filter
Spam emails in the inbox and in the junk folder
and report those learning based filter. Now the
filter uses the feedback to update its internal
model. Basically it is improving the future
perception of the predictive performance. Now it
is quite cumbersome that the user always reports
the errors.
SENDER
COMPUTER
OR SERVER
CLIENT MAIL
SERVER
1.2 Spam Filtering
As we know that the spam is “unsolicited,
unwanted email that was sent indiscriminately,
directly or indirectly, by sender having no
current relationship with the recipient” [6]. A
huge amount of spam is being generated every
day and waste significant Internet resources as
well as users time. It has been projected that
email traffic would reach 419 billion emails per
day , out of which 83 percent are going to be
spam, which translated into 347 billion spam
emails each day rad, 2012. Spam attacks both
the computer and its users. Spam email can
contain viruses, key loggers, phishing attacks
and more. These types of malware can
compromise a user’s sensitive private data by
capturing bank account information username
and passwords.
ISSN: 2231-5381
INTERN
ET
Fig 1: Spam Filter Process
It is quite clear with some of the review
references that the most important characteristics
of any Spam filter is to efficiently and reliably
prevent and block junk mails.
There are certain criterion on the performance of
spam filters is being evaluated by the research
fraternity. To protect the unwanted spam is one
of the criteria when there is a creation of
multiple user account. Along with this the filter
should able to protect the mails containing
classified attacks such as worms, viruses etc.,
and phishing attacks as well. Apparently when
http://www.ijettjournal.org
Page 1890
International Journal of Engineering Trends and Technology (IJETT) - Volume4Issue5- May 2013
the mails are classified they should be blocked
efficiently and effectively depending on their
category such as community based or so. Along
with the blocking and protection some rule or
protocols for the users to change the settings of
the spam filters according to the requirement is
one more parameter to be considered. With all
the parameters the spam filter should be
conducive according to the email client service
provider.
Several methods have been proposed for anti
spam methods or spam filters some of the
effective approaches are proposed by Russel W.
et al in [10] working over system log files
representing them critical for troubleshooting
complex modern computer system.Using various
data mining techniques of filtering and
clustering. Their research for cussed on using
very easily accessible Bayesian spam filters for
categorizing log entries, and they have
effectively used it. Another approach suggested
by Mithlesh et al. in [11] analyzing the
malicious activities like UCE (unsolicited
commercial e-mail or SPAM has been the
imminent menace to the today’s internet world.
They have comparatively analyzed the different
spam filtering techniques and provided required
gust to researchers. Moreover Ola Amayri and
Nizar Bouguila in [12] proposed content based
spam filtering using hybrid generative
discriminative learning of both textual and
visual features in their paper they proposed a
framework based on building probabilistic
support vector machines (SVMs) kernels from
mixture of Language in distributions. Through
empirical experiments they have demonstrated
the effectiveness and the merits of the proposed
learning framework. But on the same time they
failed to efficiently filter the personal males.
In [13] Cheng et. Al proposed a model that
separated the original feature space in to several
disjoint feature groups. Individual models on
these groups of features are learned using
logistic regression and their predictions are
combined using naïve bayes principle to produce
a robust final estimation. They have tried to
show that their model is better both empirically
and theoretically. Cheng et al again proposed
certain theory in [14] their paper regarding the
personalized emails for gray mails in their paper
ISSN: 2231-5381
they have proposed the study of class of mails
using a large real world email corpus and
signature based campaign detection techniques.
The analysis shows that an optimal filter will
inevitably perform unsatisfactorily a gray mail,
unless user preferences are taken in to
consideration. To reduce this they have designed
a light weight user model that is highly scalable
and can be easily combined with a global spam
filter, they have incorporated both partial and
complete user feedback on message labels and
catches up to 40 percent more spam from gray
mail in the low false region.
Further according to Gordon V. C. and
Aleksander K.[15] there are certain spam filters
evaluated with imprecise ground truth in there
paper they explained about the trained and
evaluated on accurately labeled datasets , online
email spam filters they are better than the
classifiers in similar kinds of applications as far
as errors are concerned .
3.CONCLUSION
In this paper we have briefly discussed the
problem of Spam and try to give an overview of
Spam characteristics and Spam filter features.
There is no common definition of what Spam is,
but several resources are on a consensus that the
core feature of the spam messages are that they
are unsolicited means they are unwanted junk
mails or bulk mails. Spam mails cause a many
problems both the economical and ethical
nature. The vital characteristic of Spam filter
that supposed to keep in mind is the reaction of
spammers other way round active intelligent
opposition to every useful anti Spam technique.
One approach that we can think over could be
the learning base solution of filtering. The Spam
filter should be designed keeping the discussed
major criterion like security, reliability,
blocking, rules, and compatibility.
Thus it is a never ending research as far as
internet is concerned spammers always try to
find different techniques to mail in bulk to earn
commercial profit and due to that damages will
take place. Now to stop this Spam filters need to
be improved as per the requirement and nature
of the spams.
http://www.ijettjournal.org
Page 1891
International Journal of Engineering Trends and Technology (IJETT) - Volume4Issue5- May 2013
REFERENCE
[1].
Hormel
Food
corporation
http://www.hormel.com
[2] Vapnik VN, Druck H, Wu D, “Support
Vector Machines for Spam Categorization.
IEEE Transactions on Neural Networks,” ,
pp1048- 1054.
[3] Oda T, White T., “Increasing the
Accuracy of a Spam Defecting Aritical
immune System”, IEEE, pp. 390-396.
[4] Lazzari L, Mari M, Poggi A (2005). A
collaborative and multi agent approach to email
filtering.
IEEE/WIC/ACM
International Conference on Intelligent
Agent Technology (IAT’05), pp. 238-241.
[5] Zhao W, Zhang Z (2005).An E-mail
Classification Model Based on Rough
SetTheory. IEEE, pp. 403-408.
[6] Youn S, McLeod D (2007). Efficient
Spam E-mail Filtering using Adaptive
Ontology. IEEE International Conference on
Information Technology (ITNG’07), pp.
249-254.
[7] Wu J, Deng T (2008).Research in AntiSpam Method Based on Bayesian Filtering.
IEEE,
Pacific-Asia
Workshop
on
Computational Intelligence and Industrial
Application, pp. 887 – 891.
[8]Amayri O, Bouguil N (2009).Online
Spam Filtering Using Support Vector
Machines.IEEE. pp. 337- 340
ISSN: 2231-5381
[9] Goodman, J., Cormack, G. V., and
Heckerman, D. (2007). Spam and the
ongoing battle for the inbox. Commun.
ACM, 50920; 24-33.
[10] W Russel Havens, Barry Lunt, ChiaChi, “Naive Bayesian filters for log file
analysis”, IEEE, 2012.
[11] Mithilesh K. P., and Shanthi B P., and
Aghila G., “Spam Filtering: Comparative
Analysis of filtering techniques”, IEEE
international conference on advances in
engineering, science and management
(ICAESM-2012) March 30,31,2012.
[12] Ola A., and Nizar B., “Content-based
spam filtering using hybrid generative
discriminative learning of both textual and
visual features”, IEEE 2012.
[13]Ming-Wei C., Wen-tau Y and
Christoper
M.,
“Partition
Logistic
Regression
for
Spam
Filtering”,
KDD’08,August 24-27, 2008, Las Vegas,
Nevada, USA ACM 2008.
[14]
Ming-Wei
C.,
Wen-tau
Y.,
‘Personalized Spam Filtering for Gray
Mail”, 2008.
[15] Gordon V. Cormack and Aleksander
Kolcz., ‘Spam Filte Evaluation with
Imprecise Ground Truth’, SIGIR’09, July
19-23, 2009 Boston Massachusetts USA
ACM 2009.
http://www.ijettjournal.org
Page 1892
Download