Content Based Message Filtering on OSN User Wall Using

advertisement
Content Based Message Filtering on OSN User Wall
Using Rule Based System
Sruthi.T
Greeshma .T .R
M-tech Computer Science and Engineering,
KMCT College of Engineering Calicut, Calicut University,
sruthisivasankaran@gmail.com
Assistant Professor, Dept. of CSE,
KMCT CE Calicut, Calicut University
greeshmaramkumar@gmail.com
Abstract— The Online Social Networks (OSNs) are the hottest trend of the last few years to meet people and share
information with them.OSN enables its users to keep in touch with friends by exchanging several type of content including
text, audio and video data. There is important issue in today’s OSNs is to give users the ability to control the messages
posted on their own private space to avoid that unwanted content is displayed. Up to now OSN s provide only very little
support to this requirement. To fill the gap, we propose a system that allows OSN users to directly control
messages posted on their wall. This is achieved through a flexible rule-based system, that allows users to customize the
filtering policy to be applied to their walls, and Machine Learning based soft classifier automatically labeling messages in
support of content-based filtering.
Keywords--Online social networks, information filtering, short text classification
1.
INTRODUCTION
Today Online Social Networks (OSNs) plays a vital role in our day to day life. It provides best
entertainment for the younger generation and act as an interactive medium to share, communicate, and distribute
a significant amount of human life information. OSN enable its users to keep in touch with friends by
exchanging several type of content including text, audio and video data. Therefore there is a possibility in
Online Social Networks (OSNs) of posting unwanted content on particular public/private areas, called in general
walls, so here we can use an information filtering approach to give users the ability to automatically control the
messages written on their own walls, by filtering out unwanted messages [1]. Today’s OSN do not support any
content based preferences to avoid these unwanted content display on user wall.For example, Facebook
permits users to decide who is allowed to insert messages in their private space (i.e., friends, defined
groups of friends or friends of friends).Though, there is no content-based preferences are preserved and
therefore it is not possible to prevent undesired communications, for instance vulgar or offensive ones,
no matter of the user who posts them.
The aim of our work is therefore to propose and implement an automated system, called Filtered Wall
(FW).Filtered Wall is an OSN filter which, enable its users to filter unwanted messages from their private space
called wall. The key idea of our system is the support for content based user preferences. Here we are exploiting
Machine Learning (ML) text categorization procedure able to automatically assign with each message a set of
categories based on its content and the output of the system produced by content based message filtering section
based the output of text categorization section.
The remainder sections of this paper are organized as follows: Section 2 Literature survey, whereas Section
3 analyzing the problem, section 4 introduces the architecture of the proposed system, Section 5 describes the
methodologies used, Section 6 gives application of the system ,section 7 experimental result, section 8 our
contribution, section 9 future scope ,and Section 10 concludes the paper.
2.
LITERATURE SURVEY
Our aim is to design an online message filtering system that can filter unwanted messages from OSN
user wall. The major part the system is an information filter which discards the unwanted information. There are
a lot of applications use the concept of information filtering. In last few years Recommender systems [1] have
become very popular and which is a type of information filtering system that predicts the preference of user.it
give importance to user interest and recommends an item based on it. Recommender system can works mainly
in two ways.
1


Collaborative filtering
Content based filtering
A. Collaborative filtering
The system uses collaborative filtering is mainly based on user’s preferences, actions and predicts what
users will like based on his similarities to other users. The item rating is determined by user likes and dislikes de
[2]. A collaborative filtering system involves collaboration of multiple agents while filtering information and it
requires large dataset. Cold Start, Sparsity, First Rater and Popularity Bias are problems related to collaborative
filtering.
B. Content based filtering
Content based filtering system focus on user interest and select items based on it. It uses items and it suggests
the best matched item based on previously chosen item. In content based system each user act as independently.
In content based filtering there are no problems like Cold Start, Sparsity and First rater problem. There are some
disadvantages also present in it such as, it requires contents that can be encoded as meaningful features and user
taste must be represented as learnable function of the content features.
Text classification is an important part of content based filtering also called text categorization. Content
based filtering works well on the Machine learning based text classifiers. In a Machine learning approach, it
learns from training data and creates classifiers for the classification of new dataset. The main task of text
classification is to assign each text into predefined category of text. The classification algorithms such as
Support Vector Machines (SVM), Naive Bayes, Neural network, and Decision trees can be used for text
classification.
The Support vector machines [3] are based on the Structural Risk Minimization principle. SVM analyze
and recognize patterns in the input text and it’s able to perform linear as well as nonlinear classification. SVM
classifier generally suitable for small amount of labeled and large amount of unlabeled data. The SVM Classifier
is well suited for text classification because of it having a high dimensional input space, irrelevant features,
sparse document vectors and linearly separable text classification .The Naive Bayes classifier is a
probabilistic classifier and it is based on Bayes theorem .Naive Bayes classifier use Bayes rule for probability
calculation.Probabilistic models, especially the which ones based on the Naive Bayes theory, are the state of the
art in text classification and in almost any automatic text classification task.[4].The Neural network classifiers
consist of neurons arranged in layers converting an input vector into output. The most commonly used
neural network is multilayer feed forward network in which a unit feeds its output to all the units of the
next layer but there is no feedback to the previous layer. Radial basis function network is an artificial neural
network which uses radial basis function as an activation function. The output of this network is a linear
combination of radial basis functions of the inputs and neuron parameters [4]. Decision trees [4] classifiers are
classifiers that used for a hierarchical decomposition of the data space. It determines the predicate or a condition
depending on attribute value. Class labels in the leaf node are used for classification.
TCS application consists of the TCS run-time system plus a rulebase. The rulebase defines what categories
the application can assign to texts and contains rules that make the categorization decisions for particular texts.
The data-driven nature of TCS allows it to satisfy fully the requirements of ease of application development,
portability to other applications, and maintainability. The development of a new application or the porting of an
existing one to a different domain requires rule development, not programming. Rule development is itself
facilitated by the TCS Workbench. The Workbench provides interactive and non- interactive facilities for
editing a TCS rulebase and running a TCS application against example input texts.
2
3.
Analysis of Problem
A. Problem Definition
Social networks are the hottest online trend of the last few years. In OSNs, information filtering can also be
used for a different, more sensitive, purpose. This is due to the fact that in OSNs there is the
possibility of posting or commenting other posts on particular public/private areas, called in general walls.
In the proposed system Information filtering can therefore be used to give users the ability to
automatically control the messages written on their own walls, by filtering out unwanted messages. The
aim of the present work is therefore to propose and experimentally evaluate an automated system, called
Filtered Wall (FW), able to filter unwanted messages from OSN user walls. We exploit Machine Learning
(ML) text categorization techniques to automatically assign with each short text message a set of
categories based on its content
B.
Existing System
Today OSNs provide very little support to prevent unwanted messages on user walls. For example, Face book
allows users to state who is allowed to insert messages in their walls (i.e., friends, friends of friends, or defined
groups of friends). However, no content based preferences are supported and therefore it is not possible to
prevent undesired messages, such as political or vulgar ones, no matter who posts them.


C. Limitation of Existing System
No content-based preferences are supported and therefore it is not possible to prevent undesired
messages. No matter user who propose them.
Use of Information filtering concept is limited. It use only simple concept of information filtering such
as basic filtering and strict filtering which, only consider message creator relationship with the receiver.
4.
PROPOSED SYSTEM
Here we are introducing our system called Filtered Wall (FW).It is a mini Online Social Network (OSN) filter,
able to filter out unwanted messages from social network user walls. In this section we discuss about conceptual
architecture, work flow and advantages of the system.
A. Conceptual Architecture
In general, the architecture in support of OSN services is a three-tier in nature (Figure. 1).These three layers are



Social Network Manager (SNM)
Social Network Application (SNA)
Graphical User Interface (GUI)
Figure 1: Filtered Wall Conceptual architecture
3
1. Social Network Manager (SNM)
The first layer is Social Network Manager layer provides the essential OSN functionalities such as,
profile and relationship administration. It also maintains all the data related to the user profile. After
maintaining and administrating all users data will provide for second layer for applying Filtering Rules
(FRs).
2. Social Network Application (SNA)
In second layer composed of Content Based Message Filtering (CMBF) and Short Text Classifier. This
is very important layer for the message categorization according to its CBMF filters. SNA also Black list
is maintained for the user who sends frequently unwanted message in message.
3. Graphical User Interface (GUI)
Third layer provides Graphical User Interface to the user who wants to post his messages as a input. In this layer
Filtering Rules (FR) are used to filter the unwanted messages and provide Black list (BL) for the user who
are temporally prevented to publish messages on user’s wall.
B. Filtered Wall Work Flow
Now we can consider the flow of messages follow, from creator to receiver when it when it go through
Filtered Wall.
Input message
Black list checking
Short text classification
Text representation
Filtering rules
Filter/Pass
1.
A User entering the private wall of one of his/her contacts and then tries to post text message which is
intercepted by FW.
4
2.
5.
Check whether the message creator is belongs in the BL or not, if he/she is in BL immediately block
the message without considering the contents of the message else go to next step.
The message is passed to STC which, extract content of the message.
FW uses metadata provided by the STC together with data extracted from the social graph and users’
profiles, to enforce the filtering criteria.
FW publish or filter the message based on previous step.
C.
Advantages of Proposed System
3.
4.


The system to automatically filter unwanted messages from OSN user walls on the basis of
both message content and the message creator relationship and characteristics.
Blacklist help the users easily prevent from messages of unwanted person for him without
considering content of message.
5. METHODOLOGY
Here we introduce the methodologies we are used include Black list checking, short text classification and
content based message filtering.
A. Black List Checking
Here we are introducing a special approach called Black List is list of user who are temporarily preventing from
posting messages to wall of other user. In simple words we say black list as temporary restriction of friends in
Online Social Networks (OSNs). The users whose messages are prevented independent from their contents and
who belonging in black list called BL users. A Filtered wall user can add and remove another user contact in
his/her to BL based on the result of filter.
Black list checking is first phase of the system when message is posted on OSN user wall, the message is
intercepted by FW .FW check whether he message creator belongs in BL or not .if he/she belongs in BL
immediately block the message without considering content else passed to classifier module.
B. Short Text Classification
The short classification module [7] is composed of two main phases: Text representation and Machine Learningbased classification
a.
Text representation
Text representation is a critical task in FW because of it affects classification process. There are many features
are for used in representation of text, but here we judge three types of features .we consider the two types of
features, Bag of Words (BOW) and Document properties (DP), are considered to be endogenous. Here we
introduce another feature called Contextual Features (CFS) that are exogenous in nature and also characterizes
the environment where the user is posting.
1.
2.
3.
4.
5.
6.
7.
8.
Total words: It states the amount of terms. Total terms in the message words will be
calculated.
Special characters: Collection of special characters will be determined.
Stop words: Collection of stop words will be determined.
Violence words: Percentage of violence words over the total number of words will be
calculated.
Vulgar words: Percentage of vulgar words over the total number of words will be calculated.
Sexual words: Percentage of sexual words over the total number of words will be calculated.
Offensive words: Percentage of offensive words over the total number of words will be
calculated.
Hate words: Percentage of hate words over the total number of words will be calculated.
5
b.
Machine Learning Based classification
In this section we can use any one the machine learning based text classification procedure for short text
classification. Here we suggest a Multilabel classifier based on Bayesian network as a suitable method.
1.
2.
3.
4.
5.
C.
Figure: Short text classification
Counting number of words
A simple word counter algorithm is implemented should find out the number of words
(short texts) in the message.
Stop Word Removal Process
Here we reduce the content size but improve the quality of classification process when all
the stop words present in the document are removed.
Removal of Special Characters
Next step we go to the process removing Special characters such as ?,!,”etc. Which again
reduce the size of message that is it reduces number of short texts. This step can improve
the quality of STC.
Removal of Repeated Words:
After the removal of Stop words, Special characters we perform the removal of repeated or
duplicate words this also increase the efficiency of Short Text Classifier. In this step we
keep frequency of occurrence of removed text for future use in computation probability of
occurrence.
Multilabel classification
Here we are introducing a Multilabel soft classifier based on concept of Bayes theorem .It
can perform automatic multi labeling of messages .It can be easily implemented and it is
very fast in nature.
CONTENT BASED MESSAGE FILTERING
Content based message filtering (CBMF) consist of Filtering Rules (FRs) are rules by using this user can
customize their wall, Online Setup Assistant for determining unwanted content of a user and Filter Trust for
computing trust value of OSN user.
a.
Filtering Rules
Filtering Rules (FRs) are rules by which users can state what contents should not be displayed on their
wall. FR is dependent on following factors
6




Author
CreatorSpec
Content Spec
Action
A filtering rule FR is a tuple represented as FR= (Author, CreatorSpec, ContentSpec, Action)
Definition 1(Author)
Author is a person who defines the rules. For example Bob.
Definition 2(Creator Specification)
Creator Specification is simply defined as set of OSN user. It is denote as CreatorSpec. Creator Spec can have
one of the following forms, or it may be possibly combined:
1.
2.
an OP av is a set of attribute constraints . an is a user profile attribute name, av is
a profile attribute value and OP is a comparison operator respectively,
compatible with an domain.
(m; rt; minTrust; maxTrust), A set of relationship constraints. m denoting the
OSN user who specify the rule within a relationship of type rt, having a depth
greater than or equal to minTrust, and a trust value less than or equal to
maxTrust.
Example 1: CS1 = {Sex= Female, Age<18} Denotes all the Females whose age is less than 18 years,
Example 2: CS2= {Alice; friends of; .1; .4} denotes all the users who are friends of Alice and whose trust level
is less than or equal to 0.4. Finally, the creator specification
Example 3: CS3= {(Alice; friends of; 2; 0.4; Sex =Female} selects only the Female users from those identified
by CS2
Definition 3 (Content Specification)
It is denoted as ContentSpec and represented using expression (c, ml) Where C is a class in which content
belongs and ml is percentage of content in the corresponding class.
Definition 4 ( Action)
Action € { block , notify} denotes the what action to be performed by the system when
users identified by CreatorSpec and the system on the messages matching ContentSpec.
b. Online Setup Assistant
OSA presents the user with a set of short texts belongs in different category selected from the data set. For
each short text, the user tells the system the decision to accept or reject it. Based on the response of user to OSA
FW determine whether a content should be pass or filter on OSN user wall.
c. Filter Trust
In Online Social Networks (OSNs) trust is an important property here; we considered the trust as the
assurance and confidence that information, people, behaves in expected way. In OSN generally trust may be
Machine to machine, machine to human or human to human. At a deeper level trust is very much important in
case of security and privacy in OSNs.
7
There are many algorithms are available for trust computation and which is used by different sites according
to their trust value requirement [8]. Here we are introducing a new algorithm for computing trust between users
within an OSN based on the result from Filtered Wall called Filter Trust. The basic concept of trust computation
is that initially there exists definite trust between two users which is based on the relationship type (rt) in which
the user are related it may be friends, family, colleagues etc. The trust value changes according to the filter
result. The filter trust value is used for filtering purpose.


6. APPLICATION
Our application is useful for all people who don’t want to write any unwanted messages like
vulgar, political, sexual messages on his\her own wall by any other person.
Mainly the system focus on famous personalities, usually this type of activities are happen with them,
So if this facility will provide with OSN sites then people can protect his wall from this type of
malpractices.
7.
EXPERIMENTAL RESULTS
The experimental data came from the collection of message from online social networks, which is
composed of different kinds of categories. In the proposed system, the message filtering is performed by
content based message filtering using filtering rules.

We present a mini OSN site with all basic functionalities of OSNs.
Figure 7.1: GUI for posting
This figure shows graphical interface to post messages to another user who is in his/her contacts.
Here a user select a user for posting some messages and here is user who having a relationship with the user can
post on his/ her wall. From a user wall he/she can select another user to post something on the user’s private
space.
We can consider there are two cases of posting text message. Messages that we are posting may be Good
messages or Bad messages according to message receiver concern. Good messages are messages that do not
contain any unwanted content .Bad messages are messages containing unwanted content from any of the
category that user do not need.
8

In this system, using Filtering Rules we can make Filtered Wall for preventing unwanted messages.
Figure 7.2 Posting of message without unwanted content

Initially, we focus on Violence, Vulgar, Sexual, Offensive, Hate type of messages and filter these
messages
Figure 7.3 Posting of message with unwanted content
9

User can see current filter performance as Graph contains which category of messages are filtered (in
percentage), who is message creator
.
Figure 4: Current filter performance
Here showing current performance of user A’s filter based on messages from user B.
This shows a graph for current performance of the filter. In which x axis shows category in which short texts
belongs and y axis short percentage.
8.
OUR CONTRIBUTION
We create a mini OSN with all basic functionalities:
o
o
o
o
o
o
o
o
Register
Login
Search
Send request
View request
Respond to request
Post
View post
Our additional contribution is that we used a Filter Trust algorithm. Due to this technique we can easily filter the
message from user without considering content of messages posted. By means of these we can increase the
speed of our system.
9.
FUTURE SCOPE
Here we are proposing image filtering as future scope this project. In our system we can filter only text
messages. So Image filtering will be tried in future system. The proposal is mainly based on the idea of content
based image retrieval on the basis of this we propose the idea of content based image filtering (CBIF).
10. CONCLUSION
In this paper, a system to prevent the display of unwanted messages posted on OSN user wall has
been presented. The new system is called Filtered Wall (FW) that enable OSN users to directly control the
messages posted on their private space by means of a flexible rule based system .In addition to this an OSN
user can make the system more flexible by means of filtering rules, blacklist management. The Usage of
Machine Learning has given efficient results to the system to find out the different categories of content present
in the message such as vulgar, hate, sex, violence, offensive. And the system allows the users to customize
10
their private space through the application of filtering criteria and a Machine Learning (ML) based soft
classifier automatically labeling messages in support of content-based filtering
REFERENCES
[1] Marco Vanetti, Elisabetta Binaghi, Elena Ferrari, Barbara Carminati, a Moreno Carullo, "A System to Filter
Unwanted Messages from OSN User Walls", 2013.
[2] A. Adomavicius and G.Tuzhilin, "Toward the Next Generation of Recommender Systems: A Survey of the
State-of-the -Art and Possible Extensions, “IEEE Trans. Knowledge and Data Eng., vol.17, no.6,pp.734-749,
June 2005
[3] T. Joachims, “Text categorization with support vector machines: Learning with many relevant features,” in
Proceedings of the European Conference on Machine Learning. Springer, 1998, pp. 137–142.
[4] F.Sebastiani, "Machine Learning Automated Text Categorization", ACM Computing surveys, vol.34, no.1,
pp.1-47, 2002
[5] M.Vanetti, E.Binaghi, B.Carminati, M.Carullo, and E.Ferrari, "Content-Based Filtering in On-Line Social
Networks", 2010
[6] B.Sriram, D.Fuhry, E.Demir, H.ferhatatosmanoglu, and M.Demirbas, "Short Text Classification in Twitter to
Improve Information Filtering," Proc.33rd Int'l ACM SIGIT Conf. Research and Development in Information
Retrieval(sIGIR '10), pp.841-842,2010.
[7] V.Bobicev and M.Sokolova, "An Effective and Robust Method for Short Text Classification," Proc.23rd
Nat'l Conf. Artificial Intelligence (AAAI), D.Fox and C.P.Gomes, eds., pp.1444-1445, 2008.
[8] J. A. Golbeck, “Computing and applying trust in web-based social networks,” Ph.D. dissertation, PhD thesis,
Graduate School of the University of Maryland, College Park, 2005.
11
Download