Content Based Message Filtering on OSN User Wall Using Rule Based System Sruthi.T Greeshma .T .R M-tech Computer Science and Engineering, KMCT College of Engineering Calicut, Calicut University, sruthisivasankaran@gmail.com Assistant Professor, Dept. of CSE, KMCT CE Calicut, Calicut University greeshmaramkumar@gmail.com Abstract— The Online Social Networks (OSNs) are the hottest trend of the last few years to meet people and share information with them.OSN enables its users to keep in touch with friends by exchanging several type of content including text, audio and video data. There is important issue in today’s OSNs is to give users the ability to control the messages posted on their own private space to avoid that unwanted content is displayed. Up to now OSN s provide only very little support to this requirement. To fill the gap, we propose a system that allows OSN users to directly control messages posted on their wall. This is achieved through a flexible rule-based system, that allows users to customize the filtering policy to be applied to their walls, and Machine Learning based soft classifier automatically labeling messages in support of content-based filtering. Keywords--Online social networks, information filtering, short text classification 1. INTRODUCTION Today Online Social Networks (OSNs) plays a vital role in our day to day life. It provides best entertainment for the younger generation and act as an interactive medium to share, communicate, and distribute a significant amount of human life information. OSN enable its users to keep in touch with friends by exchanging several type of content including text, audio and video data. Therefore there is a possibility in Online Social Networks (OSNs) of posting unwanted content on particular public/private areas, called in general walls, so here we can use an information filtering approach to give users the ability to automatically control the messages written on their own walls, by filtering out unwanted messages [1]. Today’s OSN do not support any content based preferences to avoid these unwanted content display on user wall.For example, Facebook permits users to decide who is allowed to insert messages in their private space (i.e., friends, defined groups of friends or friends of friends).Though, there is no content-based preferences are preserved and therefore it is not possible to prevent undesired communications, for instance vulgar or offensive ones, no matter of the user who posts them. The aim of our work is therefore to propose and implement an automated system, called Filtered Wall (FW).Filtered Wall is an OSN filter which, enable its users to filter unwanted messages from their private space called wall. The key idea of our system is the support for content based user preferences. Here we are exploiting Machine Learning (ML) text categorization procedure able to automatically assign with each message a set of categories based on its content and the output of the system produced by content based message filtering section based the output of text categorization section. The remainder sections of this paper are organized as follows: Section 2 Literature survey, whereas Section 3 analyzing the problem, section 4 introduces the architecture of the proposed system, Section 5 describes the methodologies used, Section 6 gives application of the system ,section 7 experimental result, section 8 our contribution, section 9 future scope ,and Section 10 concludes the paper. 2. LITERATURE SURVEY Our aim is to design an online message filtering system that can filter unwanted messages from OSN user wall. The major part the system is an information filter which discards the unwanted information. There are a lot of applications use the concept of information filtering. In last few years Recommender systems [1] have become very popular and which is a type of information filtering system that predicts the preference of user.it give importance to user interest and recommends an item based on it. Recommender system can works mainly in two ways. 1 Collaborative filtering Content based filtering A. Collaborative filtering The system uses collaborative filtering is mainly based on user’s preferences, actions and predicts what users will like based on his similarities to other users. The item rating is determined by user likes and dislikes de [2]. A collaborative filtering system involves collaboration of multiple agents while filtering information and it requires large dataset. Cold Start, Sparsity, First Rater and Popularity Bias are problems related to collaborative filtering. B. Content based filtering Content based filtering system focus on user interest and select items based on it. It uses items and it suggests the best matched item based on previously chosen item. In content based system each user act as independently. In content based filtering there are no problems like Cold Start, Sparsity and First rater problem. There are some disadvantages also present in it such as, it requires contents that can be encoded as meaningful features and user taste must be represented as learnable function of the content features. Text classification is an important part of content based filtering also called text categorization. Content based filtering works well on the Machine learning based text classifiers. In a Machine learning approach, it learns from training data and creates classifiers for the classification of new dataset. The main task of text classification is to assign each text into predefined category of text. The classification algorithms such as Support Vector Machines (SVM), Naive Bayes, Neural network, and Decision trees can be used for text classification. The Support vector machines [3] are based on the Structural Risk Minimization principle. SVM analyze and recognize patterns in the input text and it’s able to perform linear as well as nonlinear classification. SVM classifier generally suitable for small amount of labeled and large amount of unlabeled data. The SVM Classifier is well suited for text classification because of it having a high dimensional input space, irrelevant features, sparse document vectors and linearly separable text classification .The Naive Bayes classifier is a probabilistic classifier and it is based on Bayes theorem .Naive Bayes classifier use Bayes rule for probability calculation.Probabilistic models, especially the which ones based on the Naive Bayes theory, are the state of the art in text classification and in almost any automatic text classification task.[4].The Neural network classifiers consist of neurons arranged in layers converting an input vector into output. The most commonly used neural network is multilayer feed forward network in which a unit feeds its output to all the units of the next layer but there is no feedback to the previous layer. Radial basis function network is an artificial neural network which uses radial basis function as an activation function. The output of this network is a linear combination of radial basis functions of the inputs and neuron parameters [4]. Decision trees [4] classifiers are classifiers that used for a hierarchical decomposition of the data space. It determines the predicate or a condition depending on attribute value. Class labels in the leaf node are used for classification. TCS application consists of the TCS run-time system plus a rulebase. The rulebase defines what categories the application can assign to texts and contains rules that make the categorization decisions for particular texts. The data-driven nature of TCS allows it to satisfy fully the requirements of ease of application development, portability to other applications, and maintainability. The development of a new application or the porting of an existing one to a different domain requires rule development, not programming. Rule development is itself facilitated by the TCS Workbench. The Workbench provides interactive and non- interactive facilities for editing a TCS rulebase and running a TCS application against example input texts. 2 3. Analysis of Problem A. Problem Definition Social networks are the hottest online trend of the last few years. In OSNs, information filtering can also be used for a different, more sensitive, purpose. This is due to the fact that in OSNs there is the possibility of posting or commenting other posts on particular public/private areas, called in general walls. In the proposed system Information filtering can therefore be used to give users the ability to automatically control the messages written on their own walls, by filtering out unwanted messages. The aim of the present work is therefore to propose and experimentally evaluate an automated system, called Filtered Wall (FW), able to filter unwanted messages from OSN user walls. We exploit Machine Learning (ML) text categorization techniques to automatically assign with each short text message a set of categories based on its content B. Existing System Today OSNs provide very little support to prevent unwanted messages on user walls. For example, Face book allows users to state who is allowed to insert messages in their walls (i.e., friends, friends of friends, or defined groups of friends). However, no content based preferences are supported and therefore it is not possible to prevent undesired messages, such as political or vulgar ones, no matter who posts them. C. Limitation of Existing System No content-based preferences are supported and therefore it is not possible to prevent undesired messages. No matter user who propose them. Use of Information filtering concept is limited. It use only simple concept of information filtering such as basic filtering and strict filtering which, only consider message creator relationship with the receiver. 4. PROPOSED SYSTEM Here we are introducing our system called Filtered Wall (FW).It is a mini Online Social Network (OSN) filter, able to filter out unwanted messages from social network user walls. In this section we discuss about conceptual architecture, work flow and advantages of the system. A. Conceptual Architecture In general, the architecture in support of OSN services is a three-tier in nature (Figure. 1).These three layers are Social Network Manager (SNM) Social Network Application (SNA) Graphical User Interface (GUI) Figure 1: Filtered Wall Conceptual architecture 3 1. Social Network Manager (SNM) The first layer is Social Network Manager layer provides the essential OSN functionalities such as, profile and relationship administration. It also maintains all the data related to the user profile. After maintaining and administrating all users data will provide for second layer for applying Filtering Rules (FRs). 2. Social Network Application (SNA) In second layer composed of Content Based Message Filtering (CMBF) and Short Text Classifier. This is very important layer for the message categorization according to its CBMF filters. SNA also Black list is maintained for the user who sends frequently unwanted message in message. 3. Graphical User Interface (GUI) Third layer provides Graphical User Interface to the user who wants to post his messages as a input. In this layer Filtering Rules (FR) are used to filter the unwanted messages and provide Black list (BL) for the user who are temporally prevented to publish messages on user’s wall. B. Filtered Wall Work Flow Now we can consider the flow of messages follow, from creator to receiver when it when it go through Filtered Wall. Input message Black list checking Short text classification Text representation Filtering rules Filter/Pass 1. A User entering the private wall of one of his/her contacts and then tries to post text message which is intercepted by FW. 4 2. 5. Check whether the message creator is belongs in the BL or not, if he/she is in BL immediately block the message without considering the contents of the message else go to next step. The message is passed to STC which, extract content of the message. FW uses metadata provided by the STC together with data extracted from the social graph and users’ profiles, to enforce the filtering criteria. FW publish or filter the message based on previous step. C. Advantages of Proposed System 3. 4. The system to automatically filter unwanted messages from OSN user walls on the basis of both message content and the message creator relationship and characteristics. Blacklist help the users easily prevent from messages of unwanted person for him without considering content of message. 5. METHODOLOGY Here we introduce the methodologies we are used include Black list checking, short text classification and content based message filtering. A. Black List Checking Here we are introducing a special approach called Black List is list of user who are temporarily preventing from posting messages to wall of other user. In simple words we say black list as temporary restriction of friends in Online Social Networks (OSNs). The users whose messages are prevented independent from their contents and who belonging in black list called BL users. A Filtered wall user can add and remove another user contact in his/her to BL based on the result of filter. Black list checking is first phase of the system when message is posted on OSN user wall, the message is intercepted by FW .FW check whether he message creator belongs in BL or not .if he/she belongs in BL immediately block the message without considering content else passed to classifier module. B. Short Text Classification The short classification module [7] is composed of two main phases: Text representation and Machine Learningbased classification a. Text representation Text representation is a critical task in FW because of it affects classification process. There are many features are for used in representation of text, but here we judge three types of features .we consider the two types of features, Bag of Words (BOW) and Document properties (DP), are considered to be endogenous. Here we introduce another feature called Contextual Features (CFS) that are exogenous in nature and also characterizes the environment where the user is posting. 1. 2. 3. 4. 5. 6. 7. 8. Total words: It states the amount of terms. Total terms in the message words will be calculated. Special characters: Collection of special characters will be determined. Stop words: Collection of stop words will be determined. Violence words: Percentage of violence words over the total number of words will be calculated. Vulgar words: Percentage of vulgar words over the total number of words will be calculated. Sexual words: Percentage of sexual words over the total number of words will be calculated. Offensive words: Percentage of offensive words over the total number of words will be calculated. Hate words: Percentage of hate words over the total number of words will be calculated. 5 b. Machine Learning Based classification In this section we can use any one the machine learning based text classification procedure for short text classification. Here we suggest a Multilabel classifier based on Bayesian network as a suitable method. 1. 2. 3. 4. 5. C. Figure: Short text classification Counting number of words A simple word counter algorithm is implemented should find out the number of words (short texts) in the message. Stop Word Removal Process Here we reduce the content size but improve the quality of classification process when all the stop words present in the document are removed. Removal of Special Characters Next step we go to the process removing Special characters such as ?,!,”etc. Which again reduce the size of message that is it reduces number of short texts. This step can improve the quality of STC. Removal of Repeated Words: After the removal of Stop words, Special characters we perform the removal of repeated or duplicate words this also increase the efficiency of Short Text Classifier. In this step we keep frequency of occurrence of removed text for future use in computation probability of occurrence. Multilabel classification Here we are introducing a Multilabel soft classifier based on concept of Bayes theorem .It can perform automatic multi labeling of messages .It can be easily implemented and it is very fast in nature. CONTENT BASED MESSAGE FILTERING Content based message filtering (CBMF) consist of Filtering Rules (FRs) are rules by using this user can customize their wall, Online Setup Assistant for determining unwanted content of a user and Filter Trust for computing trust value of OSN user. a. Filtering Rules Filtering Rules (FRs) are rules by which users can state what contents should not be displayed on their wall. FR is dependent on following factors 6 Author CreatorSpec Content Spec Action A filtering rule FR is a tuple represented as FR= (Author, CreatorSpec, ContentSpec, Action) Definition 1(Author) Author is a person who defines the rules. For example Bob. Definition 2(Creator Specification) Creator Specification is simply defined as set of OSN user. It is denote as CreatorSpec. Creator Spec can have one of the following forms, or it may be possibly combined: 1. 2. an OP av is a set of attribute constraints . an is a user profile attribute name, av is a profile attribute value and OP is a comparison operator respectively, compatible with an domain. (m; rt; minTrust; maxTrust), A set of relationship constraints. m denoting the OSN user who specify the rule within a relationship of type rt, having a depth greater than or equal to minTrust, and a trust value less than or equal to maxTrust. Example 1: CS1 = {Sex= Female, Age<18} Denotes all the Females whose age is less than 18 years, Example 2: CS2= {Alice; friends of; .1; .4} denotes all the users who are friends of Alice and whose trust level is less than or equal to 0.4. Finally, the creator specification Example 3: CS3= {(Alice; friends of; 2; 0.4; Sex =Female} selects only the Female users from those identified by CS2 Definition 3 (Content Specification) It is denoted as ContentSpec and represented using expression (c, ml) Where C is a class in which content belongs and ml is percentage of content in the corresponding class. Definition 4 ( Action) Action € { block , notify} denotes the what action to be performed by the system when users identified by CreatorSpec and the system on the messages matching ContentSpec. b. Online Setup Assistant OSA presents the user with a set of short texts belongs in different category selected from the data set. For each short text, the user tells the system the decision to accept or reject it. Based on the response of user to OSA FW determine whether a content should be pass or filter on OSN user wall. c. Filter Trust In Online Social Networks (OSNs) trust is an important property here; we considered the trust as the assurance and confidence that information, people, behaves in expected way. In OSN generally trust may be Machine to machine, machine to human or human to human. At a deeper level trust is very much important in case of security and privacy in OSNs. 7 There are many algorithms are available for trust computation and which is used by different sites according to their trust value requirement [8]. Here we are introducing a new algorithm for computing trust between users within an OSN based on the result from Filtered Wall called Filter Trust. The basic concept of trust computation is that initially there exists definite trust between two users which is based on the relationship type (rt) in which the user are related it may be friends, family, colleagues etc. The trust value changes according to the filter result. The filter trust value is used for filtering purpose. 6. APPLICATION Our application is useful for all people who don’t want to write any unwanted messages like vulgar, political, sexual messages on his\her own wall by any other person. Mainly the system focus on famous personalities, usually this type of activities are happen with them, So if this facility will provide with OSN sites then people can protect his wall from this type of malpractices. 7. EXPERIMENTAL RESULTS The experimental data came from the collection of message from online social networks, which is composed of different kinds of categories. In the proposed system, the message filtering is performed by content based message filtering using filtering rules. We present a mini OSN site with all basic functionalities of OSNs. Figure 7.1: GUI for posting This figure shows graphical interface to post messages to another user who is in his/her contacts. Here a user select a user for posting some messages and here is user who having a relationship with the user can post on his/ her wall. From a user wall he/she can select another user to post something on the user’s private space. We can consider there are two cases of posting text message. Messages that we are posting may be Good messages or Bad messages according to message receiver concern. Good messages are messages that do not contain any unwanted content .Bad messages are messages containing unwanted content from any of the category that user do not need. 8 In this system, using Filtering Rules we can make Filtered Wall for preventing unwanted messages. Figure 7.2 Posting of message without unwanted content Initially, we focus on Violence, Vulgar, Sexual, Offensive, Hate type of messages and filter these messages Figure 7.3 Posting of message with unwanted content 9 User can see current filter performance as Graph contains which category of messages are filtered (in percentage), who is message creator . Figure 4: Current filter performance Here showing current performance of user A’s filter based on messages from user B. This shows a graph for current performance of the filter. In which x axis shows category in which short texts belongs and y axis short percentage. 8. OUR CONTRIBUTION We create a mini OSN with all basic functionalities: o o o o o o o o Register Login Search Send request View request Respond to request Post View post Our additional contribution is that we used a Filter Trust algorithm. Due to this technique we can easily filter the message from user without considering content of messages posted. By means of these we can increase the speed of our system. 9. FUTURE SCOPE Here we are proposing image filtering as future scope this project. In our system we can filter only text messages. So Image filtering will be tried in future system. The proposal is mainly based on the idea of content based image retrieval on the basis of this we propose the idea of content based image filtering (CBIF). 10. CONCLUSION In this paper, a system to prevent the display of unwanted messages posted on OSN user wall has been presented. The new system is called Filtered Wall (FW) that enable OSN users to directly control the messages posted on their private space by means of a flexible rule based system .In addition to this an OSN user can make the system more flexible by means of filtering rules, blacklist management. The Usage of Machine Learning has given efficient results to the system to find out the different categories of content present in the message such as vulgar, hate, sex, violence, offensive. And the system allows the users to customize 10 their private space through the application of filtering criteria and a Machine Learning (ML) based soft classifier automatically labeling messages in support of content-based filtering REFERENCES [1] Marco Vanetti, Elisabetta Binaghi, Elena Ferrari, Barbara Carminati, a Moreno Carullo, "A System to Filter Unwanted Messages from OSN User Walls", 2013. [2] A. Adomavicius and G.Tuzhilin, "Toward the Next Generation of Recommender Systems: A Survey of the State-of-the -Art and Possible Extensions, “IEEE Trans. Knowledge and Data Eng., vol.17, no.6,pp.734-749, June 2005 [3] T. Joachims, “Text categorization with support vector machines: Learning with many relevant features,” in Proceedings of the European Conference on Machine Learning. Springer, 1998, pp. 137–142. [4] F.Sebastiani, "Machine Learning Automated Text Categorization", ACM Computing surveys, vol.34, no.1, pp.1-47, 2002 [5] M.Vanetti, E.Binaghi, B.Carminati, M.Carullo, and E.Ferrari, "Content-Based Filtering in On-Line Social Networks", 2010 [6] B.Sriram, D.Fuhry, E.Demir, H.ferhatatosmanoglu, and M.Demirbas, "Short Text Classification in Twitter to Improve Information Filtering," Proc.33rd Int'l ACM SIGIT Conf. Research and Development in Information Retrieval(sIGIR '10), pp.841-842,2010. [7] V.Bobicev and M.Sokolova, "An Effective and Robust Method for Short Text Classification," Proc.23rd Nat'l Conf. Artificial Intelligence (AAAI), D.Fox and C.P.Gomes, eds., pp.1444-1445, 2008. [8] J. A. Golbeck, “Computing and applying trust in web-based social networks,” Ph.D. dissertation, PhD thesis, Graduate School of the University of Maryland, College Park, 2005. 11