IJRISE_Paper_pran

International Journal of Research In Science & Engineering Volume: 1 Issue: 1 e-ISSN: 2394-8299 p-ISSN: 2394-8280 FINDING ‘K’ INFLUENCE SEED USERS IN MULTIPLE ONLINE SOICAL NETOWRK USING CLUSTERING Praneetha G N, Post Graduate Student, Dept of CS&E, SCE Bangalore, India. Mrs Ragini Krishna, Assistant. Prof , Dept of CS&E , SCE Bangalore, India. Mrs Kavitha S, Assistant. Prof , Dept of CS&E , SCE Bangalore, India. Dr. Prashanth C M, Prof & HOD, Dept of CS&E, SCE Bangalore, India. ABSTRACT The advent of Online Social Network (OSN) has been one of the most exciting events in this decade. Many popular OSN such as Facebook, Twitter, Linkedln and Flickr have become popular. People are becoming more interested in online social network and they depend on the social network for many purposes such as finding opinions of other people about any product, movie etc. Influence propagation has become an important mechanism for viral marketing. This further motivates the researchers to carry out extensive studies on various aspects of the influence propagation problem. Influence Maximization is the problem of finding a small set of influential users in the online social network so that their influence in the social network is maximized. There are many diffusion models like Linear Threshold Model and Independent Cascade Model that are used to find the maximum influential user in online social network. And many modification and upgradation are made to these diffusion models based on scalability and efficiency issues. But, clustering technique has not been used to find the influential user. In this paper, we present a clustering approach in order to find the minimum seed users for Influence Maximization. Fuzzy k means algorithm is used to find out the influential users in multiple online social networks. This approach reduces processing time when compared to diffusion models. Diffusion model processing time increase exponentially as network size increases. Fuzzzy k means algorithm processing time increases linearly as network size increases. Keywords: Influence Maximization, Multiple Online Social Network, Fuzzy k means algorithm.. ----------------------------------------------------------------------------------------------------------------------------1. INTRODUCTION The advent of Online Social Network (OSN) has been one of the most exciting events in this decade. Many popular OSN such as Facebook, Twitter, Linkedln and Flickr have become increasingly popular. These networks are extremely rich in content and linkage data which can be analyzed. The linkage data is essentially the graph structure of social network and the communication between nodes, whereas the content data contains the text, images and other multimedia data in social network. The richness of this network provides opportunities for data analysis in context of Online Social Network. There are several factors due to which the OSN has gained importance by researchers[1]. Some of the factors are availability of social data that are vast, distributed, noisy and dynamic. There are some research issues with respect to mining the social network sites using data mining techniques. One of the issues is Influence Propagation. In many markets, customers are strongly influenced by the opinions of their friends. Influence propagation has become an important mechanism for viral marketing. This further motivates the researchers to carry out extensive studies on various aspects of the influence propagation problem. Influence Maximization problem is a problem of finding a small set of nodes that maximizes the spread of influence. Influence Maximization problem was first studied by Domingo’s and Richardson[2] and proposed first algorithm for influence propagation. Then, Kempe et al.[3] gave two fundamental propagation models, named IJRISE| www.ijrise.org|editor@ijrise.org International Journal of Research In Science & Engineering Volume: 1 Issue: 1 e-ISSN: 2394-8299 p-ISSN: 2394-8280 Independent Cascade (IC) Model and Linear Threshold (LT) Model. Many other researchers extended this basic propagation models in terms of scalability and efficiency. But most of the works focussed on a single online social network whereas users now often are found in more than one social network. Dung T. Nguyen et. Al [10] proposed an algorithm to handle this problem. We introduce a new approach to solve Influence Maximization problem. Fuzzy k means algorithm is used to find the minimum seed users in multiplex online social network so that the influence is maximized. 2. RELATED WORK 2.1 Diffusion Models Then, Kempe et al. [3] studied influence propagation by focusing on two fundamental propagation models, named Independent Cascade (IC) Model and Linear Threshold (LT) Model. Here, a social network is modelled as a graph with nodes representing individuals and edges representing connections or relationship between two individuals. Influences are propagated in the network according to the model, such as the independent cascade (IC) model. Kempe et al. proved that the optimization problem is NP-hard, and present a greedy approximation algorithm guaranteeing that the influence spread is within the optimal influence spread. In LT model, a node v is influenced by each neighbour w according to a weight bv, w such that ∑ b v,w ≤ 1 ……. Eq.1 w active neighbor of v Step 1: Each node chooses a threshold from the interval [0,1]. Step 2: Initially A0 is a set of nodes that are active. Step 3: All nodes that were active in step 2 remains active, and these active nodes tries to activate its neighbour nodes. This is according to the equation give below ∑ b v,w ≥ θv ………..Eq.2 w active neighbor of v In IC Model, we again start with an initial set of active nodes A0, and the process unfolds in discrete steps according to the following randomized rule. When node v first becomes active in step t, it is given a single chance to activate each currently inactive neighbour w. If v succeeds, then w will become active in step t+1; but whether or not v succeeds, it cannot make any further attempts to activate w in subsequent rounds. Again, the process runs until no more activation is possible. The main disadvantage of the algorithm is that it was less efficient. 2.2 Heuristic Algorithm Chen et al. proposed a new propagation model similar to the greedy algorithm[4,5] but with a better efficient result. Scalability problem of IC Model is addressed by proposing a new heuristic algorithm. The main idea of heuristic scheme is to use local arborescence structures of each node to approximate the influence propagation. Maximum influence paths (MIP) are computed between every pair of nodes in the network via a Dijkstra shortestpath algorithm, and ignore MIPs with probability smaller than an influence threshold. The union of the MIPs starting or ending at each node into the arborescence structures, which represent the local influence regions of each node, is constructed. Influence propagated through this local arborescence is only considered, and this model is referred as the maximum influence arborescence (MIA) model. Chen et al. proposed the first scalable heuristic algorithm for influence maximization in the LT model which was referred as LDAG algorithm [6]. They constructed a local DAG for every node in the network and restricted the influence for the node to be within the local DAG structure. To construct local DAG, they proposed a fast greedy algorithm where the nodes were added one by one to local DAG such that the influence of these individual nodes is greater than θ. IJRISE| www.ijrise.org|editor@ijrise.org International Journal of Research In Science & Engineering Volume: 1 Issue: 1 e-ISSN: 2394-8299 p-ISSN: 2394-8280 2.3 Multiple Online Social Networks Existing works focused only on single online social network while users nowadays are found in several OSNs. Thus, it is important to study the influence in multiple social networks. Dung T. Nguyen et.al [10] proposed a method to handle this problem. Multiple OSN network are combined into one network while preserving the influential properties of the original network. After coupling the networks, LT model was being used to find the influential users. A new metric named Influence Relay was introduced which was used to study the flow of influence between the social networks. 2.4 Interest matching users Most of the related works focused only on the topology of the network but ignored the factors such as users interest in the information that is being propagated. Yilin Shen et.al [9] proposed a new method Total Seed Nodes Learning (TSNL) to capture these interest-matching users whose interest match with each other. First, a new network was constructed from k multiple networks using network coupling. Then, interest matching users were found out using semi-supervising learning and then minimum influential users were found out based on idea called Iterative Semi-Supervised Learning. In this paper, we try to solve the Influence Maximization problem from different perspective i.e, using clustering technique. Fuzzy k-means algorithm is used to find the minimum seed users for influence maximization in multiple online social networks. LT and IC Model processing time increases exponentially as social network size increases. But, Fuzzy k-means algorithm reduces the processing time. Processing time increases linearly as network size increases. 3. METHODOLOGY 1. 2. 3. The solution to the Influence Maximization using clustering includes four modules: OSN Combiner Data Aggregator Fuzzy k-means Clusterer 1. OSN Combiner: The aim of combining the social network is to reinforce the effect of crossing users. Crossing users refers to those users who appear in more than one network. If a link appears in more than one network, then the weight of the link in coupled network will be sum of all its weights on these networks. 2. Data Aggregator After combining multiple online social networks into one, for each node in the network, following things are calculated: Total outgoing edge, total outgoing weight, total incoming edge, total incoming weight and last active time. Time Score for each user in the network is calculated to know how much s/he is active in the network. Time Score can be calculated using the formula: Time Score= c*[(LA-RMT) / (Now-RMT)] Where, LA refers to Last Active time of the user, RMT refers to Recent Message Timestamp and NOW refers to the current time. The value of Time Score will be between 0 and 1, 0 means the user is not active for long time and 1 means the user is very active. Since the value is relatively small compared to other variables like number of edges, weights etc of the user, this value can be multiplied by constant ‘c’ to all users. 3. Fuzzy k-means clusterer In k-means clustering, data element belongs to only one cluster. In fuzzy k-means clustering, data elements can belong to more than one cluster. Each data element will be associated with a set of membership matrix which indicates the strength of the association between data element and a particular cluster. Fuzzy k means algorithm works as following: Step 1: Choose a number of clusters IJRISE| www.ijrise.org|editor@ijrise.org International Journal of Research In Science & Engineering Volume: 1 Issue: 1 e-ISSN: 2394-8299 p-ISSN: 2394-8280 Step 2: Assign randomly to each point coefficients for being the cluster. Step 3: Repeat the algorithm until the co-efficient change between two iteration does not exceed sensitivity threshold ε. Step 4: Compute the centroid of cluster. Step 5: Compute the membership matrix Initially, number of overlapping users will be number of clusters. Here, co-efficient will be time score that is being calculated. Considering overlapping users as cluster head, the association between the users and overlapping user is calculated. Membership matrix gives the association between users and overlapping users. Then, membership values with respect to each cluster are calculated by summing up the membership value of each user in that cluster. Then top k clusters are selected having high membership value. This represents the top k overlapping user who has high influence in the network. Figure 2 shows the system architecture. OSN #1 OSN #2 Relationship Data OSN #N ……………… Relationship Data Relationship Data OSN Combiner Combined OSN Relationship Data Common User List Data Aggregator Data Seed Fuzzy K-Means Clustering Iterations (Usually 1) Degree of Association Matrix: Nodes Cluster A Cluster B n a 0.3 0.2 b 0.5 0.05 1 0.1 0.3 … … Total 5.3 7.9 Required no.of Seed Users Cluster D 0.2 0.04 0.6 0.1 0.3 0.1 0.2 0.21 0.0 4.6 3.1 1.8 Get Top n Clusters having High Degree of Associativity IJRISE| www.ijrise.org|editor@ijrise.org … Cluster C Cluster International Journal of Research In Science & Engineering Volume: 1 Issue: 1 e-ISSN: 2394-8299 p-ISSN: 2394-8280 4.RESULTS Snapshot 1: User to Cluster Associativity Matrix The above snapshot represents the User to Cluster Associativity Matrix. Here, the number of seed users to be found from OSN is entered as 10. Hence 10 clusters are formed. With respect to each cluster, the user to cluster associativity is found out as shown in the above figure. Snapshot 2: Finding Seed User The above snapshot represents finding seed users. With respect to each cluster, the overlapping user who is almost near to the cluster center is considered as seed user for that particular cluster. Seed User ID for each cluster and its User Association with respect to that cluster is shown. Cluster Average Association for each cluster is also calculated to rank the seed Users based on this parameter. 5. CONCLUSION The study of Influence Maximization is being studied from years and solutions like Independent Cascade (IC) model and Linear Threshold (LT) Model was proposed. But this model neglected the effect of active users in the network and processing time of these model increased exponentially as network size increases. Hence, we proposed a new approach towards Influence Maximization in multiple online social networks i.e, using clustering technique. Fuzzy k means clustering is being used to find the influential seed user from multiple online social networks. Processing time increases linearly as network size increases. Hence, processing time is reduced when compared to the diffusion models like Independent Cascade (IC) model and Linear Threshold (LT) model. A new metric called ‘Timescore’ is introduced which specifies the how much user is active in the network. IJRISE| www.ijrise.org|editor@ijrise.org International Journal of Research In Science & Engineering Volume: 1 Issue: 1 e-ISSN: 2394-8299 p-ISSN: 2394-8280 ACKNOWLEDGEMENT I am thankful to “Mrs. Ragini Krishna” for their valuable advice and support extended to us without which i could not have been able to complete the paper. I express deep thanks to Dr. Prashanth C M, Head of Department (CS&E) for warm hospitality and affection towards me. I thank the anonymous referees for their reviews that significantly improved the presentation of this paper. Words cannot express our gratitude for all those people who helped directly or indirectly in my endeavor. I take this opportunity to express my sincere thanks to all staff members of CS&E department of SCE for the valuable suggestion. REFERENCES [1]. Kempe, J. M. Kleinberg, and ´E. Tardos, “Maximizing the spread of influence through a social network,” in Proc. Of the 9th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD’03), 2003, pp.137-146. [2]. Leskovec, A. Krause, C. Guestrin, C. Faloutsos, J. VanBriesen, and N. S. Glance, “Cost-effective outbreak detection in networks,” in Proc. of the 13th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD’07), 2007, pp. 420–429 [3]. W. Chen, Y. Wang, and S. Yang, “Efficient influence maximization in social networks”, in Proc. of the 15th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD’09), 2009. [4]. W. Chen, Y. Yuan, and L. Zhang, “Scalable influence maximization in social networks under the linear threshold model”, in Proc. of the 10th IEEE Int. Conf. on Data Mining (ICDM’10), 2010.. [5]. Y.Shen, T.N. Dinh, H. Zhang, and M. T. Thai. “Interest-matching information propagation in multiple online social networks in CIKM, 2012. [6]. Dung T. Nguyen, Huiyuan Zhang, Soham Das, My T. Thai, “Least cost influence in multiplex social networks: Model Representation and Analysis, 2012 IEEE 13 th International Conference on Data Mining. IJRISE| www.ijrise.org|editor@ijrise.org

IJRISE_Paper_pran

Related documents

Products

Support

IJRISE_Paper_pran

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib