A Predictor System for Social Network with Privacy Protection

advertisement
International Journal of Engineering Trends and Technology (IJETT) – Volume 23 Number 7- May 2015
A Predictor System for Social Network with Privacy
Protection
Irine Mary Babu 1 , Laya Devadas 2
1
M tech Scholar,
1, 2
2
College of Engineering, Munnar, Kerala, India
Abstract— Online social networks have become an essential
component of the online actions on the network and one of the
most impressing media. Online Social networks (OS Ns), such as
YouTube, Facebook, twitter etc are increasingly directed by
countless people. These networks grant users to expose their own
information to others. Users can convey, interact and mingle
with others. These networks offered periodic data sharing and
inter-user communications immediately available. Privacy is one
of the major burdens when disclosing or sharing social network
data for social science research and business analysis. Recently,
researchers have expanded privacy models to put an end to node
identification over architecture information. However, even
when these privacy models are prescribed, an attacker may still
be capable to figure out one’s private information if a group of
nodes largely shares the same sensitive attributes. In this paper,
it finds the relation of friend in social network and share the
videos according to their interest, friend’s range and friend’s
prediction. And also these networks are modeled as a graph in
which the node indicates the user and the link between the nodes
indicates the relationship between those users.
Keywords— Data mining, predictor system, privacy preserving,
social network, text categorization, video sharing.
I. INTRODUCTION
Data mining is the process of automatically detecting
appropriate knowledge in large data repositories. Data mining
techniques are expanded to scour large databases in order to
find innovative and appropriate arrangements that might
otherwise remain anonymous.
Online social networks are growing rapidly day by day and
it shows the way people mingle with each other. Online social
networks such as YouTube, Facebook, twitter etc. have
developed into one of the most prominent activities on the
web. As part of the retailing procedure, most of the business
holders actively use these social networks. Users can convey,
connect, and mingle with each other. Apart from these each
and every user has their own profile, in that they can reveal
their own information if they need. On the specific side, these
sets of information or data gives tremendous analysis
opportunities to researchers and on the unfavorable side, this
information gives a hazard to users’ privacy.
Privacy is one of the most important concern when
revealing one’s identity or information. This information
should be protected. Users are more concentrated on securing
their data. Due to day to day update and rapid growth in social
networks, obtaining information like community growth, user
ISSN: 2231-5381
Assistant Professor
behavior, shared videos etc by researchers are growing now a
days, at the same time these networks should not reveal the
private information. So the challenge is to acquire methods to
publish the information on the social network, with privacy
protection. The social network can be modeled as a graph in
which the nodes indicate the user, labels indicates the
information which the user provided and the link indicates the
relationship between the users.
In the beginning, some researchers have proposed many
techniques which prevent both information leakage and
attacks by adversaries on these networks. These methods
mostly concentrated in revealing the identity. A collection of
anonymization techniques and privacy models have been
developed like k-anonymity [10], l-diversity [14] and tcloseness [17]. Graph structures are also published with the
corresponding social relationships when publishing the social
network. As a result, it may be exploited as a new means to
compromise privacy.
A structure attacks mean attacks which utilize the graph
structure data, for example the degree and the sub graph of a
node, to distinguish a node. To obviate structure attacks, a
published graph should satisfy k anonymity [3] [9]. Current
methodologies for forefending graph, privacy can be relegated
into two categories: [6] clustering [3],[9] and edge altering [1],
[10], [9]. Clustering is to consolidate a sub graph to one super
hub, which is inadmissible for delicate marked graphs,
subsequent to when a gathering of hubs is converted into one
super hub, the node label cognations have been lost. Edgealtering strategies, keep the nodes in the pristine graph
unaltered and only integrate/expunge/swap edges. Edgealtering may to a great extent demolish the properties of a
graph. The edge-altering system once in a while may change
the distance properties significantly by connecting two far
away nodes together or erasing the scaffold connection
between the two groups.
The privacy ruptures in social network data can be
assembled into three categories:[3]
1) Identity disclosure: the personality of the person
who is connected to the node is uncovered.
2) Link disclosure: delicate connections between two
people are revealed.
3) Content disclosure: the security of the information
connected with every hub is broken, e.g., the email message
sent and/or got from the people in an email correspondence
graph.
http://www.ijettjournal.org
Page 366
International Journal of Engineering Trends and Technology (IJETT) – Volume 23 Number 7- May 2015
We accept that an immaculate protection insurance
framework ought to consider these issues. Notwithstanding,
ensuring against each of the above ruptures may require
different procedures. For instance, for content disclosure,
standard security safeguarding data mining techniques [4]
such as data perturbation and k-anonymization can offer
assistance. For link disclosure, the different strategies
examined by the link-mining community [5,7] can be valuable.
Here in this paper, we are concentrating on video sharing
activities. Nowadays, users are more interested in video
sharing. The paper deals with privacy in video sharing by
comparing the interests of users. The comparison is made by
text categorization technique. So that the video will be shared
according to the user’s interest. In text categorization, a
content or word might in part coordinate numerous
classifications. We have to locate the best coordinating class
for the content or word. The Term (word) Frequency/Inverse
Document Frequency (TF-IDF) methodology is ordinarily
used to measure every word in the content record as indicated
by how novel it is. The rest of the paper is organized as
follows. Section II reviews previous works in the area. Section
III describes the text categorization. Section IV describes the
representation. . Then the problem is defined in Section V and
propose solutions in Section VI, Algorithms are described in
section VII, Experiments and result analysis are described in
Section VIII and the conclusion in Section XI.
II. RELATED WORK
A number of recent studies have been proposed by the
researchers to ensure the private data in the social network. A
large portion of these works on graph structure and labels on
that graph and privacy on that label. L. Sweeney proposed a
model k-anonymity for protecting privacy [10]. If the data
holder wants to share a version of the data with researches and
that contains some private data, so that the data holder cannot
release the private data with scientific guarantees. The
solution provided here is a formal protection model called kanonymity. If we apply this model the resulting data looks
anonymous, so that the individuals who are the subjects of the
data cannot be identified. But the k-anonymity can still be
vulnerable to attacks.
A. Machanavajjhala, J. Gehrke, D. Kifer and M.
Venkitasubramaniam proposed l-diversity: privacy beyond k
anonymity [14] shows that k anonymity does not guarantee
privacy against attackers. The authors proposed a novel and
powerful privacy definition called l- diversity and shows that
l- diversity is practical and can be implemented efficiently.
They show the attacks on k anonymity that leak information
due to lack of diversity in the sensitive attributes. To
overcome this l- diversity principle was used. After applying
l- diversity principle, all the tuples have the same value.
Different adversaries can have different background
inferences l-diversity simultaneously protects against all of
them.
X. Ying and X. Wu [8] proposed a spectrum, preserving
approach of randomizing social networks. The authors
ISSN: 2231-5381
investigated the consequence of various properties of
networks due to randomization. They studied how randomly
deleting, and swapping edges change graph properties and
proposed an eigenvalues oriented random graph change
algorithm. All the edge editing- based models prefer to
produce a published graph with as fewer edge change.
L. Zou, L. Chen and M. T Ozsu Ozsu proposed kautomorphism: a general framework for privacy preserving
network publication [12]. Due to increasing social network
applications, privacy concerns in social networks have
become increasing important; since social networks usually
contain personal information. Simply removing all identifiable
personal information (such as names and social security
number) before releasing the data is insufficient. It is easy for
an attacker to identify the target by performing different
structural queries. The authors proposed k-automorphism to
protect against multiple structural attacks and developed an
algorithm called KM that ensures k-automorphism. The
authors also discussed an extension of KM to handle
―dynamic‖ releases of the data and proved that the algorithm
performs well in terms of the protection it provides.
J. Cheng, A. W-C Fu, and J. Liu proposed k-isomorphism:
privacy preserving network publication against structural
attacks [2]. J. Cheng, A.W.c. Fu, and J. Liu (2010) identified
a new problem of enforcing k-security for protecting sensitive
information concerning the nodes and links in a published
network dataset. Their investigation leads to the invention of
k-isomorphism where, the selection of anonymization
algorithm depends on the adversary knowledge and the targets
of protection. The authors addressed the information of
protection against structural attack if the target is only
NodeInfo. The authors say that NodeInfo and LinkInfo are
two basic sources of sensitive information on network datasets,
and they call for special efforts for their security.
L. Liu, J. Wang, J. Liu and J. Zang proposed privacy
preserving in social networks against sensitive edge disclosure
[11] treated weights on the edges as sensitive labels and
proposed a method to preserve shortest paths between most
pairs of nodes in the graph. L. Liu, J. Wang, J. Liu, and J.
Zhang considered preserving weights data privacy of certain
edges, while trying to preserve close shortest path lengths and
exactly the same shortest paths of certain pairs of nodes. Also
the authors developed two privacy preserving strategies for
this application. The first strategy is based on a Gaussian
randomization multiplication, and the second one is a greedy
perturbation algorithm which is based on the graph theory.
M. Hay, G. Miklau, D. Jensen, D. Towsley, and P. Weis
proposed Resisting structural re-identification in anonymized
social networks [13] categorized the entities connected by
relations such as friendship, communication, or a shared
activity. They quantified the privacy risks associated with
three different classes of attacks on the privacy of individuals
on networks, based on the adversarial knowledge. They
proved that network structure and size is the main root of the
risks of these attacks. They also proposed a novel approach to
http://www.ijettjournal.org
Page 367
International Journal of Engineering Trends and Technology (IJETT) – Volume 23 Number 7- May 2015
anonymizing network data that models, aggregate network
structure and then allow samples to be drawn from that model,
which guarantees anonymity for network entities while
preserving the ability to estimate a wide variety of network
measures with negligible bias.
from the corpus to contain the preparation set, performs the
adapting on the preparation set, and after that creates the
model.
5) Testing and Evaluation: This stride performs the
characterization on the testing set.
III. TEXT CATEGORIZATION
With the fast development of online data, how to process,
there huge amounts of content has effectively turned into a hot
exploration subject. Text categorization is one of the key
errands among them. The objective of content order is the
grouping of archives into an altered number of predefined
classes. The working definition utilized all throughout this
paper expects that every document of the client/user is
allocated to precisely one class. To put it all the more formally,
there is a set of classes C and a set of training document
(interest) I, there is a target idea T: I-> C which maps the text
or the document to the classes of interest. T (I) is known for
the training documents/interest for the preparation set.
Through supervised learning data contained in the preparation
samples can be utilized to locate a model. A fundamental
issue in content classification is the manner by which to
enhance the characterization exactness. The goal is to locate a
model which augments precision.
IV. PROBLEM DEFINITION
When we consider social network, privacy is the most
important thing we have to concentrate. The production of
social organization information involves a protection danger
for their clients. Delicate data about clients of the social
systems ought to be ensured.
Almost all of the privacy algorithm more concentrated on
graph structure and providing privacy for the labels which will
be either sensitive or non sensitive information about the
client/user. So authors proposed techniques like k- anonymity,
l- diversity and many more. Video sharing in social
networking is some more different. In this paper, we focus on
video sharing on the basis of text categorization. Text
categorization is the process of grouping documents into
different categories or classes. With the amount of online
information growing rapidly, the need for reliable automatic
categorization has increased. This text categorization task
show that TF-IDF algorithm not only enables a better
theoretical understanding, but also performs better in practice
without being conceptually or computationally more complex.
A. Text Categorization Steps:
Generally, text categorization often includes 5 main steps:
document pre-processing, document reduction, stemming,
model training and testing and evaluation [15]
1) Document Preprocessing: In this step, it removes
the html tags, rare words, stopping words, and may need to do
some stemming
2) Document Reduction: Since in records, there're
countless words, on the off chance that we pick every one of
them as highlights, then it'll be infeasible to do the
arrangement, as the PC can't process such measure of
information. So we have to choose those most significant and
delegate highlights for grouping. Stop words are a piece of
common dialect that does not have such a great amount of
importance in a recovery framework. The reason that stopwords ought to be expelled from content is that they make the
content look heavier and less imperative for experts.
Evacuating stop words lessens the dimensionality of term
space
3) Stemming: Stemming methods are utilized to figure
out the root/stem of a word. Stemming proselyte’s words to
their stems which consolidate a lot of dialect ward
etymological information. Behind stemming, the speculation
is that words with the same stem or word root for the most
part depict same or generally close ideas in content thus words
can be conflated by utilizing stems.
4) Model Training: This is the most critical piece of
content classification. It incorporates picking a few records
ISSN: 2231-5381
A. Objective
To build up another strategy to give privacy and security of
social network information in conveying environment by
looking at client interest.
It helps publishers to publish a unified data together to
ensure privacy.
We can publish the non sensitive data for everyone in
the network
Low overhead.
V. PROPOSED TECHNIQUE
Here we propose a method to provide privacy for video
information. Fig 1 shows the proposed system architecture.
The architecture inputs a url to the search engine. The search
engine retrieves the webpage, in their preprocessing is
performed on the content of the web page. Then, the
preprocessed web page is categorized and compares it with
the user interest. According to the user interest the data to be
shared into the user profile.
For that, we create one network. The network consists of
nodes which indicate the user/client, label indicates the data
belongs to the user, and the link indicates the relationship
between the users.
http://www.ijettjournal.org
Page 368
International Journal of Engineering Trends and Technology (IJETT) – Volume 23 Number 7- May 2015
A. Preprocessing
The fundamental target of preprocessing is to acquire the
key highlights or key terms from putting away messages, and
to improve the significance between a word and repeat and the
pertinence in the middle of a word and classification.
Preprocessing step is significant in deciding the nature of the
following stage, that is, the order stage. It is essential to
choose the critical magic words that convey the importance
and dispose of the words that don't add to recognizing the
records. The preprocessing period of the study changes over
the first printed information in information mining prepared
structure. At last it will evacuate all the html labels,
uncommon words, stop words and so forth.
B. Categorizing
In this technique, the resulted word after the preprocessing
technique, will categorize. Almost same meaning word will
come into the same category. So after this, the categorized
word will compare with the category given in the user profile.
This category will automatically share in to another user
profile, if they have the same category interest.
VI. ALGORITHMIC STRATEGY
A. TF-IDF classifier
To share videos according to the user's interest, here we
use an algorithm known as TF-IDF algorithm. TF–IDF can be
viably used to stop-words filtering in diverse subject fields,
including substance summary and portrayal or arrangement
TF-IDF weight is a weight often used in information retrieval
and text mining.
The three main design choices are,
1) The word weighting method.
2) The text length normalization is done using the number
of words.
3) The similarity measure is the inner product.
Algorithm Procedure
Enter the url
Get the meta content
Remove the stop words
Perform stemming
After stemming, categorize the video
Check whether friends with same interest of video
present
If it is true,
Share the video to corresponding users
Else
Video won’t be shared.
The algorithm first checks the URL and get the meta
content of the url and delete all the stop words and the resulted
url consists only root words. Then it categorizes the video.
Then find the friends with same interest/category by getting
the video collected by the users. After this, the url will
automatically share into the user profile and at the same time
this data will share, those who have the same interest.
Fig.1 System architecture
ISSN: 2231-5381
VII.
EXPERIMENTAL RESULTS
Here in this paper, we have developed one network with
the user interface. Their user can add their interest to their
profile. So that, according to this interest they can share their
data. The algorithm checks the URL and erases all the stop
words and the resulted url comprises just root words. At that
point it analyses the classification of the url address with the
client interest class. After this, the url will consequently
impart into the client profile and in the meantime, this
information will impart, the individuals who have the same
interest.
http://www.ijettjournal.org
Page 369
International Journal of Engineering Trends and Technology (IJETT) – Volume 23 Number 7- May 2015
VIII.
CONCLUSION
Nowadays, the use of social networking increases day by
day. It contains some valuable information as well as some
worthless information. Users on these networks are more
concentrated on securing this information. In this paper, we
design a consummate security protection framework. This
framework allows users to set their own interest to their
profile and it provides privacy according to their interest. The
algorithm compares the keywords from the url address and the
user's interest. So by doing this, videos shall be shared only to
interested people.
[7]
[8]
[9]
[10]
[11]
[12]
REFERENCES
[1]
[2]
[3]
[4]
[5]
[6]
Mingxuan Yuan, Lei Chen, Philip S. Yu, Ting Yu, "Protecting
Sensitive Labels in Social Network Data Anonymization", IEEE
Transactions on Knowledge and Data Engineering, Vol. 25, No. 3, March
[13]
2013
[14]
J. Cheng, A.W.-c. Fu, and J. Liu, ―K-Isomorphism: Privacy Preserving
Network Publication against Structural Attacks,‖ Proc. Int’lConf.
Management of Data, pp. 459-470, 2010.
K. Liu and E. Terzi, ―Towards identity anonymization on graphs,‖ in
Proc. of the 2008 ACM SIGMOD Intl. Conf. on Management of Data,
New York, USA: ACM, pp. 93–106, 2008.
Aggarwal, C. C., and Yu, P. S. Privacy-Preserving Data Mining:
Models and Algorithms, vol. 34 of Advances in Database Systems.
Springer, 2008.
Getoor, L., and Diehl, C. P. Link mining: a survey. ACM SIGKDD
Explorations Newsletter 7, 2(2005), 3-12.
Mr.Gaurav .and P.R. Mr.Gururaj.T, ―Anonymization: Enhancing
Privacy and Security of Sensitive Data of Online Social Networks‖
ISSN: 2231-5381
[15]
[16]
[17]
International Journal of Computer Science and Information
Technologies, Vol. 5 (4) , 2014, 5995-6000.
Zheleva, E., and Getoor, L. Preserving the privacy of sensitive
relationships in graph data. In Proceedings of the International
Workshop on Privacy, Security, and Trust in KDD (PinKDD'07) (San
Jose,CA, August 2007).
X. Ying and X.Wu. Randomizing social networks: a spectrum
perserving approach. In SDM, 2008
B.Zhou and J.pei,‖Preserving Privacy in Social Networks Against
Neighborhood
Attacks,‖Proc.IEEE
24th
Int’l
Conf.Data
Eng.(ICDE’08),pp.506-515,2008
L.Sweeney,‖K-Anonymity: A Model for Protecting Privacy‖, Int’l
J.uncertain. Fuzziness Knowledge-Based Systems, 2002
L. Liu, J.Wang, J. Liu, and J. Zhang.‖ Privacy preserving in social
networks against sensitive edge disclosure‖. In SIAM International
Conference on Data Mining, 2009.
L. Zou, L. Chen, , and M. T. • Ozsu. K-automorphism: a general
framework for privacy-preserving network publication. PVLDB, 2(1),
2009.
M. Hay, G. Miklau, D. Jensen, D. Towsley, and P. Weis. Resisting
structural re-identi_cation in anonymized social networks. PVLDB,
1(1), 2008.
A.Machanavajjhala et.al.,‖l-diversity:Privacy beyond K anonymity,
ACM Trans Knowledge Discovery Data(2007)
Mingyong Liu1+ and Jiangang Ya ―An improvement of TFIDF
weighting in text categorization‖ 2012 International Conference on
Computer Technology and Science (ICCTS 2012) IPCSIT vol. 47
(2012)
©
(2012)
IACSIT
Press,
Singapore
DOI:
10.7763/IPCSIT.2012.V47.9
Suvidha, Ravikishan ― A study on the Architecture for Text
Categorization and Summarization‖ International journal of computer
trends and technology- vol 3 Issue 4- 2012
N.Li and T.Li,‖T-closeness: Privacy beyond K-Anonymity and
LDiversity‖ IEEE 23rd Int’l Conf.Data Eng, 2010
http://www.ijettjournal.org
Page 370
Download