Document 14642624

advertisement
CS 8803 AIA
Project Proposal
CS 8803 AIA (Advanced Internet Application)
Project Proposal: Mining Social Networks
Project Members: Abhishek Saxena, Ankit Kharadi, Chirag Rajan.
Motivation
With the exponential growth of users of social networks and On line communities,
researchers have begun to explore and investigate the vast amount of information they
contain. So far there has been a tremendous interest in understanding the characteristics
of communities – how they form, what causes a community to grow or shrink in size and
why some communities are more active than others. For our project we aim to gather
user data on social networks (such as user profiles) and then apply statistical learning
techniques to gain new insight into aspects such as community dynamics and user
behavior. Our objective is two fold- firstly we want to show the scope of
classification/prediction algorithms in social networks and second to study which
classification/prediction algorithms are good for this purpose. We aim to see if we can
provide useful insight into social network behavior. As users of Google would know, their
Ad Sense feature is very accurate in placing advertisements relevant to a users interests.
Our project tries to use some of the same techniques to provide relevant classification of
users into groups. A naïve example of the expediency exhibited by a classification
algorithm in this regard shall be to examine a users profile and find out whether he is a
good fit for a community which goes hiking on weekends. If so we could recommend this
community to the user so that he may join.
Objectives
As frequent users of social networking sites, we feel that the interactivity of the
site could be improved. Today's social networking sites offer a vast amount of
information but as users we are expected to explore the myriad of connections and find
links. Through our project we hope to find out some these links for users and make them
aware of it rather than having them have to do their own research. Coming to another
example, if a user's profile says that he/she is a senior in college then it can be inferred
with a very high probability that that user would be looking for a job, thus the system
could categorize that user as someone who would certainly be a good fit for a community
related to job searches.
We would like our project to serve as a basis for future implementations. For
example in the future if users wanted to mine data we hope our project would provide
useful insights into which algorithms to use.
Georgia Institute of Technology
Page 1
CS 8803 AIA
Project Proposal
Proposed Work
The implementation roadmap for the project shall essentially comprise of
gathering social network data possibly from varied sources with the consideration that the
data ought to be rich enough to facilitate interpretations beyond what is apparent from a
typical social-network user’s perspective. We intend to employ state-of-the-art
classification/prediction algorithms to the gathered data given the requirements of the
aforementioned task. The techniques shall include Support Vector Machines, Bayesian
methods etc. Comparisons between the same is expected to yield insights from several
viewpoints. Firstly, successful application of the techniques would insinuate towards the
wealth of information that can be gained and may offer new perceptions of the field of
social network data mining. Secondly, the premise that one algorithm might necessarily
perform better than its peers ,if realized on this data, provides clues into the nature of the
data and hence suggests directions for further exploration. Additionally, from a machine
learning research perspective, putting the techniques to test in novel domains can be
conducive as it might help figuring out some limitations in the existing approaches.
Getting into details of the implementation roadmap that we anticipate, we intend
to use relevant APIs such as the Facebook API which allow obtaining the underlying
social network graph structure and/or accessing individual users’ information.One of the
aims for such a project apart from those stated above could be to recognize some
important classes of data that allow subtle inferences to be made which is another area we
plan to look into.The metrics that the team plans to employ for the evaluation of
algorithm performance include :
1. No. of training examples required
2. Classification/Generalization accuracy
3. Overfitting
4. Performance with large feature spaces
5. Versatility across different classes of data
6. Algorithm specific issues such as minimization of errors , ability to model feature
dependence, predictive power etc. and an assessment of how far these capabilities
prove useful in the considered context.
Georgia Institute of Technology
Page 2
CS 8803 AIA
Project Proposal
Related Work
A glance at some of the relevant work in application of machine learning
techniques to Social Networks leads to the identification of a bifurcation in terms of the
ongoing research.There are approaches that aim at detecting the existence of connected
networks of users which qualify as social networks and cliques and there‘re those that are
targeted at analyzing the pre-existing social networks to model their characteristics or
make predictions etc.Our work relates more to the latter.Some of the active research
projects in this area include the ones such as the MIT reality mining project.The
aforementioned aims at using stochastic techniques to learn models of user behavior in
social networks which in turn is expected to allow predicting what the user or a specific
group of users is likely to do next.Several other projects including some work at UMass
and Cornell identify potential challenges in the realm of social–network data mining and
suggest customized approaches to handle relational data.
Plan of action
From an architecture point of view, we have planned to divide the system into 4
modules:
1. FQL engine: FQL is the query language we will use to extract the data from the users
profile.
2. Front End/GUI: This will be the interface with the user which will inform users of the
algorithm classification results.
3. Database: All the information extracted using FQL will be stored in a relational
database
4. Algorithms: Using open source libraries to implement the various algorithms for
classification.
In order to implement this project, we will use the following languages/tools:
Facebook API: In order to extract the relevant data we plan to use the Facebook API.
Facebook API has a proprietary query language FQL which will allow us to query a users
profile. Once we get this profile information we will store the data in tables. Using this
we will apply classification algorithms to mine the data and get patterns.
Tentatively we have chosen to focus only on the Facebook social network because of the
easy availability of APIs and thus we can easily extract the data.
We have decided to use classification algorithms such as k nearest neighbors clustering
,Support Vector Machines(SVMs) and bayesian methods etc. as stated earlier.Several
open source libraries such as LibSVM, SVMLight,JaHMM etc. are available to
implement classifiers based on statistical models
Georgia Institute of Technology
Page 3
CS 8803 AIA
Project Proposal
The database we will use to store the data will be MySQL.
The timelines we have set for the project so far are as follows:
Project proposal
Feb 15
Feedback from proposal
Feb 22
Studying of classification algorithms
Feb 28
Design of modules
March 7
(Query Module, Front End/GUI, Database,
Algorithms)
Implementation
March 31
Testing of application
April 7
Project report and presentation
April 14
Evaluation and Testing Methods
For evaluation we plan to proceed with a bias on qualitative measures i.e. we’d
like to lay an emphasis on the usefulness of the project-work rather than an examination
of quantitative accuracies of the implementations hence built,hence our attitude towards
evaluation will be efficacious interpretation of our findings. To add to the algorithm
specific analysis discussed in the section on proposed work, we intend to provide a
comparison of our results with some of those obtained from the ongoing/previous
research and provide explanations in case our results diverge from the established
ones.Probably ,a particularly useful characteristic of the kind of work that we’ve
undertaken may arise from the various ways our work can complement related endeavors.
Another approach that can be employed in this case could be some elicitation of
feedback from users of a social network.Consider an application on the lines of our focus
that suggests interesting communities to a user based on her profile data and activities, a
feedback is one of the most useful clues on the system’s performance and utility.
Georgia Institute of Technology
Page 4
CS 8803 AIA
Project Proposal
Bibliography
1. Alessandro Acquisti and Ralph Gross(2006),Imagined Communities Awareness,
Information Sharing, and Privacy on the Facebook, Carnegie Mellon University
2. Christopher C. Burges A tutorial on support vector machines for pattern
recognition
3. David Jensen and Jennifer Neville, Data Mining in Social Networks
4. N. Eagle and A. Pentland (2006), Reality Mining: Sensing Complex Social
Systems, Personal and Ubiquitous Computing
5. N. Eagle, A. Pentland, and D. Lazer (2007), Inferring Social Network
Structure using Mobile Phone Data
6. P. Koutsourelakis. (2007), Unsupervised Group Discovery and Link Prediction
in Relational Datasets: a nonparametric Bayesian approach, Technical Report
UCRL-TR-230743, Lawrence Livermore National Laboratory
7.
Lafferty et. al. Conditional Random Fields: Probabilistic Models for
Segmenting and Labeling Sequence Data
8. Hady W. Lauw et al. Mining Social Network from Spatio-Temporal Events
9. Brian Skyrms and Robin Pemantle A dynamic model of social network
formation, School of Social Sciences, University of California, Irvine
10. Tanzeem Choudhury(2003), Sensing and Modeling Human Networks,PhD
Thesis,MIT
Georgia Institute of Technology
Page 5
Download