Final Projects
CIS 6930.007/SYA 6933.904
Spring 2011
What you may take away
Interdisciplinary collaboration experience
Team work experience
Leaning about the other field
The seed work of a publishable result
– In CS only: a plethora of good-quality peerreviewed conferences on social networks topics
• More significant results can be obtained by asking the
right SNA questions.
• Fun!
This is an experiment for all of us!
• Please communicate well with your team and
with us to make sure things go well
– Progress
– Useful/pleasant experience
Project selection
• Significant components of SNA and
parallel/distributed computing contribution
– A program that runs for an hour on your laptop
perhaps does not deserve the effort to be written
in parallel
• Unless it’s an online service with “real time”
– Can you contribute with SNA knowledge?
• Dataset available or possible to collect
– CS students: check APIs (and Terms of Use?)
One way to reason about parallelism:
Data vs. Task Parallelism
What’s next
• In this class: you’ll talk with potential teams
• In 2 weeks: your team submits a 1000-words project
proposal that includes:
Dataset description (or how to collect it)
Why appropriate for parallel computations
(Rough ideas on how to parallelize the code)
Responsibilities for each team member
• In-class team meetings with respective professors
• …
Project 1
• For a set of real networks (some social, some
biological or technological), compute the
correlation between edge betweenness for
every pair of nodes and the overlap between
their neighborhoods.
• Datasets are available:
– Most are applicable
Project 2
• Path structure and strength of tie in the
legislature data (variant of forbidden triad). The
basic idea is that the greater the number of paths
of length 2 connecting a pair of nodes and the
stronger the ties are in these paths, the more
likely it is that the pair is connected by a tie ...
• Datasets:
– Legislature data
– Other weighted graphs may be available
Project 3
• Compare the centrality measures of different
nodes in a Facebook social graph to understand
the topology and the position of a node in the
graph. See SNA course slide on comparison
between high degree vs. low closeness -> many
ties in a cluster on the edge of network.
• Datasets:
– Facebook
– Others as well
Project 4
• Propose a hypothesis to test, collect data from
a relevant online source, and evaluate the
• Already proposed in Projects 8, 10, 12
Project 5
• Confirm or infirm a result from social network
analysis (such as, for example, Friendkin's
Horizon of observability or the forbidden triad
hypothesis) using much larger and mediated
social networks (e.g., online social networks or
other content producing/sharing systems). An
overarching question for your investigation
could be: do mediated networks have
different characteristics than what is accepted
in traditional SNA?
Project 6
• Identifying ring voters in a community such as or The original problem
was posed as a job interview question
jobs/). The problem, however, is relevant for
other contexts, as well, such as eBay or rating
posts in any product review listing.
Project 7 (Shankar Prawesh)
• Study of the topic of social distance among
Information Systems (IS) researchers during
period of 1980-2010. Our analysis will be based
on premiere journal published in this stream. The
aforementioned period covers the almost major
development period in the field of information
systems and computer technologies.
• Notes:
– Is there a large enough dataset to make use of parallel
Project 8 (Ginger Johnson)
• This project seeks to analyze the role of social media
networks in contemporary political events in North Africa
and the Middle East.
• Datasets:
– Twitter dataset available (collected until 2009) – useful for
testing code while the newer data gets collected
– Twitter API available for collecting data
• A recent paper that analyzes the communication in Twitter:
What is Twitter, a Social Network or a News Media?
Haewoon Kwak, Changhyun Lee, Hosung Park, and Sue
Moon. Proceedings of the 19th International World Wide
Web (WWW) Conference, April 26-30, 2010, Raleigh NC
Project 9 (Oz Cimenler)
• This project goal is to contribute to our understanding
of how individuals collaborate within social networks
to measure network effectiveness. One of the certain
network outcomes which can be viable alternative to
direct measurement of effectiveness is network
innovation. Homophilous networks encourage the
spread of innovation, but heterophilous network
connections provide unique opportunities of access to
innovation. Especially focusing on value homophily, we
will generate a network indicating innovation flow of
USF among researchers.
• Notes:
– Computationally challenging for parallel computation?
Project 10 (Amy Connolly)
• The Stanford SNAP database lists Amazon copurchases connecting "people who bought
item i also bought item j" from 2003.
– First, can we collect current data and compare it
to the 2003 data (and/or, compare the 2003 data
from different time periods)?
– Then, does this network reflect the rich get richer
phenomenon (i.e., Barabasi & Albert's scale free
properties including preferential attachment)?
Project 11 (Richard Salkowe)
• There is an extensive online database for
disaster declaration requests from 1953-2004
including awards and denials. There is also an
extensive online database related to voting
patterns of congressional representatives,
committee appointments, tenure, and party
affiliation. A Social Network Analysis of these
relationships may reveal potential tendencies
for disaster awards versus denials based on
network ties.
Project 12 (Jeremy Blackburn)
• Analysis of a service rating dataset. The
dataset is a bipartite graph: service providers
and customers (the service providers are
never customers). The project consists of large
scale data collection, feature extraction, and
Project 13 (Larry Moore)
• Use the International Movie Database
( to infer relationships among