Final Projects CIS 6930.007/SYA 6933.904 Spring 2011 What you may take away • • • • Interdisciplinary collaboration experience Team work experience Leaning about the other field The seed work of a publishable result – In CS only: a plethora of good-quality peerreviewed conferences on social networks topics • More significant results can be obtained by asking the right SNA questions. • Fun! This is an experiment for all of us! • Please communicate well with your team and with us to make sure things go well – Progress – Useful/pleasant experience Project selection • Significant components of SNA and parallel/distributed computing contribution – A program that runs for an hour on your laptop perhaps does not deserve the effort to be written in parallel • Unless it’s an online service with “real time” requirements. – Can you contribute with SNA knowledge? • Dataset available or possible to collect – CS students: check APIs (and Terms of Use?) One way to reason about parallelism: Data vs. Task Parallelism What’s next • In this class: you’ll talk with potential teams • In 2 weeks: your team submits a 1000-words project proposal that includes: – – – – – Objectives Dataset description (or how to collect it) Why appropriate for parallel computations (Rough ideas on how to parallelize the code) Responsibilities for each team member • In-class team meetings with respective professors (02/28) • … Project 1 • For a set of real networks (some social, some biological or technological), compute the correlation between edge betweenness for every pair of nodes and the overlap between their neighborhoods. • Datasets are available: – Most are applicable Project 2 • Path structure and strength of tie in the legislature data (variant of forbidden triad). The basic idea is that the greater the number of paths of length 2 connecting a pair of nodes and the stronger the ties are in these paths, the more likely it is that the pair is connected by a tie ... • Datasets: – Legislature data – Other weighted graphs may be available Project 3 • Compare the centrality measures of different nodes in a Facebook social graph to understand the topology and the position of a node in the graph. See SNA course slide on comparison between high degree vs. low closeness -> many ties in a cluster on the edge of network. • Datasets: – Facebook – Others as well Project 4 • Propose a hypothesis to test, collect data from a relevant online source, and evaluate the hypothesis. • Already proposed in Projects 8, 10, 12 Project 5 • Confirm or infirm a result from social network analysis (such as, for example, Friendkin's Horizon of observability or the forbidden triad hypothesis) using much larger and mediated social networks (e.g., online social networks or other content producing/sharing systems). An overarching question for your investigation could be: do mediated networks have different characteristics than what is accepted in traditional SNA? Project 6 • Identifying ring voters in a community such as reddit.com or dig.com. The original problem was posed as a job interview question (http://www.thesixtyone.com/#/info/settings/ jobs/). The problem, however, is relevant for other contexts, as well, such as eBay or rating posts in any product review listing. Project 7 (Shankar Prawesh) • Study of the topic of social distance among Information Systems (IS) researchers during period of 1980-2010. Our analysis will be based on premiere journal published in this stream. The aforementioned period covers the almost major development period in the field of information systems and computer technologies. • Notes: – Is there a large enough dataset to make use of parallel computing? Project 8 (Ginger Johnson) • This project seeks to analyze the role of social media networks in contemporary political events in North Africa and the Middle East. • Datasets: – Twitter dataset available (collected until 2009) – useful for testing code while the newer data gets collected – Twitter API available for collecting data • A recent paper that analyzes the communication in Twitter: What is Twitter, a Social Network or a News Media? Haewoon Kwak, Changhyun Lee, Hosung Park, and Sue Moon. Proceedings of the 19th International World Wide Web (WWW) Conference, April 26-30, 2010, Raleigh NC (USA) Project 9 (Oz Cimenler) • This project goal is to contribute to our understanding of how individuals collaborate within social networks to measure network effectiveness. One of the certain network outcomes which can be viable alternative to direct measurement of effectiveness is network innovation. Homophilous networks encourage the spread of innovation, but heterophilous network connections provide unique opportunities of access to innovation. Especially focusing on value homophily, we will generate a network indicating innovation flow of USF among researchers. • Notes: – Computationally challenging for parallel computation? Project 10 (Amy Connolly) • The Stanford SNAP database lists Amazon copurchases connecting "people who bought item i also bought item j" from 2003. – First, can we collect current data and compare it to the 2003 data (and/or, compare the 2003 data from different time periods)? – Then, does this network reflect the rich get richer phenomenon (i.e., Barabasi & Albert's scale free properties including preferential attachment)? Project 11 (Richard Salkowe) • There is an extensive online database for disaster declaration requests from 1953-2004 including awards and denials. There is also an extensive online database related to voting patterns of congressional representatives, committee appointments, tenure, and party affiliation. A Social Network Analysis of these relationships may reveal potential tendencies for disaster awards versus denials based on network ties. Project 12 (Jeremy Blackburn) • Analysis of a service rating dataset. The dataset is a bipartite graph: service providers and customers (the service providers are never customers). The project consists of large scale data collection, feature extraction, and calculation. Project 13 (Larry Moore) • Use the International Movie Database (IMDB.org) to infer relationships among actors.