Trust, Influence, and Noise: Implications for Safety Surveilance Bill Rand Asst. Prof. of Marketing and Computer Science Director of the Center for Complexity in Business Data Science http://www.rhsmith.umd.edu/ccb/ • Data > Large and rich sources of data of all types > Social media, GIS, loyalty cards, CRM, Open-source mainstream media • Science > Developing theories of how and why people interact > Hypothesis creation, First principles of consumer behavior • Storytelling > Explaining the science of the data to others > Analysis, Visualization, Modeling, Simulation SOCIAL LUMAscape Social Marketing Management Stream Platforms URL Shorteners Social Publishing Platforms Twitter Apps Content Curation Analytics Facebook Apps Social Branded Video Social Promotion Platforms M A R K E T E R Social Advertising Platforms Social Ad Networks Social Intelligence Social Commerce Platforms Advocate Platforms Social Networks - Other Social TV Social/Mobile Apps & Games Social Business Software Social Search & Browsing Social Data Social Scoring Blogging Platforms Facebook Gaming Social Shopping Content Sharing (Reviews/Q&A/Docs) Image/Video Sharing Social Referral External (Customer) Facing Social Content & Forums Internal (Employee) Facing Community Platforms Social Login/Sharing Denotes acquired company Denotes shuttered company Traditional Publishers Gamification © LUMA Partners LLC 2015 C O N S U M E R Cutting through the Noise • Opportunity: Social Media is a great marketing channel. • Challenge: However, there is a lot of noise, and its not apparent what users we should be paying attention to for monitoring. • Solution: Identify properties that are indicative of future conversations. Influence • Influential users are ones who are able to reach a lot of users quickly with their messaging. • How do you identify influentials? Trust • Trust is a measure of how much one user believes the content of another user. • How does trust evolve on social media? • Does understanding trust help you in modeling conversations? Different Methods for Identification 4000 4000 Specification Static Dynamic Baseline Baseline+static Baseline+dynamic 5000 Past scores 4000 Specification Static Dynamic Baseline Baseline+static Baseline+dynamic 5000 Past scores AUC 0.6 0.5 0.4 1000 0 2000 3000 Threshold Random forest AUC 0.85 0.80 0.75 0.70 0.65 0 1000 3000 2000 Threshold Deep learning 0.9 0.8 0.7 0.6 0.5 0.4 AUC • Baseline – How many messages do they generate? • Past Scores – How many conversations have they created before? • Static – How many friends? • Dynamic – What are the dynamics of conversations? Specification Static Dynamic Baseline Baseline+static Baseline+dynamic 5000 Past scores SVM 0 1000 2000 3000 Threshold Identifying Trends on Social Media • To identify trends, you need to establish a baseline, but how do you establish that baseline? • What matters? – Subject – Geography – Time Sandy - "Near NYC" Most Common (TF/IDF) Terms hurricane hurricanesandy frankenstorm storm nyc apocalypse ny food water Sandy - "Far NYC" Most Common (TF/IDF) Terms hurricane hurricanesandy storm coast weather hit beach rain wind Inferring Geolocation in Social Media Data • Geolocation in social media can be inferred from three different types of data: – Geoencoded Data – User-described Location – Ambient Geography • • • Ambient Geography is the use of references in natural language text to help determine the location being referenced We are developing a Bayesian modeling framework to constantly update a user’s most probable location based on their social media activity Among the many benefits, we plan to use this tool to help verify the accuracy of social media content, since the proximity of a user to an event can help assess their credibility Challenges and Opportunities • Challenges – We need better methods to automatically assess the quality and impact of social media content – The failure of Google Flu Trends indicates that the solution is not in big data analysis unguided by theory – There is a selection bias in terms of those who use social media to talk about health, we need to account for this bias • Opportunities – These tools will have more resolution as we move into the future – New methods of filtering and content analysis will improve the overall results – Combining multiple signals about quality of content will improve surveillance • In the end, we need to cut through the noise Thanks! Questions? wrand@umd.edu @billrand ter.ps/ccb ter.ps/ccbssrn