Trust, Influence, and Noise: Implications for Safety Surveilance

advertisement
Trust, Influence, and Noise:
Implications for Safety Surveilance
Bill Rand
Asst. Prof. of Marketing and Computer Science
Director of the Center for Complexity in Business
Data Science
http://www.rhsmith.umd.edu/ccb/
• Data
> Large and rich sources of data of all types
> Social media, GIS, loyalty cards, CRM, Open-source
mainstream media
• Science
> Developing theories of how and why people interact
> Hypothesis creation, First principles of consumer behavior
• Storytelling
> Explaining the science of the data to others
> Analysis, Visualization, Modeling, Simulation
SOCIAL LUMAscape
Social Marketing Management
Stream Platforms
URL Shorteners
Social Publishing Platforms
Twitter Apps
Content Curation
Analytics
Facebook Apps
Social Branded Video
Social Promotion Platforms
M
A
R
K
E
T
E
R
Social Advertising Platforms
Social Ad Networks
Social Intelligence
Social Commerce
Platforms
Advocate Platforms
Social Networks - Other
Social TV
Social/Mobile
Apps & Games
Social Business Software
Social Search & Browsing
Social Data
Social
Scoring
Blogging
Platforms
Facebook Gaming
Social Shopping
Content Sharing (Reviews/Q&A/Docs)
Image/Video Sharing
Social Referral
External (Customer) Facing
Social Content & Forums
Internal (Employee) Facing
Community
Platforms
Social Login/Sharing
Denotes acquired company
Denotes shuttered company
Traditional Publishers
Gamification
© LUMA Partners LLC 2015
C
O
N
S
U
M
E
R
Cutting through the Noise
• Opportunity: Social Media is a great marketing channel.
• Challenge: However, there is a lot of noise, and its not
apparent what users we should be paying attention to for
monitoring.
• Solution: Identify properties that are indicative of future
conversations.
Influence
• Influential users are
ones who are able to
reach a lot of users
quickly with their
messaging.
• How do you identify
influentials?
Trust
• Trust is a measure of
how much one user
believes the content of
another user.
• How does trust evolve
on social media?
• Does understanding
trust help you in
modeling
conversations?
Different Methods for Identification
4000
4000
Specification
Static
Dynamic
Baseline
Baseline+static
Baseline+dynamic
5000 Past scores
4000
Specification
Static
Dynamic
Baseline
Baseline+static
Baseline+dynamic
5000 Past scores
AUC
0.6
0.5
0.4
1000
0
2000
3000
Threshold
Random forest
AUC
0.85
0.80
0.75
0.70
0.65
0
1000
3000
2000
Threshold
Deep learning
0.9
0.8
0.7
0.6
0.5
0.4
AUC
• Baseline – How many
messages do they
generate?
• Past Scores – How many
conversations have they
created before?
• Static – How many
friends?
• Dynamic – What are the
dynamics of
conversations?
Specification
Static
Dynamic
Baseline
Baseline+static
Baseline+dynamic
5000 Past scores
SVM
0
1000
2000
3000
Threshold
Identifying Trends on Social Media
• To identify trends,
you need to establish
a baseline, but how
do you establish that
baseline?
• What matters?
– Subject
– Geography
– Time
Sandy - "Near NYC" Most Common (TF/IDF) Terms
hurricane
hurricanesandy
frankenstorm
storm
nyc
apocalypse
ny
food
water
Sandy - "Far NYC" Most Common (TF/IDF) Terms
hurricane
hurricanesandy
storm
coast
weather
hit
beach
rain
wind
Inferring Geolocation in Social Media Data
•
Geolocation in social media can be
inferred from three different types of
data:
– Geoencoded Data
– User-described Location
– Ambient Geography
•
•
•
Ambient Geography is the use of
references in natural language text to
help determine the location being
referenced
We are developing a Bayesian
modeling framework to constantly
update a user’s most probable
location based on their social media
activity
Among the many benefits, we plan to
use this tool to help verify the
accuracy of social media content,
since the proximity of a user to an
event can help assess their credibility
Challenges and
Opportunities
• Challenges
– We need better methods to automatically assess the quality and
impact of social media content
– The failure of Google Flu Trends indicates that the solution is not in big
data analysis unguided by theory
– There is a selection bias in terms of those who use social media to talk
about health, we need to account for this bias
• Opportunities
– These tools will have more resolution as we move into the future
– New methods of filtering and content analysis will improve the overall
results
– Combining multiple signals about quality of content will improve
surveillance
• In the end, we need to cut through the noise
Thanks!
Questions?
wrand@umd.edu
@billrand
ter.ps/ccb
ter.ps/ccbssrn
Download