Predictive Semantic Social Media Analysis
David A. Ostrowski
System Analytics and Environmental Sciences
Research and Advanced Engineering
Ford Motor Company
Social media
• Influential
• Sample of the web
– News driven
• CRM
– Real-time
– Less biased
• Unique opportunities for analytics
Opportunities
• Old Model
– Reactionary
• Damage control
• Inquiries
• Confirm positive reaction
• New Model
– Preemptive
• Focused engagement
– Promotions
– Events
– Media
• Anticipatory
Social Dimensions
• Describes affiliations across a network
• Values / Community
• Reinforced by relationships
• Utilize to predict purchase behavior
Relational Learning
• ‘Birds of a Feather’
• Leverage each local network to semantic understanding
• Relational Learning =>Social dimensions
Framework Overview
• Relational learning
– Strengthen representation
values
– Support knowledge
Political affiliations
• Unsupervised classification
– Generation of dimensions
schools
Movies
• Supervised classification
– Dimensions => behavior
Fb identifier
Buying habits
Issues positions
Television Shows
Fb identifier
Fb identifier
Religious views
associations
Framework Overview
labels
taxonomy
Social
Dimension
Local
network
Higher level
features
behaviors
features
features
RN
classification
K-means
cluster
Supv.
classification
Case Study One
• 4000 facebook identifiers
• Associations to two vehicle lines
• Question:
– What can we extract to characterize between these
two purchase behaviors
Relational Learning Step
Facebook Accounts
• Extracted data from FB
100
RN
Bayes
k-Means
90
• Consolidated interests
70
Accuracy
• Applied the RN algorithm
80
60
50
40
30
• Guided by taxonomy
20
10
0
45
50
55
60
65
70
75
80
missing labels (normalized)
85
90
Preliminary cluster statistics
veh1
veh2
veh1
veh2
veh1
veh2
veh1
veh2
k=3
k=3
k=4
k=4
k=5
k=5
k=6
k=6
1
46
21
44
14
21
35
7
20
2
39
42
16
27
8
22
43
14
3
13
36
12
24
1
12
6
16
4
5
6
26
32
0.3
15
13
8
45
14
9
9
19
35
normalized differences between vehicle lines
Extracted social dimensions
• Applied feature sets to k-means (3-6)
• Each classification attempt to characterize between
vehicle line and a social dimension (value / interest ..)
• All classification to be considered towards behavioral
training
• Also considered community detection
– Via maximization of a modularity matrix via leading eigenvectors
Applied Supervised Classification for the
Behavior prediction
•Applied sets through three Machine Learning algorithm
•Simple Bayes
precision .7 , recall .69
• Weightily Averaged One-dependence Estimators
(WAODE)
precision .69 recall .70
•J48
precision .69 recall .70
Case Study 2
• 20000 Facebook IDs across four vehicle lines
• Relational modeling
– Similar performance as first case study
• Social Dimensions generated for k=(3-7)
– Not as much separation after k=6 clustering
• Precision recall (among simple bayes, WAODE, J48)
.469, .483
.591, .588
.534, .536
Next Steps
• Institutionalization
– Extract / define exactly what our dimensions are
explaining in our data sets.
• Relate to specific association
– Values
– community
Q/A
See me for friends and neighbors discount….
[email protected]
Appendix (software)
•
•
•
•
‘R’ igraph
‘R’ km module
Weka
Ruby -Watir
Download

Extracted social dimensions