@ Labs Walmart Social Media, Data Integration, and

advertisement
Social Media, Data Integration, and
Human Computation
AnHai Doan
University of Wisconsin
@WalmartLabs
@WalmartLabs
Background


Professor at University of Wisconsin-Madison
In 2010 took unpaid leave and joined Kosmix
– Bay-area startup, did semantic analysis of social media

Acquired by Walmart in 2011, became WalmartLabs
– Based in San Bruno, local office in India, hundreds of people

Why did Walmart buy a social-media startup?
– Wanted to catch up with Amazon (<10B online vs. >35B of Amazon)
– Major problems if don’t get close in 10 years (see Borders)
– Kosmix/WalmartLabs helps in many ways
– Provides a core of technical people, help attract more
– Improves traditional e-commerce
– Builds the e-commerce of the future : Social + Local + Mobile
2
Major R&D Groups at WalmartLabs
Search and Products
Polaris
Giant product catalog
Product intelligence
Demand Generation
SEO, SEM
Customer targeting and personalization
Social, Mobile, and Local E-Commerce
Mining social data
Stores + Mobile
Build social/mobile apps (get on the self,
gift recommendation, etc.)
Big Fast Data
Large-scale Machine Learning
Data Extraction & Integration
Crowdsourcing
Social Genome
Special Initiatives
3
Social Genome

Mine everything we can out of social data
– From tweets, FB feeds, Foursquare, blogs, etc.
– Mine users, organizations, products, sentiments, events, etc.


Connect them to those in the traditional Web world
Put them into a giant knowledge base
– Big, evolve rapidly over time
– Call this “social genome”

Use social genome to power multiple e-commerce
applications
–
–
–
–
–
Search
Product intelligence
Gift recommendation
Personalized “Groupon”
Etc.
4
Social Genome
all
places
people
Twitter users
FB users
@melgibson @dsmith …
actors
mel-gibson davesmith …
Angelia Jolie Mel Gibson
tweet-about
the-same-as
events
@dsmith: Mel crashed.
Maserati is gone.
sports
celebrities
politics …
Gibson car crash
Egypt
Egyptian uprising
capital-of
Cairo
related-to
located-in
Tahrir
@far213: Tahrir is packed!
Building Social Genome: Three Sample Challenges
all
places
people
Twitter users
FB users
@melgibson @dsmith …
actors
mel-gibson davesmith …
Angelia Jolie Mel Gibson
tweet-about
the-same-as
1
events
@dsmith: Mel crashed.
Maserati is gone.
sports
celebrities
2
politics …
Gibson car crash
Egypt
Egyptian uprising
capital-of
Cairo
related-to
located-in
3
Tahrir
@far213: Tahrir is packed!
Extraction and Disambiguation:
Traditional Methods Ill Suited for Social Media
all
places
events
people
actors
sports
professors
Angelia Jolie Mel Gibson
Mel was arrested again.
What a dramatic fall since
his Oscar-winning day.
Mel Brocks
celebrities
Gibson car crash
Extraction
use rule-based / NLP /
machine learning techniques
@dsmith: mel crashed.
maserati is gone.
politics …
Egyptian uprising
Disambiguation
Long-term, Web context:
actor, movie, Oscar, Hollywood
Extraction
Disambiguation
use dictionaries
Short-term, social context:
crash, car, Maserati
Must Maintain a Highly Dynamic Social Genome
all
places
events
people
actors
sports
professors
Angelia Jolie Mel Gibson
Mel Brocks
celebrities
Gibson car crash
politics …
Egyptian uprising
Short-term, social context:
crash, car, Maserati
Long-term, Web context:
actor, movie, Oscar, Hollywood
Latency less than 2 seconds,
Maintained using a fast-data processing system
9
The Giant Traditional Taxonomy is
the Secret Weapon
all
Egypt
located-in
Tahrir


capital-of
Cairo
places
people
actors
Angelia Jolie Mel Gibson
Without it, dictionary-based extraction is not possible
Provide a framework to
– “understand” social media, find related concepts, “hang” social contexts

Very hard to develop, takes years
– Integrate data from multiple sources, like learning a foreign language

Partly explains why it was hard for others to catch up
 To integrate social media, must integrate
traditional data well, then bootstrap
Context is also Absolutely Critical
Alice lives in NYC
Alice tweets
Go Giants!
SF Giants
Entity
Extraction
Context/
Disambiguation
?
NY Giants
NY Giants
Bob likes Buster Posey (SF Giants player)
Bob tweets
Go Giants!
SF Giants
Entity
Extraction
?
Context/
Disambiguation
SF Giants
NY Giants
Charlie tweeted on Feb 4th
(day before the Super Bowl (event) – the
Web is talking about the NY Giants)
Charlie tweets
Go Giants!
SF Giants
Entity
Extraction
11
–Social @Walmart Labs
?
NY Giants
Context/
Disambiguation
NY Giants
Building Social Genome: Three Sample Challenges
all
places
people
Twitter users
FB users
@melgibson @dsmith …
actors
mel-gibson davesmith …
Angelia Jolie Mel Gibson
tweet-about
the-same-as
1
events
@dsmith: Mel crashed.
Maserati is gone.
sports
celebrities
2
politics …
Gibson car crash
Egypt
Egyptian uprising
capital-of
Cairo
related-to
located-in
3
Tahrir
@far213: Tahrir is packed!
Event Detection: Current Solutions
Twitter
4square
Facebook
Myspace
Flickr
…
events
Event detection
sports
celebrities
politics …
Gibson car crash
• Lot of current work in academia / industry
• Limitations of most of the current solutions
– exploit just one kind of heuristics
• e.g., find hot, trending, popular words (Egypt, revolt)
– does not exploit crowdsourcing
– does not scale
Egyptian uprising
Event Dection: Our Solution
Detector 1
Twitter
Foursquare
Detector 2
…
Detector n
Candidate
events
Candidate
events
Crowdsourcing
Population 1
Event
evaluator
and
ranker
Ranked
events
Candidate
events
Muppet, a platform to process fast data
over multiple machines
Crowdsourcing
Population 2
Crowdsourcing
Population 3
...
Processing Fast Data

Big data management is well known by now
– use MapReduce implementations
– simple programming model, widespread adoption

But a lot of fast data is also emerging
– 150 M tweets / day, 1 billion FB shares / day,
3 M Foursquare checkins / day
– come into the system as very fast streams


Numerous applications over these streams
Need to process in real time
– to answer “what is happening now?”
Processing Fast Data

What we want: a platform that
– delivers real-time processing (over multiple machines)
– is highly scalable (as the data gets faster and faster)
– has simple programming model
– so developers can quickly write hundreds of apps
– ideally like map-reduce, which developers already know
– has real-time query and storage capability
– apps can query content in real-time
– distributed across multiple machines

Answer: Muppet, like Map-Reduce, but for fast data
– see “MapReduce-Style Processing of Fast Data”, VLDB-12
Using the Social Genome

Gift recommendation:
– “I love salt!”
– “Your friend has
just tweeted about
the movie SALT.
Would you like to buy
something related for
her birthday?”
17
Using the Social Genome

Search query expansion
– “Advil”  “advil headache cramp”

Personalized “Groupon” with vendors
– “You seem to be interested in gourmet coffee.
If 50 persons sign up to buy the new DeLonghi coffee maker,
you can get that for a 50% discount.”

Stocking a local store
– Lot of people in Mountain View are interested in outdoor sport
– Stock up local Walmart store with related products

A Siri-like shopping assistant
18
Wrapping Up




The future of e-commerce: social, mobile, and local
Retailers must increasingly be data / Web players
Social media is important for e-commerce
Integrating social data is fundamentally much harder
than integrating “traditional” data
–
–
–
–

lack of context
dynamic environment, new concepts appear quickly
quality issues, lots of spam
fast data
Must integrate “traditional” data well, then bootstrap
– giant taxonomy critical

Crowdsourcing becomes indispensible
– but raises interesting challenges
Download