WEST2011 - California State University, East Bay

advertisement
Lynne Grewe and Sushmita Pandey
California State University East Bay
lynne.grewe@csueastbay.edu
The Goal
 Using Social Data to make Social Advertisement
Recommendations.
Your friends Nathan and Marty
will like this
User and Friends
Social Network Application
Social
Network
PPARS
Advertisements
The Problems
 What is the Social data?
 Which Social Data is useable/best?
 How do we capture and analyze it?
 How do relate Social data to Advertisements?
 How do we deliver a Social Advertisement?
The Environment
 Social Network: MySpace, Facebook, Hi5, Orkut,
LinkedIn, Netlog, more
Overview of Talk
 PPARS overview
 Data – problem of multiple networks
 Example of Data
 Parsing
 Quantization
 Results
 Advertisement Recommendation Results
 Future Work
Our System Overview
 PPARS = Peer Pressure Advertisement Recommendation
System
DATA
INPUT
User-origin
FRONT END
Model
Ads
Quantized
Get user-friends quantized
Process
groups
Group /
Ad matches &
socialize
Peer – Pressure Ad Selection
User
Ad
choice
Ad
Social Data
 Every network can provide different social data
 Two main splits: Facebook and OpenSocial (majority
of others).
 OpenSocial is an open standard adopted by over 30
containers and growing --- international audience.
Allows for “standardized” access.
 Popular containers like MySpace, Linkedin, Google,
Yahoo!, etc.
 Corporate support Google, Yahoo!, IBM, Microsoft, and
more.
Data Fields
About Me
Activities
Addresses
Age
Body_type
Books
Cars
Cars
Children
Current_Location
Date_Of_Birth
Drinker
Drinker
Emails
Ethnicity
Fashion
Food
Gender
Happiest_when
Has_app
Heroes
Humor
ID
Interests
Job_interests
Jobs
Languages_Spoken
Living_Arrangme Looking_for
nts
Movies
Music
Name
Network
Prescense
Nick Name
Pets
Phone
Political Views
Profile song
Profile url
Profile video
quotes
Relationship status Religion
Romance
Scared Of
Schools
Sexual Orientation
Sports
Status
Tags
Thumbain Url
Addresses
Time Zone
Turn Ons
Turn Offs
TV Shows
URLS
Some Example Data
AboutMe Ok, so I am a graduate of with degrees in
Philosophy, and Religion. I currently live in
with my wife and daughter. I enjoy
Snowboarding/skiing, Motorcycles,
computers, sports cars, and hanging out
with friends.
Some Example Data
Age
33
Books
The Professor and the Madman, Plato, Aristotle, Locke, Hume,
Kant, luscombe
Movies
Things to do in Denver when yer dead, The Departed, Encino Man,
Real Genius
Music
Very Eclectic, including Pennywise, Disturbed, System
of a Down, Linkin Park, Senses Fail, Mudvayne,
Goldfinger, and a bunch of others I am sure I cannot
remember at this time
Music
allen to // chimaira // sw1tched // bleed the sky // destiny // 40
below summer // endo // nothingface // enhancer // watcha //
lamb of god // soilwork // skrape // flaw // unearth // slodust
// deftones // raunchy // devildriver // reveille // american
head charge // nonpoint // stutterfly // factory 81 // in flames
// (hed) p.e. // dry kill logic // primer 55 // 36 crazyfists //
sevendust // taproot // candiria // bionic jive // funeral for a
friend // .....
Smallville, heros
Television
Some Example Data
Interests
Snowboarding/skiing,
Motorcycles, computers,
sports cars, and hanging out
with friends.
Some Example Data
Status
Status
Married
In a Relationship
Smoker
No
Drinker
Heroes
Heroes
Yes
Father
Freie Stelle als Held
zu vergeben,
Bewerbungen bitte an
mich...
Networking ,
Friends
White / Caucasian
Proud parent
Straight
Looking_for
Ethnicity
Children
Sexual_Orientation
Some Example Data
Schools
University Of Nevada-Reno
Reno, NV
Graduated: N/A
Degree: Master's Degree
Major: Hydrogeology
2007 to Present Purdue University-Main Campus
West Lafayette,Indiana
Graduated: 2003
Student status: Alumni
Degree: Bachelor's Degree
Major: Philosophy
Minor: CPT
Clubs: Purdue Student Government Liberal Arts
Student Council
Greek: Delta Chi
2001 to 2003 Reed Hs
Sparks, NV
Graduated: N/A
Student status: Alumni
Degree: High School Diploma
Social Data – which?
 Not all networks provide access to same data
 Users can keep information private
 Not all data is “social”
 Not all data is directly useful for advertisers
Data
 Not typically available / private
Current_Location
Date_Of_Birth
Addresses
Phone
 Not all data is “social”
ID
Name
Has_app
Nich_Name
Network Presence
Profile url
Profile song
Profile video
Thumnail URL
URLs
Drinker
Emails
Ethnicity
Fashion
Food
 Not all data is directly useful for advertisers
Infrequent data
 For our scheme need in common data to be able to
reason over in common feature space.
 Data that is NOT frequent:
Cars
Fashion
Food
Political Views
Pets
Heroes
Humor
Social Data - which
 First go around- based on network availability and
commonality, user prevalence and estimated
advertisement usefulness
 Balance between small sample space and feature
dimensionality
About Me
Activities
Age
Gender
Books
TV
Music
Looking For
Drinker
Relationship
Ethnicity
Religion
Language
Interests
Date_Of_Birth
Smoker
PPARS – Front End
User
Data
Friend 1
data
Friend2
data
I like cars, have 2 kids,
…..
PARSING
FriendX
data
Movies: Star Wars
Age= 30 …..
Individual Social Data Tokens
Web
Services
User-origin
Ontology
Codebook
Set of User and Friend Quantized
Data Vectors
QUANTIZATION
Codebooks
Quantized
Parsing
Raw Social Data
I like lots of movies. Like:
Star Wars, Star Wars II, Jaws.
And I love Harrison Fords acting.
 Create small
social data
tokens to pass
to Quantization
Null Data Test
Split by . / ! / ?
Split by :
Split by -
Split by
•I like lots of movies
• Like
•Star Wars
•Star Wars II
•Jaws
•And I love Harrison Fords acting.
;
Hierarchical
Segmentation
Split by ,
Individual Social Data Tokens
Parsing Example
About Me input = "I work as an engineer at Motorola. I
work in the peripherals department and do chip
design. I am doing some management.“
Resulting Social Data Tokens:
 I work as an engineer at Motorola
 I work in the peripherals department and do chip design
 I am doing some management
Parsing Example
Interests input = “Internet, Movies, Reading,
Karaoke,Building alternate communities”
Resulting Social Data Tokens:
 Internet
 Movies
 Reading
 Karaoke
 Language
 Building alternative communities
Parsing Example
Music input = “Bands: Superdrag, Weezer, The Doors, The Beach Boys, Journey Solo
Artists: Billy Joel, Albums: Appetite for Destruction - Guns & Roses; Blue - Weezer“
Resulting Social Data Tokens:












Bands
Superdrag
The Doors
Cheap Trick
The Beach Boys
Journey Solo Artists
Billy Joel
Albums
Appetite for Destruction
Guns & Roses
Blue
Weezer
Lost formatting of
line return between
Journey and Solo
Artists
Parsing
 Simple technique of segmentation
 Future work – include semantics of phrases to detect
potential “headings”, syntax rules around delimiters
like : and –
Quantization
Take a social data token and translate it into a numerical
feature vector.
“I like cars”  Cars = 0.2
 For each social data field need to create meaningful
feature vector elements.
 For each social data field need to come up with
techniques/algorithms to translate the raw social data
token into support for its different feature vector
elements.
Quantization- feature vector
 Pattern Recognition and Matching are later parts of
PPARS
 Need numerical representations for this of our user,
friend social data and also to represent Ads.
“I like cars” =???what ad??
Cars = 0.2  Ad with cars around 0.2
Quantization – feature vector
 For each social data element like “About Us”, “Gender”,
“Movies” we have designed its own feature vector.


Result of technique used to quantize the input social token
data
Result of studying keywords /trends in user database of
sample social tokens.
 To understand this ---- lets first discuss techniques
used to quantize social data tokens as it related to the
“type” of data element.
Quantization and Social Data Type
Numerical Data
Data is naturally numerical – i.e. Age, date of birth
Can be quickly and effectively translated into number in some defined range:


Address – can be translated into lattitude and longitude
Phone – again limited in digits
Time zone – again predefined ranges



Categorizable Data
Data where there is a predefined accepted taxonomy – i.e. movies their genre
Data where through sample analysis and advertisement goals categories can be derived



Example: interests, about me, food, fashion
Indexed Data
This is data that has defined sets of values specific to either container or OpenSocial.



Example : smoker = yes, no, occasionally, quit, never
Other examples: gender, relationship, drinker, sexual orientation
Other
This is data for which we can not easily derive an algorithm for categorizing.


Examples Profile Image , Profile Song URL, etc.
Collapsing of Data
 Some data fields have almost same meaning or content
typically greatly overlaps


About Me and Interests (and even Status)
Age and Date of Birth
Categorizable Data
 This is the bulk of the data fields: About Me, Interests,
Music, Movies, TV, Books, Looking For, Religion,
Ethnicity, Language
 Determine Feature Elements:
 Accepted “standard” taxonomies
 Web Service taxonomies
 Advertisement driven taxonomies
PPARS – Front End
User
Data
Friend 1
data
Friend2
data
I like cars, have 2 kids,
…..
PARSING
FriendX
data
Movies: Star Wars
Age= 30 …..
Individual Social Data Tokens
Web
Services
User-origin
Ontology
Codebook
Set of User and Friend Quantized
Data Vectors
QUANTIZATION
Codebooks
Quantized
Categorization: Web Service
 For some of our social data fields we are able to utilize
popular web services to convert our social data tokens
into search hits that have categorized information
associated with them.
 Example: Internet Video Archive and IMDB

Use movie genre
IVA – movie search by actor
“Robert Redford”

http://api.internetvideoarchive.com/Video/MoviesByActorName.aspx?DeveloperId=f377f57f-3bad-47048e80-1b643b206abd&SearchTerm=Robert+Redford

Some of the Results :
- <item>
- <Description>
- <![CDATA[ The Unforeseen movie trailer - starring Robert Redford, Willie Nelson, Ann Richards, Gary Bradley, Judah
Folkman, William Greider. Directed by Laura Dunn. Theatrical Release Date: 2/29/2008 Genre: Documentary Rating:
Not Rated ]]>
</Description>
<Title>THE UNFORESEEN</Title>
<Language>English</Language>
<Country>United States</Country>
<SiteUrl />
<Studio>Two Birds Films</Studio>
<StudioID>3018</StudioID>
<Rating>Not Rated</Rating>
<Genre>Documentary</Genre>
<GenreID>13</GenreID>
IVA – movie search continued
 http://api.internetvideoarchive.com/Video/MoviesByActorName.aspx?DeveloperId=f377
f57f-3bad-4704-8e80-1b643b206abd&SearchTerm=Robert+Redford
<HomeVideoReleaseDate>9/16/2008</HomeVideoReleaseDate>
<TheatricalReleaseDate>2/29/2008</TheatricalReleaseDate>
<Director>Laura Dunn</Director>
<DirectorID>36635</DirectorID>
<Actor1>Robert Redford</Actor1>
<ActorId1>7105</ActorId1>
<Actor2>Willie Nelson</Actor2>
<ActorId2>8591</ActorId2>
<Actor3>Ann Richards</Actor3>
<ActorId3>36642</ActorId3>
<Actor4>Gary Bradley</Actor4>
<ActorId4>36637</ActorId4>
IVA – movie search continued

http://api.internetvideoarchive.com/Video/MoviesByActorName.aspx?DeveloperId=f377f57f-3bad4704-8e80-1b643b206abd&SearchTerm=Robert+Redford
<HomeVideoReleaseDate>9/16/2008</HomeVideoReleaseDate>
<Link>http://videodetective.com/titledetails.aspx?publishedid=947964</Link>
<BoxOfficeInMillions>-1</BoxOfficeInMillions>
- <!-- Television Content -->
<AirDayOfWeek>-1</AirDayOfWeek>
<AirStartTime />
<ShowLengthInMinutes>-1</ShowLengthInMinutes>
<IsTelevisionContent>false</IsTelevisionContent>
<FirstReleasedYear>2008</FirstReleasedYear>
<Image>http://content.internetvideoarchive.com/content/photos/1250/05253626_.jpg</Image>
<Duration>164</Duration>
<DateCreated>3/20/2008 8:00:00 AM</DateCreated>
<Media>Movie</Media>
<PublishedId>947964</PublishedId>
<DateModified>4/22/2011 1:57:00 PM</DateModified>
AND MORE !!!!
selected GENRE
IVA genres --- our movie feature
elements
VideoCategory
Not Assigned
Western
Action-Adventure
Children's
Comedy
Drama
Family
Horror
Musical
Mystery-Suspense
Non-Fiction
Sci-Fi
War
Health/ Workout
Documentary
Thriller
Biography
Romance
Movie Quantization
 For each Social data token “Adam Sandler” , “Star Wars” we
can get multiple hits.
 Example, “Robert Redford” – first 8 hits:
 Drama = 5
These genres
 Western = 1
become our
 Documentary = 2
Movie feature
 Issues:



elements
How do we know if actor name, movie title, director or other?
Multiple hits for actor or director ---what do we do? (evidence them
all)
Multiple hits for movie title – what do we do? (take first hit)
Order of Movie Quantization
 Given any social data element parsed from the user’s
MOVIE data, we cannot know apriori if it is a title or
actor or director’s name. It may even be the genre of
movies a user likes.
1.
Title search
(take first hit)
2. Actor search (evidence all)
3. Director Search (evidence all)
4. Keyword Matching (see next)
Quantization Result 1
Up,Forrest Gump,Rear Window,District 9,PacMan,WALL·E,My Flesh and Blood, MacMusical,
Yields:
 MOVIE_FAMILY=0.6, MOVIE_SCIFI=0.2,
MOVIE_DOCUMENTARY=0.4, MOVIE_THRILLER=0.2
Quantization using other services
 TV - IMDB,
http://www.imdb.com/search/title?title_type=tv_serie
s&title=".
 Books - Google Books Search,
http://books.google.com/books/feeds/volumes?
 Music - IVA’s music API
http://api.internetvideoarchive.com/Music/**
Quantization via Keyword Matching
 What do we do when there is no pre-determined
taxonomy and no services for database hits?
 Natural Language Processing techniques
 Currently employ simple (but, effective and efficient)
technique of Keyword matching /lookup
 Create database of predetermined phrases/ keywords
 Lookup scheme to quantize social data token(s).
“I work as an engineer”  About ME lookup??
“Watch a lot of drama”  Movies look up ??
Individual Social Data
Tokens
Ontology
Codebook
Set of User and Friend
Quantized Data Vectors
Codebooks
Quantized
Keyword Database
 Used on : About Me / Interests, Religion, Ethnicity,
Looking For, Language, Relationship
 Secondary use: Books, TV, Music, Movies
 When service fails to provide any hits
Keyword Database Creation
 manual scanning of hundreds (at starting level) of user
profiles
 domain specific expert (human) knowledge
 dictionaries and taxonomies when exist
Issue: how determine weights for every entry
 Expert determined (consistency) or all equal valued (no sense
of importance)
Issue: at very beginning level---can we create a dictionary for
everything ---no --- are there more advance NLP
techniques
Some arbitrary Keyword DB entries
 ABOUT_ME
 ABOUT_ME
 ABOUT_ME
 ABOUT_ME
 ABOUT_ME
 ABOUT_ME
 ABOUT_ME
 ABOUT_ME
 ABOUT_ME
HOME
HOME
HOME
HOME
HOME
HOME
HOME
HOME
HOME
Cats 0.2
Children
Daughter
Dog 0.2
Cats 0.2
Children
Daughter
Dog 0.2
home 0.5
0.2
0.2
0.2
0.2
Some arbitrary Keyword DB entries
 ABOUT_ME
 ABOUT_ME
 ABOUT_ME
 ABOUT_ME
 ABOUT_ME
ENTERTAINMENT Shopping
ENTERTAINMENT Shows
ENTERTAINMENT
Sing
ENTERTAINMENT Ski 0.2
ENTERTAINMENT Songwriter
0.2
0.2
0.2
0.2
Keyword DB- evidence weight
Issue: how determine weights for every entry
 Expert determined (consistency)
 or all equal valued (no sense of importance)
System options: DB weights can take on different values,
option to run with all weights equal.
Keyword DB- ??
Issue: at very beginning level---can we create a dictionary for
everything ---no --- are there more advance NLP
techniques to explore for inferences.
 While users can write anything (and do), remember we are
focuses on Advertisement Recommendation --- so the scope
of our language is limited to hits related to our feature vector
elements….this is a constrained problem
 Home, Entertainment, Smoking, Work, Social, Movies, TV,
Shopping, Books, etc.—these are the kinds of areas we are
concerned with.
Types of Keyword Matching
STRICT
 Social data token must match exactly a DB entry
“Drama”  Drama √
“I like Drama”  Drama X
DB_ENTRY_CONTAINS_DATA_ELEMENT
 Data token must exist inside the DB entry
“Drama”  Drama and Comedy √
DB_ENTRY_PARTOF_DATA_ELEMENT
 Part of data token matches DB entry (this is further
segmenting data token)
“I like Drama”  Drama √
Quantization Results different
kinds of Keyword Matching
‘ I am a student and I work and love cars'
Output STRICT:
 No hits
 ABOUT_ME_ENTERTAINMENT = -1
ABOUT_ME_WORK = -1
ABOUT_ME_HOME] = -1
ABOUT_ME_SOCIAL = -1
ABOUT_ME_FOOD = -1
Quantization Results different
kinds of Keyword Matching
‘ I am a student and I work and love cars'
Output DB_ENTRY_CONTAINS_DATA_ELEMENT
 No hits
 ABOUT_ME_ENTERTAINMENT = -1
ABOUT_ME_WORK = -1
ABOUT_ME_HOME] = -1
ABOUT_ME_SOCIAL = -1
ABOUT_ME_FOOD = -1
Quantization Results different
kinds of Keyword Matching
‘ I am a student and I work and love cars'
Output DB_ENTRY_PARTOF_DATA_ELEMENT
keyword = student
 ABOUT_ME_WORK =0.2
keyword = work
 ABOUT_ME_WORK =0.5
keyword = cars
 ABOUT_ME_ENTERTAINMENT =0.2
keyword = LOVE
 ABOUT_ME_HOME=0.2
 ABOUT_ME_SOCIAL=0.2
 ABOUT_ME_ENTERTAINMENT = 0.2
ABOUT_ME_WORK = 0.7
 ABOUT_ME_HOME = 0.2
 ABOUT_ME_SOCIAL = 0.2
 ABOUT_ME_FOOD = -1
Quantization Results 2 – using
DB_ENTRY_PARTOF_DATA_ELEMENT
“
Fell in love with computers at 11, never got over it... Nonetheless,
I have always understood that human problems are solved by
people, not technology. My lifes work has been to empower
communities to design and build their own solutions.”
 6 data tokens from parsing
RESULTS:
 ABOUT_ME_ENTERTAINMENT = 0.2
 ABOUT_ME_WORK = 0.5
 ABOUT_ME_HOME = 0.2
 ABOUT_ME_SOCIAL = 0.2
 ABOUT_ME_FOOD = -1
Quantization Result 3 – good null
results
i am xing ju. test ABOUT ME for opensocial.
Parsed results:
 i am xing ju
 test ABOUT ME for opensocial
NO keyword db hits
 ABOUT_ME_ENTERTAINMENT=> -1
 ABOUT_ME_WORK => -1
 ABOUT_ME_HOME => -1
 ABOUT_ME_SOCIAL => -1
 ABOUT_ME_FOOD => -1
Quantization Results
 Garbage in and Garbage out
 LoL really dude that is the way to be  no hits

is this garbage “LoL” = lots of love…..could you interpret this
to be someone interested in social / friends?? Future – deeper
interpretation / semantic analysis?
Indexed
 Smoker, Drinker, Gender, Relationship (some networks),
Looking for (some networks) , etc.
 Example for Drinker:








opensocial.Enum.Drinker.HEAVILY
opensocial.Enum.Drinker.NO
opensocial.Enum.Drinker.OCCASIONALLY
opensocial.Enum.Drinker.QUIT
opensocial.Enum.Drinker.QUITTING
opensocial.Enum.Drinker.REGULARLY
opensocial.Enum.Drinker.SOCIALLY
opensocial.Enum.Drinker.YES
Quantized Feature Vector
 107 elements
 Normalize to 0 to 1.0 (near)
Advertisement Description
 Experts manually determine the feature vector
weighting for each add.
 Future –
 to automate this from survey/ input directly from
Advertiser
 Is there a way to analyze the ad message or image –
image understanding? Will results even match
advertiser’s goals.
PPARS --- Advertisement Matching
 Not focus of this talk
 Currently doing variations on KNN with different
forms of clustering
 Early results with small advertising database and
beginning Keyword database look good
 What kinds of groups ---groups with user in it or not?
based on only in common feature elements or not.
PPARS- Advertisement Delivery
 Area of future work could be in effective delivery of
“social message” related to selected add. Now simple
form of direct delivery
Based on grouping of same gender and age and strong likes
in interests on home.
PPARS- Advertisement Delivery
 Area of future work could be in effective delivery of
“social message” related to selected add. Now simple
form of direct delivery
Your friends Nathan and Marty will like this
Based on grouping of same gender and age and drinking.
This is a grouping the user is not part of---only friends
PPARS- Advertisement Delivery
 Here the grouping is “loose” only related by gender
and very loosely by age. So the advertisement match is
not great
 Question: should be only serve to “strong” groups?
Analysis of Advertisement Results
 Groupings are tight when data allows
 Matches to advertisements in levels – best, top 10, etc.
are correct
Future Work
 Parsing – more syntax and semantics (NLP)
 Parsing – differences in different languages.
 Quantization – extend to Natural Language
Understanding in addition/replacement of Keyword
matching, effects of different evidence accumulation.
 Data Extrapolation – using inference to create hits in
more feature elements.
Download