48x36 Poster Template - University of Illinois at Urbana

advertisement
Computational User Intent Modeling
Hongning Wang (wang296@illinois.edu)
Advisor: ChengXiang Zhai (czhai@illinois.edu)
Department of Computer Science, University of Illinois at Urbana-Champaign
Urbana IL, 61801 USA
Joint Relevance and Freshness
Learning (WWW’ 2012)
Content-Aware Click Modeling
(WWW’2013)
Cross-Session Search Task Extraction
(WWW’2013)
Unsupervised Discovery of Opposing
Opinion Networks (CIKM’2012)
In contrast to traditional Web search, where topical relevance is often
the main ranking criterion, news search is characterized by the increased
importance of freshness. However, the estimation of relevance and
freshness, and especially the relative importance of these two aspects,
are highly specific to the query and the time when the query was issued.
In this work, we proposed a unified framework for modeling the topical
relevance and freshness, as well as their relative importance, based on
click logs. We explored click statistics and content analysis techniques to
define a set of temporal features, which predict the right mix of freshness
and relevance for a given query.
In this work, we proposed a general Bayesian Sequential State (BSS)
model for addressing two deficiencies of existing click modeling
approaches, namely failing to utilize document content information for
modeling clicks and not being optimized for distinguishing the relative
order of relevance among the candidate documents.
As our solution, a set of descriptive features and ranking-oriented
pairwise preference are encoded via a probabilistic graphical model,
where the dependency relations among a document's relevance quality,
examine and click events under a given query are automatically captured
from the data.
Search tasks frequently span multiple sessions, and thus developing
methods to extract these tasks from historic data is central to
understanding longitudinal search behaviors and in developing search
systems to support users' long-running tasks.
In this work, we developed a semi-supervised clustering model based
on the latent structural SVM framework, which is capable of learning
inter-query dependencies from users' searching behaviors. A set of
effective automatic annotation rules are proposed as weak supervision to
release the burden of manual annotation. Our method paves the way for
user modeling and long-term task based personalized applications.
With more and more people freely express opinions as well as actively
interact with each other in discussion threads, online forums are
becoming a gold mine with rich information about people’s opinions and
social behaviors.
In this work, we study an interesting new problem of automatically
discovering opposing opinion networks of users from forum discussions,
which are subset of users who are strongly against each other on some
topic. Signals from both textual content (e.g., who says what) and social
interactions (e.g., who talks to whom) are explored in an unsupervised
optimization framework.
Relevance v.s. Freshness
Modeling User Clicks
Semi-supervised Structural Learning
Identifying Opposing Opinion Networks
• An atomic information need that may result in one
or more queries
• Relevance
• Topical relatedness
• Metric: tf*idf, BM25,
Language Model
• Freshness
• Temporal closeness
• Metric: age, elapsed time
• Trade-off
• Query specific
• To meet user’s information
need
Match my
query?
An impression
Thread, e.g.
“health care reform”
Reply To
…
Supporting Group
Post
It’s human right!
User
tѱ = 30 minutes
Redundant
doc?
Shall I
move on?
Joint Relevance and Freshness Learning
Query => trade-off
Key: Freshness
v.s.
Relevance
Chance to further
examine the results:
e.g., position,
# clicks, distance to
last click
Chance to click on an
examined and
relevant document:
e.g., clicked/skipped
content similarity
Click => overall impression
URL => freshness
URL => relevance
Experimental Results
1. P@1 comparison between different click models over the
random bucket click set and normal click set from Yahoo!
news search log.
Budget increase
It is nonsense!
5/29/2012 S1
5/29/2012 5:26
bank of america
5/29/2012 S2
5/29/2012 11:11
macy's sale
5/29/2012 11:12
sas shoes
5/30/2012 S1
5/30/2012 10:19
credit union
5/30/2012 S2
5/30/2012 12:25
6pm.com
5/30/2012 12:49 coupon for 6pm shoes
I insist my point.
I agree with you!
…
Signal 1: ReplyTo Text (R: agree/disagree)
Hot Topics & Current Events
forum in Military.com:
•43,483 threads
•1,343,427 posts
•34,332 users
•7.7 reply-to relation/ thread
Signal 2: Author Consistency (A)
Signal 3: Topical Similarity (T: agree/disagree)
Heuristic constraints
• Identical queries
• Sub-queries
• Identical clicked
URLs
Structural knowledge
• Same task =>
tasks sharing
related queries
• Latent
Relevance quality of a document:
e.g., ranking features
Experimental Results
Experimental Results
Sentiment
prior
Opinions
Agree
Opinion of posts
Text 1
Text 2
Text 3
… …
v1
v2
v3
Opinions
Disagree
subject to
1. Task extraction performance on Bing web search log with
increasing volume of weak supervision.
Experimental Results
1. Model update trace in training process.
1. Accuracy of Agree/Disagree relation classification.
2. Identified latent search task structure.
2. Ranking performance comparison with baselines on Yahoo!
news search log.
(a) On normal bucket clicks
Against Group
(b) On random bucket clicks
2. Feature weights learned by BSS model.
2. Accuracy of user opinion prediction.
Download