
Understanding Web Browsing Behaviors
through Weibull Analysis of Dwell Time
Chao Liu, Ryen White, Susan Dumais
Microsoft Research at Redmond
Dwell Time as User Implicit Feedbacks
The most significant indicator of document relevance
besides clickthroughs [Kelly and Belkin, SIGIR’01,
Leveraged in various applications
Learning to rank [Agichtein et al., SIGIR’06]
Query expansion [Buscher et al., SIGIR’09]
BrowseRank, assuming an exponential dist. [Liu et al., SIGIR’08]
Questions Addressed in this Study
How do we model the dwell time distribution Pr(t|d)?
What does Pr(t|d) tell us about user browsing behaviors?
How is the distribution related to page-level features, and can we
predict the distribution based on page-level features?
We propose to model Pr(t|d) using Weibull distributions
The fitted Weibull distribution exhibits a strong negative aging effect,
which indicates a “screen-and-glean” browsing behavior
We can predict Pr(t|d) based on page features, which effectively
extends the application of dwell time to scenarios where dwell time
data is not available
A Primer on Weibull Analysis
Weibull Analysis on Dwell Time
Screen-and-glean browsing pattern
Screening by categories
Predicting Dwell Time Distribution
Weibull distribution and analysis
Hazard function and aging effects
Prediction performance
Feature importance
Weibull Analysis
Weibull analysis is a method for modeling positive
data sets, such as time-to-failure data
Success beyond reliability engineering
Predicting product life,
Comparing reliability of competing product designs
Establishing warranty policies or proactively managing
spare parts inventories
Survival analysis, weather forecasting, fading channels in
wireless communication, the length of labor strikes,
AIDS mortality and earthquake probabilities, etc.
Unfortunately, no prior Weibull analysis on Web data
although Web abounds with temporal data
Page dwell time, session length, time-to-first-click, etc
Weibull Distribution
2-parameter Weibull distribution
λ: scale parameter
k: shape parameter
Exponential dist. when k = 1
Weibull Analysis
Hazard function at time x
Instantaneous failure rate (or hazard rate) at time x
Amount of risk associated with an x-survivor at time x
Hazard function for Weibull distributions
Aging Effects from Hazard Function
k = 1: No aging
0<k<1: Negative aging
Constant failure rate
Exponential distribution
Decreasing failure rate
An initial screening has to be
passed in order to survive longer
Smaller k means harsher
k > 1: Positive aging
Increasing failure rate
Little to no screening at the
beginning but life becomes
tougher as time goes by
Weibull Analysis on Dwell Time and Beyond
Dwell Time Analysis
Click Analysis
Failure rate
Abandon rate
Click rate
Mean residual life
Mean residual time
on page
How soon to click …
Web abounds with temporal data
Time to first click, session length, eye fixation, …
Weibull analysis is way beyond hazard functions
Failure forecasting, corrective actions, …
A Primer on Weibull Analysis
Weibull Analysis on Dwell Time
Screen-and-glean browsing pattern
Screening by categories
Predicting Dwell Time Distribution
Weibull distribution and analysis
Hazard function and aging effects
Prediction performance
Feature importance
Goodness-of-Fit Comparison
Dwell time collected for 205,873 pages (URLs) in English
(US) market, each of which has a minimum of 10k dwell
Comparison on Goodness-of-Fit (GoF)
Dwell times for each page are split into training (80%) and
testing (20%)
Model fitting on training and evaluated on testing
Metrics: Log-likelihood and Kolmogorov–Smirnov distance
Fitting λ and k
What’s the initial screening?
Strong Negative Aging
Screen-and-glean browsing pattern?
P(k|Category): Aging Effect w.r.t. Categories
Screening is harsher for less-entertaining topics
A Primer on Weibull Analysis
Weibull Analysis on Dwell Time
Screen-and-glean browsing pattern
Screening by categories
Predicting Dwell Time Distribution
Weibull distribution and analysis
Hazard function and aging effects
Prediction performance
Feature importance
Dwell Time Prediction from Page Features
Why predicting dwell time?
Extend dwell time to pages with less or no dwell time
Enable third parties to leverage dwell time even if they don’t
have access to real dwell time data
Gain insights into what elements affect dwell time
Why using only page-level features?
Users decide how long to stay with a page based on the
experience and perception, rather than PageRank for example
Advanced features like PageRank and inlink counts may not be
available to all parties
Experiment Setup
5000 randomly sampled pages with fitted λ and k as the target
Page-level features
Pages are crawled using a dynamic crawler, which parses the html,
executes all dynamic components (e.g., redirections, flashes, javascripts,
etc), and finally renders the page
“login” pages are removed as they are likely due to time-out redirection
4771 pages left
HtmlTag: frequencies of 93 Html tags
Content: frequencies of top-1000 terms
Dynamic: statistics from dynamic crawling
Regressor: Multiple Additive Regression Tree (MART)
Effectiveness and feature interpretability
Prediction Results
Comparisons with various feature configurations
Prediction outperforms the baseline
HtmlTag and Dynamic are similar effectively when separated, and
complementary to each other when combined
Content > HtmlTag+Dynamic
Content+Dynamic the best: Dynamic captures what users experience
after clicks whereas Content shows what users would see in the end
Baseline returns the mean λ and k
Important Features
A Primer on Weibull Analysis
Weibull Analysis on Dwell Time
Screen-and-glean browsing pattern
Screening by categories
Predicting Dwell Time Distribution
Weibull distribution and analysis
Hazard function and aging effects
Prediction performance
Feature importance
The first Weibull analysis on Web dwell time
Dwell time exhibits a strong negative aging effect, which hints a
prevalent “screen and glean” browsing pattern
Harsher screening for less-entertaining topics
Feasible to predict dwell time based on page-level features
Draws an analogy between dwell time and lifetime
Opens the door to Weibull analysis for temporal implicit feedbacks
Extending applicability to less-visited pages and parties without dwell
time data
Future work
Improving prediction accuracy through better feature engineering
Weibull analysis for IR
Yutaka Suzue
Krysta Svore
Qiang Wu
Wen-tau Yih
Xiaoxin Yin
Alice Zheng
Thank You!