Technion Dec04.ppt (3.124Mb)

advertisement
Modeling the Dynamics of Online Auctions
Using a Functional Data Analytic Approach
Galit Shmueli (+ Wolfgang Jank)
Dept of Decision & Information Technologies
Robert H. Smith School of Business
University of Maryland, College Park
December 2004
Overview

Online auctions





Where are the statisticians?
Using FDA for





Importance
How they work
“Classical” empirical research and new opportunities
Representing auctions
Studying auction dynamics
Comparing auctions
Exploring relations with other variables
Current & Future directions
2
Online Auctions



Central in the eMarket place (eBay, Yahoo!,
Amazon.com…)
High accessibility, low transaction costs
eBay has more than 27M active users (from
over 61M registered). Every moment there
are ~10M items across more than 43,000
product categories amounting to nearly $15
billion in gross merchandise sales
(BusinessWeek, 2003)
3
Online Auctions
The focus of much empirical research
Players: IS and economists
We’re looking at this from a whole new
perspective! (and lots of this can be
applied to other eCommerce data)
4
eBay.com




Is by far the largest C2C auction site
Buy/sell anything imaginable
(Almost) anyone can buy/sell. You need a credit card
to register (free).
In lots of countries
5
How eBay auctions work:
Selling an item
Set some auction
features (duration,
opening price,…)
Describe item
Bells & whistles
+ more info on shipping, text description,
6
payment options, etc.
How eBay auctions work:
Bidding on an item


Choose auction
Proxy bidding:





Place max bid
eBay bids for you
Price increases by
one increment
Highest bidder pays
2nd highest bid
Highest bid is not
disclosed!
7
Bidding on an item – cont.



Auction theory: bid your max and leave
In practice: lots of sniping
Sniping agents (wow – more data!)
8
Research Q’s Asked by
Economists and IS researchers

Auction design mechanisms – mostly regressions on final price



Winner’s Curse – structural model + prior


Seller rating effect on price or P(+ rating) (Wood et al; Ba & Pavlov)
Bid Sniping – bid time CDF



Fraudulent “price-pushing” by the seller (Kauffman & Wood)
Reputation and trust – regression, probit model


Winner likely to over-pay (Bajari & Hortacsu)
Bid Shilling – t-tests


Lucking-Reiley et al: Opening Bid, Number of Bidders, Number of
Bids, Length of Auction, Reputation of Seller
Bapna et al: Bid increments
Last minute biding to increase chances of success (Roth & Ockenfels)
But early bidding also prevalent
Bidding strategies – k-means clustering

3 strategies: Participators, evaluators, opportunists (Bapna et al.)
9
No statisticians playing the
game!
Why? Data Accessibility?

eBay displays data for all auctions completed in the
last 30 days.



Researchers use spiders (web agents)





Millions of auctions (how do you sample?)
Data are on in HTML format!!!!
People usually write their own code
eBay changes the rules and formats
eBay does NOT like spiders
You really need some programming expertise
Commercial software (Andale, Hammertap)



data directly from eBay
limited (mostly aggregates)
Expensive, unreliable
11
Lots of opportunities there!






No statistical framing (sample/pop, type
of data, etc)
No data visualization
Mostly “traditional” statistical methods
Ignoring data
Sampling issues
and more….
12
Unstated assumptions in current
(static) approach




An auction is an observation from a
population of eBay auctions (US market,
certain time-frame, etc.)
Sample collected by web-spider is random
and representative of population.
Data structure: multivariate, with a fixed set
of measurements on each auction
Auctions are independent
13
Visualizing Online Auction Data


Lots of empirical research, but no-one is
LOOKING at the data!
Ordinary displays not always useful
Shmueli & Jank, “Visualizing online auctions”, JCGS, forthcoming
14
Enlightening Visualizations
Detecting Fraud (color = seller rating)
15
Advanced visualizations for
interpreting modeling results
Surplus from eBay auctions (Bapna, Jank, &
Shmueli, 2004)



Data from sniping agent gives highest bid
What are factors that affect surplus?
Advanced, interactive visualizations help learn
the multidimensional structure of the data
and to interpret results of complicated
models!

Beats heavy statistical software like SAS
16
Understanding complicated results:
surplus model
7
Variable
Coefficient SE Pvalue
Intercept
Categories*
6
2.51 0.52 <.0001
5
Antique/Art
Pottery/Glass
Collectibles
EverythingElse
Toys/Hobbies
Music/Movie/Games
Jewelry
Automotive
Home/Garden
Health/Beauty
US Dollars**
NUM_DAYS
SNIPE_TIME
NUM_BIDDERS***
PRICE***
S_RATING***
W_RATING***
OPENING_BID***
OPENING_BID x PRICE
PRICE x NUM_BIDDERS
NUM_DAYS x SNIPE_TIME
0.41
0.28
0.41
0.38
0.33
0.39
-0.30
-0.24
-0.26
-0.16
0.20
-0.15
-0.23
-0.52
0.36
-0.03
0.03
-0.17
0.04
0.09
0.02
0.10
0.07
0.05
0.09
0.08
0.15
0.12
0.06
0.05
0.06
0.04
0.07
0.06
0.05
0.03
0.01
0.01
0.02
0.00
0.02
0.01
<.0001
0.00
<.0001
<.0001
<.0001
0.01
0.01
0.00
<.0001
0.00
<.0001
0.03
0.00
<.0001
<.0001
0.00
0.02
<.0001
<.0001
<.0001
0.01
4
3
2
1
0
-4
-2
0
2
(log) Price
4
6
8
12
10
8
* Base Category: Books, Business/Industry, Clothing/Accessories, Computer, Coins/Stamps,
Electronics, Photography, Sporting Goods
** Base category: Euros and GBP
6
*** The variables surplus, price, opening bid, winner rating, seller rating
and number of bidders were transformed to the log-scale
4
2
17
2
4
6
8
10
Back to current research

Almost
exclusively static



Auction =
Snapshot at end
response: price,
# bids,…
But eBay does
show complete
bid histories!
18
Our new dynamic approach


Auction = complete bid history
Response:




Price over time
# of bidders over time
Average bidder rating over time…
Interested in auction dynamics!

Car/horse race
19
Data Structure: Challenges



Each bid history = time series measured at
unequally-spaced time points, closed interval.
Bidding is usually sparse at mid-auction and
dense at auction end
Different auctions





Different number of bids, placed at different times
Different durations
Much variability across auctions
We have LOTS of auctions!
How to represent an auction?
20
Alternative representation:
Curves!



Functional Data Analysis is a
modern statistical approach suitable for
modeling objects (curves, 3D objects,
etc), not just scalars/vectors.
Made famous by the
two monographs of
Ramsay & Silverman
http://ego.psych.mcgill.ca/misc/fda
21
Example of FDA: Handwriting




Possible goal: detect
fraudulent signature
Twenty traces of writing
“fda” by same person
We can think of these
traces as functions with
X,Y coordinates
Use FDA to explore and
model similarities and
differences between the 20
traces.
22
FDA for bidding data




Bids from single auction are
represented by single entity
Assume a very flexible
underlying curve for all
auctions
Storage and computation:
represent each auction by
some basis function and a set
of coefficients
Perform statistical analyses on


the coefficients, or
a grid taken on the curves
23
The bidding path
(=the functional object)



An auction is represented by its
bidding path, a continuous function
relating $ (or other!) over time
In practice, bidding paths are
observed at random discrete time
points. These are in the observed
bid histories
We aim to reconstruct the
unobservable continuous profile
from the observed discrete bid
history
24
Recovering the bidding path


Use smoothing to recover the bidding path
One useful smoother is the Penalized Smoothing
Spline
 Piecewise polynomial with smooth breakpoints
f (t )  0  1t   2t     pt  l 1  pl (t  ) 
2

p
p
L
Penalize curvature by minimizing
PENNSE ( f )   y j  f (t j )     f ' ' ( x) dx
2
2
j
fit
curvature
25
Smoothing Splines
for recovering bidding paths

Strengths


Good tradeoff between fit and local variability
Computationally cheap (+ numerically stable): well
approximated by a finite set of Bspline basis functions
f (t )  i 1 i i (t )
q


For smooth derivatives penalize higher order derivatives
Challenges



Must determine  and knots
Requires prior interpolation+smoothing
Curves not necessarily monotone
26
From bid histories to bidding
paths: potential enhancements


Use live-bids rather than proxy-bids
Use monotone splines (non-decreasing)
Auction 6 (monotone splines)
Case 6 RMS residual = 0.073634
5.45
5.45
5.4
5.4
5.35
5.35
5.3
5.25
log(livebid)
log(Current Price)
5.3
5.2
5.15
5.2
5.1
5.15
5.05
5
5.1
4.95
5.05
0

5.25
1
2
3
4
Day of Auction
5
6
7
0
1
2
3
4
5
6
7
Integrate auction theory into curve requirements
(knot positions, polynomial order, etc)
27
Learning about Auction dynamics
(the auction as a car race)

1st derivative = velocity, 2nd = acceleration, 3rd=?
Auction #1
Auction #2
28
A sample of auctions


158 auctions for new Palm M515 PDAs
7-days, new  $250
29
And their derivatives
30
Curve fitting:
Sensitivity Analysis



Smoothing splines + pre-smoothing 
monotone smoothing splines
Choice of knots hardly influential
Smoothing parameter chosen ad-hoc
31
Smoothing spline vs.
Monotone smoothing spline
f (t )  0  1t   2t     pt  l 1  pl (t  ) 
2
p
p
L


PENNSE ( f )   y j  f (t j )    D f ( x) dx
2
2
2
j
t
2


D
f ( x) 
1
f (t )  C0  C1 D exp 
dx 
  0 Df ( x)

F  f    y j  f t j 
2
j
2
 D f ( x) 
  
 dx
 Df ( x) 
2
32
Basis function expansions

Splines: linear combination of B-splines
q
f (t )    i  i (t )
i 1


Monotone: The ratio D 2 f / Df can be
approximated by a linear combination
of basis functions j
Fitted function:
1
1
f (t )   0   1 D [exp{c D φ(t )}]
T
33
Exploratory analysis of curves:
Auction Explorer
34
“Handling” the curves

Two approaches

Functional datum (fd):



Use curve coefficients directly in analysis
When: linear representation + linear operations
Grid


Use a set of discrete values from a grid taken on
the curves.
When: nonlinear operations and nonlinear
representation (e.g. monotone splines)
35
Exploring & Modeling
The Auction Curves

Summaries of curves




Average curve
95% CI for curve
Bid paths and/or
derivative curves
Compare subsets of
auctions
36
Exploratory analysis:
Auction Clustering

Early
bidding
Using the bidding curve coefficients we
apply cluster analysis (k-medoids) Sniping
37
Comparing cluster dynamics:
Phase-plane plots
Sniping
Early
bidding
38
Characterizing the 2 Profiles
Opening Bid
Seller Rating
Bidder Rating
# Bids
7.04 (0.52)
Early
46.01(7.94)
908.16 (106.08)
101.86 (10.42)
Late
22.31(6.94)
1171.54 (292.89)
94.29 (13.29)


11.13 (0.83)
Two profiles diverse wrt Opening Bid
Investigate this influence dynamically via
Functional Regression
39
functional-PCA : When do
auctions behave differently?
Principal components as perturbations of the mean



When during
the auction do
bid curves
deviate
most/least?
PCA+ varimax
300 premium
wristwatches
40
Functional Regression Models



Involve a curve as a response/predictor
In our case, response = bidding path
Predictors:



Static: opening price, seller rating, etc.
Dynamic: current # bidders, current avg bidder
rating
Grid: fit a regression model at each grid point
and then interpolate the coefficients
41
Functional Regression of
Bidding Path vs. Opening Bid
Estimated
Parameter
Curve
42
Functional Regression of Bidding
Acceleration vs. Opening Bid
Estimated
Parameter
Curve
43
Interpretation: Opening Bid
and Auction Energy
Value of Item
Value of Item
Potential
Market
Energy
left in the
auction
Open Bid
Open Bid
44
Current & Future Directions



Real-time forecasting of bidding paths of ongoing
auctions
Representing an auction in 2D (price + #bids over
time)
Modeling other aspects of auction data







Consumer surplus – with Ravi Bapna
Bid arrival process – with Ralph Russo (Iowa)
New predictors: currency, category, and dynamic ones
Effects of auction design changes
eBay addiction
Other eCommerce and IT applications
Papers: http://www.smith.umd.edu/ceme/statistics
45
Extras
Smoothing Spline Parameters



Order of the Spline
 cubic spline: popular, provides smooth fit;
2nd derivative (curvature), no breakpoints
 To obtain m smooth derivatives, use spline
of order m+2.
Knot locations (breakpoints)
 The more knots, the more flexible
(wiggliness)
 Tradeoff between data-fit and variability of
function
Smoothness penalty parameter 


 0: fit approaches exact interpolation
 : fit approaches linear regression
47
Alternatively:
bspline basis functions

B-splines on fixed grid of
knots (s1<s2<…sq) give
good approximation to
most smooth functions
Computational aspect:
numerical stability,
ˆ  'W 1 'Wy especially for irregularly
distributed time-points
 They form a set of natural
cubic splines with limited
support
q
f (t )    i  i (t )
i 1

coefficients
Basis
function i
48
Download