Modeling the Dynamics of Online Auctions Using a Functional Data Analytic Approach Galit Shmueli (+ Wolfgang Jank) Dept of Decision & Information Technologies Robert H. Smith School of Business University of Maryland, College Park December 2004 Overview Online auctions Where are the statisticians? Using FDA for Importance How they work “Classical” empirical research and new opportunities Representing auctions Studying auction dynamics Comparing auctions Exploring relations with other variables Current & Future directions 2 Online Auctions Central in the eMarket place (eBay, Yahoo!, Amazon.com…) High accessibility, low transaction costs eBay has more than 27M active users (from over 61M registered). Every moment there are ~10M items across more than 43,000 product categories amounting to nearly $15 billion in gross merchandise sales (BusinessWeek, 2003) 3 Online Auctions The focus of much empirical research Players: IS and economists We’re looking at this from a whole new perspective! (and lots of this can be applied to other eCommerce data) 4 eBay.com Is by far the largest C2C auction site Buy/sell anything imaginable (Almost) anyone can buy/sell. You need a credit card to register (free). In lots of countries 5 How eBay auctions work: Selling an item Set some auction features (duration, opening price,…) Describe item Bells & whistles + more info on shipping, text description, 6 payment options, etc. How eBay auctions work: Bidding on an item Choose auction Proxy bidding: Place max bid eBay bids for you Price increases by one increment Highest bidder pays 2nd highest bid Highest bid is not disclosed! 7 Bidding on an item – cont. Auction theory: bid your max and leave In practice: lots of sniping Sniping agents (wow – more data!) 8 Research Q’s Asked by Economists and IS researchers Auction design mechanisms – mostly regressions on final price Winner’s Curse – structural model + prior Seller rating effect on price or P(+ rating) (Wood et al; Ba & Pavlov) Bid Sniping – bid time CDF Fraudulent “price-pushing” by the seller (Kauffman & Wood) Reputation and trust – regression, probit model Winner likely to over-pay (Bajari & Hortacsu) Bid Shilling – t-tests Lucking-Reiley et al: Opening Bid, Number of Bidders, Number of Bids, Length of Auction, Reputation of Seller Bapna et al: Bid increments Last minute biding to increase chances of success (Roth & Ockenfels) But early bidding also prevalent Bidding strategies – k-means clustering 3 strategies: Participators, evaluators, opportunists (Bapna et al.) 9 No statisticians playing the game! Why? Data Accessibility? eBay displays data for all auctions completed in the last 30 days. Researchers use spiders (web agents) Millions of auctions (how do you sample?) Data are on in HTML format!!!! People usually write their own code eBay changes the rules and formats eBay does NOT like spiders You really need some programming expertise Commercial software (Andale, Hammertap) data directly from eBay limited (mostly aggregates) Expensive, unreliable 11 Lots of opportunities there! No statistical framing (sample/pop, type of data, etc) No data visualization Mostly “traditional” statistical methods Ignoring data Sampling issues and more…. 12 Unstated assumptions in current (static) approach An auction is an observation from a population of eBay auctions (US market, certain time-frame, etc.) Sample collected by web-spider is random and representative of population. Data structure: multivariate, with a fixed set of measurements on each auction Auctions are independent 13 Visualizing Online Auction Data Lots of empirical research, but no-one is LOOKING at the data! Ordinary displays not always useful Shmueli & Jank, “Visualizing online auctions”, JCGS, forthcoming 14 Enlightening Visualizations Detecting Fraud (color = seller rating) 15 Advanced visualizations for interpreting modeling results Surplus from eBay auctions (Bapna, Jank, & Shmueli, 2004) Data from sniping agent gives highest bid What are factors that affect surplus? Advanced, interactive visualizations help learn the multidimensional structure of the data and to interpret results of complicated models! Beats heavy statistical software like SAS 16 Understanding complicated results: surplus model 7 Variable Coefficient SE Pvalue Intercept Categories* 6 2.51 0.52 <.0001 5 Antique/Art Pottery/Glass Collectibles EverythingElse Toys/Hobbies Music/Movie/Games Jewelry Automotive Home/Garden Health/Beauty US Dollars** NUM_DAYS SNIPE_TIME NUM_BIDDERS*** PRICE*** S_RATING*** W_RATING*** OPENING_BID*** OPENING_BID x PRICE PRICE x NUM_BIDDERS NUM_DAYS x SNIPE_TIME 0.41 0.28 0.41 0.38 0.33 0.39 -0.30 -0.24 -0.26 -0.16 0.20 -0.15 -0.23 -0.52 0.36 -0.03 0.03 -0.17 0.04 0.09 0.02 0.10 0.07 0.05 0.09 0.08 0.15 0.12 0.06 0.05 0.06 0.04 0.07 0.06 0.05 0.03 0.01 0.01 0.02 0.00 0.02 0.01 <.0001 0.00 <.0001 <.0001 <.0001 0.01 0.01 0.00 <.0001 0.00 <.0001 0.03 0.00 <.0001 <.0001 0.00 0.02 <.0001 <.0001 <.0001 0.01 4 3 2 1 0 -4 -2 0 2 (log) Price 4 6 8 12 10 8 * Base Category: Books, Business/Industry, Clothing/Accessories, Computer, Coins/Stamps, Electronics, Photography, Sporting Goods ** Base category: Euros and GBP 6 *** The variables surplus, price, opening bid, winner rating, seller rating and number of bidders were transformed to the log-scale 4 2 17 2 4 6 8 10 Back to current research Almost exclusively static Auction = Snapshot at end response: price, # bids,… But eBay does show complete bid histories! 18 Our new dynamic approach Auction = complete bid history Response: Price over time # of bidders over time Average bidder rating over time… Interested in auction dynamics! Car/horse race 19 Data Structure: Challenges Each bid history = time series measured at unequally-spaced time points, closed interval. Bidding is usually sparse at mid-auction and dense at auction end Different auctions Different number of bids, placed at different times Different durations Much variability across auctions We have LOTS of auctions! How to represent an auction? 20 Alternative representation: Curves! Functional Data Analysis is a modern statistical approach suitable for modeling objects (curves, 3D objects, etc), not just scalars/vectors. Made famous by the two monographs of Ramsay & Silverman http://ego.psych.mcgill.ca/misc/fda 21 Example of FDA: Handwriting Possible goal: detect fraudulent signature Twenty traces of writing “fda” by same person We can think of these traces as functions with X,Y coordinates Use FDA to explore and model similarities and differences between the 20 traces. 22 FDA for bidding data Bids from single auction are represented by single entity Assume a very flexible underlying curve for all auctions Storage and computation: represent each auction by some basis function and a set of coefficients Perform statistical analyses on the coefficients, or a grid taken on the curves 23 The bidding path (=the functional object) An auction is represented by its bidding path, a continuous function relating $ (or other!) over time In practice, bidding paths are observed at random discrete time points. These are in the observed bid histories We aim to reconstruct the unobservable continuous profile from the observed discrete bid history 24 Recovering the bidding path Use smoothing to recover the bidding path One useful smoother is the Penalized Smoothing Spline Piecewise polynomial with smooth breakpoints f (t ) 0 1t 2t pt l 1 pl (t ) 2 p p L Penalize curvature by minimizing PENNSE ( f ) y j f (t j ) f ' ' ( x) dx 2 2 j fit curvature 25 Smoothing Splines for recovering bidding paths Strengths Good tradeoff between fit and local variability Computationally cheap (+ numerically stable): well approximated by a finite set of Bspline basis functions f (t ) i 1 i i (t ) q For smooth derivatives penalize higher order derivatives Challenges Must determine and knots Requires prior interpolation+smoothing Curves not necessarily monotone 26 From bid histories to bidding paths: potential enhancements Use live-bids rather than proxy-bids Use monotone splines (non-decreasing) Auction 6 (monotone splines) Case 6 RMS residual = 0.073634 5.45 5.45 5.4 5.4 5.35 5.35 5.3 5.25 log(livebid) log(Current Price) 5.3 5.2 5.15 5.2 5.1 5.15 5.05 5 5.1 4.95 5.05 0 5.25 1 2 3 4 Day of Auction 5 6 7 0 1 2 3 4 5 6 7 Integrate auction theory into curve requirements (knot positions, polynomial order, etc) 27 Learning about Auction dynamics (the auction as a car race) 1st derivative = velocity, 2nd = acceleration, 3rd=? Auction #1 Auction #2 28 A sample of auctions 158 auctions for new Palm M515 PDAs 7-days, new $250 29 And their derivatives 30 Curve fitting: Sensitivity Analysis Smoothing splines + pre-smoothing monotone smoothing splines Choice of knots hardly influential Smoothing parameter chosen ad-hoc 31 Smoothing spline vs. Monotone smoothing spline f (t ) 0 1t 2t pt l 1 pl (t ) 2 p p L PENNSE ( f ) y j f (t j ) D f ( x) dx 2 2 2 j t 2 D f ( x) 1 f (t ) C0 C1 D exp dx 0 Df ( x) F f y j f t j 2 j 2 D f ( x) dx Df ( x) 2 32 Basis function expansions Splines: linear combination of B-splines q f (t ) i i (t ) i 1 Monotone: The ratio D 2 f / Df can be approximated by a linear combination of basis functions j Fitted function: 1 1 f (t ) 0 1 D [exp{c D φ(t )}] T 33 Exploratory analysis of curves: Auction Explorer 34 “Handling” the curves Two approaches Functional datum (fd): Use curve coefficients directly in analysis When: linear representation + linear operations Grid Use a set of discrete values from a grid taken on the curves. When: nonlinear operations and nonlinear representation (e.g. monotone splines) 35 Exploring & Modeling The Auction Curves Summaries of curves Average curve 95% CI for curve Bid paths and/or derivative curves Compare subsets of auctions 36 Exploratory analysis: Auction Clustering Early bidding Using the bidding curve coefficients we apply cluster analysis (k-medoids) Sniping 37 Comparing cluster dynamics: Phase-plane plots Sniping Early bidding 38 Characterizing the 2 Profiles Opening Bid Seller Rating Bidder Rating # Bids 7.04 (0.52) Early 46.01(7.94) 908.16 (106.08) 101.86 (10.42) Late 22.31(6.94) 1171.54 (292.89) 94.29 (13.29) 11.13 (0.83) Two profiles diverse wrt Opening Bid Investigate this influence dynamically via Functional Regression 39 functional-PCA : When do auctions behave differently? Principal components as perturbations of the mean When during the auction do bid curves deviate most/least? PCA+ varimax 300 premium wristwatches 40 Functional Regression Models Involve a curve as a response/predictor In our case, response = bidding path Predictors: Static: opening price, seller rating, etc. Dynamic: current # bidders, current avg bidder rating Grid: fit a regression model at each grid point and then interpolate the coefficients 41 Functional Regression of Bidding Path vs. Opening Bid Estimated Parameter Curve 42 Functional Regression of Bidding Acceleration vs. Opening Bid Estimated Parameter Curve 43 Interpretation: Opening Bid and Auction Energy Value of Item Value of Item Potential Market Energy left in the auction Open Bid Open Bid 44 Current & Future Directions Real-time forecasting of bidding paths of ongoing auctions Representing an auction in 2D (price + #bids over time) Modeling other aspects of auction data Consumer surplus – with Ravi Bapna Bid arrival process – with Ralph Russo (Iowa) New predictors: currency, category, and dynamic ones Effects of auction design changes eBay addiction Other eCommerce and IT applications Papers: http://www.smith.umd.edu/ceme/statistics 45 Extras Smoothing Spline Parameters Order of the Spline cubic spline: popular, provides smooth fit; 2nd derivative (curvature), no breakpoints To obtain m smooth derivatives, use spline of order m+2. Knot locations (breakpoints) The more knots, the more flexible (wiggliness) Tradeoff between data-fit and variability of function Smoothness penalty parameter 0: fit approaches exact interpolation : fit approaches linear regression 47 Alternatively: bspline basis functions B-splines on fixed grid of knots (s1<s2<…sq) give good approximation to most smooth functions Computational aspect: numerical stability, ˆ 'W 1 'Wy especially for irregularly distributed time-points They form a set of natural cubic splines with limited support q f (t ) i i (t ) i 1 coefficients Basis function i 48