A Multiresolution Stochastic Process Model for Basketball Possession Outcomes Dan Cervone, Alex D’Amour, Luke Bornn, Kirk Goldsberry Harvard Statistics Department August 11, 2015 Dan Cervone (Harvard) Multiresolution Basketball Modeling August 11, 2015 NBA optical tracking data Dan Cervone (Harvard) Multiresolution Basketball Modeling August 11, 2015 NBA optical tracking data Dan Cervone (Harvard) Multiresolution Basketball Modeling August 11, 2015 NBA optical tracking data (x, y ) locations for all 10 players (5 on each team) at 25Hz. (x, y , z) locations for the ball at 25Hz. Event annotations (shots, passes, fouls, etc.). 1230 games from 2013-14 NBA, each 48 minutes, featuring 461 players in total. Dan Cervone (Harvard) Multiresolution Basketball Modeling August 11, 2015 1.6 Expected Possession Value (EPV) RASHARD LEWIS POSS. LEBRON JAMES POSSESSION James shot Splits the defense; clear path to basket Slight dip in EPV after crossing 3 point line 0 3 Pass { Runs behind basket and defenders close Accelerates into the paint Pass { 1.0 1.2 Accelerates towards basket Slight dip in EPV after crossing 3 point line EPV constant while pass is en route 0.8 EPV 1.4 NORRIS COLE POSSESSION 6 9 12 15 18 time Dan Cervone (Harvard) Multiresolution Basketball Modeling August 11, 2015 EPV definition Let Ω be the space of all possible basketball possessions. For ω ∈ Ω Dan Cervone (Harvard) Multiresolution Basketball Modeling August 11, 2015 EPV definition Let Ω be the space of all possible basketball possessions. For ω ∈ Ω X (ω) ∈ {0, 2, 3}: point value of possession ω. Dan Cervone (Harvard) Multiresolution Basketball Modeling August 11, 2015 EPV definition Let Ω be the space of all possible basketball possessions. For ω ∈ Ω X (ω) ∈ {0, 2, 3}: point value of possession ω. T (ω): possession length Dan Cervone (Harvard) Multiresolution Basketball Modeling August 11, 2015 EPV definition Let Ω be the space of all possible basketball possessions. For ω ∈ Ω X (ω) ∈ {0, 2, 3}: point value of possession ω. T (ω): possession length Zt (ω), 0 ≤ t ≤ T (ω): time series of optical tracking data. Dan Cervone (Harvard) Multiresolution Basketball Modeling August 11, 2015 EPV definition Let Ω be the space of all possible basketball possessions. For ω ∈ Ω X (ω) ∈ {0, 2, 3}: point value of possession ω. T (ω): possession length Zt (ω), 0 ≤ t ≤ T (ω): time series of optical tracking data. (Z ) Ft = σ({Zs−1 : 0 ≤ s ≤ t}): natural filtration. Dan Cervone (Harvard) Multiresolution Basketball Modeling August 11, 2015 EPV definition Let Ω be the space of all possible basketball possessions. For ω ∈ Ω X (ω) ∈ {0, 2, 3}: point value of possession ω. T (ω): possession length Zt (ω), 0 ≤ t ≤ T (ω): time series of optical tracking data. (Z ) Ft = σ({Zs−1 : 0 ≤ s ≤ t}): natural filtration. Definition The expected possession value (EPV) at time t ≥ 0 during a possession is (Z ) νt = E[X |Ft ]. Dan Cervone (Harvard) Multiresolution Basketball Modeling August 11, 2015 EPV definition Let Ω be the space of all possible basketball possessions. For ω ∈ Ω X (ω) ∈ {0, 2, 3}: point value of possession ω. T (ω): possession length Zt (ω), 0 ≤ t ≤ T (ω): time series of optical tracking data. (Z ) Ft = σ({Zs−1 : 0 ≤ s ≤ t}): natural filtration. Definition The expected possession value (EPV) at time t ≥ 0 during a possession is (Z ) νt = E[X |Ft ]. EPV provides an instantaneous snapshot of the possession’s value, given its full spatiotemporal history. (Z ) νt is a Martingale: E[νt+h |Ft ] = νt for all h > 0. Dan Cervone (Harvard) Multiresolution Basketball Modeling August 11, 2015 Calculating EPV (Z ) νt = E[X |Ft ] Regression-type prediction methods: – Data are not traditional input/output pairs. – No guarantee of stochastic consistency. Dan Cervone (Harvard) Multiresolution Basketball Modeling August 11, 2015 Calculating EPV (Z ) νt = E[X |Ft ] Regression-type prediction methods: – Data are not traditional input/output pairs. – No guarantee of stochastic consistency. Markov chains: + Stochastically consistent. – Information is lost through discretization. – Many rare transitions. Dan Cervone (Harvard) Multiresolution Basketball Modeling August 11, 2015 Calculating EPV (Z ) νt = E[X |Ft ] Regression-type prediction methods: – Data are not traditional input/output pairs. – No guarantee of stochastic consistency. Markov chains: + Stochastically consistent. – Information is lost through discretization. – Many rare transitions. Brute force, “God model” for basketball. + Allows Monte Carlo calculation of νt by simulating future possession paths. – Zt is high dimensional and includes discrete events (passes, shots, turnovers). Dan Cervone (Harvard) Multiresolution Basketball Modeling August 11, 2015 A coarsened process Finite collection of states C = Cposs ∪ Cend ∪ Ctrans . Cposs : Ball possession states {player} × {region} × {defender within 5 feet} Dan Cervone (Harvard) Multiresolution Basketball Modeling August 11, 2015 A coarsened process Finite collection of states C = Cposs ∪ Cend ∪ Ctrans . Cposs : Ball possession states {player} × {region} × {defender within 5 feet} Dan Cervone (Harvard) Cend : End states Multiresolution Basketball Modeling {made 2, made 3, turnover} August 11, 2015 A coarsened process Finite collection of states C = Cposs ∪ Cend ∪ Ctrans . Cposs : Ball possession states {player} × {region} × {defender within 5 feet} Cend : End states {made 2, made 3, turnover} Ctrans : Transition states {{ pass linking c, c 0 ∈ Cposs }, {shot attempt from c ∈ Cposs }, turnover in progress, rebound in progress }. Dan Cervone (Harvard) Multiresolution Basketball Modeling August 11, 2015 A coarsened process Finite collection of states C = Cposs ∪ Cend ∪ Ctrans . Cposs : Ball possession states {player} × {region} × {defender within 5 feet} Cend : End states {made 2, made 3, turnover} Ctrans : Transition states {{ pass linking c, c 0 ∈ Cposs }, {shot attempt from c ∈ Cposs }, turnover in progress, rebound in progress }. Ct ∈ C: state of the possession at time t. C (0) , C (1) , . . . , C (K ) : discrete sequence of distinct states. Dan Cervone (Harvard) Multiresolution Basketball Modeling August 11, 2015 Ctrans Cend Turnover Made 2pt Shot Made 3pt Player 4 Pass to P5 Player 3 Pass to P4 Cposs Player 2 Pass to P3 Player 1 Pass to P2 Possible paths for Ct Player 5 Rebound End of possession Figure 1: Schematic of the coarsened possession process C, with states (rectangles) and possible state transitions (arrows) shown. The unshaded states in the first row compose Cposs . Here, states corresponding to distinct ballhandlers are grouped together (Player 1 through 5), and the discretized court in each group represents the player’s coarsened position and defended state. The gray shaded rectangles are transition states, Ctrans , while the rectangles in the third row represent the end states, Cend . Blue arrows represent possible macrotransition entrances (and red arrows, macrotransition exits) when Player 1 has the ball; these terms are introduced in Section ??. Dan Cervone (Harvard) Multiresolution Basketball Modeling August 11, 2015 Ctrans Turnover Cend Made 2pt Shot Made 3pt Player 4 Pass to P5 Player 3 Pass to P4 Cposs Player 2 Pass to P3 Player 1 Pass to P2 Possible paths for Ct Player 5 Rebound End of possession Figure 1: Schematic of the coarsened possession process C, with states (rectangles) and possible min{s : s > t, Cs ∈ Ctrans } if Ct ∈ Cposs state transitions (arrows) τt = shown. The unshaded states in the first row compose Cposs . Here, states corresponding to distinct ballhandlers are grouped together (Player 1 tthrough 5), and the discretized t if C 6∈ Cposs court in each group represents the player’s coarsened position and defended state. The gray shaded δt =states, min{sCtrans : s, ≥ τt ,the Cs rectangles 6∈ Ctrans }. rectangles are transition while in the third row represent the end states, Cend . Blue arrows represent possible macrotransition entrances (and red arrows, macrotransition exits) when Player 1 has the ball; these terms are introduced in Section ??. ( Dan Cervone (Harvard) Multiresolution Basketball Modeling August 11, 2015 Stopping times for switching resolutions ( min{s : s > t, Cs ∈ Ctrans } τt = t δt = min{s : s ≥ τt , Cs 6∈ Ctrans }. Dan Cervone (Harvard) Multiresolution Basketball Modeling if Ct ∈ Cposs if Ct ∈ 6 Cposs August 11, 2015 Stopping times for switching resolutions ( min{s : s > t, Cs ∈ Ctrans } τt = t δt = min{s : s ≥ τt , Cs 6∈ Ctrans }. if Ct ∈ Cposs if Ct ∈ 6 Cposs Key assumptions: A1 C is marginally semi-Markov. Dan Cervone (Harvard) Multiresolution Basketball Modeling August 11, 2015 Stopping times for switching resolutions ( min{s : s > t, Cs ∈ Ctrans } τt = t δt = min{s : s ≥ τt , Cs 6∈ Ctrans }. if Ct ∈ Cposs if Ct ∈ 6 Cposs Key assumptions: A1 C is marginally semi-Markov. (Z ) A2 For all s > δt , P(Cs |Cδt , Ft ) = P(Cs |Cδt ). Dan Cervone (Harvard) Multiresolution Basketball Modeling August 11, 2015 Stopping times for switching resolutions ( min{s : s > t, Cs ∈ Ctrans } τt = t δt = min{s : s ≥ τt , Cs 6∈ Ctrans }. if Ct ∈ Cposs if Ct ∈ 6 Cposs Key assumptions: A1 C is marginally semi-Markov. (Z ) A2 For all s > δt , P(Cs |Cδt , Ft ) = P(Cs |Cδt ). Theorem Assume (A1)–(A2), then for all 0 ≤ t < T , X (Z ) νt = E[X |Cδt = c]P(Cδt = c|Ft ). c∈{Ctrans ∪Cend } Dan Cervone (Harvard) Multiresolution Basketball Modeling August 11, 2015 Multiresolution models EPV: νt = (Z ) X c∈{Ctrans ∪Cend } Dan Cervone (Harvard) E[X |Cδt = c]P(Cδt = c|Ft ). Multiresolution Basketball Modeling August 11, 2015 Multiresolution models EPV: (Z ) X νt = c∈{Ctrans ∪Cend } E[X |Cδt = c]P(Cδt = c|Ft ). Let M(t) be the event {τt ≤ t + }. (Z ) M1 P(Zt+ |M(t)c , Ft ): the microtransition model. (Z ) M2 P(M(t)|Ft ): the macrotransition entry model. (Z ) M3 P(Cδt |M(t), Ft ): the macrotransition exit model. M4 P, with Pqr = P(C (n+1) = cr |C (n) = cq ): the Markov transition probability matrix. Dan Cervone (Harvard) Multiresolution Basketball Modeling August 11, 2015 Multiresolution models EPV: (Z ) X νt = c∈{Ctrans ∪Cend } E[X |Cδt = c]P(Cδt = c|Ft ). Let M(t) be the event {τt ≤ t + }. (Z ) M1 P(Zt+ |M(t)c , Ft ): the microtransition model. (Z ) M2 P(M(t)|Ft ): the macrotransition entry model. (Z ) M3 P(Cδt |M(t), Ft ): the macrotransition exit model. M4 P, with Pqr = P(C (n+1) = cr |C (n) = cq ): the Markov transition probability matrix. Monte Carlo computation of νt : (Z ) Draw Cδt |Ft using (M1)–(M3). Calculate E[X |Cδt ] using (M4). Dan Cervone (Harvard) Multiresolution Basketball Modeling August 11, 2015 Microtransition model Player `’s position at time t is z` (t) = (x ` (t), y ` (t)). x ` (t + ) = x ` (t) + αx` [x ` (t) − x ` (t − )] + ηx` (t) ηx` (t) ∼ N (µ`x (z` (t)), (σx` )2 ). µx has Gaussian Process prior. y ` (t) modeled analogously (and independently). Dan Cervone (Harvard) Multiresolution Basketball Modeling August 11, 2015 Microtransition model Player `’s position at time t is z` (t) = (x ` (t), y ` (t)). x ` (t + ) = x ` (t) + αx` [x ` (t) − x ` (t − )] + ηx` (t) ηx` (t) ∼ N (µ`x (z` (t)), (σx` )2 ). µx has Gaussian Process prior. y ` (t) modeled analogously (and independently). TONY PARKER WITH BALL Dan Cervone (Harvard) DWIGHT HOWARD WITH BALL Multiresolution Basketball Modeling August 11, 2015 Microtransition model Player `’s position at time t is z` (t) = (x ` (t), y ` (t)). x ` (t + ) = x ` (t) + αx` [x ` (t) − x ` (t − )] + ηx` (t) ηx` (t) ∼ N (µ`x (z` (t)), (σx` )2 ). µx has Gaussian Process prior. y ` (t) modeled analogously (and independently). TONY PARKER WITHOUT BALL Dan Cervone (Harvard) DWIGHT HOWARD WITHOUT BALL Multiresolution Basketball Modeling August 11, 2015 Microtransition model Player `’s position at time t is z` (t) = (x ` (t), y ` (t)). x ` (t + ) = x ` (t) + αx` [x ` (t) − x ` (t − )] + ηx` (t) ηx` (t) ∼ N (µ`x (z` (t)), (σx` )2 ). µx has Gaussian Process prior. y ` (t) modeled analogously (and independently). TONY PARKER WITHOUT BALL DWIGHT HOWARD WITHOUT BALL Defensive microtransition model based on defensive matchups [Franks et al., 2015]. Dan Cervone (Harvard) Multiresolution Basketball Modeling August 11, 2015 Macrotransition entry model Recall M(t) = {τt ≤ t + }: Six different “types”, based on entry state Cτt , ∪6j=1 Mj (t) = M(t). (Z ) Hazards: λj (t) = lim→0 P(Mj (t)|Ft ) . log(λj (t)) = [Wj` (t)]0 β `j + ξj` z` (t) Dan Cervone (Harvard) Multiresolution Basketball Modeling August 11, 2015 Macrotransition entry model Recall M(t) = {τt ≤ t + }: Six different “types”, based on entry state Cτt , ∪6j=1 Mj (t) = M(t). (Z ) Hazards: λj (t) = lim→0 P(Mj (t)|Ft ) . log(λj (t)) = [Wj` (t)]0 β `j + ξj` z` (t) Dan Cervone (Harvard) Multiresolution Basketball Modeling August 11, 2015 Macrotransition entry model Recall M(t) = {τt ≤ t + }: Six different “types”, based on entry state Cτt , ∪6j=1 Mj (t) = M(t). (Z ) Hazards: λj (t) = lim→0 P(Mj (t)|Ft ) . log(λj (t)) = [Wj` (t)]0 β `j + ξj` z` (t) + ξ˜j` (zj (t)) 1[j ≤ 4] Dan Cervone (Harvard) Multiresolution Basketball Modeling August 11, 2015 Hierarchical modeling Dwight Howard’s shot chart: Dan Cervone (Harvard) Multiresolution Basketball Modeling August 11, 2015 Hierarchical modeling Shrinkage needed: Across space. Across different players. Dan Cervone (Harvard) Multiresolution Basketball Modeling August 11, 2015 Basis representation of spatial effects Spatial effects ξj` `: ballcarrier identity. j: macrotransition type (pass, shot, turnover). Dan Cervone (Harvard) Multiresolution Basketball Modeling August 11, 2015 Basis representation of spatial effects Spatial effects ξj` `: ballcarrier identity. j: macrotransition type (pass, shot, turnover). Functional basis representation ξj` (z) = [wj` ]0 φj (z). φj = (φji . . . φjd )0 : d spatial basis functions. wj` : weights/loadings. Dan Cervone (Harvard) Multiresolution Basketball Modeling August 11, 2015 Basis representation of spatial effects Spatial effects ξj` `: ballcarrier identity. j: macrotransition type (pass, shot, turnover). Functional basis representation ξj` (z) = [wj` ]0 φj (z). φj = (φji . . . φjd )0 : d spatial basis functions. wj` : weights/loadings. Information sharing φj allows for non-stationarity, correlations between disjoint regions [Higdon, 2002]. wj` : weights across players follow a CAR model [Besag, 1974] based on player similarity graph H. Dan Cervone (Harvard) Multiresolution Basketball Modeling August 11, 2015 Basis representation of spatial effects Basis functions φj learned in pre-processing step: Dan Cervone (Harvard) Multiresolution Basketball Modeling August 11, 2015 Basis representation of spatial effects Basis functions φj learned in pre-processing step: Graph H learned from players’ court occupancy distribution: Dan Cervone (Harvard) Multiresolution Basketball Modeling August 11, 2015 Inference “Partially Bayes” estimation of all model parameters: Dan Cervone (Harvard) Multiresolution Basketball Modeling August 11, 2015 Inference “Partially Bayes” estimation of all model parameters: Multiresolution transition models provide partial likelihood factorization [Cox, 1975]. All model parameters estimated using R-INLA software [Rue et al., 2009, Lindgren et al., 2011]. Dan Cervone (Harvard) Multiresolution Basketball Modeling August 11, 2015 Inference “Partially Bayes” estimation of all model parameters: Multiresolution transition models provide partial likelihood factorization [Cox, 1975]. All model parameters estimated using R-INLA software [Rue et al., 2009, Lindgren et al., 2011]. Distributed computing implementation: Preprocessing involves low-resource, highly parallelizable tasks. Parameter estimation involves several CPU- and memory-intensive tasks. Calculating EPV from parameter estimates involves low-resource, highly parallelizable tasks. Dan Cervone (Harvard) Multiresolution Basketball Modeling August 11, 2015 1.6 New insights from basketball possessions RASHARD LEWIS POSS. LEBRON JAMES POSSESSION James shot Splits the defense; clear path to basket Slight dip in EPV after crossing 3 point line 0 3 Pass { Runs behind basket and defenders close Accelerates into the paint Pass { 1.0 1.2 Accelerates towards basket Slight dip in EPV after crossing 3 point line EPV constant while pass is en route 0.8 EPV 1.4 NORRIS COLE POSSESSION 6 9 12 15 18 time Dan Cervone (Harvard) Multiresolution Basketball Modeling August 11, 2015 New insights from basketball possessions 1.6 LEGEND 1.4 1.09 2V P 0.00 D 2 A MOVEMENT A Macro 4 V 1.01 1.2 History Micro MACROTRANSITION VALUE (V) P 0.06 3 High Medium Low 4 V 1.42 P 0.83 C 1.0 EPV 3 V 0.98 P 0.04 1 15 0.8 B MACROTRANSITION PROBABILITY (P) High Medium Low P 0.04 TURNOVER V 0.00 P 0.02 OTHER V 0.99 P 0.02 3 C 5 V 1.03 0 3 6 9 12 15 18 time V 1.02 P 0.03 V 0.98 P 0.04 2 3 B 2 V 1.01 P 0.04 2 2 4 V 1.01 3 V 1.00 1 P 0.13 3 4 115 V 1.17 P 0.27 2 V 1.09 P 0.01 2 V 1.01 P 0.01 2 DEFENSE Deron Williams 1 Jason Terry 2 Joe Johnson 3 Andray Blatche 4 Brook Lopez 5 D 1 V 0.98 3 P 0.04 1 1.01 4V P 0.61 P 0.23 5 1 OFFENSE Norris Cole Ray Allen 3 Rashard Lewis 4 LeBron James 5 Chris Bosh 1 3 4 V 1.01 P 0.05 4 4 V 1.50 P 0.83 5 5 V 1.05 5 V 1.03 P 0.12 Dan Cervone (Harvard) TURNOVER V 0.00 P 0.16 OTHER V 1.00 P 0.16 P 0.04 TURNOVER V 0.00 P 0.01 OTHER V 0.97 P 0.13 Multiresolution Basketball Modeling V 1.03 5 P 0.00 TURNOVER V 0.00 P 0.04 OTHER V 1.06 P 0.06 August 11, 2015 New metrics for player performance EPV-added: Rank 1 2 3 4 5 6 7 8 9 10 Player Kevin Durant LeBron James Jose Calderon Dirk Nowitzki Stephen Curry Kyle Korver Serge Ibaka Channing Frye Al Horford Goran Dragic EPVA Rank 3.26 2.96 2.79 2.69 2.50 2.01 1.70 1.65 1.55 1.54 277 278 279 280 281 282 283 284 285 286 Player Zaza Pachulia DeMarcus Cousins Gordon Hayward Jimmy Butler Rodney Stuckey Ersan Ilyasova DeMar DeRozan Rajon Rondo Ricky Rubio Rudy Gay EPVA -1.55 -1.59 -1.61 -1.61 -1.63 -1.89 -2.03 -2.27 -2.36 -2.59 Table : Top 10 and bottom 10 players by EPV-added (EPVA) per game in 2013-14, minimum 500 touches during season. Dan Cervone (Harvard) Multiresolution Basketball Modeling August 11, 2015 New metrics for player performance Shot satisfaction: Rank 1 2 3 4 5 6 7 8 9 10 Player Mason Plumlee Pablo Prigioni Mike Miller Andre Drummond Brandan Wright DeAndre Jordan Kyle Korver Jose Calderon Jodie Meeks Anthony Tolliver Satis. Rank 0.35 0.31 0.27 0.26 0.24 0.24 0.24 0.22 0.22 0.22 277 278 279 280 281 282 283 284 285 286 Player Satis. Garrett Temple Kevin Garnett Shane Larkin Tayshaun Prince Dennis Schroder LaMarcus Aldridge Ricky Rubio Roy Hibbert Will Bynum Darrell Arthur -0.02 -0.02 -0.02 -0.03 -0.04 -0.04 -0.04 -0.05 -0.05 -0.05 Table : Top 10 and bottom 10 players by shot satisfaction in 2013-14, minimum 500 touches during season. Dan Cervone (Harvard) Multiresolution Basketball Modeling August 11, 2015 Acknowledgements and future work Our EPV framework can be extended to better incorporate unique basketball strategies: Additional macrotransitions can be defined, such as pick and rolls, screens, and other set plays. Use more information in defensive matchups (only defender locations, not identities, are currently used). Summarize and aggregate EPV estimates into useful player- or team-specific metrics. Dan Cervone (Harvard) Multiresolution Basketball Modeling August 11, 2015 Acknowledgements and future work Our EPV framework can be extended to better incorporate unique basketball strategies: Additional macrotransitions can be defined, such as pick and rolls, screens, and other set plays. Use more information in defensive matchups (only defender locations, not identities, are currently used). Summarize and aggregate EPV estimates into useful player- or team-specific metrics. Thanks to: Co-authors: Alex D’Amour, Luke Bornn, Kirk Goldsberry. Colleagues: Alex Franks, Andrew Miller. Dan Cervone (Harvard) Multiresolution Basketball Modeling August 11, 2015 References I [0] Besag, J. (1974). Spatial interaction and the statistical analysis of lattice systems. Journal of the Royal Statistical Society: Series B (Methodological), 36(2):192–236. [0] Cox, D. R. (1975). A note on partially bayes inference and the linear model. Biometrika, 62(3):651–654. [0] Franks, A., Miller, A., Bornn, L., and Goldsberry, K. (2015). Characterizing the spatial structure of defensive skill in professional basketball. Annals of Applied Statistics. [0] Higdon, D. (2002). Space and space-time modeling using process convolutions. In Quantitative Methods for Current Environmental Issues, pages 37–56. Springer, New York, NY. [0] Lindgren, F., Rue, H., and Lindström, J. (2011). An explicit link between Gaussian fields and Gaussian Markov random fields: the stochastic partial differential equation approach. Journal of the Royal Statistical Society: Series B (Methodological), 73(4):423–498. [0] Rue, H., Martino, S., and Chopin, N. (2009). Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations. Journal of the Royal Statistical Society: Series B (Methodological), 71(2):319–392. Dan Cervone (Harvard) Multiresolution Basketball Modeling August 11, 2015