Inference in High-dimensional Semi-parametric Graphical Models Mladen Kolar The University of Chicago Booth School of Business Jan 6, 2016 Acknowledgments Rina Foygel Barber M. Kolar (Chicago Booth) Junwei Lu Han Liu Jan 6, 2016 2 Scientists Are Interested in Networks Networks are useful for visualization discovery of regularity patterns exploratory analysis ... of complex systems. M. Kolar (Chicago Booth) Jan 6, 2016 3 What Network Should Scientists Learn From Data? 5 6 4 3 ? 1 2 threshold covariance conditional independence structure ... M. Kolar (Chicago Booth) Jan 6, 2016 4 Probabilistic Graphical Models - Graph G = (V, E) with p nodes - Random vector X = (X1 , . . . , Xp )0 Represents conditional independence relationships between nodes Useful for exploring associations between measured variables 5 (a, b) 6∈ E ⇐⇒ Xa ⊥ Xb | Xab X1 ⊥ X6 | X2 , . . . , X5 M. Kolar (Chicago Booth) 4 6 3 1 2 Jan 6, 2016 5 Structure Learning Problem Given an i.i.d. sample Dn = {xi }ni=1 from a distribution P ∈ P b = G(D b n) Learn the set of conditional independence relationships G (Some) Existing Work: Gaussian graphical models: GLasso (Yuan and Lin, 2007) CLIME (Cai et al., 2011) neighborhood selection (Meinshausen and Bühlmann, 2006) Ising model neighborhood selection (Ravikumar et al., 2010) composite likelihood (Xue et al., 2012) Exponential family graphical models exponential (Yang et al., 2012, 2013a) Poisson (Yang et al., 2013b) mixed (Yang et al., 2014) ... M. Kolar (Chicago Booth) Jan 6, 2016 6 Implications for Science 5 4 6 3 1 2 Some questions remain unanswered: How can we quantify uncertainty of estimated graph structure? How certain we are there is an edge between nodes a and b? How to construct honest, robust tests about edge parameters? M. Kolar (Chicago Booth) Jan 6, 2016 7 Quantifying uncertainty For Gaussian graphical model inference on values of the precision matrix Ω using an asymptotically normal estimator (Ren et al., 2013) This talk Construction of confidence intervals for edge parameters in transelliptical graphical models M. Kolar (Chicago Booth) Jan 6, 2016 8 Transelliptical Graphical Models M. Kolar (Chicago Booth) Jan 6, 2016 9 Background: Nonparanormal model / Gaussian copula Nonparanormal distribution: X ∼ N P Np (Σ; f1 , . . . , fp ) if (f1 (X1 ), . . . , fp (Xp ))T ∼ N (0, Σ) (Liu et al., 2009) M. Kolar (Chicago Booth) Jan 6, 2016 10 Background: Transelliptical Distribution Transelliptical distribution: X ∼ T Ep (Σ, ξ; f1 , . . . , fp ) if (f1 (X1 ), . . . , fp (Xp ))T ∼ ECP (0, Σ, ξ) where Σ = [σab ]a,b ∈ Rp is a correlation matrix and P[ξ = 0] = 0. Elliptical distribution: Z ∼ ECp (µ, Σ, ξ) if Z =µ+ ξ |{z} random radius Σ1/2 U |{z} random unit vector (Liu et al., 2012b) M. Kolar (Chicago Booth) Jan 6, 2016 11 Tail dependence Gaussian graphical model ↓ Elliptical model → → Nonparanormal models ↓ Transelliptical model → = allow non-Gaussian marginals ↓ = allow heavy tail dependence M. Kolar (Chicago Booth) Jan 6, 2016 12 Tail dependence Elliptical and transelliptical distributions allow for heavy tail dependence between variables. (X1 , X2 ) ∼ multivariate t-distribution with d degrees of freedom Tail correlations: Corr 1I X1 ≥ qαX1 , 1I Xb ≥ qαX2 1 0.9 0.8 d=0.1 Tail correlation 0.7 d=1 0.6 0.5 d=5 0.4 0.3 d=10 d= (Gaussian) 0.2 0.1 0 0.5 0.6 0.7 0.8 0.9 1 Quantile M. Kolar (Chicago Booth) Jan 6, 2016 13 Some applications Robust graphical modeling Nonparanormal (Gaussian copula) (Liu et al., 2009, 2012a; Xue and Zou, 2012) Transelliptical (Elliptical copula) (Liu et al., 2012b) Mixed graphical models (Fan et al., 2014) Many other robust PCA, classification, portfolio allocation, ... M. Kolar (Chicago Booth) Jan 6, 2016 14 Robust Graphical Modeling Data: X1 , . . . , Xn ∼ T Ep (Σ, ξ; f1 , . . . , fp ) Underlying graph: Edge (a, b) ∈ E if ωab = 0 where Ω = Σ−1 = [ωkl ] b = [b Construct Σ σab ] where σ bab = sin π2 τbab and −1 P τbab = n2 i<i0 sign((Xia − Xi0 a )(Xib − Xi0 b )) is Kendall’s tau. Plug into, for example, GLasso objective b = arg maxΩ0 log |Ω| − tr ΣΩ b − λ||Ω||1 Ω (Liu et al., 2012a) M. Kolar (Chicago Booth) Jan 6, 2016 15 Robust Graphical Modeling b it is In order to establish statistical properties of the estimator Ω sufficient (Ravikumar et al., 2011) to control b − Σ||max = max |b ||Σ σab − σab | a,b Since σ bab is a Lipschitz function of a U-statistic with bounded kernel h i b − Σ||max ≥ Ct ≤ 2p2 exp(−nt2 /2). P ||Σ b as if the data Up to constants, the same rate statistical properties of Ω were generated from a multivariate normal distribution. M. Kolar (Chicago Booth) Jan 6, 2016 16 Inference and Hypothesis Testing for ωab Data: Y1 , . . . , Yn ∼ N (0, Σ), I = [p]\{a, b} and J = {a, b} −1 Fact: YJ | YI ∼ N (−ΣJI Σ−1 II YI , ΩJJ ) Algorithm Tβ bJ where βbJ is a (scaled) Lasso estimator Compute b i,J = Yi,J − Yi,I Compute d −1 d J ) Ω JJ = Var(b Under some conditions r n ω baa ω bbb + D 2 ω bab (b ωab − ωab ) − → N (0, 1) (Ren et al., 2013) M. Kolar (Chicago Booth) Jan 6, 2016 17 ROCKET Robust Confidence Intervals via Kendalls Tau M. Kolar (Chicago Booth) Jan 6, 2016 18 Inference Under Transelliptical Model Let Θab = θaa θab θba θbb = Ω−1 JJ = Cov(a , b ). Idea θab = E[a b ] = E Ya − YI0 γa Yb − YI0 γb = E [Ya Yb ] + γa E YI YI0 γb − E [Ya YI ] γb − E [Yb YI ] γa −1 where γa = Σ−1 II ΣIa and γb = ΣII ΣIb Our procedure constructs γ ba and γ bb b JJ + γ b II γ b aI γ b bI γ θbab = Σ ba Σ bb − Σ bb − Σ ba M. Kolar (Chicago Booth) Jan 6, 2016 19 Oracle estimator −1 Suppose that γa = Σ−1 II ΣIa , γb = ΣII ΣIb , and det(Θab ) are known. The oracle estimator ω eab = − θeab det(Θab ) b JJ + γa Σ b II γb − Σ b aI γb − Σ b bI γa . where θeab = Σ b − Σ. Note that ω eab − ωab is a linear function of Σ Since, σ bcd − σcd ≈ π 2 cos π 2 τcd (b τcd − τcd ), ω eab − ωab ≈ linear function of Tb − T, which is a U -statistic of the data. M. Kolar (Chicago Booth) Jan 6, 2016 20 Oracle Estimator Asymptotic Normality for Oracle √ ω C e − ω ab ab ≤ t − Φ(t) ≤ √ sup P n· S n ab t∈R where Sab is obtained using theory of U -statistics. Assumption: There exists Ckernel such that Cov(E[h(X, X 0 ) | X]) Ckernel Cov(h(X, X 0 )) where h(X, X 0 ) = sign(X − X 0 ) ⊗ sign(X − X 0 ). M. Kolar (Chicago Booth) Jan 6, 2016 21 Main results Estimation consistency √ If n & kn2 log(p), ||γa ||1 ≤ kn ||γa ||2 , λmax (Σ)/λmin (Σ) ≤ Ccov , r r kn log(pn ) kn2 log(pn ) and ||b γa − γa ||1 . , ||b γa − γa ||2 . n n then b − Θ|| e ∞ . kn log(pn ) . ||Θ n Asymptotic Normality √ ω bab − ωab C sup P n· ≤ t − Φ(t) ≤ √ b n t∈R Sab M. Kolar (Chicago Booth) Jan 6, 2016 22 How To Estimate γa ? Lasso γ ba = arg min γ, ||γ||1 ≤R 1 Tb b Ia + λ||γ||1 γ ΣII γ − γ T Σ 2 non-convex problem, however Loh and Wainwright (2013) need R so that ||γa ||1 ≤ R Dantzig selector n γ ba = arg min ||γ||1 M. Kolar (Chicago Booth) o b II γ − Σ b Ia ||∞ ≤ λ s.t. ||Σ Jan 6, 2016 23 Minimax optimality G0 (M, kn ) = P Ω = (Ωab )a,b∈[p] : maxa∈[p] b6=a 1I{Ωab 6= 0} ≤ kn , and M −1 ≤ λmin (Ω) ≤ λmax (Ω) ≤ M. where M is a constant greater than one. Theorem 1 in Ren et al. (2013) states that o n inf inf sup P |b ωab − ωab | ≥ 0 n−1 kn log(pn ) ∨ n−1/2 ≥ 0 . a,b ω bab G0 (M,kn ) M. Kolar (Chicago Booth) Jan 6, 2016 24 Main Technical Ingredient Prove sign-subgaussian property: Let Z ∼ N (0, Σ) and v ∈ Rn be > a unit vector. The random variable v > sign(Z) is σvminΣv (Σ) -subgaussian. b −Σ Deviation Σ q p For any k ≥ 1, Bk = u ∈ R : ||u||22 + ||u||21 k ≤1 . Let δ1 , δ2 ∈ (0, 1) and k ≥ 1, log(2/δ2 ) + (k + 1) log(12p) ≤ n. Then, with probability at least 1 − δ1 − δ2 , log 2 p2 /δ1 > b sup u (Σ − Σ)v ≤C1 k · n u,v∈Bk r log(2/δ2 ) + (k + 1) log(12p) . + C2 C(Σ) · n M. Kolar (Chicago Booth) Jan 6, 2016 25 Applications and Related work Consistency of PCA when p/n → 0 s-sparse PCA when s log(p)/n → 0 fast convergence of s-bandable Σ Existing results: Wegkamp and Zhao (2013), Han and Liu (2013) h i −(n/2)t2 /2 P |||Tb − T |||2 ≥ t ≤ 2p exp p|||Σ||| 2 2 +|||Σ||| +2pt 2 Mitra and Zhang (2014) S = {S : |||ΣSS |||2 < M, |S| < s} with |S| ≤ m h i P maxS∈S |||(Tb − T )SS |||2 ≥ C ∆s,m,t + ∆2s,m,t + ∆0s,m,t ≤ exp(−nt) where ∆s,m,t = p n−1 (d + log(m)) + t and ∆0s,m,t = M. Kolar (Chicago Booth) p n−1 log(m) + t + n−1 s log(p) + t Jan 6, 2016 26 Simulations Data generated from a 30 × 30 grid, sample size n = 400 Ωaa = 1, Ωab = M. Kolar (Chicago Booth) 0.24 for edges 0 for non edges , X ∼ EC(0, Ω−1 , t5 ) Jan 6, 2016 27 Simulations Check if the estimator is asymptotically normal (over 1000 trials): Quantiles of Ť(2,2),(2,3) ROCKET Pearson 8 6 3 6 4 2 4 2 1 2 0 0 0 −1 Quantiles of Ť(2,2),(3,3) −4 −4 −3 −4 −4 −2 −2 −2 −2 0 2 4 −6 −4 Standard Normal Quantiles −2 0 2 4 −6 −4 Standard Normal Quantiles 4 −2 0 2 4 Standard Normal Quantiles 6 3 5 4 2 2 1 0 0 −1 0 −2 −2 −4 −3 −4 −4 −2 0 2 4 −6 −4 Standard Normal Quantiles Quantiles of Ť(2,2),(10,10) Nonparanormal 4 −2 0 2 4 −5 −4 Standard Normal Quantiles 4 6 3 −2 0 2 4 Standard Normal Quantiles 5 4 2 2 1 0 0 −1 0 −2 −2 −4 −3 −4 −4 −2 0 2 Standard Normal Quantiles M. Kolar (Chicago Booth) 4 −6 −4 −2 0 2 Standard Normal Quantiles 4 −5 −4 −2 0 2 4 Standard Normal Quantiles Jan 6, 2016 28 Simulations ROCKET True edge Near non-edge Far non-edge Coverage 92.8 93.5 93.8 M. Kolar (Chicago Booth) Width 0.54 0.56 0.57 Pearson Coverage 66.6 74.2 74.8 Width 0.49 0.47 0.48 Nonparanormal Coverage 79.7 82.8 85.3 Width 0.34 0.33 0.33 Jan 6, 2016 29 Simulations Results for Gaussian data with the same Ω (grid graph): ROCKET True edge Near non-edge Far non-edge Coverage 93.3 94.7 94.7 Width 0.37 0.38 0.38 Pearson Coverage 93.3 94.1 95.2 Width 0.35 0.34 0.34 Nonparanormal Coverage 93.3 93.9 95.2 Width 0.35 0.34 0.34 All methods have ≈ 95% coverage ROCKET confidence intervals only slightly wider M. Kolar (Chicago Booth) Jan 6, 2016 30 Stock data Stock return data 452 stocks over 1257 days Xij = log (Yahoo Finance / R package huge) Closing price of stock j on day i + 1 Closing price of stock j on day i Methods: ROCKET Gaussian graphical model (“Pearson”) (Ren et al., 2013) Nonparanormal Liu et al. (2009) M. Kolar (Chicago Booth) Jan 6, 2016 31 Stock data The data does not seem to fit normal / nonparanormal model M. Kolar (Chicago Booth) Jan 6, 2016 32 Stock data Estimated graphs for 64 stocks (Materials & Consumer Staples) For each pair (a, b) of stocks, calculate p−value pab Draw edge if pab < 0.001 M. Kolar (Chicago Booth) Jan 6, 2016 33 Stock data Underlying distribution unknown −→ how to evaluate performance? Use sample splitting to check for asymptotic normality Split into 25 samples of size 50. Let X (l) ∈ R50×64 be the data on the lth subsample (l) (l) For each pair (a, b), calculate Ω̌ab and Šab . According to theory (l) zab = √ (l) (l) n · Ω̌ab /Šab ≈ √ (l) (l) n · Ωab /Sab +N (0, 1) | {z } µab M. Kolar (Chicago Booth) Jan 6, 2016 34 Stock data (1) (25) For each (a, b), zab , . . . , zab (1) are ≈ i.i.d. draws from N (µab , 1) (25) Sample variance of zab , . . . , zab M. Kolar (Chicago Booth) should be ≈ 1 Jan 6, 2016 35 Summary The ROCKET method: Theoretical guarantees for asymptotic normality over the transelliptical family Confidence intervals have the right coverage Practical recommendation: we should use the transelliptical family in practice Code: https://github.com/mlakolar/ROCKET Preprint: arXiv:1502.07641 M. Kolar (Chicago Booth) Jan 6, 2016 36 Dynamic networks M. Kolar (Chicago Booth) Jan 6, 2016 37 Dynamic networks Data collected over a period of time is easily accessible M. Kolar (Chicago Booth) Jan 6, 2016 38 Changing Associations Between Stock Prices M. Kolar (Chicago Booth) Jan 6, 2016 39 Setting Given n i.i.d. copies of Y = (X, Z), where Z ∼ fZ (z), Z ∈ [0, 1] X | Z = z ∼ NPNd (0, Σ(z), f ) Focus on the following three testing problems: Edge presence test: H0 : Ωjk (z0 ) = 0 for a fixed z0 ∈ (0, 1) and j, k ∈ [d]; Super-graph test: H0 : G∗ (z0 ) ⊂ G for a fixed z0 ∈ (0, 1) and a fixed graph G; Uniform edge presence test: H0 : G∗ (z) ⊂ G for all z ∈ [zL , zU ] ⊂ [0, 1] and fixed G. M. Kolar (Chicago Booth) Jan 6, 2016 40 Testing statistic The score function b T (z) Σ(z)β b Sbz|(j,k) (β) = Ω − ek . j A level α test for H0 : Ωjk (z0 ) = 0 can be constructed using the fact that √ b k\j (z0 ) nh · σ bjk (z0 )−1 · Sbz0 |(j,k) Ω N (0, 1) For the null hypothesis H0 : G∗ (z) ⊂ G for all z ∈ [zL , zU ], we use √ b k\j (z) WG = nh sup max Un [ωz ]Sbz|(j,k) Ω z∈[zL ,zU ] (j,k)∈E c and estimate its quantile using the multiplier bootstrap. M. Kolar (Chicago Booth) Jan 6, 2016 41 Estimating Ω(z) b Local Kendall’s tau correlation matrix T(z) = [b τjk (z)] ∈ Rd×d P 0 ωz (Zi , Zi0 )sign(Xij − Xi0 j )sign(Xik − Xi0 k ) P τbjk (z) = i<i , where i<i0 ωz (Zi , Zi0 ) ωz (Zi , Zi0 ) = Kh (Zi − z) Kh (Zi0 − z) Latent correlation matrix π b b T(z) , Σ(z) = sin 2 Latent precision matrix b CCLIME , κ Ω bj = argmin kβk1 +γκ s.t. j b kΣβ−e j k∞ ≤ λκ, kβk1 ≤ κ β∈Rd ,τ ∈R M. Kolar (Chicago Booth) Jan 6, 2016 42 Estimating Ω(z) For hl n/ log(dn) → ∞, hu = o(1) and λ ≥ CΣ h2 + p log(d/h)/(nh) , 1 b Ω(z) − Ω(z) ≤ C; max 2 h∈[hl ,hu ] z∈[0,1] λM 1 b Ω(z) − Ω(z)1 ≤ C; sup sup h∈[hl ,hu ] z∈[0,1] λsM sup sup sup max 1 z∈[0,1] j∈[d] λM bTΣ b − ej k∞ ≤ C. · kΩ j The result hinges on b Σ(z) − Σ(z)max q sup sup ≤ CΣ . h∈[hl ,hu ] z∈[0,1] h2 + (nh)−1 log(d/h) ∨ log δ −1 log hu h−1 l M. Kolar (Chicago Booth) Jan 6, 2016 43 Where to find more information? Preprint: arXiv:1512.08298 Code: https://github.com/dreamwaylu/dynamicGraph M. Kolar (Chicago Booth) Jan 6, 2016 44 Thank you! M. Kolar (Chicago Booth) Jan 6, 2016 45 References I T. T. Cai, W. Liu, and X. Luo. A constrained `1 minimization approach to sparse precision matrix estimation. J. Am. Stat. Assoc., 106(494):594–607, 2011. J. Fan, H. Liu, Y. Ning, and H. Zou. High dimensional semiparametric latent graphical model for mixed data. ArXiv e-prints, arXiv:1404.7236, April 2014. H. Liu, J. D. Lafferty, and L. A. Wasserman. The nonparanormal: Semiparametric estimation of high dimensional undirected graphs. J. Mach. Learn. Res., 10:2295–2328, 2009. H. Liu, F. Han, M. Yuan, J. D. Lafferty, and L. A. Wasserman. High-dimensional semiparametric Gaussian copula graphical models. Ann. Stat., 40(4):2293–2326, 2012a. H. Liu, F. Han, and C.-H. Zhang. Transelliptical graphical models. In P. Bartlett, F. Pereira, C. Burges, L. Bottou, and K. Weinberger, editors, Proc. of NIPS, pages 809–817. 2012b. M. Kolar (Chicago Booth) Jan 6, 2016 46 References II P.-L. Loh and M. J. Wainwright. Regularized m-estimators with nonconvexity: Statistical and algorithmic theory for local optima. arXiv preprint arXiv:1305.2436, 2013. N. Meinshausen and P. Bühlmann. High dimensional graphs and variable selection with the lasso. Ann. Stat., 34(3):1436–1462, 2006. R. Mitra and C.-H. Zhang. Multivariate analysis of nonparametric estimates of large correlation matrices. ArXiv e-prints, arXiv:1403.6195, March 2014. P. Ravikumar, M. J. Wainwright, and J. D. Lafferty. High-dimensional ising model selection using `1 -regularized logistic regression. Ann. Stat., 38(3):1287–1319, 2010. P. Ravikumar, M. J. Wainwright, G. Raskutti, and B. Yu. High-dimensional covariance estimation by minimizing `1 -penalized log-determinant divergence. Electron. J. Stat., 5:935–980, 2011. M. Kolar (Chicago Booth) Jan 6, 2016 47 References III Z. Ren, T. Sun, C.-H. Zhang, and H. H. Zhou. Asymptotic normality and optimalities in estimation of large gaussian graphical model. arXiv preprint arXiv:1309.6024, 2013. M. Wegkamp and Y. Zhao. Adaptive estimation of the copula correlation matrix for semiparametric elliptical copulas. ArXiv e-prints, arXiv:1305.6526, May 2013. L. Xue and H. Zou. Regularized rank-based estimation of high-dimensional nonparanormal graphical models. Ann. Stat., 40 (5):2541–2571, 2012. L. Xue, H. Zou, and T. Ca. Nonconcave penalized composite conditional likelihood estimation of sparse ising models. Ann. Stat., 40(3):1403–1429, 2012. M. Kolar (Chicago Booth) Jan 6, 2016 48 References IV E. Yang, G. I. Allen, Z. Liu, and P. Ravikumar. Graphical models via generalized linear models. In F. Pereira, C. Burges, L. Bottou, and K. Weinberger, editors, Advances in Neural Information Processing Systems 25, pages 1358–1366. Curran Associates, Inc., 2012. E. Yang, P. Ravikumar, G. I. Allen, and Z. Liu. On graphical models via univariate exponential family distributions. ArXiv e-prints, arXiv:1301.4183, January 2013a. E. Yang, P. Ravikumar, G. I. Allen, and Z. Liu. On poisson graphical models. In C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Weinberger, editors, Advances in Neural Information Processing Systems 26, pages 1718–1726. Curran Associates, Inc., 2013b. E. Yang, Y. Baker, P. Ravikumar, G. I. Allen, and Z. Liu. Mixed graphical models via exponential families. In Proc. 17th Int. Conf, Artif. Intel. Stat., pages 1042–1050, 2014. M. Yuan and Y. Lin. Model selection and estimation in the gaussian graphical model. Biometrika, 94(1):19–35, 2007. M. Kolar (Chicago Booth) Jan 6, 2016 49