Inference in High-dimensional Semi-parametric Graphical Models Mladen Kolar Jan 6, 2016

advertisement
Inference in High-dimensional Semi-parametric
Graphical Models
Mladen Kolar
The University of Chicago
Booth School of Business
Jan 6, 2016
Acknowledgments
Rina Foygel Barber
M. Kolar (Chicago Booth)
Junwei Lu
Han Liu
Jan 6, 2016
2
Scientists Are Interested in Networks
Networks are useful for
visualization
discovery of regularity
patterns
exploratory analysis
...
of complex systems.
M. Kolar (Chicago Booth)
Jan 6, 2016
3
What Network Should Scientists Learn From Data?
5
6
4
3
?
1
2
threshold covariance
conditional independence structure
...
M. Kolar (Chicago Booth)
Jan 6, 2016
4
Probabilistic Graphical Models
- Graph G = (V, E) with p nodes
- Random vector X = (X1 , . . . , Xp )0
Represents conditional independence relationships between nodes
Useful for exploring associations between measured variables
5
(a, b) 6∈ E ⇐⇒ Xa ⊥ Xb | Xab
X1 ⊥ X6 | X2 , . . . , X5
M. Kolar (Chicago Booth)
4
6
3
1
2
Jan 6, 2016
5
Structure Learning Problem
Given an i.i.d. sample Dn = {xi }ni=1 from a distribution P ∈ P
b = G(D
b n)
Learn the set of conditional independence relationships G
(Some) Existing Work:
Gaussian graphical models:
GLasso (Yuan and Lin, 2007)
CLIME (Cai et al., 2011)
neighborhood selection (Meinshausen and Bühlmann, 2006)
Ising model
neighborhood selection (Ravikumar et al., 2010)
composite likelihood (Xue et al., 2012)
Exponential family graphical models
exponential (Yang et al., 2012, 2013a)
Poisson (Yang et al., 2013b)
mixed (Yang et al., 2014) ...
M. Kolar (Chicago Booth)
Jan 6, 2016
6
Implications for Science
5
4
6
3
1
2
Some questions remain unanswered:
How can we quantify uncertainty of estimated graph structure?
How certain we are there is an edge between nodes a and b?
How to construct honest, robust tests about edge parameters?
M. Kolar (Chicago Booth)
Jan 6, 2016
7
Quantifying uncertainty
For Gaussian graphical model
inference on values of the precision matrix Ω using an
asymptotically normal estimator (Ren et al., 2013)
This talk
Construction of confidence intervals for edge parameters in
transelliptical graphical models
M. Kolar (Chicago Booth)
Jan 6, 2016
8
Transelliptical Graphical Models
M. Kolar (Chicago Booth)
Jan 6, 2016
9
Background: Nonparanormal model / Gaussian copula
Nonparanormal distribution:
X ∼ N P Np (Σ; f1 , . . . , fp )
if
(f1 (X1 ), . . . , fp (Xp ))T ∼ N (0, Σ)
(Liu et al., 2009)
M. Kolar (Chicago Booth)
Jan 6, 2016
10
Background: Transelliptical Distribution
Transelliptical distribution:
X ∼ T Ep (Σ, ξ; f1 , . . . , fp )
if
(f1 (X1 ), . . . , fp (Xp ))T ∼ ECP (0, Σ, ξ)
where Σ = [σab ]a,b ∈ Rp is a correlation matrix and P[ξ = 0] = 0.
Elliptical distribution:
Z ∼ ECp (µ, Σ, ξ)
if
Z =µ+
ξ
|{z}
random radius
Σ1/2
U
|{z}
random unit vector
(Liu et al., 2012b)
M. Kolar (Chicago Booth)
Jan 6, 2016
11
Tail dependence
Gaussian graphical model
↓
Elliptical model
→
→
Nonparanormal models
↓
Transelliptical model
→ = allow non-Gaussian marginals
↓ = allow heavy tail dependence
M. Kolar (Chicago Booth)
Jan 6, 2016
12
Tail dependence
Elliptical and transelliptical distributions allow for heavy tail
dependence between variables.
(X1 , X2 ) ∼ multivariate t-distribution with d degrees of freedom
Tail correlations: Corr 1I X1 ≥ qαX1 , 1I Xb ≥ qαX2
1
0.9
0.8
d=0.1
Tail correlation
0.7
d=1
0.6
0.5
d=5
0.4
0.3
d=10
d=
(Gaussian)
0.2
0.1
0
0.5
0.6
0.7
0.8
0.9
1
Quantile
M. Kolar (Chicago Booth)
Jan 6, 2016
13
Some applications
Robust graphical modeling
Nonparanormal (Gaussian copula)
(Liu et al., 2009, 2012a; Xue and Zou, 2012)
Transelliptical (Elliptical copula)
(Liu et al., 2012b)
Mixed graphical models
(Fan et al., 2014)
Many other
robust PCA, classification, portfolio allocation, ...
M. Kolar (Chicago Booth)
Jan 6, 2016
14
Robust Graphical Modeling
Data: X1 , . . . , Xn ∼ T Ep (Σ, ξ; f1 , . . . , fp )
Underlying graph: Edge (a, b) ∈ E if ωab = 0 where Ω = Σ−1 = [ωkl ]
b = [b
Construct Σ
σab ] where σ
bab = sin π2 τbab and
−1 P
τbab = n2
i<i0 sign((Xia − Xi0 a )(Xib − Xi0 b ))
is Kendall’s tau.
Plug into, for example, GLasso objective
b = arg maxΩ0 log |Ω| − tr ΣΩ
b − λ||Ω||1
Ω
(Liu et al., 2012a)
M. Kolar (Chicago Booth)
Jan 6, 2016
15
Robust Graphical Modeling
b it is
In order to establish statistical properties of the estimator Ω
sufficient (Ravikumar et al., 2011) to control
b − Σ||max = max |b
||Σ
σab − σab |
a,b
Since σ
bab is a Lipschitz function of a U-statistic with bounded kernel
h
i
b − Σ||max ≥ Ct ≤ 2p2 exp(−nt2 /2).
P ||Σ
b as if the data
Up to constants, the same rate statistical properties of Ω
were generated from a multivariate normal distribution.
M. Kolar (Chicago Booth)
Jan 6, 2016
16
Inference and Hypothesis Testing for ωab
Data: Y1 , . . . , Yn ∼ N (0, Σ), I = [p]\{a, b} and J = {a, b}
−1
Fact: YJ | YI ∼ N (−ΣJI Σ−1
II YI , ΩJJ )
Algorithm
Tβ
bJ where βbJ is a (scaled) Lasso estimator
Compute b
i,J = Yi,J − Yi,I
Compute
d
−1
d J )
Ω
JJ = Var(b
Under some conditions
r
n
ω
baa ω
bbb +
D
2
ω
bab
(b
ωab − ωab ) −
→ N (0, 1)
(Ren et al., 2013)
M. Kolar (Chicago Booth)
Jan 6, 2016
17
ROCKET
Robust Confidence Intervals via Kendalls Tau
M. Kolar (Chicago Booth)
Jan 6, 2016
18
Inference Under Transelliptical Model
Let
Θab =
θaa θab
θba θbb
= Ω−1
JJ = Cov(a , b ).
Idea
θab = E[a b ]
= E Ya − YI0 γa Yb − YI0 γb
= E [Ya Yb ] + γa E YI YI0 γb − E [Ya YI ] γb − E [Yb YI ] γa
−1
where γa = Σ−1
II ΣIa and γb = ΣII ΣIb
Our procedure constructs γ
ba and γ
bb
b JJ + γ
b II γ
b aI γ
b bI γ
θbab = Σ
ba Σ
bb − Σ
bb − Σ
ba
M. Kolar (Chicago Booth)
Jan 6, 2016
19
Oracle estimator
−1
Suppose that γa = Σ−1
II ΣIa , γb = ΣII ΣIb , and det(Θab ) are known.
The oracle estimator
ω
eab = −
θeab
det(Θab )
b JJ + γa Σ
b II γb − Σ
b aI γb − Σ
b bI γa .
where θeab = Σ
b − Σ.
Note that ω
eab − ωab is a linear function of Σ
Since, σ
bcd − σcd ≈
π
2
cos
π
2 τcd
(b
τcd − τcd ),
ω
eab − ωab ≈ linear function of Tb − T,
which is a U -statistic of the data.
M. Kolar (Chicago Booth)
Jan 6, 2016
20
Oracle Estimator
Asymptotic Normality for Oracle
√ ω
C
e
−
ω
ab
ab
≤ t − Φ(t) ≤ √
sup P
n·
S
n
ab
t∈R
where Sab is obtained using theory of U -statistics.
Assumption: There exists Ckernel such that
Cov(E[h(X, X 0 ) | X]) Ckernel Cov(h(X, X 0 ))
where h(X, X 0 ) = sign(X − X 0 ) ⊗ sign(X − X 0 ).
M. Kolar (Chicago Booth)
Jan 6, 2016
21
Main results
Estimation consistency
√
If n & kn2 log(p), ||γa ||1 ≤ kn ||γa ||2 , λmax (Σ)/λmin (Σ) ≤ Ccov ,
r
r
kn log(pn )
kn2 log(pn )
and ||b
γa − γa ||1 .
,
||b
γa − γa ||2 .
n
n
then
b − Θ||
e ∞ . kn log(pn ) .
||Θ
n
Asymptotic Normality
√ ω
bab − ωab
C
sup P
n·
≤ t − Φ(t) ≤ √
b
n
t∈R
Sab
M. Kolar (Chicago Booth)
Jan 6, 2016
22
How To Estimate γa ?
Lasso
γ
ba = arg
min
γ, ||γ||1 ≤R
1 Tb
b Ia + λ||γ||1
γ ΣII γ − γ T Σ
2
non-convex problem, however Loh and Wainwright (2013)
need R so that ||γa ||1 ≤ R
Dantzig selector
n
γ
ba = arg min ||γ||1
M. Kolar (Chicago Booth)
o
b II γ − Σ
b Ia ||∞ ≤ λ
s.t. ||Σ
Jan 6, 2016
23
Minimax optimality
G0 (M, kn ) =
P
Ω = (Ωab )a,b∈[p] : maxa∈[p] b6=a 1I{Ωab 6= 0} ≤ kn ,
and M −1 ≤ λmin (Ω) ≤ λmax (Ω) ≤ M.
where M is a constant greater than one.
Theorem 1 in Ren et al. (2013) states that
o
n
inf inf sup P |b
ωab − ωab | ≥ 0 n−1 kn log(pn ) ∨ n−1/2
≥ 0 .
a,b ω
bab G0 (M,kn )
M. Kolar (Chicago Booth)
Jan 6, 2016
24
Main Technical Ingredient
Prove sign-subgaussian property: Let Z ∼ N (0, Σ) and v ∈ Rn be
>
a unit vector. The random variable v > sign(Z) is σvminΣv
(Σ) -subgaussian.
b −Σ
Deviation Σ
q
p
For any k ≥ 1, Bk = u ∈ R : ||u||22 +
||u||21
k
≤1 .
Let δ1 , δ2 ∈ (0, 1) and k ≥ 1, log(2/δ2 ) + (k + 1) log(12p) ≤ n. Then,
with probability at least 1 − δ1 − δ2 ,
log 2 p2 /δ1
> b
sup u (Σ − Σ)v ≤C1 k ·
n
u,v∈Bk
r
log(2/δ2 ) + (k + 1) log(12p)
.
+ C2 C(Σ) ·
n
M. Kolar (Chicago Booth)
Jan 6, 2016
25
Applications and Related work
Consistency of PCA when p/n → 0
s-sparse PCA when s log(p)/n → 0
fast convergence of s-bandable Σ
Existing results: Wegkamp and Zhao (2013), Han and Liu (2013)
h
i
−(n/2)t2 /2
P |||Tb − T |||2 ≥ t ≤ 2p exp p|||Σ|||
2
2 +|||Σ||| +2pt
2
Mitra and Zhang (2014) S = {S : |||ΣSS |||2 < M, |S| < s} with |S| ≤ m
h
i
P maxS∈S |||(Tb − T )SS |||2 ≥ C ∆s,m,t + ∆2s,m,t + ∆0s,m,t ≤ exp(−nt)
where ∆s,m,t =
p
n−1 (d + log(m)) + t and ∆0s,m,t =
M. Kolar (Chicago Booth)
p
n−1 log(m) + t + n−1 s log(p) + t
Jan 6, 2016
26
Simulations
Data generated from a 30 × 30 grid, sample size n = 400
Ωaa = 1,
Ωab =
M. Kolar (Chicago Booth)
0.24 for edges
0
for non edges
,
X ∼ EC(0, Ω−1 , t5 )
Jan 6, 2016
27
Simulations
Check if the estimator is asymptotically normal (over 1000 trials):
Quantiles of Ť(2,2),(2,3)
ROCKET
Pearson
8
6
3
6
4
2
4
2
1
2
0
0
0
−1
Quantiles of Ť(2,2),(3,3)
−4
−4
−3
−4
−4
−2
−2
−2
−2
0
2
4
−6
−4
Standard Normal Quantiles
−2
0
2
4
−6
−4
Standard Normal Quantiles
4
−2
0
2
4
Standard Normal Quantiles
6
3
5
4
2
2
1
0
0
−1
0
−2
−2
−4
−3
−4
−4
−2
0
2
4
−6
−4
Standard Normal Quantiles
Quantiles of Ť(2,2),(10,10)
Nonparanormal
4
−2
0
2
4
−5
−4
Standard Normal Quantiles
4
6
3
−2
0
2
4
Standard Normal Quantiles
5
4
2
2
1
0
0
−1
0
−2
−2
−4
−3
−4
−4
−2
0
2
Standard Normal Quantiles
M. Kolar (Chicago Booth)
4
−6
−4
−2
0
2
Standard Normal Quantiles
4
−5
−4
−2
0
2
4
Standard Normal Quantiles
Jan 6, 2016
28
Simulations
ROCKET
True edge
Near non-edge
Far non-edge
Coverage
92.8
93.5
93.8
M. Kolar (Chicago Booth)
Width
0.54
0.56
0.57
Pearson
Coverage
66.6
74.2
74.8
Width
0.49
0.47
0.48
Nonparanormal
Coverage
79.7
82.8
85.3
Width
0.34
0.33
0.33
Jan 6, 2016
29
Simulations
Results for Gaussian data with the same Ω (grid graph):
ROCKET
True edge
Near non-edge
Far non-edge
Coverage
93.3
94.7
94.7
Width
0.37
0.38
0.38
Pearson
Coverage
93.3
94.1
95.2
Width
0.35
0.34
0.34
Nonparanormal
Coverage
93.3
93.9
95.2
Width
0.35
0.34
0.34
All methods have ≈ 95% coverage
ROCKET confidence intervals only slightly wider
M. Kolar (Chicago Booth)
Jan 6, 2016
30
Stock data
Stock return data 452 stocks over 1257 days
Xij = log
(Yahoo Finance / R package huge)
Closing price of stock j on day i + 1
Closing price of stock j on day i
Methods:
ROCKET
Gaussian graphical model (“Pearson”) (Ren et al., 2013)
Nonparanormal Liu et al. (2009)
M. Kolar (Chicago Booth)
Jan 6, 2016
31
Stock data
The data does not seem to fit normal / nonparanormal model
M. Kolar (Chicago Booth)
Jan 6, 2016
32
Stock data
Estimated graphs for 64 stocks (Materials & Consumer Staples)
For each pair (a, b) of stocks, calculate p−value pab
Draw edge if pab < 0.001
M. Kolar (Chicago Booth)
Jan 6, 2016
33
Stock data
Underlying distribution unknown −→ how to evaluate performance?
Use sample splitting to check for asymptotic normality
Split into 25 samples of size 50.
Let X (l) ∈ R50×64 be the data on the lth subsample
(l)
(l)
For each pair (a, b), calculate Ω̌ab and Šab .
According to theory
(l)
zab =
√
(l)
(l)
n · Ω̌ab /Šab ≈
√
(l)
(l)
n · Ωab /Sab +N (0, 1)
|
{z
}
µab
M. Kolar (Chicago Booth)
Jan 6, 2016
34
Stock data
(1)
(25)
For each (a, b), zab , . . . , zab
(1)
are ≈ i.i.d. draws from N (µab , 1)
(25)
Sample variance of zab , . . . , zab
M. Kolar (Chicago Booth)
should be ≈ 1
Jan 6, 2016
35
Summary
The ROCKET method:
Theoretical guarantees for asymptotic normality over the
transelliptical family
Confidence intervals have the right coverage
Practical recommendation: we should use the transelliptical family
in practice
Code: https://github.com/mlakolar/ROCKET
Preprint: arXiv:1502.07641
M. Kolar (Chicago Booth)
Jan 6, 2016
36
Dynamic networks
M. Kolar (Chicago Booth)
Jan 6, 2016
37
Dynamic networks
Data collected over a period of
time is easily accessible
M. Kolar (Chicago Booth)
Jan 6, 2016
38
Changing Associations Between Stock Prices
M. Kolar (Chicago Booth)
Jan 6, 2016
39
Setting
Given n i.i.d. copies of Y = (X, Z), where
Z ∼ fZ (z),
Z ∈ [0, 1]
X | Z = z ∼ NPNd (0, Σ(z), f )
Focus on the following three testing problems:
Edge presence test: H0 : Ωjk (z0 ) = 0 for a fixed z0 ∈ (0, 1) and
j, k ∈ [d];
Super-graph test: H0 : G∗ (z0 ) ⊂ G for a fixed z0 ∈ (0, 1) and a
fixed graph G;
Uniform edge presence test:
H0 : G∗ (z) ⊂ G for all z ∈ [zL , zU ] ⊂ [0, 1] and fixed G.
M. Kolar (Chicago Booth)
Jan 6, 2016
40
Testing statistic
The score function
b T (z) Σ(z)β
b
Sbz|(j,k) (β) = Ω
− ek .
j
A level α test for H0 : Ωjk (z0 ) = 0 can be constructed using the fact
that
√
b k\j (z0 )
nh · σ
bjk (z0 )−1 · Sbz0 |(j,k) Ω
N (0, 1)
For the null hypothesis H0 : G∗ (z) ⊂ G for all z ∈ [zL , zU ], we use
√
b k\j (z)
WG = nh sup
max Un [ωz ]Sbz|(j,k) Ω
z∈[zL ,zU ] (j,k)∈E
c
and estimate its quantile using the multiplier bootstrap.
M. Kolar (Chicago Booth)
Jan 6, 2016
41
Estimating Ω(z)
b
Local Kendall’s tau correlation matrix T(z)
= [b
τjk (z)] ∈ Rd×d
P
0 ωz (Zi , Zi0 )sign(Xij − Xi0 j )sign(Xik − Xi0 k )
P
τbjk (z) = i<i
, where
i<i0 ωz (Zi , Zi0 )
ωz (Zi , Zi0 ) = Kh (Zi − z) Kh (Zi0 − z)
Latent correlation matrix
π
b
b
T(z)
,
Σ(z)
= sin
2
Latent precision matrix
b CCLIME , κ
Ω
bj = argmin kβk1 +γκ s.t.
j
b
kΣβ−e
j k∞ ≤ λκ, kβk1 ≤ κ
β∈Rd ,τ ∈R
M. Kolar (Chicago Booth)
Jan 6, 2016
42
Estimating Ω(z)
For hl n/ log(dn) → ∞, hu = o(1) and λ ≥ CΣ h2 +
p
log(d/h)/(nh) ,
1 b
Ω(z)
−
Ω(z)
≤ C;
max
2
h∈[hl ,hu ] z∈[0,1] λM
1 b
Ω(z)
− Ω(z)1 ≤ C;
sup sup
h∈[hl ,hu ] z∈[0,1] λsM
sup
sup
sup max
1
z∈[0,1] j∈[d] λM
bTΣ
b − ej k∞ ≤ C.
· kΩ
j
The result hinges on
b
Σ(z)
− Σ(z)max
q
sup sup
≤ CΣ .
h∈[hl ,hu ] z∈[0,1] h2 +
(nh)−1 log(d/h) ∨ log δ −1 log hu h−1
l
M. Kolar (Chicago Booth)
Jan 6, 2016
43
Where to find more information?
Preprint: arXiv:1512.08298
Code: https://github.com/dreamwaylu/dynamicGraph
M. Kolar (Chicago Booth)
Jan 6, 2016
44
Thank you!
M. Kolar (Chicago Booth)
Jan 6, 2016
45
References I
T. T. Cai, W. Liu, and X. Luo. A constrained `1 minimization
approach to sparse precision matrix estimation. J. Am. Stat. Assoc.,
106(494):594–607, 2011.
J. Fan, H. Liu, Y. Ning, and H. Zou. High dimensional semiparametric
latent graphical model for mixed data. ArXiv e-prints,
arXiv:1404.7236, April 2014.
H. Liu, J. D. Lafferty, and L. A. Wasserman. The nonparanormal:
Semiparametric estimation of high dimensional undirected graphs. J.
Mach. Learn. Res., 10:2295–2328, 2009.
H. Liu, F. Han, M. Yuan, J. D. Lafferty, and L. A. Wasserman.
High-dimensional semiparametric Gaussian copula graphical models.
Ann. Stat., 40(4):2293–2326, 2012a.
H. Liu, F. Han, and C.-H. Zhang. Transelliptical graphical models. In
P. Bartlett, F. Pereira, C. Burges, L. Bottou, and K. Weinberger,
editors, Proc. of NIPS, pages 809–817. 2012b.
M. Kolar (Chicago Booth)
Jan 6, 2016
46
References II
P.-L. Loh and M. J. Wainwright. Regularized m-estimators with
nonconvexity: Statistical and algorithmic theory for local optima.
arXiv preprint arXiv:1305.2436, 2013.
N. Meinshausen and P. Bühlmann. High dimensional graphs and
variable selection with the lasso. Ann. Stat., 34(3):1436–1462, 2006.
R. Mitra and C.-H. Zhang. Multivariate analysis of nonparametric
estimates of large correlation matrices. ArXiv e-prints,
arXiv:1403.6195, March 2014.
P. Ravikumar, M. J. Wainwright, and J. D. Lafferty. High-dimensional
ising model selection using `1 -regularized logistic regression. Ann.
Stat., 38(3):1287–1319, 2010.
P. Ravikumar, M. J. Wainwright, G. Raskutti, and B. Yu.
High-dimensional covariance estimation by minimizing `1 -penalized
log-determinant divergence. Electron. J. Stat., 5:935–980, 2011.
M. Kolar (Chicago Booth)
Jan 6, 2016
47
References III
Z. Ren, T. Sun, C.-H. Zhang, and H. H. Zhou. Asymptotic normality
and optimalities in estimation of large gaussian graphical model.
arXiv preprint arXiv:1309.6024, 2013.
M. Wegkamp and Y. Zhao. Adaptive estimation of the copula
correlation matrix for semiparametric elliptical copulas. ArXiv
e-prints, arXiv:1305.6526, May 2013.
L. Xue and H. Zou. Regularized rank-based estimation of
high-dimensional nonparanormal graphical models. Ann. Stat., 40
(5):2541–2571, 2012.
L. Xue, H. Zou, and T. Ca. Nonconcave penalized composite
conditional likelihood estimation of sparse ising models. Ann. Stat.,
40(3):1403–1429, 2012.
M. Kolar (Chicago Booth)
Jan 6, 2016
48
References IV
E. Yang, G. I. Allen, Z. Liu, and P. Ravikumar. Graphical models via
generalized linear models. In F. Pereira, C. Burges, L. Bottou, and
K. Weinberger, editors, Advances in Neural Information Processing
Systems 25, pages 1358–1366. Curran Associates, Inc., 2012.
E. Yang, P. Ravikumar, G. I. Allen, and Z. Liu. On graphical models
via univariate exponential family distributions. ArXiv e-prints,
arXiv:1301.4183, January 2013a.
E. Yang, P. Ravikumar, G. I. Allen, and Z. Liu. On poisson graphical
models. In C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and
K. Weinberger, editors, Advances in Neural Information Processing
Systems 26, pages 1718–1726. Curran Associates, Inc., 2013b.
E. Yang, Y. Baker, P. Ravikumar, G. I. Allen, and Z. Liu. Mixed
graphical models via exponential families. In Proc. 17th Int. Conf,
Artif. Intel. Stat., pages 1042–1050, 2014.
M. Yuan and Y. Lin. Model selection and estimation in the gaussian
graphical model. Biometrika, 94(1):19–35, 2007.
M. Kolar (Chicago Booth)
Jan 6, 2016
49
Download