Sure Screening for Gaussian Graphical Models Shikai Luo, Rui Song, Daniela Witten Siyuan Ma March 20, 2015 Siyuan Ma GRASS March 20, 2015 1 / 29 Outline 1 Introduction Gaussian Graphical Models Conditional Dependency High Dimensionality Previous Works 2 The GRASS Algorithm Graphical Sure Screening Theoretical Properties 3 Results Simulation Analysis of Gene Expression Data Siyuan Ma GRASS March 20, 2015 2 / 29 Introduction Gaussian Graphical Models Graphical Models In recent years, graphical modeling (network analyses) has been a topic of great interest in both the scientific and statistical communities. In particular, in genomics, graphical models have been extensively used to model gene regulatory networks. Siyuan Ma GRASS March 20, 2015 3 / 29 Introduction Gaussian Graphical Models Graphical Models Gene Regulatory Network of TF family GLI in human. Credit: http://rulai.cshl.edu/TRED/GRN/Gli.htm Siyuan Ma GRASS March 20, 2015 4 / 29 Introduction Gaussian Graphical Models Gaussian Graphical Model Consider the gene expression random vector X = (X1 , X2 , . . . , Xp )T . Each Xj would correspond to a gene, i.e., a row in a gene expresison matrix. Each sample (column) in a gene expression matrix could be viewed as a realization of X . We could make the assumption that X ∼ N(0, Σ). Could furthur assume that each gene has mean 0 and standard deviation 1. Siyuan Ma GRASS March 20, 2015 5 / 29 Introduction Gaussian Graphical Models Gaussian Graphical Model Translating the covariance matrix Σ into a graph (network) G: Σ= 1 σ21 .. . .. . .. . σ12 1 .. . .. . .. . σ13 · · · σ23 · · · .. .. . . .. σij . .. .. . . σp1 σp2 σp3 · · · Siyuan Ma σ1p σ2p .. . .. . .. . 1 GRASS March 20, 2015 6 / 29 Introduction Gaussian Graphical Models Gaussian Graphical Model |σij | > 0 ⇔ there is an edge in graph G between gene i and gene j. In practice, usually have an estimate Σ̂ for Σ. Could set some threshold for |σ̂ij | to determine whether edges exist between genes. Siyuan Ma GRASS March 20, 2015 7 / 29 Introduction Conditional Dependency Conditional vs. Marginal Dependency: An Example Genes X1 , X2 , X3 , X4 . X3 and X4 are independent regulators. X1 and X2 are regulated by X3 and X4 . X1 = X3 + X4 + 1 . X2 = X3 + X4 /2 + 2 . 1 and 2 are independent noises. Given X3 and X4 , X1 and X2 are independent. Marginally, X1 and X2 are not independent. Siyuan Ma GRASS March 20, 2015 8 / 29 Introduction Conditional Dependency Conditional vs. Marginal Dependency The marginal dependency is determined by Σ. It’s easy to estimate and used a lot. The conditional dependency is determiend by Σ−1 . It is much harder to estimate, but might reveal more meaningful relationships. Siyuan Ma GRASS March 20, 2015 9 / 29 Introduction High Dimensionality High Dimensionality Article: Karell et al., 2015 Siyuan Ma GRASS March 20, 2015 10 / 29 Introduction High Dimensionality High Dimensionality High dimensionality problems arise when the number of genes (p) is greater than the number of samples (n): This is a problem: This is not a problem: n=50 z p = 200 x1,1 .. . .. . .. . }| ··· .. . .. . .. . x200,1 · · · Siyuan Ma n=1000 x1,50 .. . .. . .. . { z p = 200 x200,50 GRASS ··· .. . .. . .. . }| ··· .. . .. . .. . x1,1000 .. . .. . .. . x200,1 · · · ··· x200,1000 x1,1 .. . .. . .. . March 20, 2015 { 11 / 29 Introduction High Dimensionality High Dimensionality In our case, high dimensionality leads to a singular Σ̂, which means we can’t simply estimate Σ−1 using Σ̂−1 . Siyuan Ma GRASS March 20, 2015 12 / 29 Introduction Previous Works The Graphical Lasso One of the previously established approaches is the graphical lasso: X |Θij | Θ = argmaxΘ log det(Θ) − trace[(X T X /n)Θ] − λ i6=j Solving this problem is not straight forward and computationally expensive. Siyuan Ma GRASS March 20, 2015 13 / 29 The GRASS Algorithm Graphical Sure Screening Graphical Sure Screening The goal is to estimate Σ−1 , or equivalently, estimate the conditional dependency graph E. Given our normal assumption, the GRASS algorithm is surprisingly simple: Given an estimate Σ̂ for Σ. Claim that gene i and j are conditionally dependent if |σ̂ij | > γn , where γn is some threshold. This would give us an estimated edge set Êγn . Siyuan Ma GRASS March 20, 2015 14 / 29 The GRASS Algorithm Theoretical Properties Connection with the Graphical Lasso Theorem (Witten et al., 2011) The connected components of the graphical lasso estimator are exactly the same as the connected components that result from the GRASS algorithm. Siyuan Ma GRASS March 20, 2015 15 / 29 The GRASS Algorithm Theoretical Properties Connection with the Graphical Lasso Theorem (Witten et al., 2011) The connected components of the graphical lasso estimator are exactly the same as the connected components that result from the GRASS algorithm. This theorem suggests that the results of the graphical lasso and the GRASS algorithm are similary “from a distance”. Intuitively this makes sense, because the inverse of a block diagonal matrix should still be block diagonal. Siyuan Ma GRASS March 20, 2015 15 / 29 The GRASS Algorithm Theoretical Properties Sure Screening Property Theorem (Sure Screening Property) Assume that certain assumptions hold, and that log(p) = C3 nξ for some constants C3 > 0 and ξ ∈ (0, 1 − 2κ). Let γn = 2/3C1 n−κ . Then there exist constants C4 and C5 such that P(E ∈ Êγn ) ≥ 1 − C4 exp(−C5 n1−2κ ) Siyuan Ma GRASS March 20, 2015 16 / 29 The GRASS Algorithm Theoretical Properties Sure Screening Property Theorem (Sure Screening Property) Assume that certain assumptions hold, and that log(p) = C3 nξ for some constants C3 > 0 and ξ ∈ (0, 1 − 2κ). Let γn = 2/3C1 n−κ . Then there exist constants C4 and C5 such that P(E ∈ Êγn ) ≥ 1 − C4 exp(−C5 n1−2κ ) The sure screening property guarantees that with very high probability, GRASS will not result in false negatives. Siyuan Ma GRASS March 20, 2015 16 / 29 The GRASS Algorithm Theoretical Properties Sure Screening Property The sure screening property is based on the following assumption: Assumption For some constants C1 > 0 and 0 < κ < 1/2, min |σi,j | ≥ C1 n−κ (i,j)∈E Siyuan Ma GRASS March 20, 2015 17 / 29 The GRASS Algorithm Theoretical Properties Sure Screening Property The sure screening property is based on the following assumption: Assumption For some constants C1 > 0 and 0 < κ < 1/2, min |σi,j | ≥ C1 n−κ (i,j)∈E The implication of this assumption is, given that two genes are conditionally dependent, marginaly they shouldn’t be “too independent”. Siyuan Ma GRASS March 20, 2015 17 / 29 The GRASS Algorithm Theoretical Properties Control of False Positive Rate Theorem (Control of False Positive Rate) Assume that certain assumptions hold, and that log(p) = C3 nξ for some constants C3 > 0 and ξ ∈ (0, 1 − 2κ) (this is the same condition as in the last theorem). We could control the false positive rate at f /|E c | by √ f choosing γn = Φ−1 (1 − p(p−1) )/ n. Siyuan Ma GRASS March 20, 2015 18 / 29 The GRASS Algorithm Theoretical Properties Control of False Positive Rate Theorem (Control of False Positive Rate) Assume that certain assumptions hold, and that log(p) = C3 nξ for some constants C3 > 0 and ξ ∈ (0, 1 − 2κ) (this is the same condition as in the last theorem). We could control the false positive rate at f /|E c | by √ f choosing γn = Φ−1 (1 − p(p−1) )/ n. This theorem ensures that we could control the false positive rate by controlling the number of posible false positives at f . Furthurmore, with the given threshold, the sure screening property still holds. Siyuan Ma GRASS March 20, 2015 18 / 29 The GRASS Algorithm Theoretical Properties Control of False Positive Rate The false positive rate theorem is based on the following assumption: Assumption For the same ξ as in Theorem 1, max |σi,j | = o(n− (i,j)6∈E Siyuan Ma GRASS 1−ξ 2 ) March 20, 2015 19 / 29 The GRASS Algorithm Theoretical Properties Control of False Positive Rate The false positive rate theorem is based on the following assumption: Assumption For the same ξ as in Theorem 1, max |σi,j | = o(n− (i,j)6∈E 1−ξ 2 ) The implication of this assumption is, given that two genes are conditionally independent, marginaly they shouldn’t be “too dependent”. Siyuan Ma GRASS March 20, 2015 19 / 29 Results Simulation Simulation Simulation A: A sparse graph. For all i < j, set (i, j) ∈ E with probability 0.01. Simulation B: A graph with ten densely connected components. Partition the p features into 10 equally-sized and non-overlapping sets: C1 ∪ C2 ∪ · · · ∪ C10 = {1, . . . , p}, |Ck | = p/10, Ck ∩ Cj = ∅. For all i, j ∈ Ck , set (i, j) ∈ E. Simulation C: A banded graph. For |i − j| ≤ 2, set (i, j) ∈ E. Siyuan Ma GRASS March 20, 2015 20 / 29 Results Simulation Simulation Figure: For simulation B (panels a-c) and C (panels d-f) with p = 100 and n = 50, the adjacency matrices corresponding to the true edge set (panels (a) and (d)), the graphical lasso estimate (panels (b) and (e)), and the GRASS estimate (panels (c) and (f)) are shown. The adjacency matrices for graphical lasso and GRASS are averaged over 10 simulated data sets; the color of a particular cell in the heatmap corresponds to the fraction of these 10 data sets for which the corresponding edge is estimated to be present. Results for Simulation A are not shown, since in that setting the true edge set is not fixed across the simulated data sets. Siyuan Ma GRASS March 20, 2015 21 / 29 Results Simulation Simulation Figure: For Simulations A-C with p = 200 and n = 50, the number of false and true positive edges detected is displayed as the tuning parameter is varied. Results are shown for graphical lasso (black), neighborhood selection (blue), and GRASS (green). Siyuan Ma GRASS March 20, 2015 22 / 29 Results Simulation Simulation In Simulation B, the sparsity patterns of Σ and Σ−1 are identical. In this setting, GRASS outperforms the graphical lasso and neiborhood selection. Even for Simulation A and C, the vast majority of the large off-diagonal elements of Σ corresponds to non-zero elements of Σ−1 . Siyuan Ma GRASS March 20, 2015 23 / 29 Results Simulation Simulation Figure: For Simulations A-C with n = 100 and p = 50, the off-diagonal elements of Σ−1 (x-axis) and Σ (y -axis) are shown. The 0.5% of largest absolute off-diagonal elements of Σ are shown in red; the rest are in black. For all three setups, the vast majority of large ff-diagonal elements of Σ correspond to non-zero elements of Σ−1 . Siyuan Ma GRASS March 20, 2015 24 / 29 Results Simulation Simulation For Simulation A, because of the sparsity setting, with high probability a given column of Σ−1 contains no more than one non-zero off-diagonal element. Consequently, Σ−1 is (approximately) a block-diagonal matrix with blocks containing no more than two features (genes). Siyuan Ma GRASS March 20, 2015 25 / 29 Results Analysis of Gene Expression Data Analysis of Gene Expression Data Gene expression data from Spira et al., 2007. Contains 22283 microarray-derived gene expression measurements from large airway epithelial cells sampled from 97 patients with lung cancer and 90 controls. Limited to the 1778 genes with the highest marginal variance. Features were standardized to have mean zero and standard deviation one. Siyuan Ma GRASS March 20, 2015 26 / 29 Results Analysis of Gene Expression Data “Validation” Split the control samples into Set 1 and Set 2. Applied both graphical lasso and GRASS to each set. Get estimatd edge sets Ê1GL , Ê1GRASS , Ê2GL , Ê2GRASS . First treat the edges estimated by the graphical lasso on Set 1 as the gold standard, then treat the edges estimated by GRASS on Set 1 as the gold standard. Specifically, calculate the following: c GL Ê1 ∩ Ê2GRASS ∩ Ê2GL c GL Ê1 ∩ Ê2GL ∩ Ê2GRASS c GRASS ∩ Ê2GRASS ∩ Ê2GL Ê1 c GRASS ∩ Ê2GL ∩ Ê2GRASS Ê1 Siyuan Ma GRASS March 20, 2015 27 / 29 Results Analysis of Gene Expression Data “Validation” Figure: Mean (and standard error) of accuracy of graphical lasso (GL) and GRASS on gene expression data, over 20 splits of the observations into Set 1 and Set 2. |Ê|, the size of the estimated edge set, is also reported. Regardless of whether GRASS or graphical lasso is treated as the gold standard, GRASS yields more accurate edge set recovery than does the graphical lasso. These results are based on an analysis of the control observations. Similar results are obtained from the cases (results not shown). Siyuan Ma GRASS March 20, 2015 28 / 29 Results Analysis of Gene Expression Data Thanks! Siyuan Ma GRASS March 20, 2015 29 / 29