Multivariate studies of receptor tyrosine kinase function in cancer MASSACHUSETTS INSTIWE OF TECHNOLOGY by Joel Patrick Wagner JUN 2 7 2013 B.S., Chemical Engineering, B.S., Biochemistry University of Wisconsin-Madison (2006) M.Phil., Computational Biology University of Cambridge (2007) LIBRARIES Submitted to the Department of Biological Engineering in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Biological Engineering at the MASSACHUSETTS INSTITUTE OF TECHNOLOGY June 2013 @ Massachusetts Institute of Technology 2013. All rights reserved. /71 10 Author ........................ Depa4 nt of Bioldgcal Engineering oFeb;uagy 20, 2013 Certified by............ ~---? A ccepted by .................. Dougla'i A/ L/&ffenburger Ford Professor of Bioengineering /Z asis Supervisor ' ................... Forest M. White Chair, Graduate Program Committee This doctoral thesis has been examined by a Committee of the Department of Biological Engineering as follows: Professor Douglas Lauffenburger Thesis Supervisor Ford Professor of Bioengineering Professor Ernest Fraenkel Chairman, Thesis Committee Associate Professor of Biological Engineering Professor Forest White Member, Thesis Committee Associate Professor of Biological Engineering 2 Multivariate studies of receptor tyrosine kinase function in cancer by Joel Patrick Wagner Submitted to the Department of Biological Engineering on February 20, 2013, in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Biological Engineering Abstract Receptor tyrosine kinases (RTKs) are critical regulators of cellular homeostasis in multicellular organisms. They influence cell proliferation, migration, differentiation, and transcriptional activation, among other processes, and are therefore also relevant to cancer biology. Upon interaction with cognate ligand, RTKs initiate signaling cascades dependent in part on the phosphorylation of proteins. From a computational perspective, this thesis has studied methods for quantifying relationships between measured signals (using Bayesian network inference, correlation, and mutual information-based methods), and between signals and cellular phenotypes (using linear regression, partial least squares regression, and feature selection methods). From a biological perspective, this thesis has studied signaling between RTKs, signaling and cell migration downstream of RTKs in epithelial versus mesenchymal cell states, and comparative signaling across six RTKs. In the latter case, the results show that the six RTKs cluster into three classes based on their inferred signaling networks. Using publicly available transcriptional and pharmacological profiling data from hundreds of cancer cell lines, it was determined that expression of same-class RTK genes or their cognate ligands can correlate with insensitivity to drugs targeting other RTKs in that class. This suggests that resistance to RTK-targeted therapies in cancer may emerge in part because same-class RTKs can compensate for the reduced signaling of the inhibited receptor. The thesis concludes by quantitatively exploring the features of experimental data that improve model accuracy. Thesis Supervisor: Douglas A. Lauffenburger Title: Ford Professor of Bioengineering 3 4 Acknowledgments I would like to first thank my advisor, Professor Doug Lauffenburger, for providing guidance and support throughout my time at MIT. Doug provides a unique lab environment that purposefully and substantively blends biological and computational research in a manner that few other labs in the world can claim to do. Implementing this type of hybrid, interdisciplinary environment is very important for the field, and to have been able to develop, train, and immerse myself in such an environment during my graduate studies has been an honor and a privilege. I would also like to thank Doug for allowing me to attend and present at so many difference conferences, meetings, and workshops. Sharing our work, while at the same time exposing myself to so many different and new ideas, has been an immense benefit both for my graduate experience and my career beyond. I would also like to thank Doug for allowing me to apply for and participate in the National Science Foundation's East Asia and Pacific Summer Institutes program in Singapore, where I worked with Edison Liu at the Genome Institute of Singapore. Having the flexibility to essentially take leave for nearly three months in the middle of my PhD was an eye-opening experience scientifically, professionally, and personally. Doug has consistently been supportive of anything that I thought would benefit my future career, even if it was not directly related to my work in his lab, which is something that very few PhD advisors actually do, and for that I am forever thankful. I would next like to thank my thesis committee members Professors Forest White and Ernest Fraenkel, who served as committee chair. Ernest and Forest have provided helpful guidance and thoughtful contributions to my research throughout my time at MIT. I am also thankful for their critical reading of this thesis document. I would like to thank my collaborator Professor Richard Jones at the University of Chicago, and his lab members, particularly Mark Ciaccio. Rich has served as an excellent collaborator throughout my graduate career. It was a real pleasure working with Rich on the Nature Methods paper. He was an engaged and thoughtful colleague, and a model for how a more experiment-centric researcher can interface with 5 a more computation-centric researcher. Through emails, phone calls, and in-person discussions, including a week spent in his lab in Chicago and mutual attendance at two conferences, I have learned much and benefited greatly. I would like to thank my collaborators Mark Sevecka and Alejandro Wolf-Yadlin from Gavin MacBeath's lab at Harvard University. I began collaborating with Mark early in my graduate career on numerous projects, and it has been a real pleasure. Mark always provided thorough, substantive, and thoughtful responses to my questions, for which I will always be thankful, and from which the projects have benefited greatly. I began working with Ale part way through my graduate career, which was also a pleasure. Ale and Mark put forth immense effort in collecting all of the data for the receptor tyrosine kinase project, and provided helpful guidance and discussion during the analysis of the data. I would like to thank my collaborators Shannon Hughes, Aaron Meyer, and HD Kim from the Lauffenburger lab. We worked together on the epithelial-mesenchymal transition project. Discussions with Shannon and Aaron about the nature of cell migration, and the signaling underlying it, were very useful and thought-provoking. I appreciate them taking the time to help educate me about the nuances of cell migration biology. I would like to thank my collaborator William Chen from Professor Peter Sorger's lab at Harvard University. Will is among the most thoughtful and interesting people I spoke with during my graduate career, and every discussion with him was a rewarding and enjoyable experience. I would like to individually thank Julio Saez-Rodriguez for his mentorship and guidance in the earlier stages of my graduate studies. Beyond Doug, no other person at MIT had a greater impact on my thought processes than Julio. I am fortunate to consider him a colleague and friend. I am grateful to many within the Lauffenburger lab and the wider Biological Engineering community at MIT for helpful and insightful discussions and support: Miles Miller, Melody Morris, Brian Joughin, Justin Pritchard, Kristen Naegle, Michael Beste, Edgar Sanchez, Seymour de Picciotto, Chris Ng, Carol Huang, Dave Clarke, 6 Doug Jones, Jorge Valdez, Nate Tedford, Sarah Schrier, Jen Wilson, Kelly Benedict, Thomas Willems, Abby Hill, Ta-Chun Hang, and Greg Riddick. I am also thankful to Tommi Jaakkola and David Wingate in the Electrical Engineering and Computer Science department at MIT for helpful discussions regarding graphical modeling. And for their support with research and beyond at MIT, I would like to thank Lauffenburger lab manager Hsinhwa Lee, Lauffenburger lab administrative assistant JoAnn Sorrento, and Information Systems Administrator and all-around good guy Aran Parillo. Beyond MIT, I would like to thank David Heckerman at Microsoft Research for very helpful discussions regarding the data quality chapter of this thesis, Daniel Eaton from Kevin Murphy's lab at the University of British Columbia for providing the Bayesian Network Structure Learning MATLAB code and for very helpful discussions regarding Bayesian networks, and Nickel Dittrich from the University of Magdeburg for early work on data discretization methods. I would like to thank Professor Edison Liu formerly from the Genome Institute of Singapore, as well as Drs. Francesca Menghi and Xing Yi Woo in his lab. Ed graciously agreed to host me for the NSF EAPSI program, not only in his lab but also partly in his home, and for that I will be forever grateful. It was an amazing opportunity in every respect. I look forward to continuing my work with Ed as a postdoctoral associate in his lab at The Jackson Laboratory for Genomic Medicine. I would like to thank my classmates in BE-2007: Edgar Sanchez, Jeff Wagner, Melody Morris, Brian Belmont, Steve Goldfless, Michelle Sukup, Emily Florine, Bryan Bryson, Francisco Delgado, Eddie Eltoukhy, Ricardo Gonzalez, Karunya Srinivasan, and Prabhani Atukorale. Their friendship, support, and collegiality, mixed with a viscous sublayer of craziness, have been the best part of graduate school. Thank you each. And guys, some day we will get those quals beards down. I would like to thank my undergraduate research advisers, who took the time to train and support me even when I was a novice: Professor Sean Palecek at the University of Wisconsin-Madison, along with Fang Li from his lab and Dagang Huang from Professor Eric Shusta's lab; Melissa Lambeth Kemp from Professor Doug Lauf7 fenburger's lab at MIT; and Amariliz Rivera from Professor Eric Pamer's lab at Memorial Sloan-Kettering Cancer Center. I would also like to thank many of the influential teachers I have had throughout my formal education, who played immense and pivotal roles in determining who I would eventually become. I would like to thank them for dedicating a small portion of their life to improving my own, a truly generous and gracious act. From Jackson Elementary: Carol Steiner, Ruth Windmuller, Margaret Mihalic, Virginia Pliner, Jim Bugni, Nancy Reck, and Ken Govek. From Franklin Middle: Diane Bacon, Anne Smith, Renee Kasten, Scott Christy, Ron Huisheere, Chick Hawkins, Sheila Wanta, Brenda Winkler, and Jon Taft. From West High: Jim Van Abel, Mary Diedrich, Dean Cherry, Bryan Radue, Eleanor Hinz, Sue Kuester, Scott Winkler, Harlan Shupita, Bill Freude, Pam Sylvester, Ron Wallberg, Don Buntman, and Bill Zigmund. From the University of Wisconsin-Madison: Alexandru Ionescu, Fleming Crim, Kenneth George, Sigurd Angenent, Claude Woods, Robert Morse, Fred Roesler, Ieva Reich, Regina Murphy, Marshall Slemrod, Manos Mavrikakis, John Yin, Nick Abbott, Rafael Chavez, Antony Stretton, Charles Hill, David Nelson, Michael Cox, Jeremi Suri, Thomas Martin, Paul Nealey, Ross Swaney, Eric Shusta, Gary Splitter, Arun Yethiraj, and Christos Maravelias. From the University of Cambridge: Stephen Eglen, Simon Tavar6, Johan Paulsson, and Julia Gog. From the Massachusetts Institute of Technology: Dane Wittrup, Bruce Tidor, Forest White, Ernest Fraenkel, Alan Grodzinsky, John Deutch, Arup Chakraborty, Roger Kamm, Stephen Bell, Frank Solomon, Tommi Jaakkola, David Gifford, and Monty Krieger. I would like to thank my Boy Scout leaders Mason Thibeault and Tom Seibert. Sadly, they are gone too soon; but I am forever thankful for their mentorship, guidance, and teaching. They were two of the most important role models in my life, and this thesis document is also a tribute to their years of selfless support. I would like to thank my family, especially my mom and stepdad, Linda and Craig, 8 and my dad and stepmom, Al and Tina. Their support, guidance, encouragement, selflessness, and love are without bound. I would also like to thank my grandparents, my siblings-Simon, Matt, Jon, and Heather-and my siblings-in-law-Becky, Sally, and Dustin. I would especially like to thank Heather and Dustin for all their support during college. And lastly, I would like to thank my lovely girlfriend Brittany for all her support during these sometimes tumultuous and tiresome months. You are incredibly special. Thank you all. This thesis is by and for each of you. 9 Dicebat Bernardus Carnotensis nos esse quasi nanos, gigantium humeris insidentes, ut possimus plura eis et remotiora videre, non utique proprii visus acumine, aut eminentia corporis, sed quia in altum subvenimur et extollimur magnitudine gigantea. -John of Salisbury, Metalogicon (1159) 10 Contents 1 1.1 Mutual advancement of measurement and modeling techniques . . . . 24 1.2 Multivariate modeling techniques . . . . . . . . . . . . . . . . . . . . 25 . . . 26 1.3 1.2.1 Causal interpretations across network inference methods 1.2.2 Bayesian networks . . . . . . . . . . . . . . . . . . . . . . . . 28 1.2.3 Similarities across seemingly disparate modeling strategies . . 29 Modeling phenotypic data . . . . . . . . . . . . . . . . . . . . . . . . 33 Identification of drug targets . . . . . . . . . . . . . . . . . . . 35 Overview of thesis contents . . . . . . . . . . . . . . . . . . . . . . . . 36 1.3.1 1.4 2 23 Introduction Systems analysis of EGF receptor signaling dynamics with microwestern arrays 39 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 39 2.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 2.2.1 Fabrication of MWAs . . . . . . . . . . . . . . . 41 2.2.2 Validation of MWA method . . . . . . . . . . . 42 2.2.3 Comparison of macrowestern blots and MWAs . 44 2.2.4 Application of MWAs to analysis of EGFR signaling network . 44 2.2.5 Comparison of signaling network at different EGF input levels 47 2.2.6 Bayesian network modeling of receptor layer connectivity . . . 51 2.3 D iscussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 2.4 M ethods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 2.4.1 60 Signaling network inference modeling . . . . . . . . . . . . . . 11 3 2.4.2 Testing for model significance . . . . . . . . . . . . . . . . . . 64 2.4.3 Comparing different algorithm results . . . . -. . . . . . . . . . 68 2.4.4 Equivalence class analysis for Bayesian network algorithm . 70 2.4.5 Parent constraint analysis for Bayesian network algorithm . 72 Signaling network state predicts Twist-mediated effects on breast 75 cell migration across diverse growth factor contexts 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 3.2 R esults . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 3.2.1 Diverse cell motility behavior and growth factor treatment responses in epithelial versus mesenchymal mode . . . . . . . . . 3.2.2 Quantitative analysis of growth factor-elicited multiple-pathway signaling network dynamics 3.2.3 78 . . . . . . . . . . . . . . . . . . . 85 Node-to-node correlation topology model reveals quantitatively different signaling relationships between epithelial and mesenchym al states . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.4 88 PLSR model-reduction analysis reveals quantitatively different 93 pathway emphases between epithelial and mesenchymal modes 3.2.5 3.3 3.4 4 Linear regression predicts cell speed more accurately than PLSR m odels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 D iscussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 3.3.1 Excerpt from Discussion in Kim et al . . . . . . . . . . . . . . 109 3.3.2 Additional discussion . . . . . . . . . . . . . . . . . . . . . . . 111 M ethods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 3.4.1 Correlation network modeling . . . . . . . . . . . . . . . . . . 117 3.4.2 Reduced PLSR models . . . . . . . . . . . . . . . . . . . . . . 117 Receptor tyrosine kinases fall into distinct classes based on their 119 inferred signaling networks 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 4.2 R esults . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 12 4.2.1 A systematic perturbation-based approach to uncover RTKspecific signaling networks . . . . . . . . . . . . . . . . . . . . 4.2.2 RNAi perturbations reveal conserved Akt, MAPK, and PKC pathways across six RTKs . . . . . . . . . . . . . . . . . . . . 123 . . . 131 4.2.3 Data-driven network inference reveals three RTK classes 4.2.4 Consensus across inference methods reveals RTK class-specific signaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.5 134 RTKs and ligands are co-expressed in cancer cell lines and enriched in certain solid tumor types 4.2.6 120 . . . . . . . . . . . . . . . 140 RTK network class genes are correlated with responses to RTKtargeted therapies . . . . . . . . . . . . . . . . . . . . . . 14 6 4.3 D iscussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152 4.4 Materials and methods . . . . . . . . . . . . . . . . . . . . . 157 4.4.1 C ell culture . . . . . . . . . . . . . . . . . . . . . . . 157 4.4.2 Microarray fabrication . . . . . . . . . . . . . . . . . 158 4.4.3 Microarray probing . . . . . . . . . . . . . . . . . . . 159 4.4.4 Extraction of microarray data . . . . . . . . . . . . . 159 4.4.5 Data pre-processing . . . . . . . . . . . . . . . . . . . 160 4.4.6 Quantifying the consistency of biological replicates and shRNA pa irs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161 4.4.7 Quantifying shRNA effects . . . . . . . . . . . . . . . . . . 162 4.4.8 shRNA effects simulations . . . . . . . . . . . . . . . . . . 163 4.4.9 Identifying signaling time scales . . . . . . . . . . . . . . . 164 . . . 165 . . . . . . . . . . . . . . . . . . 165 4.4.10 Data discretization . . . . . . . . . . . . . . . . . . . 4.4.11 Network inference algorithms 4.4.12 Comparison of RTKs by inferred network structures through dimensionality reduction . . . . . . . . . . . . . . . . . . . . . 167 4.4.13 Network model edge weight threshold robustness . . . . . . . 169 4.4.14 Generating receptor class-specific consensus networks across inference m ethods . . . . . . . . . . . . . . . . . . . . . . . . . . 13 169 4.4.15 Clustering the raw data . . . . . . . . . . . . . . . . . . . . . 4.4.16 Generating synthetic data for network inference . . . . . . . . 170 171 4.4.17 Cancer Cell Line Encyclopedia mRNA expression principal com- ponent analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4.18 Tumor histology enrichment/depletion . . . . . . . . . . . . . 173 4.4.20 Partial correlation between genes and drug response . . . . . . 174 Comparison of RTKs by receptor-intrinsic properties through dimensionality reduction . . . . . . . . . . . . . . . . . . . . . 174 Quality versus quantity: Identifying features of biological data for making better models 177 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 5.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178 5.2.1 A simple two-variable toy model . . . . . . . . . . . . . . . . . 178 5.2.2 Analytical estimates for prediction accuracy as a function of data range in the two-variable toy model . . . . . . . . . . . . 180 5.2.3 Simulating data from multivariate linear regression networks 184 5.2.4 Inferring Bayesian networks using simulated network data 187 5.2.5 Bayesian network inference accuracy is a function of data range . and discretization level . . . . . . . . . . . . . . . . . . . . . . 5.2.6 5.2.7 . . . . . . . . . . . . . . . . . . . . . . . 196 Predicted discretization corresponds strongly with best-performing discretization 5.3 191 An a priori discretization strategy based on experimental measurement parameters 6 173 4.4.19 Correlating gene expression and drug activity area . . . . . ... 4.4.21 5 172 . . . . . . . . . . . . . . . . . . . . . . . . . . . 199 D iscussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201 205 Conclusion . . . . . . . . . . . . 206 . . . . . . . . . . . . . . . . 207 Limitations of methods . . . . . . . . . . . . . . . . . . . . . . . . . . 208 6.1 Emergent biological and computational insights 6.2 Guidelines for analysis of large data sets 6.3 14 6.4 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210 15 16 List of Figures 1-1 Bayesian network joint probability distributions . . . . . . . . . . . . 29 1-2 Network models as combined input-output models . . . . . . . . . . . 31 1-3 Summary of multivariate modeling approaches . . . . . . . . . . . . . 32 2-1 Microwestern array schematic . . . . . . . . . . . . . . . . . . . . . . 41 2-2 MWA validation of linear response . . . . . . . . . . . . . . . . . . . 43 2-3 Comparison of MWA to traditional western blot . . . . . . . . . . . . 45 2-4 An MWA containing 6 cell lysates probed with 192 antibodies . . . . 46 2-5 Heatmap of dynamic responses to EGF in A431 cells 48 2-6 Consensus model of EGF receptor level influences modeled by Bayesian . . . . . . . . . network inference with comparison to ARACNe and CLR . . . . . . . 50 2-7 Bayesian network consensus model edge weights . . . . . . . . . . . . 54 2-8 Graphical comparisons of the Bayesian, ARACNe, and CLR networks 55 2-9 Comparing inference algorithms when removing the restriction that the Bayesian network edge weight be >0.3 . . . . . . . . . . . . . . . 56 2-10 Testing for network models' significance . . . . . . . . . . . . . . . . . 57 2-11 Estimating parent-child input-output logic within the Bayesian network 62 2-12 Parent constraint analysis for Bayesian network algorithm 3-1 . . . . . . 63 EMT markers and receptor levels for the human mammary epithelial cell m odel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 3-2 Mesenchymal cells in monolayer lack E-cadherin junctions . . . . . . 80 3-3 Individual cell speed distributions . . . . . . . . . . . . . . . . . . . . 81 3-4 EMT and growth factor-dependent cell migration is context-dependent 82 17 3-5 Migratory potentials of different epithelial-like versus mesenchymal-like cell typ es . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 3-6 Relative basal phosphorylation levels in epithelial vs. mesenchymal state 86 3-7 Altered signaling pathway activities upon Twist-induced EMT . . . . 87 3-8 Correlative topological modeling . . . . . . . . . . . . . . . . . . . . . 89 3-9 Correlation network with stricter multiple hypothesis correction . . . 90 3-10 Signaling data used for cell speed prediction 93 3-11 3-site reduced PLSR predictions . . . . . . . . . . . . . . . . . . . . . 95 3-12 4-site reduced PLSR predictions . . . . . . . . . . . . . . . . . . . . . 96 3-13 5-site reduced PLSR predictions . . . . . . . . . . . . . . . . . . . . . 97 3-14 Site enrichment in reduced PLSR models . . . . . . . . . . . . . . . . 98 3-15 Correlations among signals used for cell speed prediction . . . . . . . 99 3-16 Prediction accuracy using 1- and 2-site linear regression models . . . 104 3-17 Signals in high-scoring linear regression models . . . . . . . . . . . . . 105 3-18 Percent error using 1- and 2-site linear regression models . . . . . . . 106 3-19 Signals plotted versus cell speed in a univariate fashion . . . . . . . . 110 3-20 Raw signal-signal correlation values in pre-Twist vs. post-Twist . . . 113 4-1 Data-rich, perturbation-based profiling uncovers RTK-specific signaling netw orks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 4-2 Perturbations reveal specificity in RTK-induced signal transduction . 124 4-3 shRNA effects for individual shRNAs (1% Storey FDR) . . . . . . . . 125 4-4 shRNA effects for individual shRNAs (1% Benjamini FDR) . . . . . . 126 4-5 Quantitative shRNA-induced effects for individual shRNAs . . . . . . 127 4-6 shRNA effects are not consistent with randomly distributed effects 130 4-7 Clustering RTK-specific network models reveals three RTK classes 132 4-8 Network model clusters are robust to applied edge weight threshold 133 4-9 Identifying RTK class-specific edges through consensus network edge frequency . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 135 4-10 Network models' consensus reveals core RTK signaling backbone and RTK class-specific interactions . . . . . . . . . . . . . . . . . . . . . . 136 4-11 Clustering the raw data directly .... . . . . . . . . . . . . . . . . . . 138 4-12 Clustering network topologies inferred from simulated data reveals underlying network differences but clustering raw data does not . . . . . 139 4-13 Observed distribution of gene expression values in the CCLE . . . . . 140 4-14 RTK and ligand expression in CCLE cell lines . . . . . . . . . . . . . 141 4-15 Co-expression of the receptors and ligands for multiple RMA thresholds143 4-16 Cell line histology enrichment results for multiple RMA thresholds . . 145 4-17 RTK class genes are correlated with anti-RTK therapy response . . . 147 4-18 Gene expression values of tightest TK1258 kinase binders . . . . . . . 148 4-19 Partial correlation between RTK genes and drug response . . . . . . . 151 4-20 Clustering RTK biophysical properties does not reveal RTK network m odel clusters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156 5-1 Prediction accuracy in a two-variable system . . . . . . . . . . . . . . 181 5-2 Schematic for two-variable toy model . . . . . . . . . . . . . . . . . . 182 5-3 Network structures used to simulate data . . . . . . . . . . . . . . . . 188 5-4 Bayesian network inference accuracy is a function of data range and discretization level 5-5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193 Schematic for a priori discretization algorithm . . . . . . . . . . . . . 198 19 20 List of Tables 3.1 Literature evidence for EMT network models . . . . . . . . . . . . . . 91 5.1 Number of parameters in local conditional probability table . . . . . . 195 21 22 Chapter 1 Introduction This thesis has focused on the multivariate analysis of biological networks, with an emphasis on phosphorylation signaling networks activated downstream of receptor tyrosine kinases [1]. The interaction between receptor tyrosine kinases and extracellular cognate ligand initiates a signaling cascade dependent in part on the phosphorylation of specific amino acid residues on particular proteins. Phosphorylation is a reversible post-translational modification that in essence can change the properties of a protein so that it may perform or take part in particular functions that it could not do prior to being phosphorylated [2]. The phosphorylation of the amino acid tyrosine in particular has evolved as a primary means used by multicellular organisms for transmitting information between cells [3]. This communication is used to regulate key physiological processes critical to homeostasis in multicellular organisms, including cell proliferation, migration, differentiation, cell cycle progression, metabolic homeostasis, and transcriptional activation, among others [4]. As a result, studying phosphorylation networks is highly relevant to cancer, a disease in which many of these homeostatic mechanisms dependent on tyrosine phosphorylation signaling have gone awry [5]. The motivation for this thesis was to exploit advances in experimental measurement technologies for the purpose of building computational models of signaling network behavior in relevant cancer settings. The hope was to be able to predict, among the measured signals, how to perturb the system in a manner that would influence 23 cell phenotype. The long-term goal of this approach would be to identify novel drug targets, or combinations of targets, that would be effective in treating cancer. 1.1 Mutual advancement of measurement and modeling techniques The complexity of a biological model is limited by the complexity of the biological data used to build it. As a result, as experimental measurement technologies have advanced-allowing the quantification of more signals under more conditions while using fewer reagents, requiring less time, and costing less money-the computational models derived from these data have generally seen a concurrent increase in their complexity [6, 7]. Early computational models of cell behavior, such as the mathematical model of cell migration published by Dimilla et al. [8] in 1991, did not even measure intracellular signaling. This was in part because it was not until later in the 1990s that methods for measuring many signaling events, primarily by using site- and phosphosphorylation state-specific antibodies, were even available [9]. Even as such antibodies were developed, they were only available for a limited number of proteins and were generally measured using low-throughput Western blots. Therefore models developed around that time utilized a limited number of experimental measurements, such as the differential equation model published in 2002 by Schoeberl et al. [10] in which model predictions were only compared to experimentally measured Erk phosphorylation; or the differential equation model published in 2005 by Hua et al. [11] that was tuned using experimentally measured values of only two proteins, caspase-3 and caspase-8. As new experimental methods were developed for measuring more signaling activities per sample, it became feasible to build new types of models. For example, using a method published in 2003 for measuring the activities of multiple protein kinases in a given sample [12], in combination with a method also published in 2003 for using 24 antibodies in concert with microarray technology [13], it was possible to build a multivariate regression model published in 2005 incorporating all 11 measured proteins that was predictive of cell death via apoptosis [14]. Around the same time, a method was developed for measuring the phosphorylation states of multiple proteins in single cells using flow cytometry [15], which enabled the construction of a causal network model of cell signaling [16] also published in 2005 and also containing measurements for 11 proteins. Moving closer in time to the beginning of this thesis in 2007-08, additional experimental methods were being developed that allowed one to measure an increasing number of phosphorylation sites in an increasing number of conditions. For example, mass spectrometry methods published in 2005 allowed one to measure scores of tyrosine phosphorylation events in a dynamic, site-specific manner [17], although typically only across fewer than eight conditions. Further improvements to protein microarray technology published in 2006 allowed one to measure about fifteen phosphorylation sites, but in hundreds of conditions [18]. As these methods began to measure more phosphorylation sites, how those sites related to one another in biological pathways became less clear. Unlike earlier modeling efforts that typically measured a handful of proteins from canonical pathways that had been studied for years, these newer data sets included many sites about which little was known. 1.2 Multivariate modeling techniques One key aspect that led to the successful application of differential equation models to cell signaling problems is that it was reasonably understood how the relatively few measured proteins in the model related to one another through biochemical reactions. Through years, or in some cases decades, of research the pathways governing these canonical sites had been delineated through careful experimental study. In other words, the topology of the signaling network, or how the signals relate to and influence one another, was reasonably well established. However, when using experimental methods that measured many phosphorylation sites at once, including many sites 25 with poorly understood roles in signaling, the signaling network topology governing the relationships between these measured signals was also poorly understood. A type of modeling called "network inference" offered an appealing prospect for understanding the topology governing these measured phosphorylation sites [19]. Net- work inference aims to quantify the influences between measured signals, and by doing so estimate the topology governing the signaling network. Depending on the algorithm, network inference methods can capture linear, nonlinear but monotonic, and/or nonlinear and non-monotonic relationships between measured nodes. These methods can utilize continuous data, or can discretize the continuous data into bins. Methods utilizing continuous data typically rely on a standardized functional form to quantify signal relationships (e.g., linear, sigmoidal [20, 21], Gaussian [22]), whereas methods utilizing discrete data do so in an effort to capture any functional relationship between signals, without making assumptions about the underlying functional form. 1.2.1 Causal interpretations across network inference methods The interpretation of a network inference result can vary. There exist methods that attempt to specify causality between signals (e.g., Bayesian networks [22]), methods that largely rely upon an existing literature-derived causal network topology to perform computations (Boolean logic [23], fuzzy logic [21]), methods that attempt to quantify some degree of conditional independence between nodes but do not argue for causal interpretations (e.g., conditional mutual information [24], partial correlation [25], a.k.a., Gaussian graphical models [26]), methods that attempt to filter out direct from indirect signaling influences in an attempt to identify relationships that are more likely to be causal (e.g., context likelihood of relatedness (CLR) [27], the algorithm for the accurate reconstruction of cellular networks (ARACNe) [28]), methods that select subsets of the most predictive signals in an attempt to identify relationships that are more likely to be causal (Inferelator [20]), and methods that 26 infer symmetric (i.e., undirected and not causal) relationships between signals without any additional processing steps (e.g., Pearson correlation, Spearman correlation, mutual information). Most network inference methods quantify only pairwise relationships between signals (CLR, ARACNe, Pearson correlation, Spearman correlation, mutual information). While a given signal may have multiple other nodes that it is related to in a pairwise model, there is no explicit relationship between the set of signals a particular signal is related to (e.g., if signal A is correlated with signals B and C, there is no explicit accounting for the mutual regulation of A by B and C). Fewer methods explicitly consider cases where a signal is a function of several inputs simultaneously. In logic models, AND and OR gates can be used to encode higher-order relationships; in models that explicitly consider feature selection, a signal can be a function of the multiple selected features; in partial correlation and conditional mutual information, the order of the encoded relationships depends on how many signals are considered in the conditional relationships (e.g., Nh -order partial correlation conditions upon N signals at a time, and thus can encode the mutual regulation of a child node by N parents); and in Bayesian networks, similar to partial correlation, the mutual regulation of a child node by N parents is encoded by conditioning upon N signals at a time. Most network inference research applied to cell biological data, including the earliest work in the field, has been applied to gene expression data. In such work, the authors would often make an assumption of causality between genes that were known transcription factors and genes that were not, i.e., that the transcription factors were influencing the expression of the non-transcription factor genes, and not the other way around [20, 27, 28]. This provided a broad assumption for determining causality between genes, and thus limited the need to develop methods for inferring causality from the data. In the context of cell signaling however, apart from making assumptions about signaling originating at a stimulated cell surface receptor and influencing downstream signals, there is not a comparable assumption that could broadly provide causal interpretation of signaling network inference results. 27 1.2.2 Bayesian networks As a result, this thesis originally focused on developing methods to infer Bayesian network models from cell signaling data. Bayesian networks belong to a larger class of graphical modeling techniques. Bayesian networks are represented by directed acyclic graphs, in which conditional independence relationships are encoded [29, 30]. A Bayesian network is composed of two components: (1) the topology (or structure) of the directed acyclic graph, which encodes the conditional independence relationships (and thus encodes the causal interpretations of the network), and (2) the parameters that describe the local conditional probability tables for each set of parent-child input-output relationships, i.e., the likelihood a child node is in a particular state given the state(s) of its parent(s), or more simply, the functional relationship between parent and child nodes. A Bayesian network is a representation of the joint probability distribution across all nodes (i.e., signals or variables) in the network [31]. A fully connected Bayesian network can represent any joint probability distribution over those nodes; however, the utility of the Bayesian network is to produce a network that is not fully connected and can still faithfully represent the joint probability distribution across all nodes. This is their key feature: by producing a network that is not fully connected, the Bayesian network is attempting to identify the most salient conditional independencies in the data (Fig. 1-1). Bayesian networks can encode higher order parent-child relationships by conditioning the value of a given child node simultaneously on multiple parent nodes. Historically, the structure and parameters of Bayesian networks were originally determined by knowledge experts [32]. For example, one may have interviewed physicians to determine which symptoms were most predictive of a given disease, and the likelihoods associated with those symptoms. The resultant Bayesian network would then be used to make systematic predictions, or inferences, about the likelihood of a disease state given observations about a patient's symptoms. One could also take an existing Bayesian network topology, and data from the nodes in that network, and 28 P(X,,X2,X3,X 4)P(X,)PX P(X,X 2,X 3,X 4 ) - 2 X,)P(X 3 1X,,X 2 )P(X4 |1X,,X 2,X 3 ) P(X)P(X 21X1)P(X3 )PX 4 1X3 ,X 2 ) Figure 1-1: The joint probability distributions for a fully connected Bayesian network (left) and a network not fully connected (right). The objective of Bayesian network inference is to identify conditional independencies in the data, and by doing so create a simplified representation of the joint probability distribution across all measured signals. learn the corresponding parameters (i.e., conditional probability tables) associated with that network structure and those data. It was not until later in the development of Bayesian networks that algorithms were created for learning their structure and parameters simultaneously from data [33]. Applications in this thesis have focused on this latter method, whereby given a data set containing measurement values for a set of nodes, a Bayesian network structure and parameters are subsequently learned directly from the data set. 1.2.3 Similarities across seemingly disparate modeling strategies One key issue generally not addressed in the literature is the similarity of what at first appear to be quite different modeling approaches. Over the course of this thesis, because so many different modeling approaches were explored, even beyond the so-called network inference approaches, the common features of these methods have become apparent. First, let me draw a distinction between "input-output-like" mod- els and "network-like" models. An example of the former case is partial least squares regression (PLSR) (e.g., ref. [14]), whereas an example of the latter case is a Bayesian network model (e.g., ref. [16]). For PLSR, the general notion is to utilize all mea- surements simultaneously for predicting a small number of outputs, which have often corresponded to markers of phenotypic outcomes. 29 For a Bayesian network, as just described, the goal is to derive a graphical model displaying relationships between measured signals. In PLSR, the relationships between the measured signals (those used as inputs to predict the output) are not explicitly considered, whereas in the Bayesian network, these signal-to-signal relationships are the primary result of the model. These methods actually represent a shared algorithmic approach, but applied in different means. In the case of PLSR, the primary result is a notion of input output, whereby all signals are considered as predictors of the output(s). In the case of the Bayesian network, this type of process, whereby signals are considered as possible predictors of an output, is also occurring, but on a node-by-node basis.. If one performs an input-output-type calculation iteratively for each measured signal-treating the remaining measured signals as candidate inputs, but then only selecting the most predictive inputs for each signal-the end result is a network-like structure in which only a few signals are used as predictors for every other signal (Fig. 1-2). Therefore, network-type methods can be conceptualized as applying a PLSR-type approach (i.e., many inputs, but one output) consecutively across every node, but only selecting a subset of the measured nodes as the most predictive inputs. While the methods for quantifying the relationships between signals of course varies between PLSR-type models and Bayesian network-type models, this concept nonetheless links these methods. Network-like methods also consider all signals as putative input signals for each node, but then undergo some feature selection-type procedure to explicitly select only a subset of those putative inputs as the final inputs for each node. A diagram outlining input output-like models versus network-like models, and whether the models use discrete or continuous data, is shown in Fig. 1-3 for multiple modeling methods. These insights are important to keep in mind when considering modeling methods that have been published in the past and discussed as if they were entirely different approaches. In the most basic sense, the modeling methods referenced here simply attempt to predict the values of some signals in terms of the values of other signals; but how they go about that process varies. 30 Many signals as predictors ("input-output-like") * . U~E~ E~ For each node: 1. Score other signals' utility 2. Eliminate unhelpful signals 3. Result is a 'network' of input-output relationships " - - - Which subset of signals matter? (feature selection) Few signals as predictors ("network-like") Figure 1-2: Network models can be conceptualized as applying input-output approaches to each signal in turn, and then applying some manner of feature selection to choose which inputs are most useful. Thus, rather than being seen as disparate methods, input-output models and network models have a shared underlying modeling framework, but it is applied in two different ways. 31 Many signals as predictors ("input-output-like") 0 - Partial least squares regression (PLSR) - Multiple linear regression (MLR) - Bayesian predictors Discrete . Continuous U- input input - PLSR and MLR with feature selection - Bayesian networks - Pearson and Spearman correlation - Mutual information networks networks (including CLR, ARACNe) - Partial correlation o Conditional mutual information o Gaussian Bayesian networks o Dynamic Bayesian networks 4I o Constrained fuzzy logic o Boolean and Probabilistic Boolean networks Feywsignals as predictors ("network-like') Figure 1-3: Different types of modeling approaches are summarized onto two axes: inputoutput-type models vs. network-type models, and methods using discrete binned data vs. continuous data as input. Methods noted by solid bullet points were explored in this thesis. 32 1.3 Modeling phenotypic data A fundamental question in cell biological modeling is how to incorporate phenotypic data; and in the context of this thesis, how does one relate signaling data to phenotypic data. The most important factor in the approach is the frequency of the phenotypic measurements: is there corresponding phenotypic data for every signaling measurement, or were the phenotypic data collected at a different rate than the signaling data. In the former case, one can in principle include a "phenotype node" in any inferred network model, because for every signaling data point there is a phenotypic data point collected under the same condition. In the latter case, one likely has to somehow summarize the signaling data corresponding to each condition in which phenotype was measured. For example, if signaling time courses were collected, but only one phenotypic data point was measured per time course, then one may summarize the signaling data for the entire time course by calculating the area under the signaling time course trajectory, or calculating the average signaling value across the time course, etc. If signaling and phenotypic data are collected at the same rate, then one could summarize both the signal-signal relationships and the signal-phenotype relationships in the same network model. However, if the signaling and phenotypic data were not collected at the same rate, then it would suggest that one could have two models: one model in which signal-signal relationships are quantified, and one model in which the "summarized signal" -phenotype relationships are quantified. Of course, trivially, results from two modeling approaches could be combined visually into a single network-type diagram, but this would not mean the same modeling technique was applied to all signaling and phenotypic data simultaneously. In some cases, one may wish to develop a model wherein the root nodes represent a small number of experimentally measured parameters (e.g., ligand levels or receptor activation levels), the intermediate nodes represent signaling nodes, and the terminal node(s) represents a phenotype(s) of interest. This type of approach is quite appealing conceptually. In this case, the values of the root nodes would be used to 33 "forward-simulate" the network and propagate predictions from the root nodes to the phenotype nodes, and thus predict phenotypic outputs from a small number of experimentally measured inputs. This scenario, in which an entire network of signals is predicted using only the values of a small set of inputs, was explored briefly in this thesis, and has important implications for modeling phenotype. Forward-simulationtype models raise additional concerns beyond those raised by network-type or inputoutput-type models. If a forward-simulation model is built using linear interaction terms based on N measured inputs, then any subsequent downstream signal, including any terminal phenotype node, will always be predicted by linear combinations of those N inputs. This is because the superposition of linear functions results in a linear function. However, if a forward-simulation model is built using a nonlinear interaction function f (x), subsequent downstream nodes and the phenotype may be predicted by novel nonlinear combinations of the inputs, even beyond those that would be obtained by directly considering the relationship f(x) between the input nodes and phenotype node. This is because, in contrast to linear functions, the superposition of nonlinear functions can create novel nonlinear functional relationships. Therefore, if one seeks to use this type of forward-simulation approach to model phenotype, the interaction terms in the model should be nonlinear in some manner. If they are not, then the phenotype output can be best predicted by simply regressing against the N inputs. Lastly, there may not be need to build a model if its predictions can be easily measured experimentally anyway. If one is predicting a complex phenotype like cell migration speed, then it is logical that identifying a small set of signaling measurements that are predictive for cell speed, and could subsequently be measured in lieu of cell speed in order to predict cell speed, is reasonable. However, if one is measuring a proxy for a complex phenotype (e.g., cleaved PARP as a proxy for apoptosis), and one identifies a small set of signals predictive for cleaved PARP, this may not be worthwhile if measuring the signals required to predict cleaved PARP is as time-consuming or otherwise costly as measuring cleaved PARP itself. While the latter case may aid understanding about signals driving PARP cleavage, it may not be useful if one hopes 34 to develop more easily measured proxies or biomarkers for complex cell phenotypes. In other words, if your model input is more complicated than your model output, it may not be a useful model in terms of experimental effort. 1.3.1 Identification of drug targets A primary motivation for this thesis was the idea that, given a network model derived from high-throughput cell signaling experimental data, ideally collected in concert with relevant phenotypic data, we could develop methods for the de novo prediction of useful drug targets. In the context of differential equation models of cell signaling, sensitivity analysis had been used to identify putative drug targets [34]. However, the key to identifying drug targets for cancer is having some notion of what signaling features are important to the cell phenotype. To do this, an experimental data set should include phenotypic data directly or a proxy for phenotype that can be compared to other measured signals. If such data are available, one could pursue the methods outlined in the previous section for modeling phenotype, and then seek to perturb nodes highly predictive of phenotype to determine if they actually affect phenotype. However, in many experimental data sets used in this thesis, neither phenotypic data nor a useful proxy was often available. As such, one simply has a network model of signal-signal relationships from which to predict useful drug targets. This is a task for which, to my knowledge, no great solution yet exists. One can search the literature for notions about how certain signals may be related to a phenotype of interest; but if that is the case, then one really has a proxy for phenotype and can thus proceed as outlined previously. In the case where one has only signaling data but no clear phenotype proxies, one could consider graph theoretical notions of robustness to hypothesize useful drug targets (e.g., ref. [35]). Further, one could consider the "druggability" [36] of particular measured signals to reduce the set of putative drug targets to only those that could likely be pursued clinically. In spite of these challenges, a novel application of network models to drug target discovery, which is not explicitly dependent on identifying nodes in a network model that may 35 be useful drug targets, is described in Chapter 4 of this thesis. 1.4 Overview of thesis contents By and large, this thesis has focused on three computational approaches: inferring so-called network models, predicting the values of an output signal (in terms of continuous, experimentally measured units) given one or more input signals, and quantifying the enrichment or depletion of features in data subsets. These methods were applied to cell biological data from a variety of experimental systems using a variety of measurement technologies, but in all cases involved the investigation of signaling downstream of activated receptor tyrosine kinases in vitro using cancer cell lines or engineered cell lines relevant to cancer. Chapter 2 studies signaling downstream of epidermal growth factor receptor (EGFR) using microwestern arrays, a technology developed by Professor Richard Jones at the University of Chicago. The modeling portion of the paper studied signaling relationships among 15 phosphorylation sites on 10 receptor tyrosine kinases plus two sites on Src kinase. Using Bayesian networks and two mutual information-based network inference algorithms, network models were developed that hypothesized receptorlevel crosstalk downstream of EGFR activation. Further computational analyses provided insights into how inference quality varies as a function of data set size, how prior knowledge can be used to restrict directionality in the Bayesian network, how Bayesian network complexity varies as a function of the maximum number of parent nodes allowed per child node, and how the identified discrete parent-child relationships translate to the continuous data space. Chapter 3 studies signaling downstream of five different receptor tyrosine kinases in the context of the epithelial-mesenchymal transition using a bead-based immunosandwich assay. Cell migration speed data were collected as well, although not at the same frequency as the signaling data. Pearson correlation was used to derive network models specific to the epithelial versus mesenchymal states. Feature selection was applied to PLSR models to identify reduced sets of signals that predicted 36 cell speed more accurately than the full 11-signal PLSR models. The enrichment and depletion of particular sites in the high-scoring "reduced" PLSR models was quantified. And linear regression models were built, using only one or two sites as predictors, which also had better prediction accuracy than the 11-site PLSR models. Importantly, the signals identified in the reduced PLSR and linear regression models could be linked to known differences in epithelial versus mesenchymal cell migration. These predictions were tested experimentally in the mesenchymal case and successfully validated. Chapter 4 studies signaling downstream of six different receptor tyrosine kinases using lysate microarrays. Receptor-specific network models were developed using five different network inference methods. The consensus across the methods revealed signaling network features that grouped the receptors into three classes. Using publicly available genomic and pharmacological profiling data, it was discovered that increased expression of same-class receptors or ligands correlated with insensitivity to drugs targeting other receptors in that class. The enrichment of one receptor class across cell lines derived from tumors with different histologies was quantified, suggesting clinical relevance of the receptor class. These results suggest that inferred network structure itself can serve as a multivariate classifier of the biological condition(s) from which the network was derived. In this manner, the inferred network structures provided a means for predicting useful drug targets: receptors with similar inferred network structures may compensate for one another following targeted inhibition of a sameclass receptor. Chapter 5 describes a theoretical project studying the features of experimental data that improve model accuracy. First using a simple two-variable toy model, we derive numerical and analytical estimates of linear regression model accuracy as a function of data quantity and features related to data quality. Next, using data simulated from more realistic 15-node synthetic networks, we show that Bayesian network inference accuracy can also be cast in terms of data quantity and features related to data quality. In particular, we describe how increasing the range over which the data are sampled can improve model accuracy, but only if the continuous data 37 are discretized in a manner that is consistent with the scale of heritable biological variation in the network. We describe an a priori discretization scheme, dependent only on experimental parameters related to the biological and technical variation in the data, that corresponds well with the best-performing discretization schemes in the simulated data. These results provide, to our knowledge for the first time, a discretization algorithm designed specifically to improve causal inference that is also described in terms familiar to experimental biologists. 38 Chapter 2 Systems analysis of EGF receptor signaling dynamics with microwestern arrays Note: This chapter is based on a previously published paper, Ciaccio et al. (2010) [37]. The author contributions for that paper are as follows: C.P.C., M.F.C., and R.B.J, designed the experiments. C.P.C. and M.F.C. performed the cell culture, and growth factor stimulations. M.F.C., and R.B.J. designed the micro-western array method, M.F.C. carried out microwestern experiments and organized the data into heat maps. J.P.W. and D.A.L. performed Bayesian network, CLR, and ARACNe analysis of the data. M.F.C., J.P.W., D.A.L, and R.B.J. wrote the original manuscript. 2.1 Introduction Systems-level understanding of protein functions in biological processes remains a challenge. The western blot [38] is a powerful protein analysis method because the electrophoretic separation step allows for reduction in sample complexity, and the antibody detection step then results in signal amplitude proportional to the abundance of the immobilized antigen at a physical location on the detection membrane that can be related to molecular size standards. Because western blots require a relatively 39 large amount of sample and a great deal of human labor, they have been of limited utility in large-scale protein studies. Reverse-phase lysate arrays (RPAs), performed by arraying lysates directly on nitrocellulose- coated slides and probing them with antibodies, are useful for quantifying large numbers of proteins from limited amounts of material such as in biomarker discovery [39, 40]. In contrast to western blots, however, RPAs lack confirmatory data for signal veracity; in a side-by-side comparison of measurements from RPAs and western blots, only 4 of 34 phospho-specific antibodies examined had generated equivalent information [18]. The authors of the study had concluded that antibody cross-reactivity contributed substantial noise to RPAs, confounding true protein measurements. Many antibodies have been validated for use with the Luminex xMAP bead-sorting system, but this approach requires ~1,000-fold more cell material per protein analysis than RPAs, and the cost of detection reagents per protein is -30-fold greater. Flow cytometry permits a (relatively small) cohort of proteins to be examined simultaneously in individual cells; this multiplexing feature has been exploited with Bayesian network modeling to predict new signaling network causalities [16]. In contrast to antibody-based methods, mass spectrometry can be used to identify new proteins. Using mass spectrometry, thousands of peptides have been assessed in lung cancers to identify commonly activated receptor tyrosine kinases and downstream signaling pathways [41]. Relative abundances can be examined quantitatively using isotopic labels across time points, cell types or perturbations as in examination of phosphorylation dynamics of HeLa [42] and mammary epithelial cells [43] after epidermal growth factor (EGF) or heregulin treatment. However, the large sample amount required by mass spectrometry can limit the number of conditions that can be analyzed; -10' cells are typically required for a mass spectrometry experiment [41] versus ~105 cells for an immunoblot or ~103 cells for RPAs [44]. Here we describe microwestern arrays (MWA), which combine the scalability of RPAs and retain vital attributes of western blots for highly multiplexed proteomic measurements: reduction of sample complexity and signals that can be related to protein size standards. In combination with suitable pan- and modification-specific 40 Treat cells with EGF 1 min ~ 0 min0 '"*** 5 min 15 min 30 mine Lyse cells Ls el Ar yae andlde to gel 6 0 min semidry electrophoresis Transfer to nitrocellulose Probe with 96 antibodies Figure 2-1: Microwestern array (MWA) method. Schematic of the procedure. antibodies, dynamics of protein abundance and modification may be simultaneously monitored across many samples. We demonstrate that MWA in combination with computational modeling techniques can yield useful systems-level biological insights for EGF receptor (EGFR) signaling dynamics. 2.2 2.2.1 Results Fabrication of MWAs Our strategy (Fig. 2-1) allows us to compare protein abundances and differences in post-translational modifications for cells stimulated under different conditions. To interface the microscopic western blots with microtiter-based liquid handling methods, we printed cell lysates via a noncontact microarrayer on gels in 96 identical blocks with dimensions of a 96-well plate [45]. Using these dimensions, 6 different lysates may be examined with 96 different antibodies or 24 different lysates may be examined with 24 different antibodies. To increase the migration rate of large proteins and slow the rate of smaller ones, we used an acetate running buffer, obviating the need for a 41 stacking gel. For each spot, 6 nL of sample was arrayed over the same gel position ten times, allowing for greater spotting density and signal than microdepositing the entire 60 nL in a single dispense. We arrayed one spot of size standard and six spots of experimental sample at 1 mm pitch at the top edge of each block. After printing, we subjected the samples to semidry electrophoresis for 12 min and then transferred them to a nitrocellulose membrane. We placed the membrane in a 96well gasket (Arrayit) to isolate each set of 6 separated lysates and then incubated each block with a different antibody. After incubation with dye-labeled secondary antibody, we scanned the blot using an infrared fluorescence scanner. This format allows interrogation of 192 antibodies in 6 samples when two antibodies from different hosts (for example, rabbit and mouse) are used. A total of 1,152 antibody-sample readouts is therefore possible per MWA device. Each spot measurement required ~.1,000 cells (equivalent to 250 ng of protein) and 16 ng of detection antibody, thus enabling analysis of ~4,000 protein abundances from the -1 mg of A431 cell lysate collected. 2.2.2 Validation of MWA method We compared the resolution and linearity in signal of MWAs with macroscopic gels using the Odyssey labeled protein molecular weight standard (LI-COR) (Fig. 2-2a). For proteins of 150, 50, and 25 kDa, the intensity of each ladder spot was proportional to the fold dilution over two orders of magnitude for both methods (Fig. 2-2b,c). The coefficient of variation from arraying, rehydration and transfer of a single band of the LI-COR ladder across the area of the membrane was < 9%. We then tested the linearity of signal response in quantifying proteins from A431 human carcinoma cell lysates using a two-stage fluorescence immunodetection system (Fig. 2-2d,e). We used five phospho- and two pan-specific antibodies to analyze 15-175 kDa proteins in EGF-stimulated A431 cells lysates. All MWAs showed a linear relationship between relative antigen concentration and signal intensity over their detectable range (100- to 1,000-fold). Assuming an expression level of 1.2 x 106 receptors per A431 cell [46] EGFR was detectable down to one cell equivalent (-2 42 a b Dumon DIMon (kD)t (kDa) kDa 3S-150 kDa C25 V 2 37 12 mm 8s MM C d 4 3 5 2=0.975 02 e5 4 9 2 1 = 0.967 -2 -1 0 09 relatieW concentration R- GAPDH r37,hr46 p-Erk12(Thr202,Tr204) ep-4E-BP1 I "~z p-4EBP1 (Thr37,Thr46) 15-20 kDa p-Erk1/2 (Thr202,Tyr24) kDa 42 p-EGFR (yr845) 175 kDa 0.987 e2= .22 = 0.9 8 - -= =O0.996 0.993 - 2 * 3--1p0 7 .p-Akt(Ser473) 1 0.9182 2-22-120 log retw uicntato A431 ya*s dilution GAPDH p-EGFR(Tyr84) 1 *R 0.987 * * , #0.982 2 2N 8' 2 . 0.964 ka .O0.993 -R.o.990 e p-Akt (S473) 60 kDa 0 - -2 -1 log elatiW nntradon Figure 2-2: MWA validation of linear response. (a) Traditional 10% SDS- PAGE of 5 pL aliquots (left) and MWA of 60 nL (right) of twofold serial dilutions of the Odyssey protein ladder. (b,c) Median net signal intensities quantified for the indicated bands of the Odyssey protein ladder in the traditional western blot (b) and MWA (c) in a. (d) MWA analysis of twofold serial dilutions of lysates from A431 cells stimulated for 5 min with 200 ng/mL EGF and probed with seven rabbit primary antibodies directed to indicated proteins and detected with goat anti-rabbit Alexa Fluor 680-labeled secondary antibody. Arrows to the left of gels point to the spot that was quantified. Orange circles depicted to the left of gels indicate positions of protein molecular weight standards. Numbers to the left of arrows indicate known sizes of the proteins in the Odyssey protein standard adjacent to the quantified spots. (e) Median net signal intensity of each band versus relative concentration from the gels in d. 43 attomoles; -340 femtograms). We assumed linearity for all further analyses. 2.2.3 Comparison of macrowestern blots and MWAs To compare performance of MWAs with macrowestern blots for monitoring phosphorylation dynamics, we selected a representative test set of 11 antibodies. Four had been previously shown to generate equivalent quantitative data by RPAs and western blots [18]; another four had been shown to result in substantial compression of dynamic range by RPAs owing to antibody cross-reactivity [18]. Measurements we obtained by MWAs were similar to those obtained by macrowestern blots for all antibodies (Fig. 2-3) and did not display the dynamic range compression observed for RPAs. For many protein phosphosites, including EGFR, IRSI and AKT, we observed bands at the predicted size as well as at additional sizes that could obscure quantitative measurements by RPAs. The precision in estimating sizes of proteins >100 kDa by MWAs was ±10 kDa, and for smaller proteins ±5 kDa. Although we could determine protein sizes with precision approaching that of a standard western blot, proteins were not completely resolved unless they differed by more than the following: 75 kDa for >200 kDa proteins; 50 kDa for 100-200 kDa proteins; 25 kDa for 50-100 kDa proteins; and 10 kDa for <50 kDa proteins, corresponding to a migration distance of about 1.5 mm, twice the diameter of spotted protein (Figs. 2-2d and 2-3). Resolution equal to a macrowestern blot could be obtained by electrophoresing the samples for -1.5 times the distance (Fig. 2-2a). 2.2.4 Application of MWAs to analysis of EGFR signaling network To examine EGF signaling dynamics using MWAs, we chose antibodies to a wide range of phosphosites to monitor many molecular biological processes (see Supp. Fig. 1 and Supp. Table 1 in ref. [37]): early positive growth factor response regulators, negative signaling regulators, downstream proliferation indicators, nutrition-sensing indicators, adhesion and migration indicators, phospholipid and calcium-state indi44 MacrOwsMM - Macroweten Microwestmn jj=01 if" afts - I p-EGFR 30IA 0 20 460 f0 M"45)Time p-EGFR (1yrlOS) 175 IcDa (min) 2D 40 60 k(mn 0 8 p-EGFRr 1k173) 0 20 406 Tk" ("'in) (%lke 1o,42ka 0 20 40 60 m (min) -21o 20 40 60 Tm (mini) 6 54 p-Akt1 (6r473)0 60 k 10 p-ER/2 . . . 40 2 kDa 0 . . W0 (mli) p-MekTkm (Ser217,8ef221) - 20 40 4ASmmm. 0 p-P9RSK (Ser360) 90kDa 20 40 60 Tim (min) 2 -0---. as koa Time (min) 2020 40 60 704Ek mm (IVM7) Figure 2-3: Comparison of MWA to traditional western blot. Indicated samples were analyzed by traditional western blots (left) and in triplicate in MWA format (middle). Lysates from A431 cells stimulated with 200 ng/mL EGF and lysed at the indicated times after stimulation. #3-actin monoclonal mouse primary antibody (detected with IR800 (LI-COR) secondary antibody; green) was probed with each of the eleven rabbit primary antibodies (detected with Alexa Fluor 680 secondary antibody; red) polyclonal antibodies to demonstrate equal loading of each sample. An arrow indicating the band quantified is indicated to the left of the blot along with the corresponding sizes of the LI-COR protein standard. Fold change in fluorescence signals was quantified for both formats (right). Error bars, s.e.m. of the three technical replicates of the microwesterns shown. 45 40 L 0 1 5 15 30 60min &\ \\ | /7 0 1 15 30 n min Figure 2-4: An MWA containing 6 cell lysates probed with 192 antibodies. The red channel (700 nm laser) shows the stimulation of A431 cells with 200 ng/mL EGF probed with a panel of rabbit anti-human polyclonal antibodies detected with Alexa Fluor 680-labeled secondary antibodies. The green channel (800 nm laser) reflects a scan of the samples probed with mouse monoclonal anti-human 0-actin antibody detected with IR800-labeled secondary antibodies to demonstrate the consistency of printing across the area of the membrane. L, indicates Odyssey protein molecular weight ladder; numbers indicate the time after EGF stimulation that the indicated samples were collected. The boxed areas of the red channel image (magnified on the left) were probed with a rabbit-derived antibody that recognizes the doubly phosphorylated Ser240 and Ser244 of S6 ribosomal protein (top) and with a rabbit-derived antibody that recognizes EGFR(Tyr1068) (bottom). The boxed areas of the green channel image (magnified on the left) were probed with mouse-derived 0-actin antibody. For layout of antibodies for the entire image, see Supp. Table 2 in ref. [371. Center-to-center distance between arrayed spots was 1 mm. 46 cators, stress indicators, and transcription and cell-cycle indicators. To observe signaling dynamics at doses approximating physiological levels [47], we stimulated cells with 2, 50, 100, and 200 ng/mL EGF. We performed a mock stimulation to distinguish EGF-mediated signaling events from nutrition-related events. We probed all wells with a combination of rabbit and mouse antibodies to observe temporal dynamics in phosphorylation and control for variation in loading (Fig. 2-4 and see Supp. Table 2 in ref. [37]). The coefficient of variation from arraying, rehydration, transfer, binding of primary antibody and secondary antibody was <17%. We quantified 91 phosphosites from 67 proteins and 18 pan-specific protein abundances. We analyzed a total of 75 proteins in technical triplicate replicates resulting in -9,800 signaling observations. Sufficient lysates remained for many subsequent analyses. We recorded integrated intensity, signal-to-background ratio and inferred sizes from spots detected with each antibody (see Supp. Table 1 and Supp. Figs. 2,3 in ref. [37]). Seventeen of 91 phosphosites that we quantified here had been previously quantified in one recent mass spectrometry report using pan-phospho enrichment [42] and 22 phosphosites had been quantified in another study [48] using phosphotyrosine-specific enrichment (see Supp. Table 3 in ref. [37]). Many ubiquitous EGFR signaling proteins that we quantified by MWAs, including Tyr845 phosphorylation on EGFR (p-EGFR(Tyr845)), p-SHP2(Tyr542), p-p70S6K(Ser371), p-Raf(c-)(Ser338), p-p90RSK (Ser380) and p-Stat3(Ser727), had not been quantified in either mass spectrometry study [42, 48], suggesting that mass spectrometry detects only a fraction of phosphorylation events elicited by EGF. Of the 91 phosphosites that we quantified here, only four had been quantitatively measured in an equivalent manner as western blots by others using the RPA method [40]. 2.2.5 Comparison of signaling network at different EGF input levels We next asked whether biological insights could be revealed using the MWA method. We organized five clusters of signaling profiles based on the time after stimulation 47 EGF 200 ngmf~ Time (min 100 ingm~ 0 1 51536001 50 515300 1 51 yrU7) (190 kDe) k0a) lb.) (185 ks) kDa) 1 8~2k~a) a )k8 82 ( k k) ka ) 40,42 kDu) Er2) (40 kDs) 301) (74 _ eD) E ~~a) 8 0 1 51530600 1 5153000 1 5153000 1 5153060 01 5153060 C*MWt 0 2 0 24 4 6 6 8 10 810 Figure 2-5: A clustered heatmap profile of fold changes for antibody bands representing specific phosphorylation sites of proteins in A431 cells over the indicated six time points for four EGF stimulation concentrations and the no-EGF control. The net fold change is color-coded as indicated in the legend. Antibody bands were grouped into six clusters according to the time point at which maximal fold change occurred. The antibodies are in descending order, sorted in each cluster by the value of the fold change at the 200 ng/mL EGF stimulation condition at the time point representative of that particular cluster. Antibody names are listed on the right with an approximation of band size. 48 at which maximal phosphorylation occurred (Fig. 2-5). Phosphosites within clusters were rank-ordered by fold-change. At the 2 ng/mL EGF input level, we observed several phosphosites from EGFR, ErbB2, PLC-y, Gabl, Mek, p90RSK, p70S6K and Crkl that were absent upon mock treatment (Fig. 2-5 and see Supp. Figs. 4, 5 in ref. [37]). Conversely, many phosphosites related to phosphoinositide signaling displayed substantial fold change in mock stimulation but not EGF treatment, including sites from PDK1 and its downstream targets AKT, PKCy and PKC6; downstream targets of AKT including mTOR and FOXO1; and mTor substrate p70S6K and its downstream target S6 ribosomal protein. We speculate that activation of PLC-y after EGF stimulation led to hydrolysis of phosphatidylinositol 4,5-bisphosphate, causing downregulation of PDK1 and AKT. Reduced AKT activity could produce the observed A431 cell-cycle inhibition [49] through decreased phosphorylation (and therefore increased inhibitory activity) of cyclin-dependent kinase (CDK) inhibitors, including CDKN1A(Thr145) and CDKN1B(Thr157). Consistent with this notion, insulin-like growth factor (IGF), which stimulates P13K and AKT, is also a potent mitogen for A431 cells [47]. We then asked how the dynamic range and timing of the EGF signaling network were influenced by EGF input amount. The first 'wave' of phosphorylation peaking at 1 min after EGF input included 33 tyrosines from EGFR and other receptor tyrosine kinases (RTKs) and membrane-localized proteins (Fig. 2-5, and see Supp. Figs. 3, 4, and 5 and Supp. Table 1 in ref. [37]). At 5 min after EGF input, we observed serine and threonine sites from downstream kinases and transcription factors including Raf, MEK, p70S6 kinase, mTor, and ATF2. At 15 min after EGF input, we observed phosphosites from Erk, P38 MAPK, and cell cycle-related kinases and substrates. Sites with phosphorylation peaking at 30 min included those of the Crkl adaptor protein and MAPKAPK2, a substrate of P38 MAPK. Proteins with sites peaking at 60 min included the PDK1 substrates AKT and PKC3, and the AKT substrate 4EBP1, among others. The timing of most phosphorylation events was not affected by EGF concentrations. 49 a C OLR I *APACNO W4TP p- 0* + Tv wo ARACW Orly APACNI -EeFRO6,7 U) 14IMs p-~(1004) AW p-ET"3TrCt23) L p4?QFP4WO8% p-EGFP4roIOM p4UMM"173) "O"RT1144M P-EFW84VI221101222) P-EFO84M"254 p-FGFRI(VM.T(A54) (145W* p-FGFF"(I NM.TpW) (IODWOO p4GFlFM("135:WI3%.148FM(WIW.TIMIBI) P40T(TYMO) p4AET(Ty"234.Ty"235) P-MET("34% p4MGFPA(TIr6M,PDGFFW(TV@M P-POGFRA(TVM) P-PDQFR9("000) P-SRC(TM16) P-SpWrOwn M CLA CLRonly ZAI OP/ ,5e Figure 2-6: Consensus model of EGF receptor level influences modeled by Bayesian network inference with comparison to ARACNe and CLR. (a) A consensus model of the EGF signaling network obtained by exact Bayesian model averaging following Bayesian network inference. Significant (p < 0.001) positive edges (green), significant (p < 0.01) negative edges (red, blunt edges), and interactions with a nonsignificant correlation coefficient (black) are shown. Edges for which the directionality could not be determined using equivalence class analysis are shown as undirected. (b) Heatmaps show the undirected adjacency matrices comparing the Bayesian network to the ARACNe and CLR networks. An edge between node i and node j is represented by matrix value (i, j). Because the undirected networks are compared, the adjacency matrix is symmetric across the diagonal, and thus only the lower triangular matrix of the adjacency matrix is shown. Edge weight thresholds were set to >0.3 for the Bayesian network and ARACNe (using ARACNe data processing inequality parameter T = 0.03) and to Z >1.13 for CLR. Eight of 11 edges present only in the Bayesian network and not in the ARACNe network would induce three-node triplets in the ARACNe network, which is precisely what ARACNe is designed to prune out. (c) Venn diagram comparing edges across the three networks. The ARACNe network forms a complete subnetwork of the CLR network and a near complete subnetwork of the Bayesian network, which forms a near complete subnetwork of the CLR network. 50 2.2.6 Bayesian network modeling of receptor layer connectivity To elucidate the directional influences among phosphosites, we applied Bayesian network modeling approaches to phosphosites from proteins representing cell membranelevel influences of the EGF signaling network. This permitted us to verify known influences and identify new directional relationships underlying receptor-level crosstalk. Bayesian networks are graphical representations of conditional independencies in a probability distribution over a set of variables [29] and can potentially be inferred from experimental data such as those generated by MWAs. The network we analyzed comprised 17 phosphosites: two from the Src kinase and 15 from the ten RTKs for which we specifically observed fold-change measurements with all four EGF treatments and for which the basic local alignment search tool (BLAST) predicted little similarity with the 57 other human-genome-encoded RTKs and thus indicated a relatively low probability of antibody cross-reactivity (Fig. 2-6 and see Supp. Table 4 in ref. [37]). We considered each time point as an independent sample of the EGF-stimulated network state, giving 20 samples for each phosphosite (4 conditions across 5 nonzero time points of one biological replicate), and we normalized all data to the zero time point. Only 17 phosphosites were considered for Bayesian network analysis, even though 91 phosphosites were measured in total, because the inference algorithm we used [50] performed exact Bayesian model averaging, and thus could only model networks of about 20 or fewer nodes due to computational limitations [51]. We chose all phosphosites measured on receptor tyrosine kinases (RTKs) in the data set, and the two sites on Src kinase, because of reports in the literature of RTK coactivation in cancer [52], and the role of Src family kinases in RTK signaling [53]. We hypothesized that by inferring a signaling network among the RTK and Src sites, we could gain insights into putative receptor-level signaling influences downstream of EGFR activation. Given typically limited amounts of data, a variety of graph structures can be generated by Bayesian inference modeling that describe the data reasonably well, so 51 a consensus model is often sought rather than aiming to find a unique best-scoring graph [29]. Accordingly, we created a consensus model (Fig. 2-6) containing only edges with a score >0.3, derived from exact Bayesian network model averaging over all directed acyclic graph structures having at most three parents per node [50, 51]. By considering only those directed acyclic graph (DAG) structures in the equivalence class of the consensus model with a directed edge from p-SRC(Tyr4l6) to pEGFR(Tyr845), we determined directionality of the remaining compelled edges [54] (see Methods). Signs of directional influences (positive versus negative) could also be discerned. EGFR(Tyr845) is a known Src kinase substrate that is not phosphorylated by the EGFR kinase [55]. We used this prior knowledge only to distinguish edge directionality in the equivalence class; we used no prior structural knowledge to derive the consensus model. The three linked root nodes from which we derived most downstream influences in the graph structure included p-SRC(Tyr416), p-EGFR(Tyr845) and p-PDGFRB(Tyrl009). The model suggests that the EGFR and PDGFRA,B influence one another, with pEGFR(Tyrl068), p-EGFR(Tyr1173), p-ERBB2(Tyr1221,Tyr1222) and p-KIT(C)(Tyr719) depicted directly downstream of both p-PDGFRB(Tyr1O09) and p-EGFR (Tyr845). Notably, PDGFRB has previously been described to heterodimerize and transactivate the EGFR [56] in response to PDGF, even in the presence of a PDGFR inhibitor. Whereas others have previously suggested A431 cells lack PDGFR expression, we observed bands at the predicted molecular weights using several phospho- and panspecific antibodies directed at the intracellular region of the receptor (see Supp. Figs. 1 and 6 in ref. [37]). Notably, the model depicted the phosphosite representing the activation loop tyrosine of either PDGFRA(Tyr849) or PDGFRB(Tyr857) (which, owing to homology, we could not distinguish by the antibody in our assay and hereafter designated as p-PDGFRA(Tyr849),PDGFRB(Tyr857)) to lie downstream of p-MET(Tyr1349), a root node and p-EGFR(Tyr1173), which was downstream of the root nodes pEGFR(Tyr845) and p-PDGFRB(Tyr1O09). p-EGFR(Tyr1173) first displayed ro- bust phosphorylation upon addition of 100 ng/mL EGF, the same concentration at 52 which the activation loop of PDGFRA,B first displayed phosphorylation; at low EGF amounts, Src kinase may mediate the phosphorylation of some PDGFR sites other than Tyr849,Tyr857, but at higher EGF amounts, the PDGFR kinase itself becomes activated through a mechanism involving or concurrent with the phosphorylation of p-EGFR(Tyr1173). p-EGFR(Tyr1068), modeled to be upstream of p-EGFR(Tyr1086), p-ERBB4(Tyr1284) and both p-FGFR1(Tyr653,Tyr654) activation loop isoforms, was distinct among EGFR sites in displaying maximal phosphorylation at 5 min and sustained phosphorylation amplitude for the duration of the time course. The edge directed from p-EGFR(Tyr1068) to p-FGFR1(Tyr653,Tyr654) (145 kDa) displayed a relatively high edge score (0.80; see Fig. 2-7 for all consensus Bayesian network edge weights), similar to that between p-EGFR(Tyr1068) and p-EGFR(Tyr1086) (edge score of 0.89), suggesting that EGFR can mediate FGFR1 activation. We speculate that the 145 kDa and 100 kDa forms of FGFR1 represent hyper- and hypo-glycosylated forms of the receptor, respectively. Hyperglycosylation of FGFR1 has been shown to inhibit its interaction with both FGF2 and heparin-derived oligosaccharides [57], which has been predicted to decrease its activity. Our model depicted only the 100 kDa form phosphosite to have downstream targets among the 17 phosphosites modeled. The only site negatively regulated in the model was p-PDGFRA(Tyr754), which recruits the SHP2 phosphatase [58] resulting in dephosphorylation of RASGAP recruitment sites on PDGFRA and B and increased MAPK signaling. Therefore down-regulation of p-PDGFRA(Tyr754) would be predicted to decrease MAPK signaling. Consistent with previous reports [59], our model suggested that p-SRC(Tyr527), a known inhibitory site of Src kinase, is disconnected from the EGF network. To corroborate the Bayesian network results, we also inferred network connectivities using the 'algorithm for the reconstruction of accurate cellular networks' (ARACNe) and 'context likelihood of relatedness' (CLR) algorithm [28, 27]. ARACNe and/or CLR also identified 22 of 24 edges in the Bayesian network, though as undirected edges because these latter methods are based on mutual information notions (Figs. 2-6b,c, 2-8, 2-9). Experimental evidence suggests that consistency across net53 Figure 2-7: Bayesian network consensus model edge weights. A graphical depiction of the consensus Bayesian network model showing all edges with an exact marginal posterior probability >0.3. EGFR phosphorylation sites are shown in blue for visual clarity. The model is a consensus of all Bayesian networks allowing a maximum of three parents per node, where the contribution of each Bayesian network to the consensus Bayesian network is weighted by the BDeu score of that Bayesian network (the BDeu score is simply a method for calculating the posterior probability of the Bayesian network model given the data). Thus, for a given edge Gij, we can compute the likelihood that edge is present given the data, D, by summing over all possible Bayesian networks, p(Gij = 1| D) = EG p(G I D)f(Gij), where f(Gij) is one if there is an edge from node i to node j in network G. If the resultant probability (edge weight) is close to one, that edge is found in nearly all high-scoring networks, whereas if the edge weight is low, that edge is found in few high-scoring networks. In the case where edges are shown in two directions (between EGFR(Tyr845) and PDGFR/3 (Tyr1009), and be- tween EGFR(Tyrl068) and FGFR1(Tyr653,Tyr654) (100 kDa)), this simply indicates that edges in both directions exceeded the 0.3 threshold; it does not indicate a cyclic interaction. Using this consensus network as a starting point, equivalence class analysis was performed. Thus, though the consensus network shown here has a directed edge from EGFR(Tyr845) to Src(Tyr416), there exist in the equivalence class of this consensus model Bayesian net- works with a directed edge from Src(Tyr416) to EGFR(Tyr845). Because EGFR(Tyr845) is a known Src substrate, the edge from Src to EGFR was chosen and used to restrict the directionality of all other edges in the model. Edges that could not be restricted were shown as undirected in Figure 2-6. 54 a BN edge weight >0.3 ARACNe (r=0.03) edge weight >0.3 BN edge weight >0.185 ARACNe (T=0.06) edge weight >0.3 C BN edge weight >0.3 CLR edge weight (Z) >1.13 BN edge weight >0.18 CLR edge weight (Z) >1.13 Figure 2-8: Graphical comparisons of the Bayesian, ARACNe, and CLR networks. Graphical representations of the optimal comparisons between Bayesian/ARACNe (top) and Bayesian/CLR (bottom) networks for both the restriction that the Bayesian network edge threshold be >0.3 (left column) and placing no restriction on the Bayesian network edge threshold (right column), but requiring it to be at most >0.4 so as to be a significant network result (see Fig. 2-10). Data permutation studies also indicated that the ARACNe threshold had to be at least >0.26 and the CLR threshold had to be at most >1.15. Given these threshold limitations, the optimal comparisons between Bayesian/ARACNe and Bayesian/CLR networks were determined. Green edges are shared between the Bayesian network and ARACNe (or CLR); blue edges are only in the Bayesian network; and orange edges are only in the ARACNe (or CLR) network. The ARACNe and CLR thresholds were >0.3 and >1.13 in both cases (restricting the Bayesian network threshold to >0.3 and not), but that was because those thresholds gave the optimal comparisons results, and was not enforced a priori. Note that, at the >0.3 threshold level for the Bayesian network (left column), the two edges found only in the Bayesian network, and not in the ARACNe nor CLR network, participate in threeparent interactions. This is logical, considering that ARACNe and CLR only consider undirected pairwise interactions. Similar results are seen for the lower Bayesian network threshold level (right column), where many of the edges found only in the Bayesian network participate in higher-order parent-child interactions. Additionally, it should be noted that 8 of 11 edges present only in the Bayesian network and not in the ARACNe network would induce three-node triplets in the ARACNe network, which is precisely what ARACNe prunes out using the Data Processing Inequality. Graph diagrams were generated using Graphviz (http://www.graphviz.org). 55 Compading Bayesian and ARACNe Networks P-EGFWYMMN and ARACNe -EMim QY4161 ARACNeanN Comparing Bayesian and CLR Networks N and CR CU only 4- QY527) Figure 2-9: Comparing inference algorithms when removing the restriction that the Bayesian network edge weight be >0.3. Heatmaps show the undirected adjacency matrices comparing edges in the Bayesian network to edges in the ARACNe and CLR networks, when the edge weight thresholds for all three algorithms were allowed to take any value, as long as that value gave a significant network result above the 99% confidence bound. An edge between node i and node j is represented by matrix value (i, j). Because the undirected networks are compared, the adjacency matrix is symmetric across the diagonal. The undirected form of the Bayesian network is shown to simplify the comparison to the undirected edges in ARACNe and CLR. A search over Bayesian, CLR, and ARACNe edge weight thresholds, along with the ARACNe Data Processing Inequality tolerance parameter r, found the optimal comparison between the networks. When comparing to ARACNe, the Bayesian network edge weight threshold of >0.185 was used and the ARACNe result used the Data Processing Inequality parameter r = 0.06 and edge weight threshold >0.3. When comparing to CLR, the Bayesian network edge weight threshold of >0.18 was used, and the CLR result used edge weight (Z) threshold >1.13. Just as was the case when limiting the Bayesian network edge weight threshold to >0.3, it should be noted that again 8 of 11 edges present only in the Bayesian network and not in the ARACNe network would induce three-node triplets in the ARACNe network, which is precisely what ARACNe prunes out using the Data Processing Inequality. This matrix representation is analogous to the graphically displayed networks shown in the right column of Fig. 2-8. At these threshold settings, the ARACNe network is a complete subnetwork of the Bayesian network and one edge short of being a complete subnetwork of the CLR network. 56 Bayesian Network O fo ( rd B1 were seO eeae B d u GA BA nisg data y pedn ing f IID Set a A networ tresul at"appie edewihtctf Significanedg BOA I A BE A B I B B 4 B 2 20-oin h riginal , es daasttoisl.Usnehstdt rnetomsuta appieddge rweigh 3leve dicret daa drwngnifomlyat ( Bayesn, ARACNe setgenratd bapeningtheoriina Figurerue 2-0:tst PO II in ee CLR ARACNe andau0-oitat moreneweightaothn and CLR ctoff netoa sult ato applelf.g o2-pifcnt(rigtfcly fore moeleae sinfromate. oea daases wo Daad temutaon eahsftde inaea (left the 500 permuted data sets were analyzed. Confidence bounds for the 90% (green curve), 95% (blue curve), and 99% (red curve) percentile levels were estimated as described (see Methods). The actual number of edges from each non-permuted data set as a function of edge weight cutoff is also shown (bold black curve). Edge weight cutoff values that gave significant and non-significant network results are shaded in green and red, respectively. 57 work inference methods improves edge prediction accuracy [60, 61], and in our case here, data permutation studies showed that the topologies inferred by the Bayesian network, ARACNe and CLR were significant (P < 0.01) (see Methods and Fig. 210). Because in the context of proteomic signaling networks it is problematic to make broad assumptions about edge directionality absent extensive prior knowledge (for example, concerning particular kinase-substrate relationships), we believe that predicting edge directionality using methods such as Bayesian network modeling offers an appealing advantage. 2.3 Discussion In contrast to RPAs, MWAs can reduce the complexity of lysates after arraying, minimizing effort in experimental scale-up. Most of the information of a traditional western blot can be obtained, using 200-fold less protein and antibody. MWAs should be useful for analysis of proteins from cell lines and tissues from which there are sufficient lysates to print hundreds of MWAs that could be distributed en masse in an analogous manner to spotted DNA microarrays for interrogation with the user's choice of antibodies. The only devices required after printing are commercially available 96well gaskets and an imager. The ability to obtain information regarding hundreds of proteins with the MWA method should allow advances in our understanding of cell context-specific networks underlying human disease when combined with appropriate computational modeling methods. MWAs could also be very useful for large-scale, systematic validation of antibodies. Antibody collections could be systematically verified for selectivity by examining lysates from cells transfected with a cDNA or depleted for the cognate protein by RNAi. The amount of antibody obtained from a single rabbit immunization (~5 mg) would be sufficient for over 100,000 MWAs, thus minimizing lot-to-lot variability of polyclonal antibodies. MWAs could be useful for current efforts to build a human protein atlas; samples from tissues used for in situ analyses could be examined with MWAs to verify that signals observed with each antibody resulted from proteins of 58 the predicted molecular weight(s). The ability to gather dynamic information regarding hundreds of proteins under many conditions poses new challenges for computational modeling. The Bayesian network described here represents direct and/or indirect effects of a given node on other nodes as indicated by high-probability connecting arcs, which are hypothesized to represent relationships of influence among the phosphoproteins in the network. Using prior knowledge to restrict edge directionality across a Bayesian network equivalence class, one can bolster the case for assigning directionality to these edges. To further support a case for interpreting network connections as causal, one could explicitly model the temporal data [62] and/or use interventional data [29, 51], which will be the subject of future inquiry. The timing and amplitude of phosphorylation dynamics observed here coupled with the connectivities modeled in the Bayesian network suggest several candidate sources of RTK coactivation, each of which may be important in specific cancer contexts: (i) direct dimerization and/or phosphorylation by EGFR or other downstream tyrosine kinases as suggested by the rapid phosphorylation kinetics of Src, ErbB2 and ErbB4, coupled with their close proximity at the top of the network; (ii) activation of proteases that activate precursor growth factors or latent RTKs as might be predicted from the delayed phosphorylation amplitudes of FGFR1 (100 kDa) and MET activation loop sites coupled with their distance from EGFR in the network; and (iii) inactivation of tyrosine phosphatases through oxidation by reactive oxygen species [63]. Phosphorylation of Tyr542 of Shp2 phosphatase displayed the high- est fold change of any site in our analysis; this site has been suggested to relieve inhibition of phosphatase [64] activity. The sustained phosphorylation of this and other tyrosine sites at EGF concentrations >50 ng/mL suggests that it (and other cysteine-based tyrosine phosphatases) may be inactivated at such concentrations, thus unmasking many tyrosine kinase activities. Each of these mechanisms may have distinct roles in the context of cancers that have become resistant to single kinase inhibitors; systems-level analysis of other tyrosine kinase-driven cancers may be helpful in revealing appropriate therapeutic targets. 59 2.4 Methods For a complete description of the experimental methods see Ciaccio et al. [37]. 2.4.1 Signaling network inference modeling Bayesian networks were modeled using a dynamic programming algorithm that computes the exact marginal posterior probability of edges in the Bayesian network derived from the dataset [50]. The algorithm was implemented using a modified version of the open-source Bayesian Network Structure Learning toolbox in Matlab [51]. Node conditional probability distributions were represented by multinomials using a uniform Dirichlet prior with equivalent sample size of one and a prior over graph structures was calculated by accounting for the number of ways to choose parents sets in a graph, as previously described [50, 65]. Networks were scored using the Bayesian Dirichlet likelihood equivalent uniform (BDeu) score [66]. The BDeu score accounts for both model fit and complexity and thus avoids overfitting the data. Although this dynamic programming algorithm introduces a non- uniform prior over graph structures, it has been shown to perform better at structure learning tasks [50, 51] than local search methods that use a uniform prior over graph structures, such as Markov chain Monte Carlo searches over directed acyclic graphs [67], as well as Markov chain Monte Carlo over node orderings, which uses a non-uniform prior [65]. All nodes were discretized using three-level k-means clustering to indicate low, medium and high phosphoprotein levels (see Supp. Table 5 in [37]). Clustering was done using the squared Euclidean distance metric and repeated 50 times for each node to find the optimal clustering assignments. It is believed that by using k-means clustering, we are better representing the physiological diversity in signaling states of the phosphoproteins in the network, compared to more arbitrary discretization 60 schemes, like interval and quantile discretization, that do not try to explicitly capture clusters in the data. CLR was implemented using Matlab code provided by the original authors, with Z scores (edge weights) calculated as previously described [27]. ARACNe was implemented using the minet package in R [68].To minimize the sources of variation between algorithms, the same discretized data that were used to learn the Bayesian network model were also used to learn the CLR and ARACNe models. The mutual information matrix for ARACNe was calculated using a simple histogram method in the minet package, and for CLR was calculated directly from the discretized data. The edge score thresholds for CLR and ARACNe were varied in an effort to maximize the similarity between the Bayesian network and ARACNe (or CLR), both given the >0.3 edge weight threshold for the Bayesian network (Fig. 2-6) and when this constraint on the Bayesian network edge weight threshold was removed, though in both cases staying within edge weight thresholds that gave significant network results (Figs. 2-8, 2-9, 2-10). The sign of the influences between nodes in the Bayesian network was estimated using pairwise correlation coefficients. Seventeen of 24 pairwise interactions had a highly significant (p < 0.001) positive correlation coefficient. Two of 24 had a significant (p < 0.05) negative correlation coefficient. The remaining 5 of 24 pairwise interactions had a nonsignificant (p > 0.05) correlation coefficient but were edges in two- or three-parent interactions, suggesting a simple pairwise correlation coefficient was not sufficient to capture the parent-child behavior. Notably, both negative interactions were directed at p-PDGFRa(Tyr754). Five of the six two-parent interactions (including all four with p-EGFR(Tyr845) and p-PDGFR/3(Tyrl009) as the parent set) were consistent with "and gate" behavior. The parent-child raw data from all one-, two- and three-parent interactions are plotted versus one another in Fig. 2-11. Considering up to three parents per node in the Bayesian network captured almost all higher-order interactions in the dataset (Fig. 2-12). Although additional higher-order interactions may be present but there are simply not enough data for the Bayesian network to infer them, it may also be that such higher-order interac61 21 to 5 o 21 It It 22 All pairwise interactions lo 1 5o to- - -- - - ---- o5o - - 0 0 p 1- 0 14o on-38 I.s09 V Two-parent interactions I K Three-parent interactions Figure 2-11: Estimating parent-child input-output logic within the Bayesian network. Pairwise (top), two-parent (middle), and three-parent (bottom) influences were estimated by plotting the raw data for each parent with the raw data for its child node or children nodes. Pairwise correlation coefficients were used to estimate the sign of interaction between nodes in the Bayesian network. Significant (p < 0.001) positive correlations are shown with green circles; significant (p < 0.05) negative correlations are shown with red plus signs; and non-significant (p > 0.05) correlations are shown with black triangles. Note that all five edges with non-significant pairwise correlation coefficients participated in a two- or three-parent interaction, suggesting that a simple pairwise measure may be insufficient to capture their parent-child relationship anyway. For pairwise interactions, the data for the parent is plotted along the x-axis and the child along the y-axis, to represent the output of the child node as a function of the parent node input. Nodes for which the directionality could not be discerned using equivalence class analysis are shown in the title of each window plot with a "-" instead of "->". The correlation coefficient and the p-value for that coefficient are also shown in the title of each window plot. For two- and three-parent interactions, the child node is plotted along the z-axis. For three-parent interactions, because one is limited to plotting only two-parent interactions in threedimensional space, the discretized levels of the third parent node are shown in blue ("low"), orange ("medium"), and red ("high"). When determining two-parent interactions, it was assumed for plotting purposes only that edges were directed from p-Src(Tyr4l6) to p-PDGFR,3(Tyrl009), from p-EGFR(Tyr845) to p- PDGFR3 (Tyr1O09), from p-EGFR(Tyr1068) to p- FGFR1(Tyr653,Tyr654) (100 kDa), and from p-FGFR1(Tyr653,Tyr654) (100 kDa) to pIGF1RB(Tyr1l35,Tyr1136), p-INSRB(Tyrl150,Tyr1151). 62 BN elip - we- tm-t Mam. number of pmentsw 2 mN eds we tmetrb Max. number of parents =1 BN OWmtr" Mm. number of parents BN edip we It matrk Max. number of permnts4 3 1"0! W.PWRI P4.0"I, 0- Ie d weightmeerk Mm. number of parentsa5 n eg weightma M. number of parents a ONedge weight metrk M. number of parents 7 MN edeweit mtrk MEL number of parentsa8 0 P-Mraw"" Figure 2-12: Parent constraint analysis for Bayesian network algorithm. Heatmaps of the Bayesian network adjacency matrices are shown for allowing from a maximum of one parent per node to a maximum of 8 parents per node. The weight of an edge from node i to node j is represented by matrix value (i, j). These Bayesian network edge weights represent directed edges (not undirected), and thus the adjacency matrix is not symmetric across the diagonal. All heatmaps are scaled to the same colorbar, shown on the right. These results indicate that, while much of the joint probability distribution of the Bayesian network is attributable to strictly pairwise (one-parent) interactions, additional higher-order interactions also contribute to the joint probability distribution. 63 I tions are indeed not present, regardless of how much data are available describing the network. ARACNe and CLR, which only consider undirected pairwise interactions, thus represent useful, but likely not complete, approximations of interactions in this dataset. 2.4.2 Testing for model significance Data permutation studies were performed to test the significance of the inferred network results. 500 permuted data sets were generated from the original discretized data set, in which the data for each node is permuted across conditions. In this way, correlations between nodes should be removed, but the actual number and type of discretized data for each node are the same. For each of the 500 data sets Bayesian, ARACNe, and CLR networks were generated. The same 500 permuted data sets were used across the three methods. For each network resulting from each permuted data set, the edge weight threshold was varied from 1 to 0 by 0.001 decrements, and the number of edges appearing in the network at that threshold was counted. For the CLR networks, because the edge weights correspond to Z scores, which are not bounded by unity, the edge weight threshold was varied between the maximum Z score obtained across all 500 permuted data networks, and then decreased to zero at decrements of 0.001 times this maximum Z score. By counting the number of edges in each network as a function of edge weight threshold, the fraction of the 500 networks containing at least N edges as a function of edge weight threshold was determined. This fraction was used as an empirical estimate of the likelihood of obtaining a network with at least N edges as a function of edge weight threshold. If, for particular number of edges N, it was never observed that exactly 10%, 5%, or 1% of the networks contained N edges at a particular edge weight threshold, then linear interpolation between the fractions that were observed (e.g., 1.2% and 0.8%) was used to estimate the edge weight threshold that would have given 1% of networks containing N edges at that edge weight threshold. In this way, 90%, 95%, and 99% significance bounds were estimated from the permuted data sets, for all three inference methods, describing how many edges one would expect 64 by chance (i.e., from permuted data) as a function of edge weight threshold. These bounds can be interpreted to suggest that X% of all permuted data sets would have at most as many edges as indicated by the confidence bound at a particular edge weight threshold. Thus, one must use a threshold at which the Actual data curve (black) is above the confidence bounds (Fig. 2-10). The data permutation studies indicate that, using the original 20-point data set, to surpass the 99% confidence bound (i.e., p < 0.01) one needs to use a Bayesian network edge weight threshold that is at most >0.4 (thus >~0.005 to >0.4 is above noise level), an ARACNe edge weight threshold that is at least >0.26 (thus >0.26 and higher is above noise level), and a CLR Z score threshold that is at most >1.15 (thus >0 to >1.15 is above noise level). To determine how many permuted data sets had to be analyzed to develop a reasonable estimate of these significance bounds, moving window mean and standard deviation values for these significance bounds were calculated when considering 20 to 500 permuted data sets, in increments of 20 data sets. For a particular significance bound, the values described the mean and standard deviation of the edge weight threshold expected to give a particular number of edges N. As expected, the standard deviation values approached zero as more permuted data sets were analyzed. Using cumulative values from the 4 8 1 st to 500 th permuted data sets (i.e.,data set 1 to 481, 1 to 482, 1 to 483, etc. up to 1 to 500), the average standard deviation values for the 90%, 95%, and 99% significance bounds (in the case of the Bayesian network using 20 data points) were (0.2 ± 0.6) x 10-3, (0.4 ± 1.2) x 10-3, and (0.6 ± 1.6) x 10-3, respectively, indicating that using 500 permuted data sets is sufficient to generate a useful estimate for the significance bounds. As a comparison, using cumulative values from the 1st to 20th permuted data sets, the average standard deviation values for the 90%, 95%, and 99% significance bounds were (16 ± 19) x 10-3, (18 ± 22) x 10-3, and (18 ± 22) x 10-3, respectively, values that are 30- to 80-times larger than the cumulative values from the 500 to th 481" permuted data sets. The standard deviation values calculated using the 5 0 0 th to 4 8 1" permuted data sets are shown as horizontal error bars in Fig. 2-10, but are 65 often so small they just look like vertical tick marks on the confidence bound curves. To consider a particular network inference result significant, the curve generated from counting the number of edges in the network derived from the original (nonpermuted) data should be above the confidence bounds (i.e., the inferred network should have more edges than one would expect by chance at a particular edge weight threshold). In all three algorithms (Bayesian, ARACNe, and CLR networks), the number of edges in the inferred network using the original 20-point data set is above the significance curve at the edge weight thresholds used to generate Fig. 2-6 (Fig. 210, top row). To ensure that a network derived from random data would indeed never exceed the significance bounds, a random data set was generated by drawing 20 data points of three-level discretized data (i.e., sampling 1's, 2's, and 3's) uniformly at random for each node. The three inference algorithms were applied to this random data set, along with the 500 data set permutation analyses as described above. Using this random data, we see that the network derived from the original (though random) data never surpasses the significance bounds (Fig. 2-10, middle row) for any of the three inference methods. This shows that, if the data were indeed random, the three algorithms would generate networks that did not exceed the confidence bounds at any edge weight threshold, and were thus not significant. To test the effect of data set size on the significance of the network inference results, a pseudo data set was generated by appending the original data set of 20 data points per node onto itself to give one data set of 40 data points per node. In this way, the data contained the same "signals" as the original data set (i.e., all of the observation counts in the original data set were now simply doubled), but was twice the size. The three inference algorithms were applied to this 40-point data set, along with the 500 data set permutation analyses as described above. The results are shown in the bottom row of Fig. 2-10. Using this larger data set, the Bayesian network result is now significant at all edge weight thresholds, because compared to the 20-point data set case the confidence bounds are now much lower and the curve derived from the actual data is also higher. 66 We see such a change in these curves because of 1) having 40 data points to permute instead of just 20, one is less likely to have significant correlations appear between nodes in the data set, and also 2) using the 40-point data set, the strength of the data relative to the prior is greater, and thus one obtains more significant edges at all thresholds. For ARACNe, the curve generated from the original 40-point data set is identical to the curve generated from the original 20-point data set (compare the top and bottom windows in the middle column). This curve does not change because ARACNe simply uses mutual information (followed by the Data Processing Inequality) to obtain the resultant network, and because the 40-point data set contains the same "signals" in the 20-point data set, the mutual information between the nodes does not actually change. It should be noted that, while the calculated mutual information matrix does not change when increasing the data set from 20 to 40 points, the error of that mutual information measurement does decrease by a factor of one-half (OC 1/20 vs. 1/40) [69]. As with the Bayesian network result, the confidence bounds for ARACNe using 40 data points changes because having 40 data points to permute instead of just 20, one is less likely to have significant correlations (mutual information) appear between nodes in the data set. This manifests itself by having lower values of mutual information between nodes in the resultant mutual information matrix. Thus, with 40 data points, the confidence bounds shift to the left compared to 20 data points, allowing one to consider edges with lower weights as significant in the 40 data point case that one could not consider significant in the 20 data point case. Thus, in the 40 data point case the entire ARACNe network result is considered significant, whereas in the 20 data point case there are two edges that have edge weights below the confidence bounds. For CLR, as with ARACNe, the curve generated from the original 40-point data set is identical to the curve generated from the original 20-point data set (compare the top and bottom windows in the right column). Again, this is because CLR uses mutual information, which in this case is based on counts of observations in the data. 67 Doubling the data set simply doubles the number of times each observation is seen, which does not change the mutual information matrix, but simply lowers the error of those mutual information measurements. However, quite notably, the confidence bounds for CLR when using the 40-point data set are essentially identical to the confidence bounds generated using the 20-point data set. This is a result of how CLR calculates its Z scores (edge weights). The essence of CLR is to put the network connectivities into "context". It does this by calculating Z scores, which correspond to mutual information (MI) values that have been mean-centered and normalized to unit variance (that is, the Z score is obtained by subtracting from each MI value the mean of MI values corresponding to all parents of each node, and then dividing by the standard deviation of MI values corresponding to all parents of each node). The mean and standard deviation of the MI values of the randomized data do decrease in going from the 20-point data set to the 40-point data set (as evidenced by the data permutation results for ARACNe); however, because CLR is normalizing these MI values, the resultant Z score actually does not change significantly in the 40-point data set compared to the 20-point data set. In addition to the "PLoS" method implemented in the CLR MATLAB code, this behavior was also seen using the Rayleigh, Normal, Beta, and KDE methods (data not shown). The Z scores tended to be slightly higher using the 40-point data than the 20-point data, particularly when using raw (non-discretized) data as input (data not shown). Nonetheless, in any case the resultant Z score did not change significantly with increased data. This invariance to increased data set size is seemingly a drawback of the "context" normalization of CLR. 2.4.3 Comparing different algorithm results Results comparing the networks from the inference methods are shown in Figs. 26, 2-8, and 2-9. The data processing inequality tolerance parameter, r, in ARACNe was varied from 0 to 0.20 by increments of 0.01, and then varied from 0.20 to 0.40 by increments of 0.05. Among the resultant 25 ARACNe graphs, the maximum similarity to the Bayesian network result using was found using 68 T = 0.03 (when restricting the Bayesian network edge weight threshold to be >0.3) and using T = 0.06 (when simply restricting the Bayesian network edge weight threshold to be at most >0.4 to be significantly above the confidence bounds). Similarity between the graphs was calculated by first converting the Bayesian network to its undirected form, and then taking the ratio of the number of edges shared between the Bayesian network and ARACNe (or CLR) to the number of edges present in the Bayesian network or ARACNe (or CLR) but not shared between the networks. This metric was then calculated as a function of both the Bayesian network edge weight threshold and the ARACNe (or CLR) edge weight threshold (though requiring the Bayesian, ARACNe, and CLR edge weight thresholds to be within the range that gave a significant network result above the confidence bounds), or just the ARACNe (or CLR) edge weight threshold were varied (again within the range that gave a network result above the confidence bounds) if the Bayesian network edge weight threshold was set to >0.3. It should be noted that 8 of 11 edges present only in the Bayesian network at threshold >0.3 and not in the ARACNe network would induce three-node -triplets in the ARACNe network, which is precisely what ARACNe prunes out using the Data Processing Inequality (DPI). When removing this restriction on the Bayesian network edge weight to be >0.3, the optimal comparison to ARACNe was found at Bayesian network edge weight >0.185 and ARACNe edge weight >0.3 with T= 0.06. Even with increasing the DPI tolerance parameter to T= 0.06, which should allow more three-node triplets in the ARACNe network, one still has 8 of 11 edges present only in the Bayesian network but missing from the ARACNe network that would induce triplets in the ARACNe network. Though the DPI tolerance parameter can be increased, using T = 0.06 and ARACNe edge threshold >0.3 was found to give the optimal comparison to the Bayesian network (with no restriction on the BN edge threshold) across all ARACNe threshold values and T values ranging from 0 to 0.20 (increments of 0.01) and from 0.20 to 0.40 (increments of 0.05). Additionally, it should be noted that, at the >0.3 Bayesian network edge threshold, both of the two Bayesian network edges present only in the Bayesian network and not in the ARACNe or CLR network (from p-IGF1RL(Y1135/1136)/p69 INSR/(Y1150/1151) to p-PDGFRa(Y754), and from p-PDGFRa(Y849)/p-PDGFRO(Y857) to p-MET(Y1234/1235)) were edges participating in three-parent interactions. It is feasible that such a higher-order interaction would be only inferred by the Bayesian network and missed by ARACNe and CLR, which only consider pairwise interactions. From these Bayesian, ARACNe, and CLR network comparisons, ARACNe has been shown to infer the sparsest graphs and CLR the densest graphs. The sparseness of ARACNe is likely mostly attributable to the Data Processing Inequality, which, while using the DPI is precisely the aim of ARACNe, appears to explain many of edges found by the Bayesian network and CLR but not by ARACNe. The density of the CLR graphs is likely attributable to the mutual information normalization procedure of CLR. While normalizing the mutual information does give one an indication of interaction strengths in the context of the background mutual information distribution, it may also tend to attribute significance to insignificant interactions as a result. This is supported by the permutation studies with CLR, in which high edge weights (Z scores) were given to random 20-point data sets as well as random 40-point data sets. These results suggest that Bayesian networks, given a tractable network size (for exact inference methods presently ~100 nodes given layering constraints [50], though typically <25 nodes otherwise; though there is no limit on inexact Bayesian network inference methods per se), may provide a balance between the possible false negatives of ARACNe and the possible false positives of CLR. However, applying all algorithms to any properly sized data set is not computationally or otherwise burdensome, and thus such algorithms need not be leveraged against one another, but rather with one another to provide the opportunity for maximal biological insight. 2.4.4 Equivalence class analysis for Bayesian network algorithm To determine which of the edges in the Bayesian network could be considered directional (compelled), the different directed acyclic graphs (DAGs) that were in the same equivalence class (i.e., different DAG structures that specify the same underly70 ing joint probability distribution) as the consensus model were enumerated. At the edge weight threshold used (>0.3), the consensus Bayesian network model contained two cycles, manifested as two sets of bidirectional edges between two nodes (Fig. 27). Thus, there were four ( 2 #ccle" = 4) candidate DAG structures represented by the consensus model. In this case all four represented valid (i.e., acyclic) DAGs, though that is not guaranteed. For each of those four DAG structures, all DAG structures in its equivalence class were enumerated using the Bayesian Network Structure Learning toolbox. Among all the DAGs in those four equivalence classes, only the subset of DAGs containing a directed edge from p-Src(Y416) to p-EGFR(Y845) were considered. Only those edges that were consistent (compelled) across all DAGs in all four equivalence classes containing that edge were considered directed in the final consensus Bayesian network model (Fig. 2-6). Any edges that did not have a consistent directionality across all DAGs in all four equivalence classes containing the Src to EGFR edge were shown as undirected in the final consensus Bayesian network model. The same procedure was repeated when finding the Bayesian network that was most similar to the ARACNe and CLR results. In that case, the Bayesian network edge weight threshold was reduced below 0.3, inducing even more bidirectional edges between two nodes. In that case, only those candidate DAG structures (among the possible 2 #cYcles candidate DAG structures) that were acyclic were considered for the subsequent equivalence class enumeration. Any edges that did not have consistent directionality across all DAGs containing the Src to EGFR edge in all equivalence classes were shown as undirected in the final Bayesian network model (Fig. 2-8). Additionally, if one is considering assuming directionality for particular edges that are undirected in Fig. 2-6 or Fig. 2-8, it is important to note that, in order for a graph to represent a valid Bayesian network, the directionality chosen for any particular undirected edge cannot induce cycles. 71 2.4.5 Parent constraint analysis for Bayesian network algorithm The maximum number of allowable parents was varied from one to eight and the resultant adjacency matrices were plotted (Fig. 2-12). This was tested to show that, as one allowed the algorithm to consider more parents, significant edge weights were given to the higher order parent interactions. Though the edge weights at higher parent limits are dominated by the 1-parent interactions (i.e., the adjacency matrix obtained when allowing a maximum of 1 parent dominates the edge weights when considering higher parent limits), there are nonetheless higher order parent interactions that begin to appear as high-scoring edges. In particular, there are distinct increases in the number of high-scoring edges when moving from the 1-parent limit to the 2-parent limit, from the 2-parent limit to the 3-parent limit, and then slightly when moving from the 3-parent limit to the 4-parent limit. Beyond the 4-parent limit (i.e., using a maximum of 5 parents to a maximum of 8 parents), there were only minor changes in edge weights. Thus, increasing the maximum allowable parents N from one to four produces a commensurate increase in the number of N-parent interactions showing up as having significant edge weights in the Bayesian network. But, after a certain point (here N = 4), the algorithm no longer produces significant additional connections having N > 4 parents, either because there is not sufficient data to support the presence of those edges, or those edges are indeed not biologically significant regardless of how much data one may collect. Thus we see that, while strictly pairwise (1-parent) interactions dominate the Bayesian network edge weights, there are additional higher order interactions that are inferred by the algorithm. These results suggest that inference methods that attempt to model only pairwise (though not necessarily just 1-parent) interactions (like ARACNe and CLR) will likely capture a portion of the network interactions, but will also miss some higher-order interactions. For this work, apart from this figure, all Bayesian network inference was performed using a maximum of three parents per node. The results here thus suggest that limiting the maximum 72 number of parents to three captures the majority of the joint probability distribution. Additionally, to give a concept of the Bayesian network algorithm running time, it took 7, 8, 12, 21, 54, 186 (3m 6s), 634 (10m 34s), and 2012 (33m 32s) seconds to run the inference algorithm on one data set (i.e., 17 nodes with 20 data points) when allowing a maximum of 1, 2, 3, 4, 5, 6, 7, and 8 parents, respectively. The algorithm was run in MATLAB using a desktop computer with a 3.06 GHz Intel Core 2 Duo processor and 4 GB memory. 73 74 Chapter 3 Signaling network state predicts Twist-mediated effects on breast cell migration across diverse growth factor contexts Note: Sections 3.1, 3.2.1, 3.2.2, 3.3.1, 3.4.1, and portions of 3.2.3 in this chapter were previously published in Kim et al. (2011) [70]. The author contributions for that paper are as follows: H.-D.K., A.W., F.B.G., and D.A.L. designed the research plan; H.-D.K., S.K.A., and A.S.M. performed the experiments; H.-D.K. and J.P.W. performed the computational modeling; H.-D.K., A.S.M., J.P.W., and D.A.L. wrote the paper. The remaining sections of this chapter were written by J.P.W., based on computational research designed by J.P.W. and D.A.L. and computational research performed by J.P.W. 3.1 Introduction In the phenomenon of epithelial-mesenchymal transition (EMT) [71], polarized epithelial cells loosen their cell-cell junctions and acquire the ability to migrate through extracellular matrices as single cells in a mesenchymal manner [71, 72]. Although great progress has been made on identifying and understanding components and 75 mechanisms involved in the process of EMT induction (e.g., [73, 74]), the "before" versus "after" consequences of this transition for signaling pathway control of cell migration has not yet been investigated from a multipathway, network-wide perspective. Cell migration results from a set of carefully orchestrated biophysical processes regulated by numerous key signaling pathways whose activities can be influenced downstream of a range of growth factor receptors. It is appreciated that these growth factor receptor-elicited signal- ing activities may be modulated in "before" versus "after" manner by EMT induction [75], whether by TGF# or other developmental cues or inflammation-related stimuli [76, 77]. However, a current challenge is to characterize this likely complex modulation from a multipathway network perspective and to establish an approach for predictive understanding of how the multiple pathway activities integrate to yield different migration behavior in post-induction compared with pre-induction conditions. This challenge is especially important for, among other motivations, gaining insights concerning how prospective targeted drug effects are influenced by whether tumor cells are in epithelial or mesenchymal state [78]. As one currently clinically urgent application example, the epidermal growth factor receptor (EGFR) is commonly overexpressed or mutated in epithelium-derived tumors, and its activation is linked to progression and poor prognosis [79]. Therefore, EGFR has been the target of many small molecule inhibitors and monoclonal antibody antagonists, which have met with limited clinical success [80, 81, 82]. Recent studies exploiting EMT markers and gene expression signatures suggest that cells with low levels of epithelial markers, such as E-cadherin, and high levels of mesenchymal protein expression, such as N-cadherin and vimentin, display resistance against these inhibitors [83, 84]. Therefore, the decreased sensitivity of mesenchymal-like tumors to EGFR antagonists argues for an ability to bypass EGFR dependence to activate the downstream signaling pathways necessary for cell migration and survival [85]. Cell activation through other receptors including the insulin-like growth factor-1 receptor (IGF-1R), fibroblast growth factor receptor (FGFR), and platelet-derived growth factor receptor (PDGFR), has been suggested to play a role in resistance to EGFR 76 antagonists [84, 86]. Thus, improved understanding of how EMT-mediated changes in multiple growth factor signaling networks contribute to cell invasion may necessarily shift investigational focus toward the design of novel therapeutics targeting tangential tyrosine kinase pathways or intracellular signaling nexi for use in treating EGFR inhibition-resistant carcinomas. As a first multipathway network level study of how signaling pathway activities governing cell migration downstream of receptor tyrosine kinase stimulation differ between "before EMT" and "after EMT" conditions, we use here an established human mammary epithelial cell line (hMLE) immortalized and transformed via introduction of a minimal set of oncogenes [87] and focus on EMT induction by Twist1 [88], via its ectopic expression in hMLEs as previously characterized [89]. Twist expression has been demonstrated in multiple studies in vitro, in mouse models, and in human patients, to be associated with breast tumor invasiveness, metastasis, and poor disease prognosis (e.g. [89, 90, 91, 92]), and thus represents a pathophysiologically and clinically important system for analysis. It also may be as simple an induction process as can be examined, because other EMT inducers such as TGF3 and TNFa act via multiple transcription factors including Twist along with others [77], so our initial study here may indicate basic signaling network modulation insights that can be expanded upon in future analogous investigations of the more pleiotropic EMT inducers. In this basic study, we quantitatively characterize the migration characteristics of hMLEs before and after Twist-mediated induction in both monolayer (indicative of epithelial mode) and single cell (indicative of mesenchymal mode) migration assays under stimulation by a panel of growth factors present in carcinoma environments including EGF, HRG, IGF, and HGF [86, 93, 94, 95]. Across this broad landscape of extracellular treatment conditions, we measured phosphorylation states of 14 signaling pathway nodes to ascertain how Twist-mediated changes in numerous of these signals may be associated with consequent changes in the cell motility behaviors. Computational modeling with a partial least-squares regression (PLSR) framework demonstrated that quantitative combinations of multiple signals can ac77 count for the various motility behaviors across all growth factor treatments in both epithelial and mesenchymal migration modes-and, in fact, can successfully predict a priori the motility behavior for epithelial and mesenchymal modes in a new growth factor context, PDGF stimulation. We then constructed a complementary compu- tational model, using a correlative topology framework, to identify influences among the signaling nodes that were modulated by the Twist-mediated EMT induction. 3.2 3.2.1 Results Diverse cell motility behavior and growth factor treatment responses in epithelial versus mesenchymal mode hMLEs ectopically expressing a vector control or Twist1, a transcription factor previously shown to induce EMT, were used as a model of EMT-induced phenotypic switch (called "epithelial" or "pre-Twist", versus "mesenchymal" or "post-Twist" cells hereafter). Cells were cultured in serum-free medium upon seeding to assess growth factorstimulated cell migration. The cells in epithelial and mesenchymal modes maintained their respectively appropriate EMT markers in this medium (Figs. 3-1, 3-2). Although invasive carcinomas and cells of mesenchymal developmental origins may invade as single cells, epithelial cells can also migrate but do so within established monolayers. To consider both types of migration, we seeded cells labeled with whole-cell tracking dye either sparsely to achieve single-cell migration or in a confluent monolayer with unlabeled cells for migration with cell-cell contact (Fig. 3-4A, B). Upon serum-starvation, cells were treated with saturating levels of EGF. As anticipated, sparse post-Twist cells migrated significantly, whereas pre-Twist cells that were maintained as single cells throughout the experiment exhibited little movement (Fig. 3-4A). Pre-Twist cells with intact cell-cell contacts (Fig. 3-4A) or in a confluent monolayer (Fig. 3-4B) displayed significant locomotion, consistent with previous reports of mammary epithelial cells [97, 98]. In contrast, post-Twist cells exhibited a contact-mediated reduction in motility. Moreover, consistent with clinical observa- 78 A Epithelial Mesenchymal E j E-cadherin tw Imm N-cadherin Vimentin -- GAPDH -~aem- - -- - m n- B Epithelial Mesenchymal E E E n-Et EGFR soga GAPDH HER2 GAPDH -m IGF1-R GAPDH -.- -""" wmm-= w -= Met PDGFR GAPDH ---0w --- ---- Figure 3-1: EMT markers and receptor levels for the human mammary epithelial cell model. (A) Western Blot for E-cadherin as an epithelial marker and N-cadherin and vimentin as mesenchymal markers. (B) Western Blot for total levels of EGFR, HER2, IGFIR, Met, and PDGFR/3. Cells were seeded overnight and incubated complete media (Serum) or serum-free media (Serum-free) for 24 hours before cell lysis. GAPDH is used as loading control. 79 E-cadhern Actin Figure 3-2: Mesenchymal cells in monolayer lack E-cadherin junctions. Immunofluorescence images of epithelial (top) or mesenchymal (bottom) hMLER cells stained with an antibody against E-cadherin (left) and phalloidin (middle). Cells seeded on coverslips in serum media for 24 hours. After a wash with PBS, cells were fixed with 4% paraformaldehyde and permeabilized with 0.2% Triton-X. Cells were blocked with 10% BSA and incubated with an antibody against E-cadherin (BD Biosciences, San Jose, CA) in a 1% BSA solution. After three-time wash with PBS, cells were incubated with an AlexaFluor 488 secondary antibody and phalloidin (Invitrogen, Carlsbad, CA). Mounted coverslips were imaged on a Deltavision (Applied Precision, Issaquah, WA). 80 60 - M Epithelial Mesenchymal E a 40- 20SF EGF HRG IGF HGF PDGF SF EGF HRG IGF HGF PDGF SF EGF HRG IGF HGf PDGF SF EGF HRG 1GF HGF PDGF 5010- ~30- 0- Figure 3-3: Individual cell speed distributions of human mammary epithelial cells under various growth factor treatments. Box-and-whisker plots of individual cell speeds of hMLER cells under various growth factor treatments. Grey dots indicate measured average cell speed for individual cells. Edges of the boxes indicate 2 5 th and 7 5 th percentile and the whiskers indicate 1 0 th and 9 0 th percentile. The line in the box indicates the median and the cross the mean of the distribution. Fig. 3-4C, D contain the summary figure depicting mean + S.E. 81 A A Epithelial D Mesenchymal naltha MesnchvmaI Sparse Migration 25- 2- MEpl"hWa . Mesenchymal 15- B E 25MnoaeMgrto 15 10 SF c EGF HRG IGFHGF SF EGF HRG IGF HGF jo5- 1.0 00 10. 0 . o1i 1 1A 1' AG1478 Concentration [p/ml Figure 3-4: EMT and growth factor-dependent human mammary epithelial cell migration is contingent on its context, which is recapitulated by other human breast cancer cell lines. (A,B) DIC and epifluorescence overlay and cell tracks of epithelial (left) and mesenchymal (right) cells in the sparse (A) and monolayer (B) migration assay. Cells were labeled with a whole-cell dye CMFDA and either seeded sparsely (A) or mixed with unlabeled cells and seeded in confluence (B) before a 24-hour serum-starvation and treatment with saturating levels of EGF. After 1 hour of stimulation, migration tracks over 18 hours (red) were generated via semi-automated tracking of centroids (grey circles) of labeled cells. Time-lapse movies are provided under Movie Si (in ref. [70]). (C,D) Cell speeds of epithelial (black) or mesenchymal (red) cells under stimulation of various growth factors quantified from the sparse (C) or monolayer (D) migration tracks. Cell speeds were calculated from cell tracks after 7 to 19 hours of stimulation. (E) Cell speeds of epithelial cells in monolayer migration assay (black) and mesenchymal cells in sparse migration assay (red) in presence of varying levels of an EGF receptor kinase inhibitor AG1478. AG1478 was added simultaneously to EGF. Cell speeds are normalized to their respective no inhibitor control cell speeds. p < 0.0001 via two-way ANOVA between cell lines. All data is shown as mean ± S.E. Box-and-whisker plots of individual cell speeds are shown in Fig. 3-3. N = 269-390 (C,D) and N = 109-175 (E) cells for monolayer migration and N = 16-117 (C,D) and N = 31-66 (E) cells for sparse migration obtained from 3 independent biological replicates. *p < 0.05, **p < 0.01, ***p < 0.0001 compared to serum-free condition (C,D) or sparse condition (E) (see Experimental Procedures in ref. [70] for details on statistical analyses). 82 A Oh 4h 2h E B BT549 25 8 T47D 6 20 15 MDA-MB-453 420 3 MDA-MB-231 15 10. 51 2 0 0 Sparse Mono 5 Sparse Mono 00 Sparse Mono Sparse Mono Figure 3-5: Migratory potentials of different epithelial-like versus mesenchymal-like cell types. (A) EGF-stimulated mesenchymal cells are highly migratory in three-dimensional collagen I matrix. Epithelial (top) and mesenchymal (bottom) cells were seeded in a neutralized 2.0 mg/mL collagen I solution. Upon gelation of collagen I, cells were serum-starved for 24 h and stimulated with saturating levels of EGF. Cells were imaged via phase-contrast microscopy over 6 h. Arrows indicate migratory mesenchymal cells in the three-dimensional collagen I matrix. Dashed lines are provided as a reference. Details of methodology can be found in ref. [96]. (B) Cell speeds of human breast cancer cell lines migrating in complete medium in a sparse or monolayer migration assay. Cell lines are color-coded according to their widely accepted EMT state; red for mesenchymal and black for epithelial. All data are shown as mean ± S.E. 83 tions [85], post-Twist cells displayed resistance to inhibition of invasion via inhibition of EGF signaling (Fig. 3-4C). Similar differences in motility behavior were observed with respect to invasion into a three-dimensional collagen I matrix (Fig. 3-5A); postTwist cells invaded to a significant extent whereas pre-Twist cells did not. We also considered whether this differential behavior might be generalized to other breast tumor cell lines and similarly examined motility behavior of a panel of breast carcinoma cell lines in both confluence and sparse conditions. We found that lines representing the luminal subtype (T47D, MDA-MB-453) showed an epithelial pattern of migration, whereas lines representing the basal subtype (BT549, MDAMB-231) diverged in their pattern of migration (Fig. 3-5B). However, the respective levels of Twist expression can explain the latter divergence: the MDA-MB-231 cells, which exhibited epithelial-like migration pattern similar to the T47D and MDA-MB453 cells, likewise express Twist at only low levels whereas the BT549 cells which exhibited mesenchymal-like migration pattern express Twist at high level [99, 100]. Taken all together, these findings indicate that motility behavior of the mammary epithelial cells is substantively altered by Twist expression and that insights gained in our model system may be relevant in at least some clinically relevant contexts. To determine whether migration in response to carcinoma-related growth factors was altered upon EMT, we measured steady-state migration of epithelial and mesenchymal cells in response to EGF, HRG-#1, IGF-1, and HGF to activate the ErbB family, IGF1-R, and Met, respectively. Epithelial cells migrated very little as singular cells for all stimuli, whereas single mesenchymal cells migrated robustly in response to select growth factors, notably EGF (Fig. 3-4D, Fig. 3-3). Conversely, epithelial cells moved rapidly within monolayers, at or above the speeds attained by singular mesenchymal cells, even in the absence of exogenous stimuli, with only modest enhancement by some of the growth factors (Fig. 3-4E, Fig. 3-3). Within monolayers, mesenchymal cells exhibited very low cell speeds that were enhanced only slightly by growth factor treatments. These results suggest that the degree of motility in both migratory modes is highly growth factor- and EMT-dependent. Each type of cell responded differentially to growth factor treatments based on its phenotype, indicating 84 distinct processing of growth factor-elicited signals in pre-Twist versus post-Twist cells. 3.2.2 Quantitative analysis of growth factor-elicited multiplepathway signaling network dynamics We hypothesized that the changes in Twist-related gene expression could induce alteration of multiple pathways in the signaling network downstream of growth factor cues, leading to the observed EMT-dependent migratory responses. An exciting previous study has reported measurement of more than 1,000 biomolecular species at the mRNA, protein, phosphopeptide, or phosphoprotein level in tumors with epithelial and mesenchymal phenotypes to generate annotated molecular network graphs [74], but our focus here is a quantitative analysis of changes in multipathway signal- ing network activities in comparative manner from before to after EMT induction in a particular cell line with the goal of constructing computational model-based prediction of signaling pathway relationships to motility behavior. To achieve this, we assessed the early activation kinetics of 14 proteins downstream of receptor tyrosine kinase activation (Fig. 3-6A) in confluent pre-Twist cells and sparse post-Twist cells (at 0, 5, 10, 30, or 60 min after growth factor activation). Measurements of 14 phosphosites over five time points, five growth factor treatments, two cell lines, and two to three technical replicates resulted in greater than 1,800 data points (Fig. 3-7). Interestingly, total receptor expression levels were not readily correlated with their activity in the context of EMT. For example, although EGFR activation was much greater in epithelial cells (Figs. 3-6B, 3-7) and the total EGF receptor levels were comparable or only slightly higher in epithelial cells (Fig. 3-1B), mesenchymal cells were strikingly more responsive to EGF treatment. This is not necessarily surprising, because it is appreciated that receptor expression changes (whether at mRNA or protein level) alone are typically not predictive of associated activity or inhibitor effectiveness; a prominent instance of this is the lack of correlation of EGFR expression in patient tumors with anti-EGFR kinase inhibitor efficacy (e.g., [101]). Thus, assays 85 AHGF I EGF PDGF-BB HRG IGF-1 l. I I I Ras -\-She Mho Sre /O MKK4 Y P13K - - - IRS-10 L GSKU4$ -- -Akt suasP nne Ce-CelAheio Phos;*4or n ansessed by smuantitativeWstrm o Bead-based ELISA B3 Figure 3-6: Basal phosphorylation changes in key migration signaling nodes in epithelial versus mesenchymal state. (A) Simplified schematic of receptor tyrosine kinase-activated signaling network involved in cell migration and the candidates for which phosphorylation was assessed via quantitation. Arrows indicate direct binding of the growth factors used in this study to their respective receptors. Solid lines indicate direct interactions between proteins whereas dashed lines indicate demonstrated indirect interactions. Some candidates have been grouped based on their demonstrated involvement in various biophysical processes of cell migration. This figure is intended as an illustration of the complexity of the signaling network and does not fully account for all components or interactions assessed. Phosphorylation of candidates with green or red 'P' symbols has been measured via a quantitative multiplexed bead-based ELISA assay or quantitative Western Blot assay, respectively. (B) Ratio of basal epithelial and mesenchymal phosphorylation, demonstrating changes in signaling before growth factor stimulation. Data is shown as mean ± S.E., based on error of measurements in each context. N = 2-3 biological replicates per context. 86 2s0 200150 1001 0 2.C 20 1.5 is 1.C 10' 0.5 5 U AU Erk1/2 wnpv^ 0 JNK (pT183/pYl8s) Oi ~L=!Z~ym5 40 0 20 HSP27 10 'a 60 (ps78) 4 8' 6' 4. 1I 21 2 0 20 40 60 0' 6 40 20 60 0 40 20 60 PKC6 (nTsos) 2.5 6 - -al - W-0- 2.0 1.5 2 0.5 0 j1 0 EGF HRG IGF HGF Epithellal Monolayer Mesenmchymal PAP 0.5 -1 0 20 40o 60 p41 p40 p40 4* *.*.* Sparse 10 0 2 40 60 Figure 3-7: Early activation profiles of key regulators of cell migration exhibit altered signaling pathway activities upon Twist-induced EMT. Sixty-minute time courses of phosphorylation after stimulation of hMECs with various growth factors. Confluent epithelial cells (solid, circle) or sparsely seeded mesenchymal cells (dashed, diamond) were lysed at various times after stimulation with EGF (red), HRG (black), IGF-1 (blue), and HGF (green), and subjected to a high-throughput multiplexable bead-based ELISA or quantitative Western Blot using antibodies against various phosphorylation sites. Assay wells were loaded with equal mass of protein as assessed by a quantitative bichronic acid assay. Mean fluorescence intensities (MFI) for Western Blots were obtained via den- sitometry. MFI values were normalized to the 0 min epithelial value within each phosphosite. N 2-3 biological replicates. Data is shown as mean ± S.E. 87 = that focus on receptor expression levels may not by themselves effectively identify key targets for therapeutic intervention. The resulting activation profiles showed diverse kinetics across individual signals that were growth factor- and EMT state-dependent. Basal phosphorylation levels were dependent on the EMT state, with EGFR, Met, Erk, Src, -catenin, HSP27, and IRS-1 displaying significantly higher initial phosphorylation in epithelial cells, but Akt, GSK3a/#, PKC6, PLC-y, and JNK displaying higher phosphorylation levels in mesenchymal cells (Fig. 3-6B). Dynamic changes in JNK, IRS-1, Src, HSP27, GSK3a/3, and f-catenin phosphorylation after growth factor treatment were cellstate specific and correlated with their initial phosphorylation levels (Fig. 3-7). However, activation of PKC6 and PLCy along with EGFR canonical pathways Erk and Akt were relatively growth factor-dependent and insensitive to EMT state in most cases (Fig. 3-7). Visual inspection of signal differences across the diverse treatments and contexts offered little insight into which signals contribute most significantly to the profoundly different EMT-dependent migratory responses. The consequent implication is that cells must quantitatively integrate the activities of multiple signaling pathways to generate robust decisions concerning context- and treatment-dependent migration responses. 3.2.3 Node-to-node correlation topology model reveals quantitatively different signaling relationships between epithelial and mesenchymal states Based on the inability of receptor expression to explain changes in growth factor responsiveness, striking changes in observed signaling, and the retained ability of cells to migrate in all contexts, we hypothesized that differences in downstream signaling might arise from quantitatively different signal-signal relationships downstream of receptor tyrosine kinase (RTK) activation in the epithelial vs. mesenchymal state. This is in addition to likely signaling-independent changes. In order to investigate Twist-induced differences in node communication downstream of receptor signaling, 88 Mesenchymal Epithefal Figure 3-8: Correlative topological modeling, comparing epithelial (left) and mesenchymal (right) situations, suggests quantitatively dominant nodes may arise from quantitatively different node-to-node influences. Edges between phosphorylation sites indicate statistically significant positive (black) or negative (red) Pearson correlation (Storey multiple hypothesis correction, ~1 false positive edge). Edge end annotation indicates literature evidence for detected correlation (listed in Table 3.1), including direct phosphosite-specific or pathway-level evidence (arrowheads), protein- level evidence (diamond ends) and complex-level evidence (dot ends). Nodes with red font indicate phosphosites that signal for inhibition or degradation of the protein. Dashed edges have the highest first-order partial correlation p-value within each three-node triplet. correlative topology modeling was performed for the epithelial and mesenchymal contexts as described in section 3.4.1. Separate network topology inference models were constructed for each EMT cell state. Results using the Storey method are shown in Fig. 3-8, whereas the Bonferroni and Benjamini methods' results are shown in Fig. 3-9. In the context of network inference, each significant correlation value represents one undirected edge in the inferred network. In the epithelial state, using the Storey method with a false discovery rate (FDR) of 0.08 (p < 0.046) provides for 12 significant correlation values, resulting in an estimated 0.08 x 12 edges ~ 1 false positive edge. In the mesenchymal state, using the Storey method with a FDR of 0.11 (p < 0.028) provides for 10 significant correlation values, resulting in an estimated 0.11 x 10 edges ~ 1 false positive edge. Not surprisingly, the greatest number of node-to-node influence arcs were found in the 89 post-Twist pre-Twist Src | (Tyr4l16) MPLCg (r771) FAK (Tyr397) (Tyr771) JNK hrls3,Tyriss PKCD .... (rSTS) E Sic | (Tyr416)| I FAK (Tr973 PKCD C (ThrOs) jNK (ThriS3,TyrIS) MI Erkl/2 (Thr202,Tyr204/ Thr28STyr287) (Ser473) (S.7)Thr285,Tyr287) SK3a/b (Sr21/Ser9) | Bet-catenin (Ser3Ser37,Thr41) fGSK3a/b (Ser21/Ser9) Oet-caenin |(r33,Ser37Thr4l) I IRS-1 HSP27 (Ser78) HSP27 I(r7) (Se636,Sr639) Erk/2 (Thr202,Tr204/ IRS-1 (e36Sr639) gPLCg PL~g (Tyr77r77l 2 Beta-catenin (Ser33,Ser37,Thr41) MCDJNK (ThrSOS) JMK |(r183;yriS5) -PKcD S(hrSs) Aid Erkl/2 (Thr2(2S4/ Thr235.Tyr2&7 (Ser473) HSP27 (Ser7) (Ser33,Ser37,Thr4 GSK3alb er21/Ser9) HSP27 ( IRS-1 I(Ser636,Ser639) 22 Th~Sry27 Ba-catenin GSK3a/b (Ser21/Ser9) IEkI2I A *er473) (Tri3,Tyr18S IRS-1 (Ser636,Ser639) Figure 3-9: Correlative topological modeling suggests adjustment of node-to-node influence. This figure is intended as an alternative to Fig. 3-8 using more rigorous multiple hypothesis testing. Using the Benjamini method for multiple hypothesis correction, in the epithelial state a false discovery rate of 0.15 (p < 0.02) was used, giving an estimated 0.15 x 8 edges = 1.2 false positive edges. In the mesenchymal state a false discovery rate of 0.10 (p < 0.015) was used, giving an estimated 0.10 x 9 edges = 0.9 false positive edges. With the exception of a now missing Akt-GSK3 edge, this mesenchymal state result matches the mesenchymal result using the Storey method shown in Fig. 3-8. A Bonferroni-corrected p < 0.05 (corresponding to p < 0.05/55, or p < 0.001, given the 55 correlation coefficients being considered in one cell states network) was used to generate the epithelial and mesenchymal states Bonferroni networks. 90 Table 3.1: Edge-specific literature evidence for epithelial and mesenchymal state network models in Figs. 3-8 and 3-9. Evidence is listed as site-specific (for evidence of the upstream node phosphorylating the downstream node at the measured phosphorylation site), proteinspecific (for evidence of the upstream node phosphorylating the same type of amino acid, either pY or pS/pT, but at a different location on the protein), complex-level (for evidence of the two proteins represented by two correlated nodes being found in the same protein complex), or pathway-level (only true for PLCy -> PKC6, via diacylglycerol). Interaction Type Epithelial State PKC6 - Akt(S473) Site-specific Protein-level Src-> #-catenin (pY) Protein-level JNK->IRS-1 (pS) Complex-level FAK-PLC-y Pathway-level PLC-y -> PKC3 Mesenchymal State Site-specific PKC6 -* GSK3a/ (S21/S9) PKCo -> -catenin (S33/S37/S45) Site-specific Protein-level Src-> PLCy (pY) Protein-level PKC6 -> IRS-I (pS) Complex-level FAK-Erk Epithelial and Mesenchymal States Site-specific Akt-> GSK3a/0 (S21/S9) Erk2-> IRS-i (S636/S639) Site-specific Akt-HSP27 Complex-level Edge 91 Reference [102] [103] [104] [105] [106] [107] [108] [109] [110] [111] [112] [113] [114] Storey models, and the smallest number in the Bonferroni models, given the more conservative nature of the latter algorithm for assigning significance. Accordingly, all arcs found in the Bonferroni models were found in the corresponding Benjamini models and all arcs found in the Benjamini models were found in the corresponding Storey models. First-order partial correlation determines if the correlation between two nodes may be explained because of their mutual correlation with a third node [25]. Firstorder Pearson partial correlations were calculated for all three-node triplets present in the networks. A three-node triplet occurs when three nodes, A, B, and C form a complete subnetwork, whereby significant pairwise correlation exists between nodes A and B, A and C, and B and C. Using the Storey method's resultant networks, in the epithelial network a three-node triplet exists between JNK, Erkl/2, and IRS-1. In the mesenchymal network, two three-node triplets exist: Akt, GSK3a/#, and HSP27; and PKC6, GSK3a/, and IRS-1. For each triplet, the partial correlation edge with the highest p-value is shown as a dashed edge in Fig. 3-8. These results suggest that the dashed edge in each triplet may exist because of mutual correlation with the third node. Striking differences in the set of significant edges suggests network modulation occurs upon Twist-induced EMT, wherein information is processed via changes in the influence signaling pathway nodes have upon one another. Key similarities and differences will be noted, in context of available literature information, in the Discussion section. 3.2.4 PLSR model-reduction analysis reveals quantitatively different pathway emphases between epithelial and mesenchymal modes We next sought to identify which measured phosphosite signals were most predictive of cell migration speed in the epithelial versus mesenchymal state. To relate signals to cell speed, because only average cell speed was experimentally determined for 92 Epithelial monolayer state 1 AKT ERK1/2 GSK3A/B 0.8 SRC JNK HSP27 0.6 IRS1 0.4 PLCG PKCD FAK BCATENIN 0.2 0 SF EGF HRG IGF HGF PDGF Mesenchymal sparse state 1 AKT ERK1/2 GSK3A/B SRC JNK HSP27 0.8 0.6 IRS1 0.4 PLCG PKCD FAK BCATENIN e0.2 SF EGF 71 HRG IGF HGF PDGF 0 Figure 3-10: Heatmaps indicate the relative signal levels, quantified using the area under the curve (AUC) of the signaling timecourse trajectory, of the 11 phosphorylation sites across the 6 growth factor contexts for the two cell states (epithelial monolayer and mesenchymal sparse). For display purposes only, the signal AUC values here were normalized relative to the maximum value of each signal. SF represents the serum-free condition; EGF, HRG, IGF, HGF, and PDGF represent RTK ligands. 93 each growth factor condition (serum-free, EGF, HRG, IGF, HGF, PDGF), whereas signaling data was measured at five time points (0, 5, 10, 30, and 60 min) in each growth factor condition, it was necessary to summarize the signaling data in some manner for each growth factor condition. To do this, we calculated the area under the curve (AUC) of each signal's 0 60 min time course. In this manner, we could now calculate "summarized signal" -phenotype relationships, given the 6 growth factor conditions, a metric for the "quantity" of signal (AUC value) for each of the 11 signals in each growth factor condition, and the average cell speed values in each growth factor condition (Fig. 3-10). Partial least squares regression (PLSR) offers a method for taking a collection of signals and reducing the signals to create a smaller number of so-called latent variables, which are also orthogonal (independent, uncorrelated) to one another, that can be used to predict the value(s) of an output variable(s) [115]. In our case we sought to use PLSR to relate the measured signals' AUC values to the measured average cell speed values in each growth factor condition. While it is possible to use all 11 measured signals to predict cell speed, and then use the variable importance in the projection (VIP) score [116] to gain some insight into how important each signal is to the phenotype prediction, because there were only 11 total signals in the data set we sought to use exhaustive feature selection methods to test signals' importance in predicting cell speed. To do this, PLSR models of varying size were constructed (Figs. 3-11, 3-12, and 3-13), using every combinatorial 3-, 4-, and 5-site subset of the 11 signaling measurements. Models in which the test and training error were both reduced relative to the full 11-site model were selected as high-scoring models. We focus on the results for the 4-site reduced models in Fig. 3-12, as they were the simplest models (simpler than 5-site models) that still had enough N-site models for enrichment analysis (the 3-site models did not). The statistical significance of observed phosphosite frequencies in the reduced PLSR models using 3, 4 and 5 phosphosites and different multiple hypothesis correction methods is summarized in Fig. 3-14. Given m high-scoring reduced models and the expected frequency of each site in 94 8 7 -E U; 6 -c 5 4 3 EU 2 1 1.5 1 0.5 i Mean Traning Error [pm/hi Reduced / Funl Model Er 150 E E 0 .C 0 @1 00 00000 10 - . 0 0 0 0 00 00 0 0 CP 0 0 00 0 00&0 0 00 OOO 0 0L 0 0 0 09 00 0 0 O 0 0 0 000000a 2.5 1 1.5 2 0.5 Mean Traning Error [prm/h] Reduced / Fu ModemEr Figure 3-11: 3-site reduced PLSR predictions for epithelial monolayer vs. mesenchymal sparse cell migration speed. (left) Test and training error for all possible four-variable, twocomponent combinations of signals in epithelial (top) and mesenchymal (bottom) situations. Red and blue spots respectively indicate the full model and reduced models with reduced error. (right) Variables selected for inclusion (blue) in models with reduced error, and test error relative to that of the full model. The small subplot with the 11 site names represents the fraction of high-scoring reduced PLSR models containing a given phosphosite. 95 6 I= 5 rz F4 00 000 0 0 0 0 8o4o 9 00~0 -C LU 0 0 0 0 0 3 00000 2 0 S.. 0 0.5 1 1.5 Mean Training Error [sm/hr] RedCWie / AdM MOM~ EMx E1 V 0.5 1 1.5 2 Mean Training Error [pmn/hr] 1 Reckicd I RdA Liam AMx Figure 3-12: 4-site reduced PLSR predictions for epithelial monolayer vs. mesenchymal sparse cell migration speed. (left) Test and training error for all possible four-variable, twocomponent combinations of signals in epithelial (top) and mesenchymal (bottom) situations. Red and blue spots respectively indicate the full model and reduced models with reduced error. (right) Variables selected for inclusion (blue) in models with reduced error, and test error relative to that of the full model. The small subplot with the 11 site names represents the fraction of high-scoring reduced PLSR models containing a given phosphosite. 96 * * 5 4 .. U 0 3 uai wU 2 1 910 0 1 0.5 1.5 Reduced/ Ful ModelEnr Mean Traning Error [pm/h] 0 16 14 0 00 C e 12 0 0 0 0 0 0 0 0 E . 0 0 0 0 OO 0 0 0 0( 0 0 00 0 00 0 Z- 10 U 8 +6. 4 2 00 1.5 2 1 0.5 Mean Traning Error [pm/h] 2.5 I 05 0 0.4 Oh6 0.8 Reduced/ FUN Model Eo 1 Figure 3-13: 5-site reduced PLSR predictions for epithelial monolayer vs. mesenchymal sparse cell migration speed. (left) Test and training error for all possible four-variable, twocomponent combinations of signals in epithelial (top) and mesenchymal (bottom) situations. Red and blue spots respectively indicate the full model and reduced models with reduced error. (right) Variables selected for inclusion (blue) in models with reduced error, and test error relative to that of the full model. The small subplot with the 11 site names represents the fraction of high-scoring reduced PLSR models containing a given phosphosite. 97 No. of No, of ses Beajamful 3-site: FDR = 0.12 (p <0.0473) 4te: FDR =0.02(p <0.003) 5-uite: FDR = 0.06 (p <0.0331) B3rroi (P <0.0011) No. of reduced models scoring better than full 1 1-site model EnkhdEMhCk~d 3-ste 3 of 165 - - 4-site 5-site 18 of 330 44 of 462 Akt Src Akt, Sr, PLCy - 3-site 4-site 5-site 10 of 165 18 of 330 36 of 462 JNK JNK JNK, f-catenin _ - Akt, Src Akt, Sr, PLCy GSK3a/p GSK3a/p, IRS-1 - JNK JNK JNK, p-catenin, Erk, HSP27 HSP27 HSP27 Src Figure 3-14: Site enrichment in reduced PLSR models. The enrichment or depletion of individual phosphosites within the high-scoring subset of reduced PLSR models (i.e., models with mean test error and mean training error lower than the full 11-site PLSR model) was quantified using a two-tailed hypergeometric test. Enrichment was assessed in 3-, 4-, and 5-site reduced PLSR models for both the epithelial and mesenchymal states. Two multiple hypothesis correction methods were applied: the Benjamini false discovery rate, and the more stringent Bonferroni method. Akt and JNK were consistently enriched in the epithelial and mesenchymal models, respectively. those m models (each of the 11 sites appears in 120 of the 330, i.e., 11 choose 4, possible 4-site models, yielding an expected frequency of 0.36), we could estimate the likelihood of the observed frequency of each phosphosite in the set of high-scoring reduced models. The Benjamini [117] and Bonferroni [118] methods for multiple hypothesis correction were applied to the list of 22 p-values (11 phosphosites x 2 states = 22 p-values). Using a FDR of 0.02 (p < 0.003), in the epithelial state Akt and Src are enriched in the 4-site models, whereas GSK3a/3 is depleted; and in the mesenchymal state JNK is enriched while HSP27 is depleted. This provides for an estimated 5 x 0.02 = 0.1 false positives. If we instead use a less conservative FDR of 0.06 (p < 0.013), in the epithelial state Akt and Src are enriched, whereas Erk, IRS1, and GSKa/3 are depleted; and in the mesenchymal state JNK is enriched while Src and HSP27 are depleted. This less conservative FDR provides for an estimated 8 x 0.06 ~ 0.5 false positives. Tests for the likelihood of observed phosphosite frequencies in the reduced 3-site and 5-site PLSR models were also performed. For the 3-site models using the twotail Bonferroni-corrected p < 0.05, JNK was enriched in the mesenchymal state, 98 whereas no sites appeared significantly differently than expected by chance in the epithelial state. Using the Benjamini method with a FDR of 0.12 (p < 0.0473), JNK was enriched and HSP27 was depleted in the mesenchymal state. No sites appeared significantly differently than expected by chance in the epithelial state. This provides for an estimated 2 x 0.12 - 0.24 false positives. For the 5-site models using the two-tail Bonferroni-corrected p < 0.05, in the mesenchymal state JNK and #-catenin were enriched, while no sites appeared less often than expected by chance. In the epithelial state, Akt, Src, and PLCy were enriched, while no sites appeared less often than expected by chance. Using the Benjamini method with a FDR of 0.06 (p < 0.0331), in the epithelial state Akt, Src, and PLC 1 were enriched, while IRS-1 and GSK3a/0 were depleted. In the mesenchymal state JNK, 0-catenin, Erk, and HSP27 were enriched, while Src was depleted. This provides for an estimated 0.06 x 10 = 0.6 false positives. It should be noted that while depletion of a phosphosite indicates decreased predictive ability of a site, it does not necessarily indicate a poor correlation of that site with cell speed. Rather, depletion may indicate redundancy between the predictive ability of phosphosites because two sites are themselves correlated. This can be explored by comparing the enrichment results in Fig. 3-14 to the correlation p-values shown in Fig. 3-15, which summarizes the most well correlated (p < 0.1) signal pairs among the epithelial and mesenchymal signals' AUC values. These signal-signal correlations are different than those previously summarized in Figs. 3-8 and 3-9, which calculated the correlations among signals' fold-change values from the four individual nonzero time points across the five ligand conditions, i.e., 20 data points. In Fig. 3-15 the signal-signal correlations are calculated using the area under the curve (AUC) of each signal's entire time course, for each of the six treatment conditions (five ligands plus serum-free), as shown in Fig. 3-10. Looking at Fig. 3-14, in the epithelial state none of the signal pairs between the enriched and depleted phosphosites are correlated (p < 0.1), indicating that the depleted sites are not redundant with the enriched signals. Further, none of the signals that are enriched are correlated (p < 0.1) with one another, indicating 99 Pearson corr. among signals' AUC values (Epithelial state) -1 -1.5-- -2.5PLCG-FAK -3 .ERK12-IRS1 2 4 8 6 10 12 p-value ranking Pearson corr. among signals' AUC values (Mesenchymal state) -1 -2- -2.5- JNK-HSP27 -3 r 2 4 I I 6 8 10 12 14 p-value ranking Figure 3-15: The p-values of the Pearson correlations among signals' AUC values in the epithelial and mesenchymal states. The logio of the p-values for all signal pairs with p < 0.1 are shown. These represent correlations among the signals' AUC values shown in Fig. 3-10. Depletion of signals in Fig. 3-14 could occur because the enriched and depleted signals are correlated. These results show this to be the case for the AUC values of JNK and HSP27, the most well correlated signal pair in the mesenchymal state data. 100 that the enriched signals are not redundant. In the mesenchymal state, JNK is enriched whereas HSP27 is depleted, and these two signals are also the most well correlated in the mesenchymal data. This is consistent with HSP27 being depleted in the high-scoring reduced PLSR models because it is redundant with JNK. Src, which is depleted in the 5-site mesenchymal reduced PLSR models, is poorly correlated (p - 0.1) with the enriched signals JNK and HSP27. Among the four enriched signals in the 5-site mesenchymal reduced PLSR models, only one of the six signal pairs (JNK and HSP27) is correlated (p < 0.1). Thus, depletion of a signal in the high-scoring reduced PLSR models may be due to redundancy with the enriched signal(s), but it is not always the case; and the enriched signals here are not redundant with one another, except for JNK and HSP27 in the mesenchymal 5-site reduced PLSR model. These results are consistent with notions of "minimum redundancy, maximum relevance" in identifying useful features for prediction [119], whereby features that are well correlated with the output but not well correlated with one another (i.e., not redundant) are useful for prediction. It should also be noted that the leave-one-out cross-validation performed here can be sensitive to the case in which the data from two or more conditions are correlated and numerically similar. For example, if Condition A in the training set is sufficiently similar to Condition B in the test set, then the test error would be lower than it would be if Conditions A and B were dissimilar. To account for this effect, one could implement some type of stratified cross-validation [120], in which one tries to explicitly account for the potential similarities across conditions in the training and test sets. In our case here, we can inspect the similarity across signals' AUC values in the different growth factor conditions (Fig. 3-10). For example, the serum-free and PDGF conditions in the epithelial monolayer state, and the serum-free and the HRG conditions in the mesenchymal sparse state have some similarity in the signals' values. We have not accounted for these similarities in the cross-validation procedure, but one could implement a method for doing so. These two pairs of conditions (serum-free and PDGF, and serum-free and HRG) have the lowest average cell speeds in the ep101 ithelial and mesenchymal states, suggesting that the similar signal values correspond to similar signaling network states that produce similar and low cell migration speeds. This suggests we have measured enough signals that govern cell migration speed. In contrast, if two growth factor conditions had similar values across the 11 measured signals but produced very different cell speeds, it would suggest that we were not measuring the signals most important for governing cell migration speed. 3.2.5 Linear regression predicts cell speed more accurately than PLSR models Given that many of the reduced PLSR models had better prediction accuracy (both in terms of mean training error and mean test error) than the full 11-site PLSR models, including using as few as three phosphosites in the reduced model, we next sought to determine how accurately even simpler models could predict cell speed. To test this approach, we used linear regression. Just as was done for the reduced PLSR models, the area under the curve (AUC) values of the signals' time courses were used as input (as a metric of signal quantity). Importantly, this AUC approach was required to model the phenotypic data because we only had one phenotypic data point (i.e., one average cell speed value) for each condition. Thus we had to summarize all the time points' data into one "condition-specific" signal value. If, instead, we had phenotypic data available at each time point, then we could incorporate the signaling data from each corresponding time point into the prediction task. In this case, there were 6 conditions (5 growth factor treatments and one serumfree condition) and 11 phosphosite signals. The system would be underdetermined (more unknowns than equations) if we were to try and assess the full multiple linear regression solution by including all 11 sites in the model. To make the system solvable, we must select 6 or fewer phosphosites (5 or fewer if a zero-order constant term is included in the linear regression model) to include in the multiple linear regression model. To provide the simplest analysis possible, we only considered linear regression models using one or two phosphosites as predictors. In other words, the models took 102 the form, Cell speed = mix,i + /3 Cell speedi mX 1 ,i + m 2x 2 ,i + 3 where xj represents the time course AUC value for phosphosite j under condition i, mj represents the linear regression coefficient associated with phosphosite j, and # represents the model's constant term (equivalent to the y-intercept in a model with one input variable). Because the number of phosphosites is small (11 sites), all linear regression models containing one or two phosphosites as predictors could easily be exhaustively considered. In other words, all 11 one-site models were considered, and all "11 choose 2"=55 2-site models were considered. To include more sites in the model, one could also exhaustively search all N-site models, or use a non-exhaustive feature selection procedure (e.g., stepwise regression [121], Lasso [122], elastic net [123]). Models were scored based on their mean training error and mean test error from leave-one-out cross-validation. In this case, because there were 6 conditions, this amounted to 6fold cross-validation. Models were built using the regress function in MATLAB. Training and test errors were quantified using the absolute difference between the predicted and observed cell speeds, just as was done for the reduced PLSR models' errors. Mean training and test errors for all one- and two-site linear regression models, for the epithelial monolayer and mesenchymal sparse migration modes separately, are shown in Fig. 3-16. These results show that even one-site linear regression models offer training and test error levels that are comparable to or even lower than the full 11-site PLSR models' errors. Expanding to two-site models generates more models that provide comparable or lower error rates compared to the 11-site PLSR models. The best one-site predictors (i.e., lowest training and test errors) for epithelial and mesenchymal cell states are Akt and JNK, respectively. If we zoom in on the axes (Fig. 3-17), we can see that there are not stand-out winners for best two-site 103 Epithelial I Mesenchymal 14 14 12 12 10 10 8 I=8 6 I- 6 40 4 M 20 2 1 2 3 "0 4 Mean Training Error bm/hr] 1 3 4 Mesenchymal Epithelial 14 14 12 12 10 10 - 1~ - AL 0 m~3 8 8 U 0 6 i0 4 I- 2 Mean Training Error jab/hr] U 0 6 i0 4 I- 2 0Do 1 2 3 2 0 4 Mean Training Error jnihr] 1 2 3 4 Mean Training Error Jrn/hr] Figure 3-16: Prediction accuracy using 1- and 2-site linear regression models. (top row) One-site model errors of epithelial (blue) and mesenchymal (red) cell speeds. (bottom row) Two-site model errors of epithelial (blue) and mesenchymal (red) cell speeds. All subplots' axes are drawn to the same scale. 104 Mesenchymal Epithelial 4 r ii 4 OFAK 0 13.5 8 w W3 OBCATENIN 6 MCG EK12 PKCD 2.5 SIRSI I 2 1.5. 2.5 2 1.5 Mean Training Error km/hr] *HSP27 2 *JNK '0 3 A EsC-FMI WFAK SP NK-BCATENIN 2.5 2 C I 2 AKT-ERK1/23 gFAKS8 EAKT-JNK AKT-IRS1g GSK3A-SRCMAKT-GSI N *JNK-FAK GSKUBAKT-HSP27 PKCD*AKT-PK( SRC-BCATENIN N gmin -HSP27 ATENIN w 3ATENIN D 0.8 1 Mean Training Error 1.2 bm/hr] *PKCD-FAKOAKT-HSP27 3.5 is 3AJB 1KCD 1-BCATENIN AKT-SRC *0 GSK3AB-BCATE' IN *MTAPCG 1.5 mA P MAKT-BCATENIN N AKT-FAK 0.6 6 Mesenchymal Epithelial 13 4 2 Mean Training Error km/hr] I1- I 3 IRS1 O&Mt , TENIN *ERK1 2-HSP27 GSK3A/B-HSP27 HSPE -PLCG *JN-SRC OJNK- D JNK-PKCD 2.5 2 1.5 1 O5% JNK-FAK JNK-AKT A JNK-ERKI/2 W"IN WJNK..RS1 JNKO-PLG 0.5 1 Mean Training Error 1.5 n/hr] Figure 3-17: The same data as shown in Fig. 3-16, but with zoomed in axes and labels for the signals used in the 1- and 2-site models. (top row) One-site model errors of epithelial (blue) and mesenchymal (red) cell speeds. (bottom row) Two-site model errors of epithelial (blue) and mesenchymal (red) cell speeds. Data points are labeled with the phosphosite(s) used for prediction. Each subplot has its own axes scale. predictors. For the epithelial state, Akt-FAK offers the lowest test error, but AktPLC-y offers a slightly lower training error. Akt-#-catenin has a comparable test error to Akt-PLCy, but a higher training error. For the mesenchymal state, most two-site predictors that include JNK perform comparably well, except when JNK is paired with FAK, PKC6, or Src. Thus, the best performing two-site predictors contain the best one-site predictors (Akt for epithelial, JNK for mesenchymal). The best two-site predictors for the epithelial and mesenchymal states provide about 0.5 pm/hr mean training error, whereas the full 11-site PLSR model provides 105 Mesenchymal Epithelial 120 12 100 80 w w 60 C 0i :1 40 10 E 0. 2 201 0 0 W 10 20 0 30 Mean Training Error [%] Epithelial 10 20 30 Mean Training Error [%] Mesenchymal 120 20 100, 00 I! I 60 60 I- C 0 0 40 20- 20f* 40 20 10 20 "0 30 10 20 30 Mean Training Error [%] Mean Training Error [%] Figure 3-18: The same linear regression models as shown in Fig. 3-16, but now quantified using percent error instead of absolute error. (top row) One-site model errors of epithelial (blue) and mesenchymal (red) cell speeds. (bottom row) Two-site model errors of epithelial (blue) and mesenchymal (red) cell speeds. This shows that the average percent training and average percent test errors for the best 1- and 2-site models were about 5% and 10%, respectively, for both the epithelial and mesenchymal states. All subplots' axes are drawn to the same scale. 106 a mean training error of about 0.3 and 0.6 pm/hr for the epithelial and mesenchymal states, respectively. Thus the two-site linear regression and 11-site PLSR models provide comparable training errors. However, the best two-site predictors for the epithelial and mesenchymal states provide about 1-1.5 pm/hr mean test error, whereas the full 11-site PLSR model provides a mean test error of about 2 and 7 pm/hr for the epithelial and mesenchymal states, respectively. Thus the two-site linear regression models have much lower mean test errors than the 11-site PLSR models. These absolute error values for the linear regression models correspond to about 5% average training error and about 10% average test error for the best-scoring models (Fig. 3-18). The training errors for the high-scoring 4-site reduced PLSR models are lower than or comparable to the training errors associated with the best two-site linear regression models. For the epithelial state, only the best 4-site PLSR models offer test errors comparable to the best two-site linear regression models. For the mesenchymal state, the best two-site linear regression models offer lower test error than the best 4-site reduced PLSR models. Thus, the two-site linear regression models generally provide lower test error than either the full 11-site or reduced 4-site PLSR models, for both epithelial and mesenchymal cell states. The most accurate one-site linear regression predictors for the epithelial and mesenchymal states, Akt and JNK, respectively, are also the two sites that were most frequently observed in the high-scoring 3-, 4-, and 5-site reduced PLSR models (Figs. 311, 3-12, and 3-13). Indeed, JNK was significantly (p < 0.0011) enriched in the high-scoring 3-, 4-, and 5-site reduced PLSR models for the mesenchymal state, while Akt was significantly (p < 0.0011) enriched in the high-scoring 4- and 5-site reduced PLSR models for the epithelial state (Fig. 3-14). To gain a better understanding of why certain phosphosites were being included or excluded from the high-scoring reduced PLSR and one- and two-site linear regression models, we plotted the signals' AUC values versus cell speed in a univariate fashion (Fig. 3-19). This is the simplest visual representation for the task of predicting cell phenotype (here cell migration speed) from signaling data (here phosphosite time 107 course AUC values). Inspecting these plots, it becomes clear why some phosphosites were useful one-site predictors and some were not. For example, in the epithelial state, the Akt signal is reasonably linear with cell speed, whereas in the mesenchymal state, the JNK signal is very linear with cell speed. This plot also allows one to understand why some phosphosites performed poorly. For example, in the epithelial state, HSP27 signal is correlated well with cell speed with the exception of the IGF growth factor condition. Similarly, in the mesenchymal state, PKC6 is correlated well with cell speed with the exception of the HGF growth factor condition. The analyses discussed so far have developed separate models for the epithelial and mesenchymal cell states. This is based on the notion that the signals governing cell migration may be different between the epithelial and mesenchymal states, particularly since the epithelial signaling data were measured in the monolayer state, whereas the mesenchymal signaling data were measured in the sparse state. This notion is further supported by the fact that different signals do correlate well with cell speed in the epithelial and mesenchymal states. However, it is also possible to combine the signaling and migration data from the epithelial and mesenchymal states, and in doing so create a sort of "pan-EMT" model. This approach makes an assumption that, even though the signaling and migration data were collected not only across different cell types (epithelial vs. mesenchymal) and in different contexts (monolayer vs. sparse), the signals governing cell migration may be universal, in some sense, and therefore be maintained across these diverse biological settings. By plotting the signaling and cell speed data from the epithelial and mesenchymal states on the same axes (Fig. 3-19, bottom two rows), we can see how signals relate to cell speed if we combine the data. Such a pan-EMT model, which used all 11 signals in a PLSR model, was presented in Figures 4 and 5 in Kim et al. [70]. When combining data, the relationship between JNK signal and cell speed that was so strong in the mesenchymal state is now lost, because the functional relationship between JNK and cell speed in the epithelial state is fundamentally different than in the mesenchymal state. Further, the relationship between Akt signal and cell speed that was strong in the epithelial state is now much weaker in the combined state, because the relationship 108 between Akt and cell speed in the mesenchymal state is weak. Combining the data can also change the sign of the signal-cell speed relationship (i.e., positive vs. negative slope). In the epithelial state, Src and -catenin are both negatively correlated with cell speed (i.e., increasing the phosphorylation of those sites on Src and -catenin decreases cell speed). When combining data, both of these negative slopes are lost; however, the combined data may highlight a biphasic relationship between Src and -catenin signal level and pan-EMT cell migration speed. In spite of these differences, this chapter has focused on the biological and computational implications of separate epithelial and mesenchymal state models, not a combined state model. 3.3 3.3.1 Discussion Excerpt from Discussion in Kim et al. Our objective in this report has been to investigate how activities in multiple signaling pathways downstream of a range of receptor tyrosine kinases are changed between pre-EMT condition and post-EMT condition, especially with respect to their contributions to regulation of cell motility. We emphasize that we have not aimed to investigate signaling pathway activities involved in or responsible for the act of EMT induction per se, for that question has been addressed with great effectiveness by several laboratories during the past decade (e.g., 171, 72, 73, 74, 75, 76]). We also note that our analysis focuses on EMT induced by ectopic expression of the Twist1 transcription factor, for two reasons: first, Twist expression is strongly implicated in clinical tumor biology; and, second, it may represent a relatively tightly defined dysregulation, compared with extracellular inducers such as TGF# and TNFa, which alter expression of multiple EMT-associated transcription factors along with Twist [77]. We have found that although both epithelial- and mesenchymal-like cells possess migratory potential, their growth factor-elicited behavior is substantively distinct with respect to contexts under which vigorous motility is exhibited. Epithelial cells 109 AKT ERK1/2 24 24 22 GF 201120 is 1818 14 0 5 1 F 2 4 5 20 1 i I 16DGF243 30 20 AK W ANCr 25D3F 20 I O 8 14 14 10 F3 2 J G Is~ 18i F 22 FAK 2 4 22 2 F [ 5 500 5506800 14 5 10 xl' BCATENIN IC fl3 F I MKGF 186 1 14 FF 0F 00 40 SRC GSK3A/B J Is 18 201 18 16 14 22 F 11201 X10 F QGF ' 22 OF 20 EG 1 8 ERG SA WF C F 14 6 3F 4 6 8 ERK1/2 HSP27 2 WOFF14 1.6 5001000500500 PKCD amJ0F 18 214 FI 4 2 4 x 10' 1 j x F IM IF 22 1414 JNK HSP27 GF J$ 15 ijo 10F KC 5 0 CO IRSI 25 0. 6 PLCG ,F 20I F GF JNK 24 F 2 1 x1 IRSI 2 SRC 21 fwF 2 x10 18 GSKSA/B NIGis WRJ: ILFis 18 14 0 10 24 IF 22 20 15 10 x 10O X 105 PLCG ODI F 25 IF 20 IW 15 AM 2 4 6 8 1012 x 10' 2D Kw 15 On IRSI 201 *OF 10 15 10 is10 a 10 25 I .-. 20 F OF GF 1615is 2224 emF 800 GSK3A/B 800 18 18 2D 22 24 SRC 25 15 10 10 10 2 4 ](n 4 6 0.5 1 1.5 2 2.5 X 106 PLCG PKCD ~2DIe 20I FF E . RG 10 2 20 F 40 0 5YF 10 0 OF 15 10 x 10 F 15 15 0 BCATENIN .,W DGF GF WGF JNK HSP27 F F 5 FAK *-GF 25 W 10 OF 0 GF1 25)G j 20 ERK1I/2 AKT 25 20 FI 40 M CF G 10 PKCD GF 2 4 6 8 1012 x 10' X 10 20 40 f 10000 20000 BCATENIN o A5 10 10 600 800 2 4 6 8 1012 X10' SEpIthelial 20I~F F 10 5 0 FAK L 1 0 5000000000 X 10 2 10 j 20 40 GF Mesenchymal 0 Signal Timecourse Integral Figure 3-19: Signals plotted versus cell speed in a univariate fashion. Each subplot represents a different phosphosite's data. The x-data points represent the area under the curve of each signal's time course for one condition. The y-data represent cell migration speed. Blue and red data points represent epithelial and mesenchymal cell states respectively. The first two subplot rows show only the epithelial data; the middle two subplot rows show only the mesenchymal data; and the bottom two rows represent show both the epithelial and mesenchymal data on the same axes. The y-axes are the same within each pair of rows because the same cell speed data is plotted with each signal. EGF, HRG, IGF, HGF, and PDGF represent RTK ligands; SF represents the serum-free condition. 110 are predominantly motile only within confluent monolayers in which cell-cell contacts are maintained (consistent with previous findings [97, 98]), whereas mesenchymallike cells are motile mainly as individual cells and exhibit this best when sparsely distributed (Fig. 3-4). The responsiveness to any particular growth factor depends on whether the cells are in an epithelial or mesenchymal-like state. With respect to EMT-associated alterations in growth factor-induced signaling network activities downstream of the stimuli/cues that might be critically involved in disparate motility responses, we showed that quantitative and dynamic properties of numerous phosphoprotein signaling nodes were comparatively modulated from pre- to post-Twist conditions across the different growth factor treatments (Figs. 3-6, 3-7). PLSR analysis successfully demonstrated that multipathway signaling information can be quantitatively integrated to account for motility behavior across all observed contexts (see Fig. 4 in [70])-and can even predict a priori the motility responses in both epithelial and mesenchymal situations to treatment by an additional growth factor, PDGF (see Fig. 5 in [70]). Finally, we analyzed each EMT-condition separately in order to identify differences in signaling. Correlative topological modeling suggested Twist-dependant differences in terms of a network-level explanation for disparate motility responses (Fig. 3-8). Specifically, we propose a concept of "operational rewiring," in which the dominance of particular nodes on motility is altered by quantitative modulation of node-to-node influences. 3.3.2 Additional discussion The additional computational analyses presented here that were not presented in Kim et al. [70] have focused on building separate predictive models for cell migration in the epithelial (pre-Twist) and mesenchymal (post-Twist) cell states. By quantifying enrichment of phosphosites in high-scoring reduced PLSR models, and quantifying the errors associated with all one- and two-site linear regression models, we have explicitly addressed which phosphosites are most predictive of cell speed in these two cell states. These efforts both converged on the same conclusion: Akt and JNK are most associated with cell migration in the epithelial and mesenchymal states, respec111 tively. Neither of these signals are well correlated with cell speed when combining epithelial and mesenchymal data in a pan-EMT model. Regarding the pre-Twist versus post-Twist network models (Figs. 3-8 and 3-9), an alternative visualization strategy for the differences in the node-to-node influences between the two states is to simply plot the node-to-node correlation values in the pre-Twist condition versus the same values in the post-Twist condition (Fig. 3-20). The advantage of this visualization, compared to the graphically displayed network models with nodes and edges drawn, is that it does not require one to select a threshold and lose information related to correlation values that did not exceed the threshold. Further, this visualization also allows one to immediately see which node-to-node influences were most changed across the two conditions, versus which influences were generally maintained across conditions. The downside of this visualization strategy is that it allows visualization of only two conditions' node-to-node influences at one time (although one could show additional conditions' dimensions using 3-dimensional plotting, or by incorporating additional node size and/or node color schemes in a 2-D plot). In this experiment only two conditions were considered, so it was sufficient; however, in other experiments with n conditions to compare, one would need to display n x (n - 1)/2 pairwise plots. From a biological perspective, perhaps most interesting are node-to-node influences that changed sign from pre-Twist to postTwist (e.g., 0-catenin-GSK3a/ is negatively correlated pre-Twist, but positively correlated post-Twist; and FAK-Src is weakly positively correlated pre-Twist, but weakly negatively correlated post-Twist), which may highlight fundamental biological changes during EMT. From a computational perspective, these reduced PLSR and linear regression results show that feature selection (i.e., using only subsets of the measured signals) can significantly improve model accuracy when trying to predict phenotype from signaling data. Both the reduced PLSR models and the two-site linear regression models can be more accurate (lower mean test error) than PLSR models that use all measured phosphorylation sites. The lower prediction accuracy of the full PLSR models stems from a fundamental feature of PLSR: the algorithm tries to not only capture vari112 Comparing Pre- and Post-Twist Correlation Coefficients 1 0 PLCG-SRC 0.8 0 PKCD-GSKCAB * PKCD-IRS1 0 IR81-GSK3"I 0 HSP27-AKT 0 SCATB&-PKCp 0.6 - . IRS1-ERK1/2 H -FAK-ERK1I2 0HSP27-GSKSAAB BCATENIN-GSKMB -7- PKC-PLCG 0 FAK-PKCO , 0.4 1- AscATE NE-I Ra2 0 JNK-ERK1/2 O PLCG- R31 0 .- SRC 0 FI- S BCATENW-ERK1PHSM-JNK 0.2- 0 IR1-K1T 0 PLCG-T 0 C 0 0PLCG CR - 0- ~1/2 PKCD- O KD# - -- ------ /--- R2 BCATENIN-SAC -- - - - -- ---- 0 BCTEW44SP27 4il PK~48P27 BCATENIN-JNK 0 FAK-AKT - --- 0 FAK-JNK 0 FAK-HSP27 1 PLOG-H(P27 - --- F*JUQ. -0.2 - 0 0 HSPj7'-ERK1?2 -0.4 1- FAK-SRC 95% 99% -0.61- " Post-Twist Only (p<0.05) " Pro-Twist Only (p <0.05) * Pro- and Post-Twist (p <0.05) -0.8 -1- -1 -0.8 -0.6 -0.4 0.2 0 -0.2 Pre-Twist Correlation Coefrfcient 0.4 0.6 0.8 Figure 3-20: Raw signal-signal correlation values in pre-Twist vs. post-Twist plotted against one another. The x-data represent the Pearson correlation value between pairs of measured signals in the pre-Twist state, whereas the y-data represent the Pearson correlation value between the same pairs in the post-Twist state. Dashed boxes indicate borders of significance, including p < 0.1, p < 0.05, and p < 0.01, such that data points outside of a given dashed box ("outside" in the horizontal direction for pre-Twist, and the vertical direction for post-Twist) are significant to that level. Data points are colored to indicate significance (p < 0.05) in the post-Twist state only (orange), pre-Twist state only (cyan), both states (red), or neither (white). 113 1 ance between the measured signals and the phenotypic output(s), but it also tries to capture the variance within the measured signals themselves. As such, signals with high variance will be given more weight by the PLSR algorithm, even if that signal does not correlate well with the phenotypic output. Here 'high variance' indicates the variance associated with the mean-centered and unit variance-scaled values. Thus, in the case when a signal has small variance in the original data prior to scaling, this signal will subsequently have high variance in the unit variance-scaled data (because dividing the original data by a small variance value will inflate the unit variancescaled values). Thus, as a side point, one should have sufficient variability in the original data values to reduce the likelihood of inflated unit variance-scaled values. It should also be noted that feature selection has generally not been explored in published PLSR models (one exception is ref. [124], wherein the PLSR models were implemented using plsregress in MATLAB). This is in part due to the underlying methods used to build the PLSR models. In some software (e.g., SIMCA-P), building many different PLSR models (e.g., using different sets of N-site models) is a laborious process involving a lot of input from the user. In contrast, building PLSR models in a more computer language-type environment (e.g., MATLAB) enables facile exploration of thousands of different PLSR models automatically after entering a few lines of code. Such computing barriers should be kept in mind when choosing modeling software. From an experimental perspective, given that just two phosphosites can provide reasonable predictions for cell speed, these results suggest that one could measure these few sites in a signaling experiment in lieu of actually performing the migration experiment. Such an approach could be useful if one wanted to obtain cell speed estimates across many different growth factor treatment conditions, for which obtaining cell speed estimates would take an undesirably long time. To be even more practical from an experimental perspective, one could repeat the model-building exercises using time point-specific signaling data instead of signal time course AUC values. That way, one could simply measure signaling values at an individual time point(s) instead of needing to obtain measurements from an entire time course. This could save substantial amounts of time compared to performing the cell migration experiments 114 directly. From a biological perspective, these results indicate that separate signals are likely driving cell migration when cells are in an epithelial monolayer state (3-4B, left) versus a mesenchymal sparse (isolated single cells) state (3-4A, right). The importance of Akt in epithelial monolayer cell migration and JNK in mesenchymal sparse cell migration is consistent with literature regarding epithelial versus mesenchymal cell biology. It has been shown in Madin-Darby canine kidney (MDCK) epithelial cells that the engagement of E-cadherins is necessary and sufficient for the induction of Akt activity upon adherens junction assembly [125]. E-cadherins are integral membrane glycoproteins that serve as adhesion receptors and promote homophylic calcium-dependent cell-cell interactions. They are found within adherens-type junctions in epithelia [125]. Adherens junctions are dynamic structures that physically connect neighboring epithelial cells, and also couple intercellular adhesive contacts to the cytoskeleton [126]. JNK is known to phosphorylate paxillin and regulate sparse cell migration; a small molecule inhibitor of JNK, SP600125, inhibited the directed movement of sparsely distributed keratocyte cells [127]. Paxillin is a focal adhesion-associated, tyrosinephosphorylated adapter protein [128] that is also involved in signaling from integrins [129]. Phosphorylated JNK is known to localize at focal adhesions [130]. Focal adhesions are macromolecular protein complexes that transmit the effects of the extracellular matrix to the actin cytoskeleton through integrins [131]. These observations draw links between Akt signaling and cell-cell adhesion, and JNK signaling and cell-substrate adhesion. These results are consistent with the experimental design used here: the pre-Twist epithelial cells were observed in a monolayer state in which cell-cell contacts were maintained, while the post-Twist mesenchymal cells were observed in a single cell sparse state in which cell-cell contacts were not maintained, but cell-substrate contacts were. Further, the observation that pre-Twist cells in the monolayer condition moved in a sheet-like manner (for a discussion of sheet-like, or collective, cell migration see ref. [132]), while post-Twist cells in the monolayer condition moved individually (Aaron Meyer, personal communication, 115 August 8, 2010), suggests that post-Twist cells do not form cell-cell adhesions even when the cells are near one another in a monolayer. The observations are consis- tent with the results showing that pre-Twist cells in a monolayer state maintain their E-cadherin junctions, but that post-Twist cells in a monolayer state do not (Fig. 3-2). Additionally, further experimental testing has validated the computational predictions about the importance of JNK signaling in mesenchymal cell migration. Using three different small molecule inhibitors of JNK (JNK-IN-8, EMD Millipore; TCS- 6o, Tocris Bioscience; and SP600125, Selleck Chemicals), each inhibitor significantly reduced the migration of 12Z endometrial cells. Further, SP600125 significantly reduced the migration of MDA-MB-231 "triple-negative" breast cancer cells; the other two inhibitors were not tested against the MDA-MB-231 cells (Miles Miller, personal communication, January 8, 2013). While these cell lines are derived from the epithe- lium, both lines exhibit essentially post-EMT mesenchymal features. Taken together, these results suggest that pre-Twist epithelial cell migration is governed by the adhesivity of cell-cell contacts mediated by adherens junctions, while post-Twist mesenchymal cell migration is governed by the adhesivity of cell-subtrate contacts mediated by focal adhesions. In this manner, the pre-Twist epithelial monolayer migration results may actually reflect non-pathological, sheet-like migration phenomena associated with wound healing and development, whereas the post-Twist mesenchymal sparse migration results may reflect the pathophysiological, invasive type of migration more relevant in cancer. Thus, the transition from physiological to pathophysiological cell migration may be driven by a transition from cell-cell to cell-substrate dependent adhesion and migration. The results presented here, through a combination of cell signaling and migration experimental data, computational analyses, validation of computational predictions, and literature review, demonstrate that JNK is a key mediator of mesenchymal cell migration. Further, there is growing evidence for the role of JNK in not just migration, but in mediating EMT itself [133, 134, 135]. To strengthen the hypothesis about EMT as a transition from cell-cell to cell-substrate adhesion, future experiments would have to quantify the presence of focal adhesions in the epithelial and mesenchymal states. 116 It remains unclear the extent to which epithelial cell migration is dependent on both cell-cell and cell-substrate adhesion, and in the process of EMT the cell-cell adhesion is lost but cell-substrate adhesion remains; or if cell-substrate adhesion is weak in the epithelial state, and EMT represents a transition from primarily cell-cell adhesion to primarily cell-substrate adhesion. 3.4 Methods For a full description of the experimental materials and methods, see Kim et al. [70]. 3.4.1 Correlation network modeling Pairwise Pearson correlation was used to quantify the relatedness between signaling nodes in the epithelial and mesenchymal cell states. First, the geometric mean of the phosphorylation fold-change relative to time zero from two to three biological replicates was calculated for each nonreceptor phosphosite time course. Using only the four nonzero time points across five growth factor treatments, this gave 20 data points per phosphosite per cell state. Given 11 nonreceptor phosphosites, the Pearson correlation was then calculated between each pair of phosphosites using this 11 x 20 data matrix. The p-values for nonzero correlation were calculated using a Students t distribution for a transformation of the correlation. This provided (11 x 10)/2 = 55 unique pairwise correlation coefficients and p-values, neglecting self-correlations. Three separate methods were applied to account for multiple hypothesis testing: Bonferroni Bonferroni is the most conservative, and [118], Benjamini [117], and Storey [136]. Storey the least conservative, of these alternative methods, with respect to assigning statistical significance to correlations. 3.4.2 Reduced PLSR models To determine subsets of phosphorylation sites, out of the 11 non-receptor sites measured, that were most predictive of cell speed, reduced PLSR models were created in 117 which N = 3, 4, or 5 of the 11 sites were used in the model. In each case, all combina- tions of N sites were considered. For example, there are "11 choose 4", or 330, 4-site models to create; separate models were analyzed for the epithelial and mesenchymal cell states. The data used to create the reduced PLSR models were the integral values of the time courses from the 11 non-receptor phosphosites across the six experimental conditions (serum-free, EGF, HRG, IGF, HGF, and PDGF treatment), providing for 66 data points. Data values were mean-centered and scaled to unit variance for each phosphosite prior to building the PLSR models. All reduced PLSR models used two principal components and were implemented in MATLAB (MathWorks, Natick, MA) using the plsregress function. Given the six experimental conditions, each subset of sites was used to train a model on five of the experimental conditions and use the resultant model to predict the cell speed in the sixth left out condition. For each subset of sites, the arithmetic mean training error and arithmetic mean test error were calculated across the six experimental conditions. The errors were the absolute difference between the predicted and observed cell speeds. High-scoring reduced PLSR models were denoted by their ability to have both a lower mean training error and a lower mean test error than the full 11-site PLSR model. To compare different high-scoring reduced models, their distance from the origin in a plot of training error versus test error was calculated. These distances relative to the distance from the origin of the 11-site models errors were then plotted to compare the quality of the reduced models to the 11-site model (Reduced/Full Model Error, Figs. 3-11, 3-12, and 3-13). The significance of observed phosphosite frequencies in the reduced PLSR models using 3, 4 and 5 phosphosites is summarized in Fig. 3-14. A two-tailed hypergeometric test was performed to determine if certain sites appeared in the high-scoring models more or less often than expected by chance. For the Bonferroni correction, the cumulative likelihood of observing sites more than a maximum frequency and less than a minimum frequency was a Bonferroni-corrected p < 0.05 (p < 0.025/22, or p < 0.0011, for each tail). The Bonferroni correction of 22 was chosen given the 11 phosphosites across the two cell states. False discovery rate corrections were 118 calculated based on the Benjamini method [117]. 119 120 Chapter 4 Receptor tyrosine kinases fall into distinct classes based on their inferred signaling networks Note: This chapter forms the basis of a manuscript that has been submitted for publication, Wagner and Wolf-Yadlin et al. (2013) [137]. The author contributions for that manuscript are as follows: A.W.-Y., M.S., and G.M. designed experimental research. M.S. performed experimental research. A.W.-Y. and J.K.G. and D.E.R. contributed shRNA reagents and support. J.P.W. and D.A.L. designed computational research. J.P.W. performed all computational research and data analysis following extraction of the microarray data (including data pre-processing, quantifying shRNA effects, network inference, application of CCLE data, and study of receptor-intrinsic properties). J.P.W. designed all figures except portions of Fig. 4-1. J.P.W., A.W.-Y., M.S., D.A.L., and G.M. wrote the paper. 4.1 Introduction Receptor tyrosine kinases (RTKs) are critical effectors of cell fate that are expressed ubiquitously during development and throughout the adult body. Fifty-eight RTKs are encoded within the human genome, belonging to 20 subfamilies as defined by genetic phylogeny [1]. RTKs initiate intracellular signaling events that elicit diverse 121 cellular responses such as survival, proliferation, differentiation, and motility [138]. Dysregulation of RTK-activated pathways, often a consequence of receptor overexpression, gene amplification, and/or genetic mutation, is a causal factor underlying numerous cancers, leading to an increasing number of FDA-approved RTK-targeted therapies [1]. It has become increasingly clear that co-activation of multiple RTKs limits the efficacy of RTK-targeted therapies resistance (e.g., [139, 140, 141]). [52] and can serve as a mechanism of acquired Recent work has also shown that stimulation of tumor cells with certain RTK ligands can rescue cells from therapies targeting other RTKs [142, 143]. Thus, it seems that certain RTKs have sufficient redundancy to compensate for other RTKs upon targeted inhibition. Exactly which RTKs exhibit this redundancy and why remains unclear. Here, using a set of engineered isogenic cell lines, we measured the dynamic signaling networks of six RTKs while simultaneously perturbing thirty-eight different signaling nodes singly and in combination using RNA interference (RNAi). Applying multiple computational network inference approaches to the data, we found that certain RTKs exhibit functional redundancy because they are able to induce similar downstream signaling networks. The six RTKs studied here fall into three classes based on their inferred networks, and these classes are consistent with clinically observed modes of resistance to RTK-targeted therapies. 4.2 4.2.1 Results A systematic perturbation-based approach to uncover RTK-specific signaling networks Reverse engineering of biological networks is an attempt to infer the underlying structure of regulatory networks from gene expression or signal transduction data using computational network inference algorithms [19]. Although these approaches often uncover important regulatory interactions, spurious correlations in gene expression or protein activation levels make it difficult to isolate direct, causal interactions. To 122 QSix isogenic, RTK-specific cell lines 44 W_ WInfect E R FGFR1 c-Met IGF-1R NTRK2 PDGFRO Lentiviral shRNA expression vectors targeting signaling nodes / RTK-specific network-level data in biological quadruplicate Stimulate each cell line with RTK-specific ligand 0 min\\...//256 pData min D Print lysate microarrays and probe with 19mm processing PTM-specific antibodies Measured Signal P-Actin Figure 4-1: Data-rich, perturbation-based profiling uncovers RTK-specific signaling networks. Six isogenic cell lines expressing either EGFR, FGFR1, c-Met, IGF-1R, NTRK2 or PDGFR were treated with lentiviral shRNA expression vectors to modulate the cellular abundance of 38 downstream signaling proteins. Upon stimulation with RTK-specific ligands, time-dependent signaling events were monitored using high-throughput lysate microarrays. The resulting compendium of signaling measurements, consisting of over half a million individual data points, served as a starting point for computational analysis, allowing insight into the mechanisms underlying RTK specificity. 123 circumvent this limitation, a number of efforts have used targeted perturbations [16], sometimes in conjunction with dynamic measurements [144, 145], to constrain network topology and to infer directionality between nodes. Here, we used this strategy to infer the topology of RTK-activated signaling networks by systematically perturbing network nodes using RNAi and broadly measuring network dynamics under each perturbation condition using high-throughput lysate microarrays (Fig. 4-1). We focused on a representative subset of six phylogenetically diverse RTKs: epidermal growth factor receptor (EGFR or ErbB1), fibroblast growth factor receptor 1 (FGFR1), insulin-like growth factor 1 receptor (IGF-1R), hepatocyte growth factor receptor (c-Met), neurotrophic tyrosine kinase receptor type 2 (NTRK2 or TrkB), and platelet-derived growth factor receptor beta (PDGFR3). To isolate the unique features of each RTK from potentially confounding differences in gene expression levels, we used a set of six otherwise isogenic cell lines, each of which expresses one of the six RTKs at comparable levels, and in which downstream signaling can be activated by treatment with cognate ligand [146]. Thirty-eight proteins within these cell lines were systematically perturbed by lentivirus-mediated RNAi [147], individually (Table Si in ref. [137]) or in pools (Table S2 in ref. [137]), using a total of 88 short hairpin RNA (shRNA) interventions with a median average of 77% knockdown efficiency. To account for possible off-target reactivity of the RNAi reagents, two different shRNA clones, targeting different regions of the same transcript, were used for each gene. Our perturbations broadly covered the pathway's downstream of RTKs, notably the PI3K/Akt, Ras/MAPK and PLCy/PKC/Ca2 pathways [148], as well as a num- ber of phosphatases, cytoskeletal components, and receptor-proximal adaptor proteins. For each cell line and each RNAi intervention, we followed signaling activity by treating the cell lines with RTK ligands for 10 different durations ranging from 1 to 256 minutes (plus a zero time point control for each). Performing all experiments in biological quadruplicate, over 24,000 unique lysates were collected in our study. To query the state of activation across key signaling pathways in each lysate in a multiplex fashion, we used lysate ("reverse-phase") microarray technology [40]. 124 Using antibodies we had validated in a previous study, and using methodology developed therein [149], we quantified the relative levels of 22 phosphorylation sites on 21 signaling proteins in each of the >24,000 lysates (Table S3 in ref. [137]). Collectively, these signaling measurements, comprising over half a million independent data points, report on the state of the receptor/adaptor layer of signaling, the MAPK, Akt, PKC and calcium signaling cascades, and a variety of transcription factors. To our knowledge, this constitutes the largest signaling data set recorded to date. It is our hope that these data prove useful to the signaling and systems biology communities. Processed raw data and discretized median data are provided in Tables S4 and S5 (in ref. [137]), respectively. 4.2.2 RNAi perturbations reveal conserved Akt, MAPK, and PKC pathways across six RTKs To assess the effect of each RNAi intervention on each measured network node, shRNA knockdown effects were first quantified for all 88 network-targeted shRNAs (shRNAs targeting signaling proteins) relative to three control shRNAs (two shRNAs targeting GFP and one empty-vector control). For every shRNA perturbation, we calculated the area under the curve (AUC) across the 11-point time courses; AUC was calculated separately for each RTK, each signal, and each biological replicate. A hairpin was considered to affect a signal only if its AUC was significantly greater than or less than the AUC values of all three controls (1% false discovery rate (FDR), two-sample t-test). Example hairpin effects are shown in Fig. 4-2A. We confirmed consistency among the biological replicates: 75% of RTK-signal pairs had a coefficient of variation < 10% (Fig. 4-2B). Additionally, Pearson correlation coefficients between the network-wide responses for pairs of shRNAs targeting the same gene exceeded 0.85 for 95% of the measured signals (Fig. 4-2C). shRNA pairs exhibiting lower correlation values generally did so in multiple cell lines (e.g., MEK2 has low correlation values for multiple cell lines), suggesting biological, rather than technical, origins for the variability. A conservative summary of network-wide 125 rivisisimind anrn hkannkol -.-. hdU4?7 -.sOCK2OMN ?FW u#Kt 0 - - , Ir-W-1 U IS RTK -"-4- ii 80 No RTKs hi LL I 1A -L II c-Cbi(Y7 73) Paxiin (Y1 A (84 GSK3c- (S21 SO(S235.112 Se (S240,82 44)4 - elect A 6 ms 5 RTKS 2 RYK t x No esiht Mbred ef et 2RTK 3 m. PKCP (88 MARCKS (8512,S5 5 RTKs AMU 6RWix, cam (S RSK3 (35,S3 c-Ref(S280,8296,83 MEK1/2 (8217/S2 01) ERKil2 (T202JY2 01) p90RSK (S3 STAT1 (Y7 STAT3 (fY7 80) CRES (S1 33) NF4S (S5 c-un (8 Uo.m.~ a~I 200 C Mnute of 1GF-1stimuus D E 8~5 nebue a AhN Deareesed " TargetAs shRNATargt Figure 4-2: Perturbations reveal specificity in RTK-induced signal transduction. (A) Time courses showing example shRNA perturbations that increase, decrease, and have no effect on p-ERK1/2 signaling in the IGF-1R cell line. Values shown are averages ± standard deviations of four biological replicates at each time point. Solid and empty squares of the same color represent data from two different shRNAs. (B) The coefficient of variation (c.v.) across four biological replicates was calculated at each time point under each shRNA perturbation for each measured signal. The median of the resultant c.v. values is shown. (C) The Pearson correlation between measured signals resulting from two shRNAs targeting the same gene, when considering all signals and time points together. (D) All six RTKs induce the canonical downstream pathways Ras/MAPK, PI3K/Akt and PKC/Ca 2 , but each RTK activates its own complement of non-canonical signaling events. Phosphositeperturbation pairs are categorized into increased (red shades) or decreased (blue shades) measured signals relative to shRNA controls, based on a threshold of statistical significance. Yellow outlines denote data points where shRNA targets protein products contain measured phosphosites. (E) The large number of shRNA-induced increases in signaling across 1-3 RTKs indicates that negative regulatory interactions may play a key role in conferring specificity to RTK signaling networks. 126 '' c 1 L -L I- I4 ---- 6 -L - I- Lo II4j- I LE I I L -' II I- I R Ir I I 10 - 1 1 L- 6m - 61 1 m6 1L 1 1 - I II -ILi-A 1 - 6 L --J q Figur 4-3 shN effct fo iniiuL hNI In a 1% Storey FD corcIo (ls osratv) Th repesnato chse her cotat wit Fig 4-2 in whc th heatmapI gav Ignfcn an cosstn efecs Stce barsI-I only inicte case whr bot shN gie phshst row stce b raoer indcat bInr inras/eces effects.For a below th hoiona lin inict inrae or derae sina effcts repIvey The six bytesmeclrschm as used in th In tet as intill colors rereen th si RTA itoedm in= Fig -1:s cy-a EGFR pupe FGFR- yellow IG-1R red c-e;gen Aseik iniae shNI sed n pols Phshs1 row blue PDF1 NTRK2I by aprxmt paha mebeshp Thi vertica and shN coun ar oraie by aprxmat Meart sR liesseart difeen shRN pols Thic vertical" line pathways.-I L -j - --- 127 Lu -. L -- LM 6---1 L 1 -16. L -L--- I I I I I I I I I I I I I I I L- L I I I I I L -a-dJ I I I r vp I I -Mj I L M 'i L I 6 -- L I I - i I I -1 F K sl'r--- 491 I I I 1 I I 1 - 1 I I 1hE. I 17f II~i I --JL+ L+ II - 1I -i--I- I a j 9--L -a L- II '2+ 1 04 I I -- II I -1 p- =- I 0 L-1 j.. I 1 -I -- I IN - I p- ;I m-j -K j I 04 1 it~uIIIiluuiITuuluIlu I I I Il~u i It II I A~u l TIM I II I I II I I jI I UIITI I I 14- L I4 Ig - -F - a m n L- -L- T 1- M - - - E Figure 4-4: shRNA effects for individual shRNAs using a 1% Benjamini FDR correction (more conservative).- 128 ~A~AA ~A A -,-- . A0 -A - - . A - t ~ ~ ~ ~ Ii ~-A A - A . A- I ~~ 0 &4A - f V ~ *~ *,- 4t -A A A -~- A-A. A - -. ~ . , , A AA AA~ ~A 7- . - 4 eOA -AA 060 AAA J 4I .- - -W O + A - - - - A ------ A.~. - AA 4---- Aq LA 1 - A - ~ -&A A .4 ----- -A A - - -- A- shRA_ &A-- across four__-4 ~ A-, * -- - - t 0 6 A - It- ~ _ bioogca relcts*md AAUC ofcnto counst arte orgaze by aprxTe pedaha meberolship. Thin verticalclned s prte mdirnt shRN hNsby approximate pathwaymmesi.Ti etcllns ofteiarataret 129 -4 + Astuek indicunatie shRNAsdine poois.s Phspositedrowsuan shRNAI vetcllnssprt A A-0-0: -A .- - - y - - -A.- i --- A~A - -- C f es - - - - q:----A ~AA Aal-o2 man A A - ------ 9 sAP AA- -o ----- V400 -A AAA--- - ~AA ,A -- -- -A A ---- - . 40- ------- W! A -- O 0 A ALtA - -- AAA_ 0 .~- -- - - A -~ poueaols thc gifeene sR Apos hc , -- A A A - z' shRNA effects, showing only significant effects (1% FDR) that were observed consistently with both hairpin clones, is shown in Fig. 4-2D. Receptor-specific shRNA effects, including effects that were observed for only one of the two shRNA clones targeting each gene, are provided in Figs. 4-3, 4-4, and 4-5. Tallying the number of significant perturbation effects across RTKs revealed that network connections within the Akt, MAPK, and PKC pathways are uniquely conserved across RTKs. Perturbations within each of these pathways, specifically (i) PI3K-+PDPK1-+ Akt-+ GSK3, (ii) Raf-> MEK-- ERK-> p90RSK, and (iii) PLCy -PKC/16 -+ MARCKS, are propagated throughout the entire pathway in the majority of cases across all six RTKs. In contrast, most other perturbation effects are observed only across small subsets of RTKs. Additionally, the directionality of relationships between signaling nodes is highly conserved across RTKs. Considering only shRNA effects that are consistent across both hairpin clones, all 107 of 107 shRNA effects observed in at least two cell lines affect the signal in the same direction (either increased or decreased) across cell lines. When pooled shRNAs are included, more than 96% (184/191) of shRNA effects show consistent directional effects. Even when pairs of shRNA clones that are not consistent with each other are included, 92% (565/612) of shRNA effects are directionally consistent. These results indicate that perturbation sensitivities across RTK-activated signaling networks, if they are present, are generally conserved (i.e., no reversal in directionality), but that RTKs use distinct subsets of the available RTK connectivity space. In addition to many reduced signals, some phosphorylation sites exhibit increased levels following shRNA-mediated perturbations. These increases, however, tend to be conserved across fewer RTKs. Only one increased signal is observed across all six RTKs (p-MEK1/2 increases in response to ERK knockdown by the ERK shRNA pool), and only three increases are observed across five RTKs (p-MEK1/2 and pERK1/2 both increase in response to the GSK3 shRNA pool, and p-Akt increases in response to the PTPN pool). Notably, robust feedback within the MAPK pathway and crosstalk between the Akt and MAPK pathways are observed across all six RTKs. Although these regulatory events have been observed by others-feedback from Erk 130 to Sos and from Erk to c-Raf (ref. [150]), and negative regulation of MEK/ERK by GSK3 (ref. [151])-the extent to which they are conserved across RTKs was not previously appreciated. It is also notable that knockdowns resulting in increased signals across five or six RTKs are observed only with shRNA pools but not with single shRNAs, suggesting that protein isoform-specific roles across RTKs can be overcome by concurrently perturbing multiple isoforms. The specificity of shRNA effects is quantified in Fig. 4-2E, summarizing the number of each type of colored square in Fig. 4-2D. This summary shows that most perturbations affect only 1-3 RTKs. Most pan-specific effects (affecting five or six RTKs) are decreases in signals at phosphosites within the MAPK, Akt, and PKC pathways. The observed effects across RTKs and across the measured signals are significant relative to a model that assumes shRNA effects occur randomly (Fig. 4-6, x test, p = 0). Thus, although the RTKs share many of the same pathways, they exhibit different levels of sensitivity to targeted perturbations. Changes in phosphorylation signals following shRNA perturbations can arise from one or more of the following phenomena: (1) reduction in the concentration of a kinase or phosphatase, directly affecting the phosphorylation levels of its substrate; (2) transcriptional, translational, or post-translational feedback or compensation in the network; and (3) modulation of scaffold or protein complex stoichiometries [152], including decreases in the concentrations of proteins to which phosphatases dock. We expect the non-specific effect of lentiviral infection itself to be minimal because, rather than being dominated by a possible "infection signal", many hairpins exhibit different effects on the signaling network compared to empty vector and GFP-targeted control hairpins. Post-translational feedback in the network may function through feedback reactions (such as Erk phosphorylating inhibitory sites on c-Raf/Raf-1, ref. [150]), or through more indirect effects. As an example of the latter, competitive inhibitiontype effects may occur when there is an increase in the amount of enzyme available for a given substrate following a reduction in the concentration of one of that enzymes other substrates [153]. This effect has been called retroactivity. For example, 131 "*crease sinal A M Z 1400 Decrease signal onW Thewe -0 Observed 1400 -- 200 1200 1000 1000 600 00 200 z 0 1 3 2 4 5 20 6 0 Number of RTKs affected B 160 140 140 120 120 100 100 80 z so 60s 4 5 6 signal 80 60 40 40 20 20 0 3 2500 Simulations -- SImulaon Average Theoreticl (hypergeomet) -0-observed 200 180eW 160 0 2 2Decrease 2500 Smi -SImuladion Average T diW (hy -e-Observed 180 1 Number of RTKs affected Increase signal 200 n=M*Al (hyge) beeved 5 10 15 0 20 0 Number of sites affected 5 10 15 20 Number of sites affected Figure 4-6: Observed shRNA-induced effects across (A) RTKs and (B) phosphosites are not consistent with a model in which shRNA effects are randomly distributed. The total number of increased (1,232) and decreased (1,346) signal effects resulting from the Storey 1% FDR correction were randomly distributed in silico among the 11,616 RTK-shRNAphosphosite pairs (6 RTKs x 88 shRNAs x 22 phosphosites). The number of increased and decreased signal effects was then tallied across the RTKs and across the phosphosites. This simulation was repeated 2,500 times. The simulation average converged to the hypergeometric distribution. Because the distributions of randomly distributed shRNA effects are not consistent with the distributions of observed shRNA effects, often with empirical p << 1/2,500 (4 x 10-4) when comparing individual values in the distributions, we conclude that the distributions of shRNA effects across RTKs and across phosphosites are significantly non-random. 132 if a kinase has multiple substrates, reducing the concentration of one substrate may increase phosphorylation of its other substrates. Similarly, if a phosphatase has multiple substrates, reducing the concentration of one may decrease phosphorylation of its others. This concept has been considered theoretically in the context of kinase inhibitors [154], where inhibiting a kinase can turn on a quiescent parallel pathway. To our knowledge, however, this has not been considered in the context of RNAi perturbations. While demonstrating specific cases of this effect is not our goal here, we simply submit that some shRNA perturbation effects may stem from this nonintuitive indirect phenomenon. Lastly, observed shRNA effects are not just a function of which proteins are phosphorylated by a kinase (or dephosphoryalted by a phosphatase), but rather which residues are phosphorylated or dephosphorylated. Thus, the absence of an shRNA effect at a particular phosphosite does not necessarily imply that the corresponding proteins are not connected, as they may functionally relate through a phosphosite that is not measured in our study. 4.2.3 Data-driven network inference reveals three RTK classes To better understand the signaling network topology and dynamics underlying these perturbation-induced effects, network inference was performed using each cell line's data separately. The zero minute time point was separated to represent the basal, unstimulated network state, and the remaining ten time points were grouped into three time scales based on k-means clustering of the temporal data across all RTKs. Although the algorithm did not require it, the resulting four time scales contained only contiguous time points: basal (0 min), early (1, 2 min), intermediate (4, 8, 16 min), and late (32, 64, 96, 128, 256 min). This provided four time scales for each of the six RTKs, yielding 24 different data subsets. The use of complementary network inference methods can improve confidence by circumventing the biases inherent in any single algorithm [155]. We therefore used five different network inference algorithms: Bayesian networks [50], mutual information [156], context likelihood of relatedness (CLR) [27], Spearman correlation, and 133 Mutual Information CLR Spearman Correlation Pearson Correlation - 5OCluster 1 -FGFRI -4 c-Met ~ ~ -IGF-1 ---. -- I .. -. .. - R NTRK2 ( -----. 1, PDGFRP ,''. .I._ _ .---. -,...--' _ ._ .ne Cluster 2 Cluster 3 time scales' ooOFour stuctures twork Eigenvalue 1 Figure 4-7: Clustering RTK-specific network models reveals three RTK classes. Connectivities of RTK signaling networks were derived from our large-scale signaling data using five different network inference algorithms. Relationships between RTK-specific networks were then visualized in two dimensions using multidimensional scaling. Marker color denotes receptor cell line. Marker size denotes the four time scales from basal (smallest markers) to late (largest markers). Dashed outlines and marker shapes represent k-means clustering assignments. 134 l 3% 3 0% v UPn mipO EdPcew prd8 10% vElde 8 8.% Vaiwmcam0 30.3% VuemmuGqi*Wd33% VwbiOWbsW:W@d Cmii:.S 0.10 ICmrd Vadnmsepnd~M2 % V msplaiwd:28.1% CordcOMs MIlA.04 MIA06S Veinmexpled: 278% Vdmosmpind 23% 2 Z Vednc .1 explined: 27.3% N0.c0 mICo 0.16 Vdmsmplind:34.3% Z: .24 I= CoMd VWn Mc0.2 Vdnce aiUd McO.10 ON MN10 8N0.06 0.55 ene elhd27.% >A.23 A4.% Vanc .led29.7% vaence expWM :0.37 ICi A4 lComA.38 VWlanc pk nds 51.0% VedanceMpinuiied:o.3% e"plaind:44.1% Cci0 .20 CordA.s c1onW eAid: 41.3% VadnS epain 27.0% V VOd=cs exined:87% VwmcuqWined: ON0.4 22 VwWc.q knd40.1% VdMr iin>013 icor vdnc= exopnd 45.8% Vadnceeplied41.9% Ed pindO 40% Gdgn WhlWp : 50% UPewe pn 40% Edp WOW 40% vd 40.% VuliuqOWd 0 VWSWIS*Wnd424% Vwimsmiitd402% Vwbmo*OW4WMed40*% explaind Cnii MI0.13 explained304% Vadanm Z:.079 V7lmnc.exWinst 284% N0C.18 Cmi n>30 AOAM A.30 48.0% Vunoqxpais 479% V Varin A.16 cAned:310% Vi..mceud.4GA% ACond027 ICmniO.5 Cmi 40.8%VwncepiWned:4I.5% A.22 vedenc p d ZA.38 M 1% 4Wu~tpiOS % VWmlpW"422% vad0n% e 53.1% VWadnOimneId:2.0% Vadcemeaplined nWlMexined Zy1.03 Vm lncqpined: 28. vEdange W)-O3O Co A.63 lan VOWM eid 83.5% MZ-OA1 eMind: vAdMne eplaind: 27A8% varWino Z>1.75 ZSA41 pined 2p.6% Vadns eqxpwed:322% 28.2% VWadWM vedencexiplnhud: 8N40.20 N04NA7 MNc0.73 BNA0W Figure 4-8: Network model clusters are robust to intermediate range of applied edge weight thresholds. Results for clustering the network model structures, here visualized using the first two eigenvalues from multidimensional scaling, for each of the five inference methods across a range of edge weight thresholds. Thresholds were determined by varying the percentile ranking of edge weights for each method. Percentiles increase from left to right, and are indicated at the top of each row. Plot marker shapes indicate cluster assignments, while marker size indicates time scale (basal = smallest, late = largest). The six colors represent the six RTKs by the same color scheme as used in the main text, as initially introduced in Fig. 4-1: cyan, EGFR; purple, FGFR1; yellow, IGF-1R; red, c-Met; green, NTRK2; blue, PDGFR3. The following percentile ranges provide robust clustering of the three RTK network classes for each inference method: Spearman (50-70), Pearson (30-70), mutual information (50-70), CLR (30-70), and Bayesian (40-50). 135 8.0% Pearson correlation. Each of these five methods was applied to the 24 different data subsets across RTKs and time scales, yielding 24 different network states per method. To visualize differences in the inferred network structures across RTKs and time scales, adjacency matrices describing the topology of the inferred networks were analyzed using multidimensional scaling [157] (Fig. 4-7). Remarkably, the five inference methods consistently revealed three distinct RTK classes: EGFR/FGFR1/c-Met, IGF-1R/NTRK2, and PDGFR3, regardless of the time scale. These three RTK classes are robust, as they are maintained across a wide range of network model edge weight thresholds (Fig. 4-8). This indicates that the signaling networks downstream of these six RTKs operate according to three identifiable programs, and that the majority of variation in the inferred network structures arises from differences between RTKs, rather than between time scales. Inferred signal-signal relationships are largely maintained across time for a given RTK. 4.2.4 Consensus across inference methods reveals RTK classspecific signaling To determine which edges account for the differences in network topology across the three observed RTK classes, a consensus network was developed using all five inference methods and all four time scales for each RTK. This approach identified edges consistently observed within one, two, or all three RTK classes (Fig. 4-9). Because variation in network structure arises primarily from different RTKs rather than different time scales, this consensus approach highlights RTK class-specific edges conserved across most time scales. The consensus network reveals a striking pan-RTK signaling core shared by all six RTKs, along with sets of RTK class-specific edges (Fig. 4-10A). Notably, the IGF-1R/NTRK2 and PDGFR3 networks both contain fewer edges than the EGFR/FGFR1/c-Met network, with all edges in the IGF-1R/NTRK2 network and all except one edge in the PDGFR/3 network also present in the EGFR/FGFR1/c-Met network (Fig. 4-10B). This suggests that, among the measured phosphosites, the EGFR, FGFR1, and c-Met 136 S.7 a=0.7 0101400 as as 410Wa Figure 4-9: Identifying RTK class-specific edges through consensus network edge frequency. Heatmap values indicate the fraction of network models (when considering all five inference methods and all four time scales) containing the indicated edge. receptors exhibit a greater degree of coordination in their responses to growth factor stimulation, as a denser network implies more highly correlated signal-signal relationships compared to the sparser IGF-1R/NTRK2 and PDGFR3 networks. The pan-RTK backbone identified through our network modeling approach contains the conserved MAPK, Akt, and PKC pathways-as previously highlighted by our direct analysis of shRNA-induced effects. In addition, it contains a variety of other conserved directional edges. The majority of RTK class-specific edges are signals related to c-Cbl, Shc, paxillin, and calmodulin, suggesting that these receptor-proximal signaling influences may play a central role in mediating RTK class-specific responses. Some nodes in the RTK class network models have no inputs, and thus have no directed path from the receptor phosphorylation site (labeled 'RTK' in Fig. 4-10A) to the node. This can occur across all RTKs (for example, Akt has no inputs in any of the RTK classes models), or within an RTK class (for example, the MAPK cascade beginning with c-Raf has no inputs in the PDGFRL class network, but has c-Cbl as an input in the other RTK classes' models). In the latter case, this does not 137 B OTxn. PDGFR Factor oR er- IGF-IR W EGF4 FGFR1 NTRK2 J4 C o EGFR CI ti pog2l PU N (Y118) Figure 4-10: Network models' consensus reveals core RTK signaling backbone and RTK class-specific interactions. (A) RTK backbone edges are shown in thick black edges, while class-specific edges are colored. Nodes are colored according to their approximate biological function. Tyrosine and serine/threonine-containing phosphorylation epitopes are shown as ovals and boxes, respectively. (B) A Venn diagram showing shared and class-specific edges across the three RTK classes. All IGF-1R/NTRK2 edges and all but one of the PDGFRL edges are present in the EGFR/FGFR1/c-Met network. (C) Median signal values (across all time points, shRNA conditions, and biological replicates) for each phosphosite relative to the EGFR cell line. 138 necessarily imply that the node without an input is unphosphorylated in the RTK cell line(s) from that class compared to the other cell lines. For example, the median phosphorylation level of c-Raf in the PDGFRO cell line is comparable or higher than in the other cell lines (Fig. 4-10C). Instead, nodes lacking inputs for some or all RTK classes are likely under the influence of unmeasured signal(s), termed hidden nodes, or under the influence of other measured signal(s), but in a potentially complex way not captured by the model. To determine if clustering the raw data directly could recapitulate the network model clusters, the raw data were clustered using (1) median signal values across all time points, (2) signal values from all time points, (3) signal values from each time scale, or (4) signal values from each time point (Fig. 4-11). Data from all shRNA perturbations were used in each case. The network model clusters were recapitulated using the raw data in only two of 16 clustering scenarios (late time scale and 256 min.), and one of the three clusters was recapitulated in an additional three scenarios (0, 2, and 4 min.). That all five network inference methods highlighted the same three RTK classes, but clustering of the raw data generally did not, suggests that inferred network topologies contain information not accessible by clustering the raw data directly. To explore this notion further, we generated synthetic data from networks with four different known topologies (see section 4.4.16). We simulated five sample data sets per network, and then attempted to classify the resultant twenty data sets according to their underlying network using either the raw data or the network topologies inferred from the raw data. The inferred topologies clearly segregated according to their underlying network, whereas the raw data did not (Fig. 4-12). This further supports the notion that inferred network topologies, in which relationships between measured signals are explicitly quantified, provide insight into the multivariate structures underlying raw data beyond what can be observed by clustering the raw data directly. This strengthens our case for using the identified RTK network model clusters as relevant indicators of signaling network differences among the six RTKs. 139 B va"les "ed'ans' A 1TRK2 AFGFR1 2 EFR AtmPot"G O"WRB 40 G -R 20 F gGF-1R E4TR AFGFR -20 Aomai -40 40 10 C 5 0 P40 CWWpI -100 Eady Basal 0 min. 0 0 PftnCom. I -40 100 Late 'l'ed"" 32, 64,96, 128, 256 min. 1,2 min. 40 40 m 20 30 20 1I 1I 10 10 PDF1R EMGFR 0: scu -20 40 0 20 Pdn.Comp. I FDR * FGR FR 40 -Pu. B 2 min. (See *Basal, 0 min." case above) 20 *GF-IR 10 twSoe 16min. -20 10 *EGFR -30 -20 0 20 40 0 PGR 20 FGFR 20 IGF-IR oArGFRhA,~ GFRI GFR 0 0 20 PrmCamp.I GFR1 4 8 min. EGFR COM.1 4 min. 2D &A AFGFRi -10 41POF -40 40 20 --10 U TGFRI -20 -10 D NM A 20 20 -2 _10 0 10 Pn.Cam- 1 64 min. 20 1 201@ER 10 0 -10 128 mn. NTRK2 EFOF-1R AFGFR 0 20 P40n. CORP.1 -0 0 96min. 0 20 -40 40 10 0 -20 0 20 256 min. 128mm~. 20 I F 3FR1 -20 40 .10 dkffMOjP' ,Nrx * AEF GFRB I PF-IR I I ckwnamiado of3mrwdabE FGFRI AjG -1R 3 of 3rawdta Clurs mh nwhmkmoftdan -20 -40 Pd".CammI ! 0 -20 P*LCOMpI R .0 Pfit.Cow. I -10 20 0 20 Figure 4-11: Clustering the raw data directly. The raw data were clustered using (A) the median signal values, (B) all time points together, (C) each time scale separately, or (D) individual time points. Cases where one of three raw data clusters matched the network model clusters are shown with blue titles, while cases where all three raw data clusters matched the network model clusters are shown with red titles. Marker shapes indicate cluster assignments. The six marker colors represent the six RTKs by the same color scheme as used in the main text, as initially introduced in Fig. 4-1: cyan, EGFR; purple, FGFR1; yellow, IGF-1R; red, c-Met; green, NTRK2; blue, PDGFR3. 140 ANetwork B #1 Netor Network #4 ::3 Network #2 Samole #1 Saminle #3 Sample #5 Samole #4 0 0 Cn4 z 0 2I I. 0 a, Signal values across 200 conditions C Raw Data (PCA) Continuous Correlation Matrix (PCA) Normalized Raw Data (PCA) 40 4 200 CO W Binary Correlation Matrix (Jaccard Distance) 0.4 30 40 05 10 -200 -E0 -00 02 100 20 0- -.----- - - - - 0 jo 0 0.1 V * 0 -- - 0 - . -0.1 S -10 -0-0.2 -40 -207 0 20 40 04.4 0. -02 0 0.2 0.4 Eigenvalue I Figure 4-12: Clustering network topologies inferred from simulated data reveals underlying network differences but clustering raw data does not. (A) Four synthetic networks used to simulate data. Network structures are defined based on edges between individually numbered nodes, whose positions vary from network to network. (B) Simulated data sets for five independent samples from each of the four networks. Rows of each heatmap represent nodes and columns represent conditions. All heatmaps are shown using the same colorbar scale. (C) Using principal component analysis to visualize the raw data, normalized raw data, and Spearman correlation matrices, and multidimensional scaling to visualize the binary Spearman correlation matrix (correlation values exceeding the 6 0 th percentile). Marker colors indicate which of the four synthetic networks the data were generated from. The inferred network topologies (i.e., correlation matrices) cluster according to their underlying networks, but the raw data do not. 141 X 102 2- 1.5- 1 0.5- 2 4 6 8 10 12 mRNA expression level [RMA] 14 16 Figure 4-13: Observed distribution of gene expression values in the CCLE. All gene expression values in the CCLE were included (when considering all 18,926 genes across all 967 cell lines, i.e., 18, 926 x 967 ~ 18.3 million gene expression values). The bimodal nature of the plot suggested a natural range over which to consider genes to be expressed versus not expressed. 4.2.5 RTKs and ligands are co-expressed in cancer cell lines and enriched in certain solid tumor types To determine the degree of expression of the six RTKs and ligands used in this study in relevant cancer cell lines, we analyzed the Cancer Cell Line Encyclopedia (CCLE) data set, which includes mRNA expression values for -19,000 genes in 967 cancer cell lines [1581. Expression values exceeding five on an RMA (robust multi-chip average) scale were used to define expressed genes. This threshold was chosen based on the observed bimodal distribution of RMA values across all genes in the CCLE (Fig. 4- 13). EGFR, FGFR1, MET, and IGF1R were widely expressed (97, 96, 81, and 95% of cell lines, respectively), whereas PDGFRB and NTRK2 were only expressed in 23% and 4% of cell lines, respectively. The degree of co-expression for RTK and ligand pairs varied across the six RTKs (Fig. 4-14A). Because of the nature of the CCLE experiment design, any observed gene expression would be limited to expression in tumor cells and not stromal cells. We used co-expression of receptor and ligand as an indicator of potential autocrine 142 A EGFR FGFRI EGF FGF1 IGF-1R c-Met HGF IGF1 NTRK2 PDGF BDNF PDFB D 12 0 . EGFPR.MET Wid =1 Ligands In this study a-66200,040~ EGFR B EGF or HBEGF oE'o FGFR1 FGFI, 2, 4, 5, or 6 RTK-ac vat Ilgands C EGFR mRNA FGFR1 mRNA NTRK2 BDNF or NTF3 lot PDGFRS PDGFB or D 0l 71O I4 MET M Pdri.Comp.1 Pdn.Comp.I IGF1R mRNA NTRmnRNA PDGFRB mRNA Pdn.Coma.I Pdn.Coma.I Prn. Com.1 0io E Prin.Comp.1 Carcinoma Prin.Comp.1 F *.7 H*matopoletc neoplam Gioma Prin.Comp.I Lymphold neoplamm Malignant mlanoma Pdn.Comp.1 Neuroblastoma Figure 4-14: RTK and ligand expression in CCLE cell lines. (A) Co-expression of receptors (black) and the ligands used in this study (red) across 967 cell lines in the CCLE. (B) Considering co-expression of multiple cognate ligands in the CCLE increases the number of cell lines co-expressing receptor and at least one ligand. (C) Gene expression levels of the six RTKs displayed in principal component space. (D) mRNA expression values for EGFR, MET, and FGFR1 plotted against one another. Red circles indicate cell lines with greater than median expression values of the three RTKs. Tumor histologies (E) enriched or (F) depleted for co-expression of EGFR/FGFR1/MET. Red markers indicate cell lines derived from the indicated tumor histology type. 143 activation of these RTKs. The low co-expression of some receptors and the ligands used in our study may be partly explained by the fact that some receptors can be activated by multiple ligands. For example, if in addition to considering the ligands used in our experiments we consider additional ligands that can activate EGFR (ref. [159]), FGFR1 (ref. [160]), NTRK2 (ref. [161]) and PDGFR/3 (ref. [162]), we see that most cell lines expressing an RTK express at least one cognate ligand (Fig. 4-14B). We estimate that the signaling networks induced by these other family ligands would be similar to those induced by the ligands used in our study. c-Met is only known to be activated by HGF, and data for IGF2 (another ligand for IGF-1R) and NTF4 (another ligand for NTRK2) were not available in the CCLE. Low co-expression of some receptors and ligands may also be partly explained because some receptors are more commonly activated in a paracrine manner, and thus we would not expect high receptor and ligand co-expression in tumor cells alone. For example, c-Met is often activated by HGF secreted from stromal cells [163], although HGF-independent activation may also occur [164]. Thus we observe that the RTKs used in this study and one or more of their cognate ligands are co-expressed across many cell lines in the CCLE. Co-expression of the receptors and ligands are robust to the RMA threshold used to define expression (Fig. 4-15). Having established widespread co-expression of receptor and ligands in the CCLE, we next sought to determine which tumor types were co-expressing the RTKs used in our study. To provide a two-dimensional visual representation in which cell lines segregate according to global differences in their gene expression levels, principle component analysis (PCA) was applied to the matrix of -19,000 gene expression values across all 967 cell lines. The resulting layout of the 967 cell lines in principal component space is shown in Fig. 4-14C. PCI and PC2 explain 8.1% and 4.6% of the variance across cell lines, respectively. The color of each circle represents the expression level of the RTK in each cell line. These PCA results show that many cancer cell lines express multiple RTKs. Further, these data show that EGFR, FGFR1, MET, and PDGFRB are expressed at high levels only in particular cell types, whereas IGF1R is expressed at high levels in nearly all cell types, and NTRK2 is only expressed in a 144 FGFRIGF E3FR~eGF METMGF IFIRFI NTRK26DNPF GF DFODF A so W B EGMFR EGForNEEGF FGFR1 FGF1.2, 4, S.or 6 N12 BOWtr NTF3 e PXFRS PDGFBor D I 'a -O Figure 4-15: Co-expression of the receptors and ligands for multiple RMA thresholds. The coexpression of receptors and cognate ligands, as shown by Venn diagrams, are robust to the RMA threshold used to define expression when considering (A) only ligands used in this study, or (B) multiple RTK-activating ligands. Different RMA thresholds are shown down the rows, while different RTKs are shown across columns. 145 small subset of cell types. Given the number of cell lines co-expressing EGFR, FGFR1, and MET, and the similarities of the EGFR, FGFR1, and c-Met network models, we sought to identify which tumor types were co-expressing these three RTKs. Plotting the expression levels of these three RTKs on the same axes indicates that they are indeed co-expressed at high levels in many cell lines (Fig. 4-14D). Using information in the CCLE about the original tumor histology of each cell line, we calculated which tumor histologies had more or less EGFR/FGFR1/MET co-expression than expected by chance. Carcinoma, glioma, and melanoma cell lines were enriched for EGFR/FGFR1/MET co-expression (p = 8.7 x 10- 2 6, p = 3.0x 10-6, andp = 1.9 x 10-3, respectively, Fig. 4- 14E), whereas hematopoietic neoplasm, lymphoid neoplasm, and neuroblastoma cell lines were depleted (p = 2.0 x 10-30,p = 1.7 x 10-24, and p = 7.8 x 10- , respectively, Fig. 4-14F). These results are robust to the RMA threshold used to define the co-expression signature, with the exception of the melanoma cell lines. At higher expression thresholds, melanoma cell lines are actually depleted for the RTKs coexpression, suggesting that the three RTKs are not all expressed at high levels in the same melanoma cell lines. A full assessment of EGFR/FGFR1/MET co-expression enrichment or depletion across 20 tumor histologies in the CCLE, as a function of expression level threshold, is shown in Fig. 4-16. Overall, these results suggest that specific patterns of RTK co-expression (e.g., EGFR/FGFR1/MET) are overrepresented in certain tumor types. We propose that the pre-existing co-expression of multiple RTKs from the same network model class within many cell lines in the CCLE is consistent with notions of primary (or instrinsic) resistance to RTK-targeted inhibitors. This redundancy of RTK networks may also be at play in the development of longer term acquired resistance to RTK inhibition, through selection of subpopulations of cells with higher levels of compensatory RTKs, and/or feedback within the same cells that increases expression of compensatory RTKs. 146 Ewings acome- 01 Prmas, umour C-rciwid- nu,.edmsn endocdntulmour -... - 6 4 2 2 . 4 0.8 .62 400 0 4 200 0.4 0.2 4 >S >a il 6 0 02 4 >5 >6 502 40 0.5 10'0.5 4 >" >5 >6 A7 >6 44 >5 1.5 0 3 '7* 7 >9 4 3-5 x6 a '4. 5 7 0 15 60.20 206 > >0 10 0 44 >7 102 1.5 '4 >6 30 o1stm 0.6 >7 1T 2 1.51 "S Rfwosecom chonrosecome ecinOM 00 00.8I3 . '4 447 '4 74 '44,74 4 44 9 RMA threshold used to define EGFR, FGFR1, and c-Met co-expression signature Significantly enriched or depleted for EGFRIFGFR1/c-Met co-expression Figure 4-16: Cell line histology enrichment results for multiple RMA thresholds. The enrichment or depletion of cell lines with EGFR/FGFR1/MET co-expression in cell lines with different histological origins are shown. The values on the x-axis indicate the RMA threshold used to first identify which cell lines were co-expressing the three RTKs. Using the hypergeometric distribution, grey data points indicate the expected number of cell lines from each histology type that should also co-express these three RTKs. Red data points indicate the actual observed number of cell lines from each histology type that also co-express these three RTKs. Note that subplots have different y-axis scales. Significance (indicated by black circles) was determined using a 5% FDR (p < 0.0191) with the Benjamini method. 147 4.2.6 RTK network class genes are correlated with responses to RTK-targeted therapies Given the results that certain RTKs fall into classes with shared inferred network topologies, that these RTKs are frequently co-expressed in cancer cell lines, and that certain tumor types are enriched for this co-expression, we next sought to assess if RTK co-expression had implications for response to RTK-targeted therapies. We first asked whether RTKs within a network class were correlated with resistance to therapies targeting RTKs in that same class. This is based on the notion that RTKs within the same class appear to have shared underlying signaling network topologies, and thus these RTKs may be more capable of compensating for inhibition of other RTKs in that class. For example, resistance to EGFR inhibitors may be mediated more effectively by FGFR1 and c-Met than by IGF-1R, NTRK2, or PDGFR3. To test this hypothesis, we again turned to the CCLE data set. In addition to gene expression data, the CCLE contains cell growth inhibition data across 500 cell lines for 24 anticancer compounds, including EGFR, FGFRI, c-Met, and IGF-IR kinase inhibitors (erlotinib, TK1258, PHA-665752, and AEW541, respectively). There are caveats associated with using the CCLE data to assess drug resistance mechanisms. First, the kinase inhibitors off-target effects may complicate interpretation of these resistance profiles [1651. In the case of the EGFR inhibitor erlotinib, the c-Met inhibitor PHA-665752, and the multi-kinase inhibitor TK1258 that also targets FGFR1, we see that these compounds have many off-target effects compared to, for example, the EGFR/HER2 inhibitor lapatinib (Fig. 4-17A). These off-target effects likely induce their own minor resistance mechanisms in concert with the resistance mechanisms for each compounds primary target(s). Regarding TK1258, although it binds 18 kinases with greater affinity than it binds FGFR1, many of these other genes are expressed at low levels across the CCLE cell lines (Fig. 4-18). A second caveat is that the receptors and their ligands are not always co-expressed, as noted in Fig. 4-14, so ligand-mediated receptor activation is less likely for some receptors. Nevertheless, we believe that meaningful conclusions can be drawn from a comparison of network 148 A k. mhJl ii d . I 11 I I 2442 Idna r - .207, p o 9.77-08 r = 0.179, p x 1.269-04 , 13 =0.16 r = 0AM6,p = 0.16 4 3.. 2 3 BhM 3 6 -0.184. p - 1.05*-04 a 4 p = 0.14 -0.72 r 10 3.- 6 10 12 4 6 r -.. 6 *0 10 6 S 10 ea 1 to r- -0.12S,p S6.416-03 105, p.=2.166-02 3 3 p-0.36 1 6 4 6 r- 4.133. p 7 4 0.74 3 $6 r 0.14, p=0.12 3 t S 3 6 10 .-0.0.350 -- 0 2 1 a r 0.04 - 4.076, p a 0.13 - 3 2 6 4 2 * 6 10 - r.416p-. -'3 , 2o 1 6 a r = 0.073,p 4. i 4 66 10 12 6 24 r - -0.101,p * 2.72P-02 r *0 to 4 0.202, p = 1.16.-0S 1 6 4 7 r=0.21,pO0A7 6 6 r 0.00k,pa 0.93 g 2 6 1 r -0.18, p - 4.38.-0 =0.125, 46 to p a 6.30s-03 r 6 to 12 2 -0.161, p = 1.12e-03 4 4 r.4-.300, p=0 4 5 TrU 6 EG 10 t 7 30 r =0.115, p=0.21 4 6 POGFM r S.1~a2 2 tg FOFRI exSP"8sion 6 J 2 4 M6 12 4 6 6 10 IGFIRxpei. Figure 4-17: RTK class genes are correlated with anti-RTK therapy response. (A) Affinity of kinase inihbitors lapatinib, erlotinib, PHA-665752, and TK1258 for 442 kinases (data from ref. [165]). On-target effects are shown in red bars. (B) Correlating RTK gene expression with responses to EGFR, FGFR1, c-Met, and IGF-1R inhibitors across hundreds of cancer cell lines. Cell lines with RMA expression values >5 or <5 are shown in blue or grey, respectively. Values above subplots are Spearman correlation coefficients and p-values. Red lines indicate linear fits to the data. Genes significantly correlated (1% FDR) with drug response are shown with red font titles. 149 FLT3 mMA P. Comp,1 POGFAtAmAMA bgo,0 K. M) -. 27 PfkLCOWW I RETmAAMYUC4 Ibg(K. aM . 15 FOPRI mMA POGPROn~M P. Cony I UU(2 mRHA Iogo(( (1D*72 PubkComy.I a~A 1og1 (K MD-7.1 KITmAMA YSK4mANA Pdk cum. I Prin.Corny.I UU(1 mAf" bg,,,(K.P 7M CSFIlR mAMA i6og,(, WMD .72 Pri. I MYLK2mA Iog, .7.4 0 (K.PAD TNN mAMA P.kLCMV. I PLTI w~M bo~(.) .a7.16 Prk~Cony. I PdkMCamp.1 GAK4mA bqI (K.WD 7.01 RAM mRNA WM-6.89 0910O(Ka MASTImRAA Pi.COMP I MAP2K2mRNA logO( 0 PCD=7.15 A ConW1 TOKImMNA Iag,O(K. WD.S.AW FA AK4f~U3 Mg.SK.(K PODW6.4 I12 10 S a 4 Figure 4-18: Gene expression values of tightest TK1258 kinase binders. The gene expression values of the 24 kinases that bind TK1258 most tightly are shown, as indicated by the color of each marker in principal component space (analogous to the plots in Fig. 4-14C). Kinases are sorted in order of decreasing affinity from upper left to lower right. Each marker corresponds to one cell line in the CCLE. Although TK1258 binds to 18 kinases more tightly than it binds to FGFR1, many of these other binders are expressed at low levels in the CCLE. All subplots are shown on the same color scale. 150 structure similarity and small molecule growth inhibition data. To quantify the relationship between gene expression and response to a given drug, we calculated the Spearman correlation coefficient between each genes expression values and the activity area for that drug in each drug-treated cell line. Activity area is a metric for growth inhibition: greater activity area implies more growth inhibition, and thus increased sensitivity to the drug. Gene expression values that positively correlate with activity area thus denote sensitivity to the drug, whereas negatively correlated genes denote resistance. In this manner, we used the diversity across ~500 cell lines-irrespective of their histological origin, copy number variation, or mutation status-to understand how variation in gene expression correlated with variation in drug response. We considered significant correlations among expression levels of each of the six RTKs and six cognate ligands and the four relevant inhibitors (1% FDR, p < 0.0054). Consistent with expectations, we saw that EGFR and IGF1R expression were correlated with sensitivity to the EGFR and IGF-1R inhibitors, respectively. More interestingly, EGFR expression was correlated with resistance to the c-Met and FGFR1 inhibitors; FGFR1 expression was correlated with resistance to the EGFR inhibitor; and MET expression was correlated with resistance to the FGFR1 inhibitor. NTRK2 and PDGFRB expression were not significantly correlated with responses to any of the four inhibitors (Fig. 4-17B). Among the six cognate ligands, BDNF (the ligand for NTRK2) was correlated with resistance to the IGF-1R inhibitor, and IGF1 was correlated with resistance to the EGFR inhibitor. None of the other ligands (EGF, FGF1, HGF, PDGFB) were significantly correlated with responses to any of the four inhibitors (see Fig. S12 in ref. [137]). These correlations were calculated using only cell lines that expressed a given gene with RMA > 5. Using correlations is notable because, rather than just considering a binary relationship between the presence/absence of RTK expression and drug response, they capture the graded relationship between the continuous level of RTK expression and drug response. In other words, it is the quantitative level of gene expression that can be considered relevant, not just the presence or absence of an RTK. To confirm this 151 finding, the same analysis was performed using different RMA thresholds (> 0, 4, 4.5, 5.5, 6) for both receptors and ligands (see Fig. S12 in ref. [137]). All noted receptordrug correlations are robust at low thresholds, whereas only the erlotinib-EGFR and AEW541-IGF1R correlations are maintained for RMA > 5.5 and > 6. Additionally, at low thresholds MET expression is correlated with resistance to c-Met inhibitor, and PDGFRB expression is correlated with resistance to EGFR inhibitor. Regarding ligands, the AEW541-BDNF correlation is maintained at all low thresholds, but because fewer cell lines strongly express the ligands as compared to the receptors, the ligand-drug correlations are less robust to the RMA threshold. Thus, despite the caveats associated with the CCLE, we see several cases of intra-class genes associated with resistance (EGFR with c-Met and FGFR1 inhibitor, FGFR1 with EGFR inhibitor, MET with FGFR1 inhibitor, and BDNF with IGF-1R inhibitor), and only one case of an inter-class resistance mechanism (IGF1 with the EGFR inhibitor) using the RMA > 5 threshold. This supports the notion that coexpression of same-class RTKs may contribute to resistance to RTK-targeted therapies. To strengthen the argument that expression of an individual RTK correlates with drug response, we calculated the Spearman partial correlation coefficients [25] between each RTK gene and each drug, and each ligand gene and each drug, while controlling for the expression of the remaining five RTK genes or five ligand genes (Fig. 4-19). These results indicate that most correlations identified in Fig. 4-17B are maintained for the corresponding partial correlations, providing stronger evidence that the correlation between an RTK and drug response is due to that RTK individually and not because of correlation between RTKs' gene expression values. The significant (5% FDR) receptor partial correlation values are constant for all RMA thresholds up to RMA >4.5, whereas the ligand partial correlation values are only constant for the two lowest RMA thresholds. The receptor partial correlations are consistent with the three RTK network classes, with the exception of PDGFRB correlating with resistance to erlotinib. However, the ligand partial correlations are not as consistent with the RTK network classes, perhaps because multiple ligands can 152 Receptors (p0.0163) EW541OG-1 I RMA IWA4 p.O163) MA P3.5 A0 (p0.016) AMA i :4.5 (p4.0168) AEW54100K-F-1 AEW541 Endam NWF ilE Miss(FFR li MKA . IiMA)1.5 RMA A (p4.0101) 0.2 Sensitive 0.1 0 -0.1 -0.2 Resistant Ligands 060M) OGF-1t6W AEW041 IWA FAIA3.5 §WA0 AW541 EI L E30 <0.1M1 AEU410FI--1 {pWAG13 ~LEL 00F-tS4 AON541 AEM54100F-1 4 I I EmnbGFHELI4 n~6,MA)E "" FI >4.5 (p<00133) AEW54100F-1f MM4100F~-tR4 EmG4 E<0AW43 F#M 4 {p<.0146)' <.AM1) Io02 Sensitive 0.1 0 -0.1 -0.2 Resistant Figure 4-19: The Spearman partial correlation coefficients between each receptor gene and each drug, and each ligand gene and each drug, while controlling for the expression of the remaining five RTK genes or remaining five ligand genes. Each subplot shows a different RMA threshold that was applied to the gene of interest. For each RMA threshold a 5% Benjamini false discovery rate was applied, with p-values noted. 153 activate some receptors and therefore analyzing one ligand is insufficient. 4.3 Discussion In this study, we integrated pathway-level phosphorylation measurements, RNAi perturbations, and computational network inference to quantify signaling network specificity across six RTKs. The shRNA perturbations revealed a core set of Akt, MAPK, and PKC pathways conserved across all six RTKs, which were recapitulated in RTK-specific network inference models, along with additional RTK-specific signaling relationships. Importantly, the six RTKs network models clustered into three classes: EGFR/FGFR1/c-Met, IGF-1R/NTRK2, and PDGFR. Using gene expres- sion data from the CCLE, we showed co-expression of RTK and ligand pairs across many cancer cell lines, along with enrichment for EGFR/FGFR1/MET co-expression in carcinoma, glioma, and malignant melanoma cell types. Using corresponding anticancer drug response data from the CCLE, we showed evidence for intra-class resistance mechanisms prevailing over inter-class mechanisms, whereby expression of EGFR was correlated with resistance to c-Met and FGFR1 inhibition, expression of FGFR1 was correlated with resistance to EGFR inhibition, expression of MET was correlated with resistance to FGFRI inhibition, and expression of BDNF (the ligand for NTRK2) was correlated with resistance to IGF-IR inhibition. The relationships between EGFR and c-Met inhibition, FGFR1 and EGFR inhibition, and MET and FGFR1 inhibition were also maintained for partial correlation calculations. The novel application of systematic RNAi perturbations in concert with pathwaywide signaling measurements, made feasible by the use of lysate microarray technology, enabled the inference of the most comprehensive RTK-specific signaling network models to date. Because network inference is fundamentally a question of quantifying correlation among measured signals, and because the primary driver of variability in the signaling data used here was RNAi perturbation responses, we conclude that RTKs with similar inferred networks have downstream signals that respond similarly to perturbations. Based on this notion, we propose that RTKs within the same net154 work model class are more capable of promoting resistance to therapies targeting RTKs in that class than are RTKs in a different class. There is extensive literature evidence consistent with the notion of intra-class drug resistance mechanisms among the six RTKs studied here. The most comprehensive evidence comes from two recent studies that measured the ability of different growth factors to rescue cells from various RTK inhibitors [143, 142]. Harbinski et al. observed (i) the ability of EGF family ligands and FGF family ligands to rescue c-Met-dependent cell lines from c-Met inhibition; (ii) the ability of EGF family ligands and HGF to rescue FGFR2- and FGFR3-amplified cell lines from FGFR inhibition; and (iii) synergistic growth inhibition with combined FGFR1 and c-Met inhibition both in vitro and in vivo. Wilson et al. observed that (i) FGF2 (FGF-basic) and HGF can each at least partially rescue the four tested EGFR mutant cell lines included in their study from EGFR inhibition by erlotinib; (ii) EGF and NRG1 can each at least partially rescue all three tested MET amplified cell lines from c-Met inhibition by crizotinib, and FGF2 can at least partially rescue two of them; and (iii) EGF and NRG 1 can at least partially rescue three of four tested FGFR amplified cell lines from FGFR inhibition, and HGF can rescue one of the four cell lines. Strikingly, PDGF-AB ligand never rescued any cell line and IGF-1 only partially rescued three of the 41 tested cancer cell lines from any drug. Although our study used PDGF-BB and FGF1 (FGF-acidic) ligands, these results nonetheless indicate that the rescue potential of growth factors mimic the RTK classes extracted from our network models: EGF family, FGF family, and HGF ligands have generally similar rescue potential across cell types, whereas IGF-i and PDGF family ligands have sparse to non-existent rescue potential. Beyond these ligand rescue experiments, additional evidence exists that is consistent with the network model classes identified here. The link between EGFR and c-Met is well established: c-Met can compensate following anti-EGFR therapy (e.g., [139]), and, conversely, EGFR can compensate following anti-c-Met therapy [140, 1661. Additional evidence linking FGFRI and EGFR signaling also exists. Combining EGFR and FGFR family kinase inhibitors has been shown to exhibit additive 155 [167] or synergistic [168] growth inhibition, and combining dominant-negative forms of both EGFR and FGFR1 resulted in synergistic increases in cell death [168]. Additionally, FGFR1/FGF2 autocrine signaling has been observed in non-small cell lung cancer (NSCLC) cell lines that do not respond to gefitinib [169], and the induction of FGFR2 and FGFR3 expression has been observed in response to gefitinib in gefitinibsensitive NSCLC and head and neck squamous cell carcinoma cell lines [170]. There has also been evidence for resistance to EGFR inhibitors by the de-repression of IGF-1R signaling [171]. Later work by the same group, however, showed that IGF-1R compensates poorly for EGFR loss because IGF-1R only strongly maintains activation of the Akt pathway, whereas c-Met activates both the Akt and MAPK pathways [139]. Consistent with these reports, our results show that IGF-1R exhibits the lowest median Mek, ERK, and p90RSK phosphorylation among the six RTKs, whereas c-Met exhibits similar levels to EGFR (Fig. 4-10C). These observations are consistent with the weak rescue potential of IGF-1 noted above. All non-EGFR cell lines exhibit comparable p-Akt and p-GSK3 signals that are actually higher than the EGFR cell line, suggesting that activation of Akt is not a distinguishing feature among these six RTKs, at least in the context of saturating doses of growth factor in our isogenic system. There is less clear evidence for therapies against IGF-1R, NTRK2, and PDGFR3, in part because there are fewer studies of targeted therapies against these RTKs. EGFR has been cited as a reason for primary but not acquired resistance to antiIGF-1R therapy [172], whereas others suggest the insulin receptor is the primary driver of resistance to anti-IGF-1R therapy [173]. Resistance to anti-PDGFRO therapy (in the form of imatinib, which targets Abl, c-Kit, and PDGFRoj/#3) seems to involve mutations of the targeted proteins and amplification of Src family kinases, rather than compensation by other RTKs [174]. To our knowledge, no studies have addressed resistance to any anti-NTRK2 therapies. In addition to RTK network phenocopying, there may be other mechanisms driving co-activation of particular RTKs, especially chromosomal structure processes [175]. Notably, EGFR and MET are both present on chromosome 7; and all genes on 156 chromosome 7 are significantly amplified in sets of glioma (FDR < 10-10) and lung (FDR < 0.25) tumors [176]. Further, MET is located at a fragile site on chromosome 7 that makes it prone to amplification [177]. Literature evidence that other RTKs are present at fragile sites was not found. Thus, amplification of MET may be an especially prevalent mechanism for resistance to EGFR inhibitors because not only does c-Met phenocopy the EGFR/FGFR1 network, it is also prone to co-amplification with EGFR. Although same-class RTKs have similar network models and are capable of rescuing cells from and causing resistance to inhibition of other RTKs in that class, these same-class RTKs are not fully redundant. Simply inspecting the RTK-specific shRNA effects shows that, although there are similarities in shRNA effects across same-class RTKs, these effects are not identical (Figs. 4-3, 4-4, and 4-5). Further, it may be that if we observed and/or perturbed different or additional signaling nodes beyond those studied here, we may see these same-class RTKs' network models diverge from one another. That multiple RTKs exist, and that they are co-expressed in cancer cells, suggests that these RTKs are not fully redundant. Thus, while the network models we developed here are sufficient to classify ligand rescue and drug resistance patterns, the similarity of same-class RTKs is nevertheless a relative concept. The evidence for the importance of RTK network phenocopying in drug resistance is strong, but the exact mechanism enabling this behavior is unclear. Given that the six cell lines differ predominantly by a single variable-the identity of the expressed RTK-we speculate that it may be receptor-intrinsic biophysical properties causing the RTKs to group into the three identified network classes. To this end, we compared the sequences of the six RTKs' cytoplasmic and kinase domains. We also compared previously published data about kinase inhibitors' binding affinities for these receptors [165], hypothesizing that using small molecule binding profiles as a proxy for kinase substrate specificity may provide an explanation for the observed RTK classes. However,. none of these three properties, when clustered, produced clusters identical to the network models (Fig. 4-20). The inhibitor profile clusters did match the kinase domain clusters, suggesting that the kinase domain sequence 157 Cytoplasmic domain sequence Kinase domain sequence ....... Kinase inhibitor binding profiles .. .. ---- - - - -- - - - 00 o Eigenvalue 1 Figure 4-20: Clustering RTK biophysical properties does not reveal RTK network model clusters. The cytoplasmic sequences and kinase domain sequences were each clustered across the six RTKs, but did not reveal the same clusters as the RTK network models. Data concerning the affinity of each RTK for numerous small molecule kinase inhibitors was also clustered, but again did not match the network model clusters. Dotted ellipses and marker shapes represent k-means clustering assignments. The six marker colors represent the six RTKs by the same color scheme as used in the main text, as initially introduced in Fig. 4-1: cyan, EGFR; purple, FGFR1; yellow, IGF-1R; red, c-Met; green, NTRK2; blue, PDGFRO. largely explains the RTKs' differential sensitivities to kinase inhibitors. Some previous work has explored the notion that receptor recruitment interactions define specificity in receptor-activated signaling. Using chimeric EGF and insulin receptors, early work showed that RTK cytoplasmic domains encode kinase specificity, mitogenic and transforming potential, and receptor routing [178]. Others have shown in yeast that kinase domains encode limited intrinsic discriminatory specificity, and that the functional identity of a kinase is instead largely determined by its recruitment interactions [179]. These observations are consistent with our results showing that RTK-proximal edges in the network models tend to be RTK-specific, whereas downstream edges tend to be conserved across all RTKs. Thus, although we are not certain how these three RTK clusters emerge, it is unlikely to be driven purely by their kinase specificity, and instead is likely to emerge from specificity in receptor-proximal protein recruitment. In conclusion, the RTK signaling classes identified in this study are consistent with clinically observed mechanisms of resistance to targeted therapies in cancer. The limited efficacy of single-agent RTK-directed therapies may therefore be due in part to the pre-existing co-expression of same-class RTKs across a diverse spectrum 158 of tumor types. In this scenario, these tumors are primed to compensate for the loss of RTK function following therapy. We submit that classifying RTKs by their inferred networks and then therapeutically targeting same-class receptors, either in combination or sequentially, may provide clinical benefit by delaying or preventing the onset of resistance. 4.4 4.4.1 Materials and methods Cell culture Isogenic HEK-293 cells expressing EGFR, FGFR1, IGF-1R, c-Met, NTRK2 or PDGFR3 were described previously [146]. All cell lines were cultured in Dulbeccos Modification of Eagles Medium (DMEM; Mediatech, Manassas, VA) supplemented with 10% fetal bovine serum (FBS; Hy Clone, Logan, UT), 2 mM glutamine (Mediatech), 100 I.U./mL penicillin and 100 pg/mL streptomycin (Mediatech). Additionally, cell culture media contained 150 pg/mL Hygromycin B (Invitrogen, Carlsbad, CA) to maintain stable integration of RTK expression cassettes. Lentiviral shRNA expression vectors were produced using a three-plasmid system as described previously [147, 180]. Briefly, HEK293T cells were co-transfected with plasmid pLKO. 1 containing the shRNA expression cassette of interest, as well as packaging plasmids pCMV-dR8.91 (containing HIV gag, pol and rev genes) and pMD2.G (coding for VSV-G envelope protein). Medium was replaced after 24 hours, and viral supernatants were harvested 48 and 72 hours post-transfection. Viral stocks were centrifuged and decanted to remove cellular debris, and stored in aliquots at 80 C. Relative virus titers were determined by transducing A549 lung carcinoma cells at low multiplicity of infection, selecting for viral integrants with puromycin (Invitrogen) and measuring relative cell densities by Resazurin viability assay. All viral stocks were then diluted to match the lowest-titer individual virus. Viral pools were generated by mixing equal volumes of the titer-normalized component viruses. The total viral titer of each pool thus matched the titer of each component virus. A complete list of all 159 76 individual shRNA constructs used in this study is given in Table Si (in ref. [137]), and a list of all 12 shRNA pools used in this study is given in Table S2 (in ref. [137]). For gene knockdown experiments, RTK-expressing HEK293 cells were first plated onto D-lysine coated 96-well plates (BD Biosciences, Franklin Lake, NJ) at a density of 20,000 cells/cm 2 . After 24 hours, medium was replaced with medium containing lentiviral particles and 10 pg/mL polybrene (Sigma-Aldrich, Saint Louis, MO), and plates were centrifuged at 1,178 g for 30 minutes at 37'C for enhanced infection efficiency. For single and pooled shRNAs targeting signaling proteins (test shRNAs), cells were infected in biological quadruplicates per cell line and time point. Cells were also treated in parallel with non-targeting shRNA vectors (control shRNAs) shGFP49 (8 replicates), shGFP477 (8 replicates) and pLKO.lempty (4 replicates). Mock-infected cells (not treated with virus) were included as an additional control for infection efficiency (12 replicates). 24 hours post-infection, medium was replaced with medium containing 1.5 pg/mL of puromycin (Invitrogen) to select for virally infected cells. We observed complete cell death of mock-infected cells within 48 hours, while no sign of cell death was evident for any virally infected cells. Ninety-six hours post-infection, at which time cells were 70-80% confluent, cells were washed once with phosphate-buffered saline (PBS) and incubated in serum-free medium for an additional 24 hours. To initiate RTK signaling, cells were then stimulated with the cognate ligands of each RTK: EGF (EGFR), FGF1/FGF-acidic (FGFR1), IGF1 (IGF-1R), HGF (c-Met), BDNF (NTRK2) and PDGF-BB (PDGFRO) (all Peprotech, Rocky Hill, NJ). After 1, 2, 4, 8, 16, 32, 64, 96, 128 or 256 minutes cells were washed with ice-cold PBS and lysed in 2% SDS buffer as described previously [181, 18]. Lysates of cells not treated with RTK ligands served as the 0 minutes time point. Cell lysates were cleared by filtration through 0.2-pum filter plates (Pall Corporation, East Hills, NY) and stored at 80 C until microarraying. 4.4.2 Microarray fabrication Custom lysate microarrays were printed by Aushon Biosystems (Billerica, MA) on 11.5 cm x 7.5 cm single-pad nitrocellulose-coated glass slides. Slides were custom160 manufactured by Grace Bio-Labs (Bend, OR) and were generously provided as a gift. Lysates were arrayed at a spot-to-spot spacing of 400 pum using 8 depositions with solid 110 pm pins, which resulted in an average feature diameter of 180 Pm when visualizing spot protein content. Each lysate in our experiment, including lysates of cells treated with control shRNAs and lysates of mock-infected cells, was initially spotted once on each microarray slide. A small number of microarray source plates were then re-printed onto the same slides in cases where spots were missed due to instrument errors, as assessed visually under a microscope. Each microarray ultimately contained a total of 26,496 microarray features, 25,344 of which represented biologically unique lysates. Following microarray printing, slides were stored dry, in the dark, and at room temperature until further processing. 4.4.3 Microarray probing To remove the buffer and detergent contained in each microarray spot, slides were washed three times for 5 min each with 1X PBS/0.1% Tween-20 (PBST), incubated in Tris/HCl (pH 9) for 72 h with daily replenishment, washed again with PBST, and centrifuged dry. Slides were then blocked with 5% BSA/PBST for 1 h at 4 0 C. Microarrays were incubated in a pool of 1:1,000 anti-o-actin antibody (Sigma-Aldrich) and 1:1,000 phosphospecific antibody (Table S3) in 5% BSA/PBST at 4'C for 24 h. Following washing, slides were incubated in a pool of 1:1,000 680 nm-dye-labeled antirabbit and 1:1,000 800 nm-dye-labeled anti-mouse antibodies (LI-COR, Lincoln, NE) in 5% BSA/PBST for 24 h at 4 C. Slides were washed again three time for 5 min each with 1X PBS/0.1% Tween-20 (PBST), and centrifuged dry. Microarrays were scanned in the 680 nm and 800 nm channels using the OdysseyTM imaging system (LI-COR) at 21 pim resolution. 4.4.4 Extraction of microarray data Slides were visually inspected and initial feature finding and spot centering were performed using the ArrayPro TM software package (MediaCybernetics, Bethesda, 161 MD). Spots with morphological defects, notably spots of non-circular shape, spots affected by lint or scratches, and spots overlapping with neighboring spots, were manually flagged and excluded from our data set. We then used custom-built code for MATLAB@ 7.4 (The Mathworks, Natick, MA) to refine the positioning of the circular areas over which the ArrayProTM software would integrate the microarray spots to derive signal intensities. Signal intensities from both target proteins and -actin were then integrated accordingly, and target protein signals were normalized to their respective O-actin signal intensities to account for any differences in lysate concentration or spotting. Normalized signal was used in all subsequent data analysis steps. 4.4.5 Data pre-processing To remove data outliers that were not detected by visual inspection of the microarrays, a smoothing window approach was applied. For a time point ti within a time course from a particular phosphosite, RTK cell line, and shRNA condition, the data from the three time points ti_ 1 , ti, and ti+ 1 across all biological replicates were grouped together in a vector x. An upper bound was defined as Q 3 (x) lower bound was defined as + 1.5 x IQR(x) and a Q1(x) - 1.5 x IQR(x), where IQR(x) is the interquartile range of x, and Q1 (x) and Q 3 (x) are the first and third quartile of x, respectively. Any data replicates at time ti that were above the upper bound or below the lower bound were flagged. This procedure was applied to time points sequentially, starting with the first time point in each time series. When applied to the first time point in each time series, only the first and second time points were used. When applied to the last time point in each time series, only the penultimate and last time points were used. This time window approach allowed us to take advantage of the temporal dependence of the data, as phosphorylation levels at adjacent time points were expected to have approximately similar values. Data for a given time point could only be flagged by smoothing if there were at least three replicate data points initially present in the vector x. In total, less than 2.1% (11,644/564,960) of all collected data points were flagged 162 either because of poor spot morphology or using the smoothing window approach. After flagging outliers, the flagged data point(s) at t, (for a given RTK, phosphosite, and shRNA condition) were replaced with the mean value of the remaining data replicates at time ti. Each test shRNA had four biological replicates associated with each RTK, phosphosite and time point. Because a small number of microarray source plates were printed more than once onto each slide, additional technical replicates were available in some instances. In addition, several control shRNAs had 8 or 12 biological replicates associated with each RTK, phosphosite and time point. In these cases, every fourth replicate was averaged together to condense the replicates into only four replicates per shRNA, RTK, phosphosite and time point. For example, if there were 12 replicate data points, they would be condensed into four data points based on the following scheme: (1, 5, 9), (2, 6, 10), (3, 7, 11), and (4, 8, 12). This condensing step was done after any individual replicates were replaced in the flagging step. The processed replicate data for all RTKs, phosphosites, time points and shRNA conditions are available in Table S4. 4.4.6 Quantifying the consistency of biological replicates and shRNA pairs To quantify the consistency across biological replicate measurements for each phosphosite in each RTK cell line, the coefficient of variation (c.v.) across the four biological replicates at each time point (across 11 time points) in each shRNA time course (across 91 shRNA conditions) was calculated, producing 91 x 11 = 1, 001 c.v. values. For each of the six RTKs and 22 phosphosites, the median of those 1,001 values is shown in Fig. 4-2B. To quantify the consistency across pairs of shRNAs directed at the same gene, for each pair of shRNAs targeting one of 38 unique genes, the median signal values (calculated across the four biological replicates) were compared across all phosphosites and all time points. Thus, for each shRNA pair, the Pearson correlation coefficient 163 between two vectors, each containing 22 phosphosites x 11 time points = 242 median data values, was calculated. These correlation values across the six RTKs and 38 unique genes are shown in Fig. 4-2C. 4.4.7 Quantifying shRNA effects To quantify shRNA-induced effects on measured signals, area under the curve (AUC) values were compared between time courses of test shRNAs (shRNAs targeting signaling proteins) and control shRNAs (pLKO.lempty, shGFP477 and shGFP49). We first assembled four time series vectors for each phosphosite, RTK cell line and shRNA by randomly assigning each of the four replicate measurements at each time point into one of the four time series vectors. We then calculated the four AUC values associated with each of the time series by the trapezoid method (using the trapz function in MATLAB R2009a), accounting for the non-uniform intervals between time points in the time series. Thus, each replicate time series was represented by a single AUC value. For each test shRNA, we then compared its four AUC values to the four AUC values of each of the three control shRNAs in turn. Using a two-tailed, two-sample ttest assuming equal sample variances, this yields p-values PpLKO, PGFP477, and PGFP49- Performing this procedure on all 88 test shRNAs, 22 phosphosites, and 6 RTKs generated three lists of 11,616 p-values. Using each list of p-values separately, we used the Storey method [136] to determine significance levels for each of the three control shRNAs. At a 1% false discovery rate (FDR), the significance levels were calculated to be %LKO -0.02871, OGFP477 =0.01625, and aGFP49= 0.02515. shRNA-induced effects on measured phosphosites were considered significant only if all three p-values were below the FDR-corrected levels of significance (i.e., apLKO, PGFP477 < aGFP477, and PGFP49 < aGFP49), PpLKO < and if the shRNA-induced change in AUC value was either an increase over all three control shRNAs or a decrease over all three control shRNAs. To impose additional stringency, only instances where a measured signal was significantly affected (as defined above) by both shRNAs targeting each gene are shown in Fig. 4-2D. Using the alternative Benjamini method [117] 164 to calculate levels of significance, we obtained apLKO = 0.00399, aGFP477= 0.00315, and aGFP49= 0.00377. shRNA-induced effects that pass the significance level using this alternative method are shown in Fig. 4-4. 4.4.8 shRNA effects simulations We performed simulations to determine whether the observed pattern of shRNAinduced effects was consistent with a model where shRNA effects are randomly distributed across RTKs or across phosphosites. First, the total number of significantly decreased and increased signal effects across the 6 RTKs, 22 phosphosites, and 88 test shRNAs (11,616 cases in total) were tallied as 1,346 (11.6%) and 1,232 (10.6%), respectively. This was computed using the 1% Storey false discovery rate method as described above, with the exception that we did not require that the same significant effect be observed for both test shRNAs targeting each gene of interest, as was conservatively required for Fig. 4-2D and Fig. 4-2E. Next, the same number of increases and decreases in signal were randomly distributed in silico among the 11,616 total RTK-phosphosite-shRNA combinations. We then tallied the total number of RTKs (0 to 6) exhibiting an increased or decreased signal effect for each phosphosite-shRNA pair. This analysis considered how the shRNA effects were distributed across the number of RTK cell lines. Additionally, we tallied the total number of phosphosites (0 to 22) exhibiting significantly increased or decreased signal for each RTK-shRNA pairs. This analysis considered how the shRNA effects were distributed across the number of measured phosphosites. This simulation was repeated for 2,500 different random assignments of the decreased and increased signal effects. To corroborate the results of our simulation, we also derived analytical estimates of the expected distribution of shRNA-induced effects across RTKs and across phosphosites when assuming a random hypergeometric distribution. For the distribution of effects across RTKs we assumed drawings of six samples out of 11,616 at a time, while for the distribution of effects across phosphosites we assumed drawings of 22 samples out of 11,616 at a time. Both our simulations and analytical results showed that the observed distributions of shRNA effects across either RTKs or phosphosites 165 are not consistent with this model that assumes randomly distributed hairpin effects. The significance of this comparison was measured using a chi-squared goodnessof-fit test. In the tests, we compared the number of increases or decreases in signal observed across zero to six RTKs with that expected by chance. In both cases, the distribution of observed effects was significantly different than the distribution of random effects (p = 0, using 3 degrees of freedom given 7 bins and 3 parameters in the hypergeometric distribution). Similarly, we compared the number of observed increases or decreases in signal observed across zero to 22 phosphosites with that expected by chance. In both cases, the distribution of observed effects was significantly different than the distribution of random effects (p = 0, using 19 degrees of freedom given 23 bins and 3 parameters in the hypergeometric distribution). 4.4.9 Identifying signaling time scales To facilitate analysis of dynamic changes in signaling network structure, we wished to aggregate the 11 time points in our data set into broader time scales representing basal, early, intermediate, and late signaling events. First, the time zero data were taken to represent the basal network state. To determine which of the remaining 10 time points in our data set correspond to early, intermediate, and late time scales, we subjected our data to k-means clustering (k = 3) using the squared Euclidean distance metric and 200 replicates of each cluster assignment (using the kmeans function in MATLAB R2009a). For each time point, data were first compiled across all 6 RTKs, 22 phosphosites, 91 shRNAs (88 test shRNAs + 3 control shRNAs) and 4 biological replicates into a vector of 6 x 22 x 91 x 4 = 48, 048 data points. The input for the clustering algorithm then consisted of a matrix of 10 time points x 48,048 data points. This pan-RTK approach identified time scales that were indicative of signaling dynamics across all RTKs. 166 4.4.10 Data discretization The Bayesian network, mutual information, and CLR algorithms we employed in our study require discrete data as their input. Because our experimental phosphorylation data were continuous in nature, we discretized all time course data into four levels, with 1 indicating the lowest phosphorylation values and 4 indicating the highest phosphorylation values. To further increase data robustness, the median data value was calculated across the biological replicates at each time point (for each RTK, phosphosite, and shRNA condition), following the previously described data pre-processing step. The median data were subsequently discretized. For each phosphosite, data were discretized separately for each RTK and time scale. Within each data subset (for a particular phosphosite, RTK, and time scale), the Z scores of the raw data were calculated. Those data points with Z > 4 were set to discrete value 4. Those data points with Z < 4 were set to discrete value 1. The remaining data points were discretized according to 4-level k-means clustering with the squared Euclidean distance metric and 100 replicates of each cluster assignment (using the kmeans function in MATLAB R2009a). The ordinality of the discrete data was always maintained, such that 1 and 4 consistently represented the low and high raw signal values, respectively. The discrete data for all RTKs, phosphosites, and shRNA conditions are available in Table S5. 4.4.11 Network inference algorithms The core Bayesian network inference algorithm was implemented as previously described [37], using a modified version of the Bayesian Network Structure Learning toolbox in MATLAB R2009a [182] based on the algorithm of Koivisto and Sood [50]. Here the equivalent sample size (ESS) in the Dirichlet parameter prior was varied for each time scale to help normalize for varying sample size across time scales. ESS values of 20, 1, 1, and 3.4 x 104 were used for the basal, early, intermediate, and late time scales, respectively. shRNA perturbations were modeled as perfect interventions. That is, when a measured phosphosite (e.g., c-Raf Ser289, Ser296, Ser301) was 167 present on the protein product of a transcript targeted by an individual shRNA (e.g., CRAF) or shRNA pool (e.g., RAF pool), then these phosphosite data were considered to be under the influence of that shRNA intervention. In such cases the discrete data were not modified from their previously determined values, but the network scoring function was modified to take the intervention into account. Prior knowledge was used to restrict viable Bayesian network structures. The RTK phosphosite was not allowed to have any parent nodes (i.e., no incoming edges), and the transcription factor sites (c-Jun, NF-rKB, STATI, STAT3) were not allowed to have any child nodes (i.e., no outgoing edges), except if those child nodes were other transcription factor sites. Nodes were restricted to a maximum of three parents. That is, when computing the posterior edge probabilities, consensus networks containing all possible one-, two-, and three-parent node-node interactions were considered. Higher-order parent-child relationships, beyond three-parent interactions, were not considered. The directionality of the edges shown in Fig. 4-10 was based on the consensus directionality observed in the 24 Bayesian networks inferred across the six RTKs and four time scales, along with the prior knowledge assumptions. Because the RTK phosphosite was assumed in the prior knowledge to be a root node, all nodes connected to it were required to be child nodes. Similarly, because the phosphosites on transcription factors were required to have no children nodes, except for other transcription factor phosphosites, all nodes connected to transcription factor sites were required to be parents of the transcription factor nodes. Edges inferred between transcription factor phosphosites were left undirected, under the assumption that an edge between two transcription factors likely represented mutual coordination by an unmeasured node(s), rather than the action of one phosphosite on another. The edges from the calmodulin phosphosite were the most uncertain in the consensus directionality analysis. As such, the directions of these three edges (to PKC6, paxillin, and RSK3) are least confident. CLR was implemented in MATLAB R2009a using code provided by Faith et al. [27], with Z scores (edge weights) calculated using the plos method. The mutual 168 information matrix was calculated using a simple histogram method within the CLR code. Spearman and Pearson correlation networks were calculated using the median data (median across the biological replicates) and the corr function in MATLAB R2009a. For the CLR, mutual information, Spearman, and Pearson networks, all 22 measured phosphosites were used as input for the algorithms. Due to algorithmic memory constraints, we were able to use only 20 out of the 22 measured phosphosites (p-S6 (Ser240, Ser244) and p-CREB (Ser133) were left out) for Bayesian network inference. However, the same discretized data for the 20 nodes in the Bayesian network inference were used for those 20 of 22 nodes present in the other algorithms. It should be emphasized that the network inference results are consensus models across all shRNA perturbations in a given time scale. Thus these networks provide a representation of the dominant signal-signal relationships across all 91 shRNA conditions. As a result, perturbation effects observed in only one or two shRNA conditions are likely washed out and do not appear in the consensus networks. This is likely one explanation for why certain shRNA effects (such as GSK3 shRNA pool effects on MEK and ERK signals) are not seen in the network models. Because there are many signaling nodes that were not measured in our data set, it is possible that two sites connected by an edge in the network inference models are actually under the mutual regulation of one or more unmeasured signaling nodes. To the extent that such hidden variables exert influence on our measured phosphoproteins, our inferred networks represent coarse approximations of the actual network topologies. 4.4.12 Comparison of RTKs by inferred network structures through dimensionality reduction To compare inferred network structures across RTKs, multidimensional scaling (MDS) was used as dimensionality reduction technique. To enable comparisons across different network inference methods, first each network structures adjacency matrix was converted into a binary vector describing the presence or absence of each edge. For 169 each network inference method, pairwise distances between all 24 networks binary vectors were calculated using the pdist function in MATLAB R2009a with the Jaccard distance metric. We focused on the Jaccard distance as our metric for network structure comparison because it considers binary features, and it does not consider cases where two observations (networks) both have a value of zero (are both missing a particular edge). The Jaccard distance matrices were then used as input for classical MDS using the cmdscale function in MATLAB R2009a. All of the resultant MDS eigenvalue features were then clustered using k-means clustering (k = 3) to identify groups of similar network structures. For all four network inference methods, clustering was performed using the squared Euclidean distance metric and 200 replicates of each cluster assignment (using the kmeans function in MATLAB R2009a). Notably, if the Euclidean distance metric were used instead of the Jaccard distance metric, the multidimensional scaling procedure would be identical to principal component analysis. For the Bayesian networks, the MDS input networks had an edge weight threshold of > 0.1 applied. For the CLR and mutual information networks, the MDS input networks had an edge weight threshold of Z > 1 and MI > 0.3 applied, respectively. For the Spearman and Pearson correlation networks, the 6 0 th percentile of the absolute value of the correlation coefficients were calculated across the 24 correlation networks, corresponding to |correlation coefficient| > 0.35 and |correlation coefficientl > 0.30, respectively. For the Bayesian networks, the adjacency matrix vectors contained 400 directed edge features (self-edges were excluded). For the four undirected network inference methods, the adjacency matrix vectors contained 231 undirected edge features (again excluding self-edges). It should be noted that because the Bayesian networks are directed network structures, while the other four methods are undirected network structures, this provides more edge features to capture in the Bayesian networks' MDS analysis. Additionally, the Bayesian networks only contain 20 of the full 22 phosphosite nodes, while all 22 are included in the other three inference methods. These two aspects may explain why two of the 24 networks in the Bayesian network MDS cluster analysis are not 170 assigned to the same clusters as the other four methods MDS clustering results. 4.4.13 Network model edge weight threshold robustness To determine the robustness of the network model clusters (EGFR/FGFR1/c-Met, IGF-1R/NTRK2, and PDGFR3) to the edge weight threshold applied to each network inference methods result, the edge weight threshold was varied over a range of values and then clustering was repeated at each value. The range was based on the 9 0 th 1 0 th to percentile of the edge weight values, at 10-percentile increments. For the case of Spearman and Pearson correlation, the percentile was calculated using the absolute values of the correlation coefficients. The other three inference methods (mutual information, CLR, Bayesian) have strictly nonnegative edge weights, so no absolute value was needed. For the Bayesian network edge weights, to increase the dynamic range of the sensitivity analysis, edge weights < 0.02 and > 0.98 were removed before calculating the 10-percentile increments. This is because, by the algorithms design, most of the resultant edge weights are near zero and several are unity. 4.4.14 Generating receptor class-specific consensus networks across inference methods The frequency of each edge in the five inference methods and four time scales was calculated for each RTK. The same edge thresholds used for the dimensionality reduction were applied. To directly compare the five inference methods, the Bayesian networks were converted to an undirected form. Further, because the Bayesian networks included only 20 of 22 measured nodes, while the four other inference methods contained all 22 nodes, edges were normalized to the total number of instances they were considered across the five inference methods (i.e., 4 time scales x 5 inference methods = 20, versus 4 timescales x 4 inference methods = 16 for the edges connecting nodes excluded from the Bayesian networks). This provided a scale between 0 and 1 for each edge, representing that edges frequency within a particular RTK across four or five inference methods and four time scales (shown in Fig. 4-9). 171 To generate class-specific networks, it was required that an edge appeared with a frequency > 0.5 for each RTK within an RTK set and < 0.25 for each RTK outside the RTK set. RTK sets included (1) each individual RTK class, (2) two of the three RTK classes, and (3) all three RTK classes. For example, the c-Cbl-CaM edge appeared with a frequency of 0.85, 0.85, 0.8, 0, 0, and 0.1 for the EGFR, FGFR1, c-Met, IGF-1R, NTRK2, and PDGFR3 receptors, respectively. As such, the c-Cbl-CaM edge was considered to be specific to the EGFR/FGFR1/c-Met class, but absent from the IGF-1R/NTRK2 and PDGFR3 classes. Pan-RTK backbone edges were required to have a frequency > 0.5 across all six RTKs. 4.4.15 Clustering the raw data When clustering the median signal values, first each signals median value was calculated across all biological replicates, shRNA perturbations, and time points. This gave an indication of the typical level of phosphorylation for each phosphosite in each RTK cell line. The resulting matrix of 22 phosphosites by 6 RTKs was then mean-centered and unit variance scaled across each phosphosite. This matrix was then clustered using the kmeans function in MATLAB R2009a with k = 3 and 100 replicates with random initial centroid assignments. When clustering signals from all time points together, data matrices representing the data for each RTK and all 22 phosphosites were first constructed (representing a matrix with 22 rows and 11 x 91 x 4 = 4,004 columns). The data were then meancentered and unit variance scaled for each signal separately (i.e, across rows). This process was repeated for all six RTKs. These normalized matrices were then converted into vectors to form a new matrix of 6 rows (one per RTK) and 22 x 4004 = 88, 088 columns. This matrix was then clustered using the kmeans function in MATLAB R2009a with k = 3 and 100 replicates with random initial centroid assignments. A similar approach was taken to cluster signals from each time scale, and signals from each time point. In each case, signals from the relevant time point(s) were first mean-centered and unit variance scaled for each RTK separately, and then the resultant matrices were converted to vectors and compiled into a multi-RTK matrix. 172 This matrix was then clustered using the kmeans function in MATLAB R2009a with k = 3 and 100 replicates with random initial centroid assignments. The plots shown in Fig. 4-11 represent the first two components of the PCA loadings from each clustered data subset. These plots are simply for visualization purposes to show approximate relative similarity between the RTKs' data. The actual clustering was done using the full data matrix, not just the first two components. The marker shapes in Fig. 4-11 indicate which cluster each data point belongs to. 4.4.16 Generating synthetic data for network inference Directed acyclic networks were randomly generated allowing only one parent node per child node and containing only one root node (i.e., source signal). The signal levels for the root node were 200 points randomly sampled from a uniform distribution between values 1 and 6. The signal levels for all downstream nodes were specified based on the signal level of its input parent node, namely youtput = yinput. Data were simulated in a step-wise fashion, such that the only input to the simulation process was the signal levels of the root node. Then, at each step in the simulation from parent to child node, heritable variation was added to each nodes data. This variation was drawn from a random normal distribution with mean zero and a 10% coefficient of variation. This variation-added signal was then used as input for the nodes child node in the network. Heritable variation was also added to the terminal nodes in the network, even though they have no child nodes. Once all nodes were simulated, then non-heritable variation was added to the simulated data. This variation was drawn from a uniform distribution over the range t1, and this variation was added independently for each node. As an example, in the simple case of a two-node network A -> B, the 200 signal values for A are drawn from a uniform distribution, and then those values have random normally distributed heritable variation added to them. The subsequent values, A', are then used as input for node B. The values of node B are then based directly on its input node, so the values for B are equal to A'. Then random normally distributed heritable variation is added, creating B'. After the simulation, random uniformly 173 distributed noise is added independently to both A' and B', creating A" and B", which are the final output from the simulation. To generate the results in Fig. 4-12, four synthetic networks were generated, each containing 22 nodes. This is the same number of nodes measured in our experimental signaling data. For each network, five independent data sets were simulated. Because the input levels and heritable and non-hertiable variation are all stochastic, this generated five different data sets per network. To analyze the data by PCA, for each of the twenty data sets the matrix of 22 nodes x 200 conditions was converted into a vector, providing a final input matrix for PCA of 20 data sets x 4,400 data points. In the case of the normalized raw data, data for each node was first mean-centered and unit variance scaled before putting the data set into vector format. Spearman correlation was used to represent inferred network topologies. For the binary case, a threshold of the 6 0 th percentile correlation value (> 0.8198) was used (the same correlation percentile used in Fig. 4-7). The percentile was calculated based on the correlation values across all 20 data sets. PCA was used for the raw data, normalized raw data, and continuous correlation values, while multidimensional scaling was used for the binary correlation values. 4.4.17 Cancer Cell Line Encyclopedia mRNA expression principal component analysis CCLE mRNA data were downloaded at the CCLE web site (http: //www. broadinstitute. org/ccle) from the file CCLEExpressionEntrez_2012-04-06.gct. The data were analyzed using the principal components analysis function princomp in MATLAB R2009a. The input matrix for this function was 18,926 genes' robust multi-chip average (RMA) gene expression values in 967 cell lines. The matrix was entered such that the genes were considered 'observations' and the cell lines considered 'variables'. Before PCA was applied, the gene expression values were mean-centered and unit variance scaled for each gene across all cell lines. The PCA results plotted in Fig. 414 represent the first two components of the resultant PCA coefficients, or loadings. 174 4.4.18 Tumor histology enrichment/depletion Enrichment and depletion of EGFR, FGFR1, and MET mRNA co-expression in particular tumor histologies was assessed using the cell line information provided in Barretina et al. [158]. First, it was determined if each cell line did or did not co-express EGFR, FGFR1, and MET given a particular RMA threshold defining expressed genes (e.g., RMA > 5). Next, it was determined if cell lines originally derived from tumors of particular histologies exhibited EGFR, FGFR1, and MET co-expression either more or less often than expected by chance, given the total number of cell lines coexpressing these genes, the total number of cell lines of each histology, and the overlap of the two sets. This was quantified using the hypergeometric test as implemented by the hygepdf function in MATLAB R2009a. The probability of observing as many or more cases of overlap (N) between EGFR/FGFR1/MET co-expressing cells and cells of a given histology was obtained by summing the probability density function from the number of cell lines with overlap N to the total number of cell lines (967). Conversely, the probability of observing as many or fewer cases of overlap N was obtained by summing the probability density function from zero cases to N cases. Cell lines with histology "other" or with no histology information provided were not considered for enrichment. For each RMA threshold and each histology type, the lower p-value between enrichment versus depletion was selected. Given the 20 histologies and 11 tested RMA thresholds, this provided a list of 20 x 11 = 220 p-values. Applying the Benjamini method with a 5% FDR to this list of p-values corresponded to p < 0.0191. 4.4.19 Correlating gene expression and drug activity area Pharmacological profiling data were downloaded at the CCLE web site (http: //www. broadinst itut e . org/ccle) from the file CCLENP24.2009_profiling_2012.02.20.csv. Spearman correlation was calculated between gene expression values and drug activity area. The six RTKs and six cognate ligands used in this study were considered together for multiple hypothesis correction. That is, at each RMA threshold, the 175 Spearman correlation and associated p-values were calculated across the 12 genes (six RTKs and six ligands) and 4 drugs, providing 48 p-values. These p-values were then corrected for a 1% FDR using the Benjamini method. The 1% FDR p-values for the > 0, 4, 4.5, 5, 5.5, and 6 RMA thresholds were p < 5.93 x 10-3, 3.72 x 10-3, 5.93 x 10-3, 5.41 x 10-3, 3.76 x 10-3 , and 2.80 x 10-3, respectively. Cell lines with values of zero for the measured activity area were not included in the correlation calculations, because there was no indication of how insensitive to a drug a cell line with zero activity area may be. 4.4.20 Partial correlation between genes and drug response The Spearman partial correlation values were calculated between each receptor gene and each drug while controlling for the expression of the remaining five receptors, and between each ligand gene and each drug while controlling for the expression of the remaining five ligands, using the partialcorr function in MATLAB 2010b. Only the individual gene that was being considered for the partial correlation calculation had to exceed the applied RMA threshold. For example, when calculating the partial correlation between EGFR and erlotinib and using a threshold of RMA >5, only cell lines with EGFR >5 were considered, but the expression levels of the remaining five RTKs in those cell lines could be below five. The 5% Benjamini false discovery rate was applied for each RMA threshold separately based on the list of 24 p-values (6 genes x 4 drugs), which were calculated using the partialcorr function. 4.4.21 Comparison of RTKs by receptor-intrinsic properties through dimensionality reduction To compare different receptor-specific intrinsic properties, multidimensional scaling and principal components analysis were used as dimensionality reduction techniques. We extracted Kd values describing the affinities between 72 kinase inhibitor drugs and our six RTKs from the data set by Davis et al. [165]. Of the 72 inhibitors, 61 bound to at least one RTK with Kd < 10 pM. The 176 Kd values were converted to 1ogio(Ka) values, and, to ensure that non-measurable docking interactions would not numerically dominate the clustering results, those inhibitor-receptor interactions that were not measured to bind were set to logio(Ka) = 3 (i.e., Kd - 1 mM). This matrix of 6 RTKs x 31 inhibitor compounds was then used as input for principal components analysis. The amino acids comprising the cytoplasmic domains of the six RTKs were accessed from http://www.uniprot.org and defined as follows: EGFR (aa669-1210), FGFRI (aa398-822), IGF-IR (aa960-1367), c-Met (aa956-1390), NTRK2 (aa455-822), and PDGFR3 (aa557-1106). The amino acids comprising the kinase domains of the six RTKs were defined as follows: EGFR (aa712-979), FGFR1 (aa478-767), IGF-IR (aa999-1274), c-Met (aa1078-1345), NTRK2 (aa538-807), and PDGFR3 (aa600-962). For both the kinase and cytoplasmic domains, in each case the domains were aligned across RTKs using the multialign function in MATLAB R2009a with the Gonnet scoring matrix. Pairwise distances between all aligned sequences were then calculated using the seqpdist function in MATLAB R2009a, also using the Gonnet scoring matrix. This distance matrix was then used as input for classical multidimensional scaling using the cmdscale function in MATLAB R2009a. For the kinase inhibitor data, kinase domain sequences, and cytoplasmic domain sequences, all five eigenvalues resulting from MDS were used for subsequent k-means clustering. For all receptor-intrinsic properties, k-means clustering was performed with the city block distance metric and 200 replicates of each cluster assignment. 177 178 Chapter 5 Quality versus quantity: Identifying features of biological data for making better models Note: This work will be submitted for publication. It is based on computational research designed by J.P.W. and D.A.L. and performed by J.P.W. The authors thank David Heckerman (Microsoft Research), William Chen (Harvard University), and Brian Joughin (M.I.T.) for helpful discussions. 5.1 Introduction Within the realm of computational models applied to biological systems, there is a general lack of understanding of what features of biological data make useful models. While there are heuristics, like high signal-to-noise and the collection of multiple biological replicates, that generally guide experimentalists when planning studies and collecting data, these notions have generally not been quantitatively explored in regards to subsequent model accuracy. To begin to address this need, here we generate data from simple synthetic models and derive insights from their analysis. Using a simple two-variable toy model, as well as a more realistic although still simplified multivariate network model, we highlight explicit features of data that improve model 179 accuracy. In the two-variable case, "accuracy" refers to the relative error between the data points produced from a linear model, and the predictions of a linear model inferred from noisy manifestations of the same data. In the multivariate case, "accuracy" refers to the similarity between the synthetic network topology and the inferred Bayesian network topology. Modeling the two-variable system with linear regression and the multivariate system with Bayesian networks, we show that prediction accuracy is a function of data quantity and also features related to data quality. In the linear regression case, increasing the range over which the data are sampled improves accuracy. In the Bayesian network case, increasing the range over which the data are sampled can also improve accuracy, but only if the data are discretized in a manner that corresponds to biologically meaningful variation in the data. Further, the Bayesian network results highlight the necessity of the propagation of variation within the network, here termed heritable variation, for causal inference. An algorithm is developed for the automatic identification of a discretization scheme for each signaling node, using only information about the variation across biological replicates and the technical precision of the measurements, that improves the accuracy of causal Bayesian network inference. While existing literature has discussed nonuniform discretization approaches (e.g., [183, 184]), the results here cast the problem in terms of parameters familiar to experimental biologists. These results, for the first time to our knowledge, provide a simple method for identifying a discretization strategy on a data set-specific basis. 5.2 5.2.1 Results A simple two-variable toy model We began our analysis by exploring the simplest possible model of input-output behavior, a two-variable linear toy model of the form y = mx. In other words, the output, y, is a direct linear function of the input, x. First, x-data are generated by randomly sampling N data points from a uniform distribution over the interval 180 Xmin and Xmax, namely, X = U(Xmin, iman). A uniform distribution was chosen to represent a biological signal distribution that may result from a set of experimental measurements performed across multiple sufficiently biologically different conditions. In contrast, a set of signaling measurements taken from one biological condition may follow a normal distribution. In this manner, a biological condition would represent one instantiation of the biological network state; for example, as measured at a single time point under one growth factor concentration with no external perturbations. Sufficiently biologically different states would result from diverse experimental conditions: stimulating cells with different concentrations of growth factor, stimulating cells with different growth factors, stimulating cells with combinations of growth factors, perturbing the concentrations of proteins using RNAi, small molecule inhibitors, or antibody inhibitors, measuring signals at different time points, or any combination of the above. While we do not prove here that such a uniform distribution could be attainable, it is an assumption we use for our modeling efforts based on the study of experimental data gathered under many of the different conditions just outlined. Given the x-data, the corresponding y-data are set by y = mx. Randomly distributed noise is then added to the x-data and y-data separately, such that xnoisy =.A(x, uz) Ynoisy = A(y, uy) where V(p, a) represents data from a normal distribution with mean y and standard deviation o. Then, a new fitted slope parameter, mnoisy, is inferred from xnoi, Ynoisy and using linear regression. This parameter is then used to predict the expected output given the originally observed x-data, such that Yp,,c - mnnoisyx. The mean absolute percent error (MPE) was then used to quantify the prediction accuracy: N MPE = 100 x N (yi - Ypred,i)/Yi i=1 Thus, as the inferred parameter mooi8 , approaches the original parameter m, the 181 MPE will approach zero. To simplify the model, the following parameters were used: m = 1, min = 1, and or = ay = 1. We then sought to explore the prediction accuracy as a function of N, the number of data points used to train the model, and zmax, the range over which the -data were sampled. The value of N was varied from 10 to 104, and the value of Xmax was varied from 1.1 to 50. Because the parameter fit was dependent on randomly generated input data and noise, for each value of N and Xmax the procedure was repeated 1,000 times. The mean values from these 1,000 simulations are plotted in Fig. 5-1. Note that a relative error measure (MPE) was used instead of an absolute error measure (e.g., mean squared error) because increasing the value of xmax itself increases the absolute error. These results show that the prediction error is a function both of the quantity of data, N, used to train the model, and the range over which the training data were sampled, Xmax/Xmin. As one increases the quantity of data, the error decreases, although only up to about 1,000 data points. And, as one increases the range over which the z-data were sampled, the error also decreases, although only appreciably up to Xmax/xmin ~' 10. Further, these results show that a high Xmax/Xmin ratio can compensate for small quantities of data; and similarly, large quantities of data can compensate for a low Xmax/IXmin 1, 20 data points with Xmax/Xmin ratio. For example, as seen in the inset of Fig. 5~ 12 provide about the same average prediction accuracy (MPE ~ 4%) as 10,000 data points with Xmax/Xmin ~8. Thus, in this simple two-variable toy model, prediction accuracy is a function of both data quantity and Xmax/Xmin 5.2.2 ratio. Analytical estimates for prediction accuracy as a function of data range in the two-variable toy model The two-variable toy model just introduced has a number of parameters involved in the data generation, including m, Xmin, Xmax, Yox,and o. Because of this, the numerical results presented in Fig. 5-1 are dependent on the parameters used, namely m = 1, min = 1, and ax = y = 1. It is possible, however, to derive dimensionless 182 6 L35 5 w 4 25 2 120 1501 -1 2 4 6 a 10 12 14 16 18 201 ~5 2 4 6 8 10 12 14 16 18 20 30 40 50 Range of sampled data, xmaI/Xin Figure 5-1: Prediction accuracy in a two-variable system as a function of data quantity and the range over which the data were sampled, Xmax/Xmin. The mean absolute percent error is plotted across different data set sizes (different color markers) as a function of the range of the input data used to build the linear regression model. The error is quantified between the underlying linear model and the data predicted from a linear model fit to noisy manifestations of the underlying data. These results show that a small data set with wide range can be as predictive as a large data set with narrow range. (inset) Zoomed in plot from the dashed region of the larger plot. 183 y4 Xmax ~ x miii ~~ ax 'Ymax) 4/ max ~y mini (xmin 9Ymin / x Figure 5-2: Schematic for the two-variable toy model outlining the conditions required to accurately infer the slope of a line. Red points indicate the original data, (Xmin, Ymin) and (Xmax, Ymax), whereas black points indicate the positions of those same data while accounting for the characteristic error values, ex and cy. analytical estimates of data quality that are not dependent on individual numerical simulations. If we have two points on a line, (Xmin, Ymin) and (Xmax, Ymax), whereby Xmax > Xmin, Ymax > Ymin, and Xmin > 0, and we have some characteristic error associated with the measurement of both variables, Ex > 0 and ey > 0, then we can ask under which conditions do the two points remain distinguishable given their errors. In other words, Xmax - Ex > Xmin + Ex Ymax Ey - > Ymin + Ey This is motivated by the observation that, if the two points are distinguishable beyond their associated error values, then the slope of the line between those two points can be inferred (Fig. 5-2). An analogous result can be derived for the case where the line has a negative slope (y = -mx), whereby Ymax < Ymin and it is required that Ymax + Ey < ymin - Ey. Using the definition for the slope of the line, m = (Ymax - Ymin)/(Xmax - Xmin), these equations can be rearranged to show that 184 the slope of the line can be well identified if Xmax > 1+ Xmin max o\Xmin Ex, / m| Here the error terms represent the degree of uncertainty about the position of each data point. The terms within the max operator represents the error introduced to the data points by the error in the x-variable versus the y-variable. The max operator is used because the separation between the two data points must exceed the error in both dimensions for the points to be distinguishable. The term ey/ ml represents the range of error in the x-variable introduced by the error in the y-variable, given the slope of the line, m. Thus we can see that in the case of zero error in both variables (EX = EY = 0), the slope of the line between the two points can be determined simply when Xmax/Xmin > 1. For the data generated in Fig. 5-1, normally distributed noise with mean zero and or = a = 1 was added to the data. For such a normal distribution, let us approximate the characteristic error as ~ 2 standard deviations away from the mean (corresponding to the ~ 5%tails of the distribution). Thus, in this case, ex ~ 2u7 = 2, and Ey - 2o, = 2. Using the slope m = 1, we estimate that the slope of the line will be well specified when Xmax/Xmin > 5. These analytical estimates are in good agreement with the numerical simulation results shown in Fig. 5-1, namely that when Xmax/Xmin is greater than - 5, we expect good separation between the Xmax and Xmin relative to the noise in the data and thus reasonable prediction accuracy. We can now inspect the analytical expression for general trends about inference quality as a function of the relevant parameters. If we reduce the slope to m = 0.5, but keep the other parameters constant, we see that we need Xmax/Xmin > 9. However, if we increase the slope to m = 2 while keeping the other parameters constant, we see that Xmax/Xmin is still > 5. This is because if cx = c, then the error introduced by yerror will only dominate the max operator when ey = ac, when Iml Iml < 1. More generally, if we say then the error introduced by yerro, will only dominate the max operator < a. This is because as the slope of the line increases and 185 Iml > a, small changes in x result in greater changes in y, which will tend to extend beyond the error introduced by cy. Thus we see that, for a simple two-variable linear problem, increasing the range over which the input x-data are sampled increases the prediction accuracy. The analytical expression shows how prediction accuracy is a function of xmin, Im|, EX, and ey. The numerical results shown how this relationship is also a function of the number of sampled data points (based on the normal error distribution applied to the simulated data). This is a simple case of inference, in which we are inferring the magnitude of the slope of a line. This shows that inference quality is a function of (1) data quality, here defined as the range of the sampled data relative to the noise in the sampled data and the functional relationship between the input and output (i.e., the line's slope), and (2) data quantity. Next we sought to show that these insights could be extended to more relevant multivariate network models. 5.2.3 Simulating data from multivariate linear regression networks To extend the results comparing data quality and data quantity into a more relevant setting, we next generated simulated data from network models in which the relationships between nodes in the network were defined by linear regression functions. Directed acyclic networks were randomly generated based on the maximum number of allowable parents (inputs) per node, while containing only one root node (i.e., source signal). The signal levels for the root node were generated from N linearly equally spaced points over the range inputmin to inputmax. The signal levels for all nodes downstream of the root node were specified based on the signal level of its input parent node(s) using multiple linear regression. For a node P, the data for j, j with a set of parent nodes termed xj, is set according to the sum of its inputs, x. = E V iEP 186 mixi where mij represents the linear regression coefficient specifying the relationship between node i and node j. This coefficient is analogous to the slope of the linear regression line used in the two-variable toy model. Data were simulated in a step-wise fashion, such that the only input to the simulation process was the signal levels of the root node. Then, at each step in the simulation from parent to child nodes, heritable variation was added to each node's data. This variation was drawn from a random normal distribution with a standard deviation equal to ebiological multiplied by the absolute value of the node's pre-variation levels. In other words, normally distributed values with a coefficient of variation of (100 X ebiological)% were randomly added to each node's data at each step in the step- wise simulation. This variation-added signal was then used as input for the node's child node(s) in the network. Heritable variation is also added to the terminal nodes in the network, even though they have no child nodes. Once all nodes were simulated, then non-heritable variation was added to the simulated data. This variation was drawn from a uniform distribution over the range Etechnical, and this variation was added independently for each node. Note that, as defined here, Ebiological is a dimensionless quantity, but that technical has dimensions identical to the measured values xi. In biological terms, ebiological is intended to represent some degree of stochastic fluctuation in the signals' levels across different biological conditions. At described in Section 5.2.1, a biological condition would represent one instantiation of the biological network state. The source of these fluctuations may be variation in mRNA synthesis or degradation rates, protein synthesis or degradation rates, or other sources, but their end effect would be fluctuations in signal levels across conditions that are consistent with the quantitative signal-signal biochemical relationships underlying the biological network measured in those conditions, i.e., heritable variation. Understanding how the data are simulated is a vital step for subsequent understanding, so we will discuss an example here. Consider a simple network of three nodes with the structure A-+B-C, in which the two interaction parameters are both unity (mAB - 1, mBc = 1), Ebiological = 0.1, and Etechnical =1. 187 The simulation would first generate data for A, XA, based on a specified range from inputmin to inputmax. Heritable variation would then be added to XA based on a normal distribution with 10% coefficient of variation. Thus the heritable variation is proportional to the signal level, and added to each data point individually, XA,heritable - NA(XA, 0.1 This variation-added version of node A's data, X XAl) XA,heritable, would then be used as input for calculating the levels of node B. Because mAB =1, this simply implies that XB = XA,heritable. Then heritable variation is added to XB,heritable = N(XB, 0.1 X XB, just as it was for XA, XB ) And again, this variation-added version of node B's data would then be used as input for calculating the levels of node C. And again, because mBc = 1, this simply implies that xC = XB,heritable. Heritable variation is then added to xc, even though it is the terminal node, XC,heritable = N(xc, 0.1 x Ixc) After the simulation is complete, non-heritable variation is added to each node's data independently. Each simulated data point for each node has non-heritable noise added to it independently. This variation is drawn randomly from a uniform distribution over the range (teechnical, and thus this non-heritable variation is not proportional to the signal level, XA,final XA,heritable + U(1, +1) XB,final XB,heritable + U--1, -1) XC,final XC, heritable + -1(-1, -1) Using this procedure, data can be simulated for directed acyclic networks of arbitrary 188 size and complexity, given there is only one root node. These simulated data contain node-specific heritable variation but also node-specific non-heritable variation. Given a network structure, the parameters required for this simulation are the number of simulated data points (N), the range of signal values used for the root node (inputmin to inputma,), the interaction parameters (mij), the magnitude of heritable variation (Ebiological), and the magnitude of non-heritable variation (Etechnical). This procedure provided a method for simulating data from networks of known structure. Using the simulated data, and the known network structure as a benchmark, we could then assess the quality of network inference results as a function of the parameters used to generate the simulated data. 5.2.4 Inferring Bayesian networks using simulated network data Having a method for simulating multivariate network-level data, we next sought to assess our ability to infer the underlying network structures from these data using Bayesian network inference. Data were simulated from regression networks of varying complexity by changing the maximum allowable number of parent nodes per child node. The number of parents for each node was chosen randomly from a discrete uniform distribution. The three networks used in this study, each containing 15 nodes but allowing different maximum numbers of parents, are shown in Fig. 5-3. To simplify the simulation of these networks, all interaction parameters mi were set to unity. The heritable variation parameter Ebiological was set to 0.1, and the non-heritable variation parameter Etechnical was set to unity. To explore the effects of sampled data range on inference, three different ranges were used as inputs for the root node in each network. These ranges were 1.9-2.1 ("low narrow" range), 8.9-9.1 ("high narrow"7 range), and 1-10 ("broad" range). Given the lower and upper bounds of each range, linearly equally spaced data were sampled to generate input data containing 50, 100, 200, or 1,000 data points. Thus, as more data points were sampled, the density of sampling increased but the range of sampling did not. Heritable variation 189 1 parent max. 2 parents max. 3 parents max. Figure 5-3: Three different directed acyclic graphs were used to simulate synthetic data from which Bayesian network models were inferred. Each network contains 15 nodes, but varies in the maximum allowable number of parent nodes per child node. Each network contains only one root node. Data were simulated by modeling the edges in each network using linear regression relationships. proportional to the input signal level was then added, and these data were then used as input to simulate the entire network, as described in Section 5.2.3. Because the heritable and non-heritable variation were randomly generated, the simulation was repeated independently three times for each condition. Thus, in total, 108 data sets were simulated (3 network structures x 3 input ranges x 4 data set sizes x 3 replicates = 108). Before further considering the simulated network data, let us preface it with a discussion of the role of 6 biological and etechnical in causal network inference. If ebiological = 0 and etechnical = 0, the data generated for any given node would be exactly identical to the root node's input data except for slope terms mij that would in effect scale each node's data, and in cases of multiple parent inputs the child's values would be higher because of summing the parents' values. In this case, it would represent a unsolvable problem because many causal models would equally explain the data, in effect because there was no causal information in the network. If we have etechnical but Ebiological = > 0, 0, the variation in the signals would be only technical in nature, and any causal model inferred from these data would represent random signal-signal relationships in the data. To derive causal influences that actually represent node-to- 190 node signal propagation, one needs ebiological > 0. Having 6 technical > 0 is not required for causal inference; indeed, the greater its magnitude the more it confounds causal inference. For each Bayesian network model, prior knowledge was applied such that the root node from the simulated data was not allowed to have any parent nodes in the Bayesian network. Further, the maximum number of parents allowed in each Bayesian network model was restricted to the maximum number of parents found in each synthetic network. None of the simulated data conditions were treated as a perturbation or intervention by the model (i.e., none of the nodes were ever 'clamped' when inferring the Bayesian network). And to clarify, when referring to the different numbers of simulated data points (50, 100, 200, or 1,000), the term "data points" refers to the number of instances (or conditions) in which the entire network state was observed. In other words, 50 data points refers to 50 observations of the entire 15-node network state, which therefore actually corresponds to 15 x 50 = 750 unique numerical values. In an experimental biology context, these 50 data points would represent the total number of conditions in which the network's signals were measured, and may therefore constitute a range of multiplexed experimental scenarios (e.g., 5 time points x 10 RNAi perturbations, 2 time points x 25 RNAi perturbations, 5 growth factor concentrations x 2 time points x 5 RNAi perturbations, etc.). The main decisions one makes in Bayesian network inference are (1) the application of prior knowledge (in the network structure and/or the parent-child interaction parameters), (2) the explicit modeling of interventions, and, if one is using Bayesian network algorithms dependent on discrete data and not continuous data, then (3) how to discretize the continuous data values. By 'discretize' we mean to transform continuous data values into binned values. For example, discretizing continuous values into three states may correspond to something like "low", "medium", and "high" signal values. Historically, early applications of Bayesian networks in biology analyzed gene expression data. As such, because the classification of genes that were "underexpressed", "normal", or "overexpressed" was such a common scheme in the analysis of gene expression data, it was natural to consider three-level discretization because 191 of its similarity to this scheme [22]. Subsequent application of Bayesian networks to phosphoprotein data also used three-level discretization [16]. Other notable applications of Bayesian networks in a biological context used values ranging from two to four [185, 186], four [187], or twenty-five [188]. Applications of mutual information, another method using discrete data that generally only quantifies pairwise relationships, have also reported use of multiple discretization levels, including six [189], ten [156, 27], or values ranging from seven to seventeen [190]. (In ref. [190] the number of discretization levels used was not stated explicitly, but instead was calculated ex post facto based on their citation of ref. [191], in which it is recommended discretizing data measured across n conditions into \,Fn bins.) Given the lack of guidelines or consensus regarding an appropriate number of discretization levels for applying Bayesian network inference to a given data set, we chose to test multiple discretization levels, including 2, 4, 6, 8, and 10 levels. Quantile discretization was used to separate the data into bins (meaning that an equal number of data points were put into each bin, although the boundaries of those bins in the continuous space may not be equally spaced). This is in contrast to interval discretization, in which the bins are equally spaced in the continuous data space, but the number of data points per bin may not be equal. Quantile discretization was chosen to ensure that changing the number of discretization levels would change which data points were in each bin. In contrast, if interval discretization had been used, changing the number of discretization levels may not have changed which data points were in each bin (for example, if all the data points were near the minimum and maximum observed values, then increasing the number of discretization levels may just create more empty bins in between those values). The area under the receiver operating characteristic (ROC) curve (AUROC) was used to quantify the accuracy of the inferred Bayesian network models. The ROC curve represents the trade-off between true positive rate (also called sensitivity or recall) and false positive rate, as determined by comparing the edges in the inferred Bayesian network to the edges in the synthetic network used to generate the data from which the Bayesian network was inferred. Because the Bayesian network inference 192 algorithm employed here [50] uses exact Bayesian model averaging to derive a consensus model with probabilities for the likelihood of each edge feature given the data [65], one can vary the threshold at which an edge is considered 'significant' to derive different network structures. This allows one to traverse the space of true positive rate versus false positive rate. The edge weight threshold, P, was varied between the mini- mum and maximum observed edge weight values. Thus, at each value of the observed edge weights, the Bayesian network structure generated by only considering edges with weight p ;> P was compared to the synthetic network structure. The networks were scored based on the presence of directed edges, so that a high AUROC score reflects an inferred network that not only properly detected a relationship between two nodes (e.g., A-B), but also properly detected the directionality (i.e., causality) of that relationship (i.e., A--B). Earlier work by Margolin et al. [28] generated ROC curves for Bayesian network models by varying the equivalent sample size (ESS) parameter in the parameter prior (what Margolin et al. call the "Dirichlet psuedocount"). Although the details of their implementation of the LibB software (Friedman and Elidan, http: //compbio. cs . huj i . ac. il/LibB/programs .html) are not clear from the paper [28], it is likely that they inferred a single high-scoring Bayesian network structure, not a consensus model, and thus did not have edge weight scores to threshold for generating ROC curve data. Given the subsequently studied sensitivity of the inference process on the ESS value [192], and that modifying the ESS shifts the weighting between prior and observed data, it is not clear that varying ESS is an appropriate method for generating ROC curve data for Bayesian networks. 5.2.5 Bayesian network inference accuracy is a function of data range and discretization level The mean AUROC values across the three replicates are shown in Fig. 5-4. The values plotted above the dashed line in each subplot will be discussed later. These results reveal two striking trends: inference accuracy is a function of (1) the range of 193 the input data and (2) the number of bins used to discretize the data. Further, how inference accuracy varies as a function of the discretization scheme depends on the range of the input data. Additionally, although less surprisingly, increasing the size of the data set increases inference accuracy (regardless of input data range or synthetic network complexity, i.e., maximum parents allowed), and increasing the synthetic network complexity generally decreases inference accuracy (for a given data set size). While these latter trends are not unexpected, how they varied as a function of data set size and network complexity was not a priori known. To begin to understand why inference accuracy is a function of the input data range and the number of discretization levels applied, we can consider the origins of the synthetic data. In all cases, heritable variation was added to the data that was proportional to the signal level, but non-heritable variation that was not proportional to the signal level was also added. That non-heritable variation was drawn from a uniform distribution between -1 and +1, resulting in an expected average magnitude of 0.5. Thus, in the "low narrow" range case, which varied from 1.9-2.1, the nonheritable variation was about 0.5/2 or 25% of the average input signal level. In the "high narrow" range case, which varied from 8.9-9.1, the non-heritable variation was about 0.5/9 or about 5% of the average input signal level. And in the "broad" range case, which varied from 1-10, the non-heritable variation was about 0.5/5.5 or about 9% of the average input signal level. Thus, in part, the inference accuracy varied across the input ranges because of a signal-to-noise-type issue, where in this case 'noise' refers to non-heritable variation, because the "low narrow" range case had an especially poor signal-to-noise ratio. However, such notions do not explain why the inference accuracy varies as a function of discretization level, nor why that variation depends on the range of the input data. To understand these behaviors, we must consider the heritable variation that was added to the data. Heritable variation, in which fluctuations in the value of a parent node are essentially passed on to its child node(s), is key to causal inference. To extract the causal dependencies between measured nodes, we must be able to faithfully identify these fluctuations across conditions. In the simplest sense, if 194 1 0.9 0.9 0.9 0.8 0.8 0.8 0.8 0.7 0.6 0.7 0.6 0.7 0.6 0.7 0.6 0.9 1 parent ma. . ma 0.5 0 ----- -- - - - -- - - - - - - -- - - -- -- 1,000 data points 200 data points 100 data points 50 data points 2 4 6 8 10 0.5 *2 4 6 8 10 1-------------------- 1 -------------------- 0.5' 2 4 6 8 0,5 *2 10 0.9 0.9 0.9 0.9 0.8 0.8 0.8 0. 0.7 0.7 .6 0.6 6 8 10 1 1 -------------------- 3 parents max. 2 4 6 8 10 1P 1 -------------------- 0.9 0.7 0.7 0.7 4 6 8 10 8 10 2 10 - - ~~T. 4 0.52 4 6 8 10 0.5 2 4 6 8 10 - - - - - - ---- 1 0.9 _....J.....4..' j, "7I m . 9 0.7 0.6 0.6 0.6 0.6 6 W -~~~ 1-------------------- 0'9 0. 4 02 0.9 0.52 8 0.6 4 02 6 - 1- 1 -------------------- 2 parents 0.8 max. 0.7 4 4 6 8 10 0.5 2 - "Low narrow" range - "High narrow" range "Broad" range 4 6 8 10 Number of bins used to discretize data - Figure 5-4: Bayesian network inference accuracy is a function of data range and discretization level. The mean AUROC values across the three replicates are shown. Vertical error bars indicate the standard deviation of the AUROC values across the three replicates. Each subplot shows the AUROC values across the five discretization schemes (x-axis; 2, 4, 6, 8, and 10 quantile levels) for a given synthetic network and data set size. Within each subplot, values are plotted for the three simulated ranges ("low narrow", "high narrow", and "broad"), each shown in a different color. The square markers plotted above the dashed line represent the predicted number of discretization levels for each data range, as determined by the algorithm presented in Section 5.2.6. 195 the non-heritable variation and heritable variation have a similar magnitude, then the causal node-to-node fluctuations will be washed out by the non-causal (i.e., nonheritable) fluctuations. However, even when the heritable variation is of a greater magnitude than the non-heritable variation, there is still another requirement for accurate inference: the resolution, or granularity, of the discrete data must be sufficiently fine such that fluctuations due to heritable variation coincide with different discrete states. As an example, if we have signal A that takes values from 1 to 10 and has some constant heritable variation of magnitude 5, then discretizing that signal into two bins may be sufficient to capture heritable variation in the discrete data. However, if we consider another signal B with range 1 to 100 with the same heritable variation of magnitude 5, discretizing that signal into just two bins (e.g., values 1-50 as "low" and values 51-100 as "high") will generally not capture in the discrete data most of the heritable variation fluctuations. Only data fluctuations near value 50 will be translated into changes in discrete states. In other words, changes that occur in values approximately 1-40 and 60-100, for example, will all be considered identical according to the discrete data. Thus, discretizing the data from signal B into only two bins would underutilize the causal information in the data. Instead, one will likely need to discretize signal B into more bins than signal A in order to extract the fluctuations in the data that correspond to heritable variation. At the same time, one cannot simply arbitrarily increase the number of bins used to discretize data for two reasons. First, at a sufficiently fine level of discretization, the differences in raw data values that are being placed into different discrete bins will no longer correspond to heritable variation, and will instead correspond to non-heritable variation. In other words, discretizing too finely can begin to ascribe heritable value (i.e., placing data points into different discrete bins) to data that does not represent heritable variation, and thus is akin to overfitting the original data. Second, increasing the number of discretization levels also increases the number of parameters in the conditional probability tables of the Bayesian network (when using discrete Bayesian networks based on multinomial local conditional probability distributions [66]). The 196 Table 5.1: This table describes the number of parameters in the local conditional probability table of a Bayesian network for the case in which a node has 1, 2, or 3 parent nodes, and assuming the child node and its parent(s) have the same number of discrete states. Number of discretization levels 2 4 6 8 10 Number of parents 3 1 2 8 4 2 192 48 12 1,080 180 30 3,584 448 56 9,000 90 900 number of parameters required to characterize the local conditional probability distribution of a node with p parents, assuming both the parent and child nodes are discretized to the same number of levels C, is CP+ 1 - CP [193]. The number of pa- rameters for 2, 4, 6, 8, and 10 discrete levels given 1, 2, or 3 parents is shown in Table 5.1. This table therefore provides guidelines for approximately how many data points one should have to parameterize a Bayesian network with a given degree of complexity and given number of discrete states in the data: in general, one should have more data points than parameters. As such, some of the decreased inference accuracy shown in Fig. 5-4 may result from having too few data points to parameterize the Bayesian network models. However, in certain cases the inference accuracy is still high even though the number of data points is less than the number of parameters in the table. This may be because the linear relationships between nodes (which underlie the synthetic data) are still captured sufficiently well by an under-parameterized model, and/or the fact that the synthetic networks with a maximum of 2 or 3 parents also contain 1- and 2-parent relationships. This latter effect may further reduce the data requirements necessary to parameterize the full joint distribution across all signals (in contrast to just the local conditional distributions as given in Table 5.1). In summary, using too many discretization levels may decrease inference accuracy because of fitting to non-heritable variation in the data, and/or having insufficient data for the more complex parameterizations induced by using many discretization levels. With these conceptual insights in mind, we now have a better understanding 197 of why the inference accuracy varies as a function of discretization level, and why that variation depends on the range of the input data. Inference accuracy using the "low narrow" range data, likely because it has the poorest signal-to-noise ratio, is generally not strongly affected by the number of discretization levels. The "high narrow" range data generally achieves the highest accuracy using 2 or 4 discretization levels, with sometimes drastic decreases in accuracy as one increases up to 10 discrete levels. The "broad" range data generally achieves the highest accuracy using about 6 discretization levels, and sometimes even exhibits a biphasic relationship between discretization level and accuracy. These changes highlight the fact that more discrete levels are generally required to exploit the heritable variation present in the "broad" range data compared to the "high narrow" range data. If one discretizes the "broad" range data into too few states (e.g., 2), the inference accuracy is often less than the "high narrow" range data discretized to 2 states. In contrast, if one discretizes the "broad" range data to 6 states, the inference accuracy is often better than the "high narrow" range data discretized to 6 states. Overall, these simulation results provide a rationale for the importance of choosing a discretization strategy appropriate for each data set. 5.2.6 An a priori discretization strategy based on experimental measurement parameters While the previous section provides rationale for the importance of discretization, what is needed is an a priori method for estimating the most useful number of discretization levels for a data set given its experimental parameters, in particular the magnitudes of the heritable and non-heritable variation. To pursue such a method, we developed the notion of heritable variation windows; namely, how many significantly biologically different sub-ranges are present in between the minimum and maximum observed values for a given signal. By "biologically different" we mean signal differences on the same order of magnitude as the heritable variation. Similar to notions just discussed, the general concept is that a signal with range 1-100 and heritable 198 variation 5 has more "heritable variation windows" within that large range than a signal with the same heritable variation but a range 1-10. To quantify the appropriate width of a heritable variation window, we utilize the knowledge of how the synthetic data were generated. The heritable variation used to generate the synthetic data was drawn from a normal distribution with a mean value equal to the original signal and a given coefficient of variation ebiological, such that for a given signal A, XA,heritable AfJ(XA, 6 biological X IXA) The non-heritable variation was drawn from a uniform distribution with range -IEtechnical Using this information, we can proceed in a manner similar to that described in Section 5.2.2. The basic notion is that the heritable variation window must be wide enough to account for both the heritable variation and the non-heritable variation, but no wider. Let us begin with the assumption that the minimum observed value, Xmin > 0, for a given signal represents the lower boundary of a heritable variation window, such that: Xrnjn :- Bl - (Z x< Ebioloical)B1 - Etechnical where B1 represents the center of the first heritable variation window, and Z represents a scaling factor contributing to the width of the heritable variation window. In other words, by starting at the center of the first heritable variation window, subtracting some degree of heritable variation contributed by (Z X Ebiologcal)B1, and subtracting some degree of non-heritable variation contributed by etechnical one arrives at the minimum observed value, Xmin. This approach can be repeated to identify the center of the second heritable variation window, B 2 : B2 = B1 + (Z x Ebiological)B1 + (Z 199 x Ebioloical)B2 + 2 technical [E 41 i I B2 B, X 1 I. 11I 6 biological Bj e I11 B3 . r ""a Figure 5-5: Schematic for a priori discretization algorithm. Red arrows indicate the portion of each window attributable to heritable variation, ebiologicalBi, whereas blue arrows indicate the portion of each window attributable to non-heritable variation, etechnical. In other words, B 2 must be far enough from B 1 to account for the heritable variation associated with B 1 , (Z x (Z X Ebiologica)B1, Ebiological)B1, the heritable variation associated with B 2 , and the non-heritable variation associated with B 1 and B 2 , namely 2 Etechnical In this manner, the centers of all heritable variation windows can be determined, up until the point at which a window center exceeds the maximum observed value, Xmax (Fig. 5-5). The centers of the windows can be solved for recursively, first by solving for B 1 as a function of Xmin, B1 Xmin 1 - Z + X Etechnical 6 biological and then for any subsequent window Bn > B 1 , Bn while Bn < Xmax. windows between - (1 + Z 6 X ebiological)Bnl + 2 technical 1 - Z X 6 biological We can then use the number of identified heritable variation Xmin and Xmax as a metric for the number of significantly biologically different sub-ranges into which we can discretize our data. Therefore, this recursive formula provides an a priori method to choose a discretization scheme using only Xmin, Xmax, Ebiological, Etechnical, and Z as inputs. Using this formula, we then re-analyzed the synthetic data underlying the results shown in Fig. 5-4 to estimate the number of discretization states for each of the 108 data sets. Because a 10% coefficient of variation was used to generate the heritable 200 variation, we set ebiological to 0.1. And because the non-heritable variation was drawn from a uniform distribution U(-1, +1), in which the magnitude of the expected mean value is 0.5, we set Etechnical to 0.5. Because the etechnical was drawn from a uniform distribution, using 0.5 as its characteristic value only captures 50% of that distribution (while a characteristic value of 1 would capture 100% of that distribution). Therefore, one may want a more conservative estimate for the characteristic value of Etechnical. The Z term represents the scaling factor for the heritable variation. In this case, because the heritable variation was drawn from a normal distribution with coefficient of variation Ebiological, if etechnical were zero then Z represents the number of standard deviations from the window center to the window edge. However, because the width of each window relies on both heritable and non-heritable variation, the distance from a given window center, Bn, to its edge is given by (Z we used Z 5.2.7 = 1.64, corresponding to the 9 0 th X Ebiological)Bn + etechnical. Here percentile tails of a normal distribution. Predicted discretization corresponds strongly with bestperforming discretization The results for the number of discretization states derived using the recursive formula outlined in the previous section are plotted in Fig. 5-4 above the dotted line in each subplot. The horizontal error bars represent the standard deviations associated with these predictions. Discretization predictions were made for each of the 15 nodes separately and for each replicate data set. The mean predicted discretization level across the three replicates was calculated for each node. The standard deviation among these resultant 15 mean values is what is shown in the horizontal error bars. The color of each prediction matches the color of the data range ("low narrow", "high narrow", "broad") it is associated with. Because the predicted number of discretization states was simply a function of the raw data, it is not restricted only to the levels tested using the synthetic data (2, 4, 6, 8, and 10 levels), and may therefore be any nonzero value. With few exceptions, the range of the predicted number of discretization levels 201 aligns very well with the number of discretization states that provided the most accurate Bayesian network inference result in the synthetic data. This demonstrates that the a priori discretization algorithm, using only features of the simulated data itself and only one user-defined parameter Z as inputs, is a useful tool for predicting how many bins one should use to discretize these data. Inspecting the results, the error bars associated with the prediction increase as the complexity of the network increases. This is an artifact of the method used to simulate the data. Because the data for nodes with multiple parent inputs were determined by simply summing the values of each of the parents, this meant that nodes with multiple parents took on a higher range of values, and thus were discretized by the algorithm into more bins. Further, even if a given node only had one parent, but was downstream of a node that at one point had multiple parents, then its values too would take on a higher range. However, because not all nodes in the maximum 2- and 3-parent synthetic networks were downstream of a multi-parent node, not all the nodes experienced this increased range effect. Thus, some nodes in the multi-parent networks were predicted to be discretized to more states than other nodes, causing the increase in error bars. To consider cases without this summation effect, one can inspect the algorithm predictions for the maximum 1-parent synthetic network. Here, because the discretization algorithm is deterministic for a given data set, the variation shown in the predictions' horizontal error bars reflects variation from the simulated data sets' replicates. To translate these results into more practical experimental biology terms, 6 biological would correspond to the coefficient of variation measured across biological replicates, and Etechnical would correspond to the precision of the measurement for a given experimental procedure. A key assumption employed in all analyses here was that the heritable variation was proportional to the measured signal, but the non-heritable variation was not. Thus, the assumption implies that ebiological reflects signal-proportional experimental error, but Etechnical reflects signal-independent experimental error. Additionally, although the results using Z = 1.64 were satisfactory, the algorithm could be further tuned by changing the value of Z: increasing its magnitude will result in 202 wider heritable variation windows and therefore fewer predicted discretization bins, whereas decreasing its magnitude will result in narrower heritable variation windows, and therefore more predicted discretization bins. Lastly, because the discretization algorithm is performed on a node-specific basis, in practice each node could be discretized into its own specific number of bins. This was not explored explicitly here, but it is likely that discretizing each node to its value determined by the algorithm would improve inference accuracy, compared to using the same number of bins for all nodes as was done here. 5.3 Discussion Here we have quantitatively explored, using synthetic data from both a two-variable toy model and more realistic 15-node networks, how prediction accuracy varies as a function of data quantity and features related to data quality. We can now see that the notions explored in the two-variable toy model are elemental corollaries of the lessons learned from the multivariate network models. Xmax/xmin The data range term explored in Fig. 5-1 is analogous to the heritable variation term in the network data. That is, one must have a sufficiently high value of Xmax/Xmin, which will drive the output y based on the "heritable" system behavior y = mx, to overcome the non-heritable variation present in xnoisy and ynoi y. The inference of the slope parameter m is aided by a larger Xmax/Xmin; but even if one has a small Xmax/Xmin, the accuracy of the inferred parameter will increase as one increases the data set size, because by using more samples the prediction will converge to the expected slope according to the law of large numbers. However, there is a key difference between the two-variable toy model and the multivariate network model: causality. In the two-variable two model, the modeling task was simply to infer the slope of the relationship between the two signals: whether x was upstream of y or y upstream of x was not considered. In contrast, in the multivariate network models, the entire task was centered on identifying the causal "upstream-downstream" relationships between signals. In fact, the heritable 203 variation introduced in the network models' synthetic data is actually antagonistic to the process of trying to quantify the slope between signals. If there were no heritable variation induced in the data (Ebiological = 0), the slope parameter could actually be inferred more easily, if that were a goal, because the remaining variation from 6 technical was typically small compared to the heritable variation. However, without the heritable variation there would be no node-specific variation introduced into the data that was passed from parent to child node, and thus there could be no causality inferred from the data. The necessity of the propagation of variation from signal to signal through the network for enabling causal inference was recently discussed, albeit in the context of quantitative trait loci in metabolic pathways, by Blair et al. [194]. The propagation of heritable variation is a requirement for causal inference regardless of whether the model relies on continuous data (as in ref. [194]) or on discrete data, as discussed here. The additional requirement in the discrete case, quantified by our work here, is that the differences in the raw data underlying different discrete states must be on the same order of magnitude as the heritable variation propagated from signal to signal. The quality of inference will suffer if continuous data are split into too many bins, and differences between bins largely reflect non-heritable variation; but it will also suffer if too few bins are used, and the data within any given bin actually corresponds to differences in the continuous data that are biologically (i.e., causally) meaningful. It should be noted that an agglomerative discretization method, in which an effort is made to preserve the total mutual information between pairs of measured signals, is outlined in ref. [193]. While this approach does explicitly calculate a metric for the degree of information loss as a function of the number of discretization levels used, and generally advises one to choose a discretization scheme that does not result in substantial loss of information, it does not provide an explicit estimate of a discretization level. Further, it does not explicitly consider experimental parameters in the algorithm. That is, it does not consider the degree of heritable and non-heritable variation in the data. We sought to frame the discretization problem in terms of experimentally measurable parameters, and not just the numerical features (e.g., mu204 tual information content) of the data set. Nonetheless, it remains unclear the extent to which that method would compare to the method outlined here. One aspect missing from this work is determining, given a predicted discretization scheme, which edge weight threshold to apply to the inferred Bayesian network model. That is, while the algorithm corresponds well with discretization schemes providing high AUROC scores in the synthetic data, because the ROC curve calculation considers all edge weights, the algorithm does not actually suggest which edge weight is best for each discretization scheme. Future work could consider, for each ROC curve, which edge weight threshold provides a balance between true positive rate and false positive rate, and this could be an additional output for the algorithm. Short of a predicted edge weight threshold, one could always use p > 0.5, which will ensure that no cycles are present in the thresholded consensus Bayesian network. Another notion not explored here is the impact of explicitly modeling perturbations in the synthetic data. For example, one could simulate a portion of the synthetic data to mimic an RNAi knockdown and thus the signal level would be greatly reduced (or perhaps increased) in response. Explicitly modeling such perturbations would likely reduce the amount of data required to properly infer causal relationships in the data. However, the fact that no perturbations were simulated in the synthetic data, and thus no perturbations were explicitly modeled in the Bayesian network algorithm, is actually encouraging: it indicates that correct causal relationships can nonetheless be inferred, even when explicit perturbations are not present, if one has sufficient range and heritable variation in the data. In this manner, the range of the signal data is actually likely a mimetic of the effects of perturbations, which themselves typically "push" signals to extremes of their natural physiological range. Notions related to the data quality features explored quantitatively in this work have been discussed previously in the literature, but to our knowledge only in qualitative terms. The novelty of our work here is that it explores these notions in quantitative terms, using both analytical expressions and simulated network-level data. For example, from Basso et al. [189]: "...[G]enetic interactions are best inferred when the genes explore a sub205 stantial dynamical range. Traditionally, this has been achieved by systematic perturbations in simple organisms (e.g., by large-scale gene knockouts or exogenous constraints), which are not easily obtained in more complex cellular systems. We show here that an equivalent dynamic richness can be efficiently achieved by assembling a considerable number of naturally occurring and experimentally generated phenotypic variations of a given cell type [emphases added]." And from Hartemink [195] commenting on Basso et al. [189]: "Basso et al. demonstrate that as long as the available data explore a wide range in the 'expression space' of the system, biologically meaningful interactions can be recovered by computational algorithms." The conclusions discussed here actually relate to a more generalizable concept for discrete models, namely, nonuniform discretization (e.g., [183, 184]). The notion is to finely discretize regions of the functional space that one knows with high confidence, and to coarsely discretize regions that one knows with low confidence. While Kozlov and Koller [183] discuss the application of their method to hybrid Bayesian networks, Reshef et al. [184] only consider pairwise relationships between signals and do not attempt to infer causality. Importantly, neither method incorporates tangible experimental parameters, such as measures of heritable and non-heritable variation as described here, into its algorithm. As such, we believe our results provide novel insights not only into the features of data useful for building causal models, but also how those features can be expressed in terms familiar to experimentalists. 206 Chapter 6 Conclusion This thesis has focused on improving our understanding of receptor tyrosine kinase signaling in cancer using multivariate computational methods paired with experimental cell signaling and phenotype measurements. By collaborating with numerous experimental colleagues who used a variety of technologies to measure signaling in a range of biological settings, much has been learned about biological modeling that may not have been learned had this thesis focused exclusively on one type of experimental data, or one particular biological topic. In addition, given the variety of experimental data, numerous modeling methods were explored and applied during the thesis. By becoming familiar with multiple modeling methods, generalizable modeling lessons could be learned and insights gained that may not have been realized had this thesis focused heavily on only one modeling method. This broad spectrum of experience has been a great asset for this thesis. The results from Chapter 2 highlighted the possible role of signaling relationships existing between receptor tyrosine kinases, even though only one receptor's ligand was used to stimulate the cells, in a manner that was not appreciated previously. This has led to continued experimental study on receptor-to-receptor signaling mechanisms. In Chapter 3, fundamental differences in the receptor tyrosine kinase signaling networks and migration modes between epithelial versus mesenchymal cells were highlighted. In Chapter 4, similarities in signaling across six receptor tyrosine kinases were identified and subsequently linked to possible roles in cancer drug resistance. In Chapter 207 5, analytical, numerical, and conceptual arguments were proposed for identifying features of experimental data that produce more accurate models. 6.1 Emergent biological and computational insights There are emergent connections between the chapters' biological conclusions. Notions of receptor tyrosine kinase crosstalk highlight in Chapter 2 may also relate to the receptor network classes identified in Chapter 4. For example, because of the shared underlying signaling networks across same-class receptors, they may also share intracellular activation mechanisms (e.g., by receptor-proximal docking proteins). As such, if one receptor is activated, it may activate intracellular proteins that could potentially interact with same-class receptors even without ligand-dependent activation of the other receptor. Regarding the epithelial-to-mesenchymal transition (EMT) studied in Chapter 3, the same-class receptors EGFR, FGFR1, and c-Met studied in Chapter 4 have demonstrated roles in EMT [196], including suggested switching from EGFR signaling in an epithelial state to FGFR1 signaling in a mesenchymal state [84]. This potential ability of FGFR1 to compensate for EGFR in an EMTdependent manner, combined with the knowledge from Chapter 4 that EGFR and FGFRI belong to the same network class, may also help explain how cells that have undergone EMT are less sensitive to EGFR inhibitors [84]. In this manner, FGFR1 may function as a sort of "mesenchymal version" of EGFR. This thesis has also provided novel insights from a computational perspective. While it has thoroughly explored arguably complex methods, like Bayesian networks, mutual information, and partial least squares regression, It has also demonstrated the power of simple methods. In Chapter 3, it was shown that linear regression models using only one or two phosphorylation sites as predictors of cell speed could be more accurate than a partial least squares regression model using 11 phosphorylation sites. Chapter 3 also discussed network models derived using Pearson correlation, arguably the simplest measure of similarity between two signals. In Chapter 4 it was demonstrated that network topologies derived using Pearson and Spearman correlation were 208 as accurate, when used as multivariate classifiers of receptor signaling networks, as Bayesian networks and methods based on mutual information. And lastly, in Chapter 5, conceptual insights into data quality features were first obtained using a system of just two variables. Thus, importantly, these results show that biological models do not have to be complicated to be useful. Indeed, unjustified complexity can obscure biological insight. 6.2 Guidelines for analysis of large data sets In the course of working on multiple projects all involving the analysis of relatively large, relatively involved data sets, some lessons have emerged that may serve as useful guidelines for future study of large data sets. First, plot the data. Although what one plots may vary based on what type of model one is constructing, visualizing the data in some manner will almost always be helpful to understanding it and therefore to modeling it. If one seeks to build a network model quantifying relationships between measured signals, one should absolutely always plot all pairwise signal-signal relationships. For example, given three measured signals A, B, and C, one should generate plots of the data from A vs. B, A vs. C, and B vs. C. When seeking to use measured signals to predict a given output of interest (e.g., some phenotypic quantity), always plot each signal individually versus the output. Additionally, one should start modeling efforts in this case by simply calculating the Spearman correlation between each signal and the output. One could also use the Pearson correlation; but the Spearman correlation can capture nonlinear (but monotonic) relationships and, because it is rank-based, is also typically more robust to outlier data points than the Pearson correlation. For any given modeling task, the null hypothesis (i.e., the starting point) should always be one of simplicity rather than one of complexity. If one builds a simple model and it is not sufficiently predictive, only at that point should one consider a more complex modeling approach. Thus, added complexity should be justified. If one only considers complex modeling approaches to analyze biological data, then the 209 notion that biology is complex becomes a self-fulfilling prophecy. Along these lines, one common concept cited in biological models is nonlinearity. What is not always realized is that the relationship between two nonlinear signals can itself be linear, and thus sometimes linear methods are sufficient. Modeling methods that promote their utility for capturing nonlinear relationships-for example, including literature using mutual information-based methods [28, 156], Bayesian networks [185, 195], and fuzzy logic [21]-generally have not provided evidence that the underlying data exhibit nonlinear relationships, and have generally not compared the predictive capacity of their nonlinear methods to the predictive capacity of simpler linear methods. Notably, Faith et al. [27] compared CLR and mutual information to Pearson correlation, finding that Pearson correlation could outperform mutual information and perform comparably to CLR; but unfortunately they never considered Spearman (nonlinear, monotonic) correlation. Linear methods should be used first; and if proven insufficient, then nonlinear methods should be used. This is especially true for nonlinear methods relying on discrete data given the results in Chapter 5, in which the sensitivity of Bayesian network inference to discretization level was described. 6.3 Limitations of methods While the methods discussed in this thesis have provided substantial insights into receptor tyrosine kinase signaling, they also have limitations. One strong limitation is the availability of data. Inferring network models from data, in the most basic sense, requires a sufficient number of data points to faithfully describe the functional relationships between measured signals. To develop models that are arguably causal, one typically needs even more data points. In the treatment of cancer, it is becoming increasingly clear that variability in tumor composition between patients, and even within the same patient, is an important factor in determining which patients will respond to drugs. This suggests efforts to build patient-specific models for understanding treatment strategies. However, patient-specific protein signaling data is not 210 available in great quantities. And it is not clear that the amounts available are sufficient for constructing patient-specific signaling network models at this time. Thus, while signaling network models will certainly remain relevant for in vitro cell linebased studies, and likely even in vivo animal models [197], it is not clear they will be readily applicable to patient-specific data. Another limitation with Bayesian network inference in particular is the difficulty inferring large networks. Other methods that do not argue for causal interpretations or calculate conditional independence relationships are not as limited by large networks. The core Bayesian network inference algorithm used throughout this thesis [501, because it performs exact Bayesian model averaging by scoring all possible net- work structures, is limited to inferring networks containing about 20 nodes. While the computational complexity of the problem can be reduced by limiting the number of parents per child node, and limiting the number of bins used to discretize the data, these may not be sufficient, or desirable, in all situations. To consider networks with more than 20 nodes, but no longer exhaustively score all networks, one could use search methods like Markov chain Monte Carlo (MCMC) [65]. Thus, one could analyze a much larger network with MCMC-type approaches, but as with optimization problems more generally, increasing the dimensionality of the optimization problem (i.e., the number of nodes in the network in this case) may make it more difficult to identify high-scoring regions of the search space. Another limitation of signaling network inference is the presence of so-called hidden variables [198], namely variables that are present in the system but not measured in the experiment. Two measured signals that appear correlated in a given data set may exhibit a functional biological relationship, or they may appear correlated because of mutual regulation by a third but unmeasured signal. In protein signaling data sets, in which about 15 to 100 signals are typically measured, while this is a prodigious improvement over previous experimental methodologies, there still are still vast multitudes of signals not being measured. These hidden variables confound network inference results even when causal methods are not used. When causal methods are used, any interpretation of causality must be tempered by the possibility that any 211 given network relationship is due to the unaccounted for influence of an unmeasured signal. Lastly, using a signaling network model to identify drug targets is still a challenge. While Chapter 4 described results using the entire inferred network topology as a multivariate classifier of receptor function, how to use the inferred relationships on an edge-specific basis is less clear. Further, if one has phenotypic data, it is not clear that having an inferred signaling network "upstream" of the signal-phenotype predictions is going to be useful. It may be that simply trying to predict phenotype directly will yield the greatest insights into what may be a useful drug target, namely the signals most predictive of phenotype. 6.4 Future work Given these limitations, there are nonetheless prospects for future research. It may be that trying to argue causality by using Bayesian networks-given the data requirements to describe causal and higher order parent-child relationships, the limits on network size, and the presence of hidden variables-is not always necessary to gain insight. As such, one could consider using simpler methods, like correlation and pairwise mutual information approaches, to gain insights into gross differences in network topology across different conditions of interest. In this manner, one would derive consensus networks across these simpler methods, and then compare the consensus networks between conditions of interest to identify similarities and differences in the network structures. The most striking differences could be researched against known signaling mechanisms and literature, and then followed up experimentally to determine if the observed network differences, while not arguably causal from an algorithmic perspective, nonetheless reflect real biological differences between those conditions. However, if one wants to try and argue causal interpretations of the network inference result using Bayesian networks, I suggest a modified approach. One can begin by inferring a Bayesian network structure as has been used throughout this 212 thesis. But then for each inferred network structure, one could fit the identified signal relationships to linear and/or nonlinear functions using the continuous data. In other words, if a Bayesian network result identifies a hypothesized link A->B, then return to the continuous data and fit the data for B as a function of A. One can then determine, in the continuous data space, how well A predicts B. This procedure could be repeated at different edge weight thresholds to identify a threshold that corresponds to high prediction accuracy in the continuous data space. These continuous functions could then be used to perform sensitivity analysis for evaluating the effects of perturbations to the system. In this manner, one would have an underlying network topology derived using causal semantics, but the functions then applied to those network interactions would be continuous and potentially nonlinear. This may ease interpretation of the prediction results compared to operating in the discrete data space, in which inference predictions made using Bayesian networks are presented in probabilistic terms [199]. Prior knowledge about the signaling network structure was applied to the Bayesian networks inferred in Chapters 2, 4, and 5; and prior knowledge about the parameters of the conditional probability tables was introduced, by varying the equivalent sample size parameter to account for different data set sizes, in Chapter 5. Most literature related to applying prior knowledge to Bayesian networks in biology has focused on modifying the structure prior [200]. However, to my knowledge, no one has discussed use of the parameter prior for incorporating prior knowledge. Modifying the equivalent sample size (ESS) value in particular should allow facile incorporation of prior knowledge on a parent-child specific basis. To begin, the ESS value could be altered for just the one-parent interactions. If this were insufficient, one could consider encoding higher order prior knowledge (e.g., about mutual regulation of one child node by two parent nodes) using ESS values for two-parent interactions. Methods could also be developed to improve the inference of large networks. Initial results in this thesis explored results obtained by breaking a large network into subnetworks, inferring a Bayesian network for each subnetwork, and then piecing the subnetworks back together by normalizing against how many times a given interaction was considered across the subnetworks. Other work in this thesis also considered 213 large, directed, linear and nonlinear regression-based networks as an alternative to the Bayesian network approach. In that approach, one or more root nodes would be specified. Then, the signal best predicted by the root nodes would be determined and added to the network. Then, given the set of root nodes and the new node, the signal best predicted by that set would be added to the network. This process would be repeated until all the nodes were incorporated into the network. Multiple parents could be specified for each child node. Additionally, the accuracy between linear and nonlinear interaction terms were compared, and only if the nonlinear method provided significantly improved accuracy would it be used in place of the linear function. This type of approach would allow one to "forward-simulate" the entire network based only on the values of the root nodes. Lastly, modeling temporal data remains a challenge. Dynamic Bayesian networks [62], while often cited as a solution to modeling dynamics and cycles in the context of Bayesian networks, really just represent a multi-layered Bayesian network. Further, it is not well established how to split the time points from a given time course into those multiple layers [201]. Thus there is still work to be done to determine how to implement dynamic Bayesian networks in a biologically relevant manner. But more importantly, signals that are correlated temporally are still not necessarily causal. For example, considering the signaling data from different receptor tyrosine kinases discussed in Chapter 4, it has signals collected across 11 time points and also across 91 perturbation conditions. If one calculates the Spearman correlation between signals across time points for a given perturbation condition (i.e., a measure of the similarity between signals' time courses), and compares it to the Spearman correlation between signals across perturbations for a given time point (i.e., a measure of the similarity between signals' perturbation responses), the two can be very different. Some signals have similarly shaped time courses, but respond very differently to perturbations; and some signals have very differently shaped time courses, but nonetheless response very similarly to perturbations. The implications of these phenomena should be further studied, particularly if models calculate signaling relationships across temporal and perturbation data combined together. 214 Bibliography [1] M.A. Lemmon and J. Schlessinger. Cell signaling by receptor tyrosine kinases. Cell, 141(7):1117, 2010. [2] T. Hunter. Why nature chose phosphate to modify proteins. Philosophical Transactions of the Royal Society B: Biological Sciences, 367(1602):2513-2516, 2012. [3] W.A. Lim and T. Pawson. Phosphotyrosine signaling: evolving a new cellular communication system. Cell, 142(5):661-667, 2010. [4] T. Hunter. Tyrosine phosphorylation: thirty years and counting. Opinion in Cell Biology, 21(2):140-146, 2009. Current [5] D. Hanahan and R.A. Weinberg. Hallmarks of cancer: the next generation. Cell, 144(5):646-674, 2011. [6] T. Ideker and D. Lauffenburger. Building with a scaffold: emerging strategies for high- to low-level cellular modeling. Trends in Biotechnology, 21(6):255-262, 2003. [7] K.A. Janes and D.A. Lauffenburger. A biological approach to computational models of proteomic networks. Current Opinion in Chemical Biology, 10(1):73 80, 2006. [8] P.A. DiMilla, K. Barbee, and D.A. Lauffenburger. Mathematical model for the effects of adhesion and mechanics on cell migration speed. Biophysical Journal, 60(1):15-37, 1991. [9] K. Nagata, I. Izawa, and M. Inagaki. A decade of site-and phosphorylation state-specific antibodies: recent advances in studies of spatiotemporal protein phosphorylation. Genes to Cells, 6(8):653-664, 2001. [10] B. Schoeberl, C. Eichler-Jonsson, ED Gilles, and G. Muller. Computational modeling of the dynamics of the MAP kinase cascade activated by surface and internalized EGF receptors. Nature Biotechnology, 20(4):370, 2002. [11] F. Hua, M.G. Cornejo, M.H. Cardone, C.L. Stokes, and D.A. Lauffenburger. Effects of Bcl-2 levels on Fas signaling-induced caspase-3 activation: molecular 215 genetic tests of computational model predictions. The Journal of Immunology, 175(2):985-995, 2005. [12] K.A. Janes, J.G. Albeck, L.X. Peng, P.K. Sorger, D.A. Lauffenburger, and M.B. Yaffe. A high-throughput quantitative multiplex kinase assay for monitoring information flow in signaling networks application to sepsis-apoptosis. Molecular & Cellular Proteomics, 2(7):463-473, 2003. [13] U.B. Nielsen, M.H. Cardone, A.J. Sinskey, G. MacBeath, and P.K. Sorger. Profiling receptor tyrosine kinase activation by using Ab microarrays. Proceedings of the National Academy of Sciences, 100(16):9330-9335, 2003. [14] K.A. Janes, J.G. Albeck, S. Gaudet, P.K. Sorger, D.A. Lauffenburger, and M.B. Yaffe. A systems model of signaling identifies a molecular basis set for cytokine-induced apoptosis. Science, 310(5754):1646 1653, 2005. [15] J.M. Irish, R. Hovland, P.O. Krutzik, O.D. Perez, 0. Bruserud, B.T. Gjertsen, and G.P. Nolan. Single cell profiling of potentiated phospho-protein networks in cancer cells. Cell, 118(2):217-228, 2004. [16] K. Sachs, 0. Perez, D. Pe'er, D.A. Lauffenburger, and G.P. Nolan. Causal protein-signaling networks derived from multiparameter single-cell data. Science, 308(5721):523, 2005. [17] Y. Zhang, A. Wolf-Yadlin, P.L. Ross, D.J. Pappin, J. Rush, D.A. Lauffenburger, and F.M. White. Time-resolved mass spectrometry of tyrosine phosphorylation sites in the epidermal growth factor receptor signaling network reveals dynamic modules. Molecular & Cellular Proteomics, 4(9):1240-1250, 2005. [18] M. Sevecka and G. MacBeath. State-based discovery: a multidimensional screen for small-molecule modulators of EGF signaling. Nature Methods, 3(10):825831, 2006. [19] M. Bansal, V. Belcastro, A. Ambesi-Impiombato, and D. Di Bernardo. How to infer gene networks from expression profiles. Molecular Systems Biology, 3(1), 2007. [20] R. Bonneau, D.J. Reiss, P. Shannon, M. Facciotti, L. Hood, N.S. Baliga, V. Thorsson, et al. The Inferelator: an algorithm for learning parsimonious regulatory networks from systems-biology data sets de novo. Genome Biology, 7(5):R36, 2006. [21] M.K. Morris, J. Saez-Rodriguez, D.C. Clarke, P.K. Sorger, and D.A. Lauffenburger. Training signaling pathway maps to biochemical data with constrained fuzzy logic: quantitative analysis of liver cell responses to inflammatory stimuli. PLoS Computational Biology, 7(3):e1001099, 2011. 216 [22] N. Friedman, M. Linial, I. Nachman, and D. Pe'er. Using Bayesian networks to analyze expression data. Journal of Computational Biology, 7(3-4):601-620, 2000. [23] J. Saez-Rodriguez, L.G. Alexopoulos, J. Epperlein, R. Samaga, D.A. Lauffenburger, S. Klamt, and P.K. Sorger. Discrete logic modelling as a means to link protein signalling networks with functional analysis of mammalian signal transduction. Molecular Systems Biology, 5(1), 2009. [24] K. Wang, M. Saito, B.C. Bisikirska, M.J. Alvarez, W.K. Lim, P. Rajbhandari, Q. Shen, I. Nemenman, K. Basso, A.A. Margolin, et al. Genome-wide identification of post-translational modulators of transcription factor activity in human B cells. Nature-Biotechnology, 27(9):829-837, 2009. [25] A. De La Fuente, N. Bing, I. Hoeschele, and P. Mendes. Discovery of meaningful associations in genomic data using partial correlation coefficients. Bioinformatics, 20(18):3565-3574, 2004. [26] J. Krumsiek, K. Suhre, T. Illig, J. Adamski, and F.J. Theis. Gaussian graphical modeling reconstructs pathway reactions from high-throughput metabolomics data. BMC Systems Biology, 5(1):21, 2011. [27] J.J. Faith, B. Hayete, J.T. Thaden, I. Mogno, J. Wierzbowski, G. Cottarel, S. Kasif, J.J. Collins, and T.S. Gardner. Large-scale mapping and validation of Escherichia coli transcriptional regulation from a compendium of expression profiles. PLoS Biology, 5(1):e8, 2007. [28] A.A. Margolin, I. Nemenman, K. Basso, C. Wiggins, G. Stolovitzky, R.D. Favera, and A. Califano. ARACNE: an algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context. BMC Bioinformatics, 7(Suppl 1):S7, 2006. [29] D. Pe'er. Bayesian network analysis of signaling networks: a primer. Science Signaling, 2005(281):pl4, 2005. [30] D. Heckerman. A tutorial on learning with Bayesian networks. Innovations in Bayesian Networks, pages 33-82, 2008. [31] K.B. Korb and A.E. Nicholson. Chapman & Hall/CRC, 2003. [32] D. Heckerman. 1990. Bayesian Artificial Intelligence, volume 1. Probabilistic similarity networks. Networks, 20(5):607-636, [33] G.F. Cooper and E. Herskovits. A Bayesian method for the induction of probabilistic networks from data. Machine Learning, 9(4):309-347, 1992. 217 [34] B. Schoeberl, E.A. Pace, J.B. Fitzgerald, B.D. Harms, L. Xu, L. Nie, B. Linggi, A. Kalra, V. Paragas, R. Bukhalid, et al. Therapeutically targeting ErbB3: a key node in ligand-induced activation of the ErbB receptor-PI3K axis. Science Signaling, 2(77):ra3l, 2009. [35] A.L. Hopkins. Network pharmacology: the next paradigm in drug discovery. Nature Chemical Biology, 4(11):682-690, 2008. [36] T.H. Keller, A. Pichota, and Z. Yin. A practical view of 'druggability'. Current Opinion in Chemical Biology, 10(4):357-361, 2006. [37] M.F. Ciaccio, J.P. Wagner, C.P. Chuu, D.A. Lauffenburger, and R.B. Jones. Systems analysis of EGF receptor signaling dynamics with microwestern arrays. Nature Methods, 7(2):148-155, 2010. [38] W. Burnette. "Western blotting": electrophoretic transfer of proteins from sodium dodecyl sulfate polyacrylamide gels to unmodified nitrocellulose and radiographic detection witih antibody and radioiodinated protein A. Analytical Biochemistry, 112:195-203, 1981. [39] C.P. Paweletz, L.A. Liotta, and E.F. Petricoin. New technologies for biomarker analysis of prostate cancer progression: Laser capture microdissection and tissue proteomics. Urology, 57(4):160-163, 2001. [40] C.P. Paweletz, L. Charboneau, V.E. Bichsel, N.L. Simone, T. Chen, J.W. Gillespie, M.R. Emmert-Buck, M.J. Roth, EF Petricoin, L.A. Liotta, et al. Reverse phase protein microarrays which capture disease progression show activation of pro-survival pathways at the cancer invasion front. Oncogene, 20(16):1981-1989, 2001. [41] K. Rikova, A. Guo, Q. Zeng, A. Possemato, J. Yu, H. Haack, J. Nardone, K. Lee, C. Reeves, Y. Li, et al. Global survey of phosphotyrosine signaling identifies oncogenic kinases in lung cancer. Cell, 131(6):1190-1203, 2007. [42] J.V. Olsen, B. Blagoev, F. Gnad, B. Macek, C. Kumar, P. Mortensen, and M. Mann. Global, in vivo, and site-specific phosphorylation dynamics in signaling networks. Cell, 127(3):635-648, 2006. [43] A. Wolf-Yadlin, N. Kumar, Y. Zhang, S. Hautaniemi, M. Zaman, H.D. Kim, V. Grantcharova, D.A. Lauffenburger, and F.M. White. Effects of HER2 overexpression on cell signaling networks governing proliferation and migration. Molecular Systems Biology, 2(1), 2006. [44] R. Tibes, Y.H. Qiu, Y. Lu, B. Hennessy, M. Andreeff, G.B. Mills, and S.M. Kornblau. Reverse phase protein array: validation of a novel proteomic technology and utility for analysis of primary leukemia specimens and hematopoietic stem cells. Molecular Cancer Therapeutics, 5(10):2512-2521, 2006. 218 [45] R.B. Jones, A. Gordus, J.A. Krall, and G. MacBeath. A quantitative protein interaction network for the ErbB receptors using protein microarrays. Nature, 439(7073):168-174, 2005. [46] H. Sunada, B.E. Magun, J. Mendelsohn, and C.L. MacLeod. Monoclonal antibody against epidermal growth factor receptor is internalized without stimulating receptor phosphorylation. Proceedings of the National Academy of Sciences, 83(11):3825-3829, 1986. [47] G.N. Gill and C.S. Lazar. Increased phosphotyrosine content and inhibition of proliferation in EGF-treated A431 cells. Nature, 1981. [48] A. Wolf-Yadlin, S. Hautaniemi, D.A. Lauffenburger, and F.M. White. Multiple reaction monitoring for robust quantitative proteomic analysis of cellular signaling networks. Proceedings of the National Academy of Sciences, 104(14):58605865, 2007. [49] F. Chang, JT Lee, PM Navolanic, LS Steelman, JG Shelton, WL Blalock, RA Franklin, and JA McCubrey. Involvement of PI3K/Akt pathway in cell cycle progression, apoptosis, and neoplastic transformation: a target for cancer chemotherapy. Leukemia, 17(3):590-603, 2003. [50] M. Koivisto and K. Sood. Exact Bayesian structure discovery in Bayesian networks. Journal of Machine Learning Research, 5:549-573, 2004. [51] D. Eaton and K. Murphy. Exact Bayesian structure learning from uncertain interventions. In AI & Statistics, volume 2, pages 107-114, 2007. [52] J.M. Stommel, A.C. Kimmelman, H. Ying, R. Nabioullin, A.H. Ponugoti, R. Wiedemeyer, A.H. Stegh, J.E. Bradner, K.L. Ligon, C. Brennan, et al. Coactivation of receptor tyrosine kinases affects the response of tumor cells to targeted therapies. Science, 318(5848):287, 2007. [53] P.A. Bromann, H. Korkaya, and S.A. Courtneidge. The interplay between Src family kinases and receptor tyrosine kinases. Oncogene, 23(48):7957-7968, 2004. [54] D.M. Chickering. Learning equivalence classes of Bayesian-network structures. Journal of Machine Learning Research, 2:445-498, 2002. [55] J. Downward, P. Parker, and MD Waterfield. Autophosphorylation sites on the epidermal growth factor receptor. Nature, 311(5985):483-485, 1984. [56] Y. Saito, J. Haendeler, Y. Hojo, K. Yamamoto, and B.C. Berk. Receptor heterodimerization: essential mechanism for platelet-derived growth factor-induced epidermal growth factor receptor transactivation. Molecular and Cellular Biology, 21(19):6387-6394, 2001. 219 [57] L. Duchesne, B. Tissot, T.R. Rudd, A. Dell, and D.G. Fernig. N-glycosylation of fibroblast growth factor receptor 1 regulates ligand and heparan sulfate coreceptor binding. Journal of Biological Chemistry, 281(37):27178-27189, 2006. [58] S. Ekman, A. Kallin, U. Engstroem, C.H. Heldin, and L. Roennstrand. SHP2 is involved in heterodimer specific loss of phosphorylation of Tyr771 in the PDGF3-receptor. Oncogene, 21:1870-1875, 2002. [59] K.L. Gould and T. Hunter. Platelet-derived growth factor induces multisite phosphorylation of pp60c-src and increases its protein-tyrosine kinase activity. Molecular and Cellular Biology, 8(8):3345-3356, 1988. [60] R.C. Taylor, G. Acquaah-Mensah, M. Singhal, D. Malhotra, and S. Biswal. Network inference algorithms elucidate Nrf2 regulation of mouse lung oxidative stress. PLoS Computational Biology, 4(8):e1O00166, 2008. [61] I. Cantone, L. Marucci, F. Iorio, M.A. Ricci, V. Belcastro, M. Bansal, S. Santini, M. di Bernardo, D. di Bernardo, M.P. Cosma, et al. A yeast synthetic network for in vivo assessment of reverse-engineering and modeling approaches. Cell, 137(1):172, 2009. [62] D. Husmeier. Sensitivity and specificity of inferring genetic regulatory interactions from microarray experiments with dynamic Bayesian networks. Bioinformatics, 19(17):2271-2282, 2003. [63] Y.W. Lou, Y.Y. Chen, S.F. Hsu, R.K. Chen, C.L. Lee, K.H. Khoo, N.K. Tonks, and T.C. Meng. Redox regulation of the protein tyrosine phosphatase PTP1B in cancer cells. FEBS Journal,275(1):69-88, 2008. [64] W. Lu, K. Shen, and P.A. Cole. Chemical dissection of the effects of tyrosine phosphorylation of SHP-2. Biochemistry, 42(18):5461-5468, 2003. [65] N. Friedman and D. Koller. Being Bayesian about network structure: A Bayesian approach to structure discovery in Bayesian networks. Machine Learning, 50(1):95-125, 2003. [66] D. Heckerman, D. Geiger, and D.M. Chickering. Learning Bayesian networks: The combination of knowledge and statistical data. Machine Learning, 20(3):197-243, 1995. [67] D. Madigan, J. York, and D. Allard. Bayesian graphical models for discrete data. InternationalStatistical Review/Revue Internationale de Statistique, pages 215-232, 1995. [68] P.E. Meyer, F. Lafitte, and G. Bontempi. minet: A R/Bioconductor package for inferring large transcriptional networks using mutual information. BMC Bioinformatics, 9(1):461, 2008. 220 [69] R. Steuer, J. Kurths, C.O. Daub, J. Weise, and J. Selbig. The mutual information: detecting and evaluating dependencies between variables. Bioinformatics, 18(suppl 2):S231-S240, 2002. [70] H.D. Kim, A.S. Meyer, J.P. Wagner, S.K. Alford, A. Wells, F.B. Gertler, and D.A. Lauffenburger. Signaling network state predicts Twist-mediated effects on breast cell migration across diverse growth factor contexts. Molecular 6 Cellular Proteomics, 10(11), 2011. [71] J.P. Thiery. Epithelial-mesenchymal transitions in development and pathologies. Current Opinion in Cell Biology, 15(6):740-746, 2003. [72] R. Kalluri and R.A. Weinberg. The basics of epithelial-mesenchymal transition. Journal of Clinical Investigation, 119(6):1420, 2009. [73] J.P. Thiery and J.P. Sleeman. Complex networks orchestrate epithelialmesenchymal transitions. Nature Reviews Molecular Cell Biology, 7(2):131-142, 2006. [74] S. Thomson, F. Petti, I. Sujka-Kwok, P. Mercado, J. Bean, M. Monaghan, S.L. Seymour, G.M. Argast, D.M. Epstein, and J.D. Haley. A systems view of epithelial-mesenchymal transition signaling states. Clinical and Experimental Metastasis, 28(2):137-155, 2011. Signaling networks guiding epithelial[75] A. Moustakas and C.H. Heldin. and cancer progression. Cancer embryogenesis during transitions mesenchymal Science, 98(10):1512-1520, 2007. [76] J. Xu, S. Lamouille, and R. Derynck. TGF--induced epithelial to mesenchymal transition. Cell Research, 19(2):156-172, 2009. [77] J.M. L6pez-Novoa and M.A. Nieto. Inflammation and EMT: an alliance towards organ fibrosis and cancer progression. EMBO Molecular Medicine, 1(6-7):303314, 2009. [78] A. Singh and J. Settleman. EMT, cancer stem cells and drug resistance: an emerging axis of evil in the war on cancer. Oncogene, 29(34):4741-4751, 2010. [79] R.I. Nicholson, J.M. Gee, and M.E. Harper. EGFR and cancer prognosis. European Journal of Cancer (Oxford, England: 1990), 37:S9, 2001. [80] E.M. Bublil and Y. Yarden. The EGF receptor family: spearheading a merger of signaling and therapeutics. Current Opinion in Cell Biology, 19(2):124-134, 2007. [81] J.R. Grandis and J.C. Sok. Signaling through the epidermal growth factor receptor during the development of malignancy. Pharmacology & Therapeutics, 102(1):37-46, 2004. 221 [82] Y. Yarden and M.X. Sliwkowski. Untangling the ErbB signalling network. Nature Reviews Molecular Cell Biology, 2(2):127-137, 2001. [83] B.A. Frederick, B.A. Helfrich, C.D. Coldren, D. Zheng, D. Chan, P.A. Bunn, and D. Raben. Epithelial to mesenchymal transition predicts gefitinib resistance in cell lines of head and neck squamous cell carcinoma and non-small cell lung carcinoma. Molecular Cancer Therapeutics, 6(6):1683-1691, 2007. [84] S. Thomson, F. Petti, I. Sujka-Kwok, D. Epstein, and J.D. Haley. Kinase switching in mesenchymal-like non-small cell lung cancer lines contributes to EGFR inhibitor resistance through pathway redundancy. Clinical and Experimental Metastasis, 25(8):843-854, 2008. [85] S. Barr, S. Thomson, E. Buck, S. Russo, F. Petti, I. Sujka-Kwok, A. Eyzaguirre, M. Rosenfeld-Franklin, N.W. Gibson, M. Miglarese, et al. Bypassing cellular EGF receptor dependence through epithelial-to-mesenchymal-like transitions. Clinical and Experimental Metastasis, 25(6):685-693, 2008. [86] A. Chakravarti, J.S. Loeffler, and N.J. Dyson. Insulin-like growth factor receptor i mediates resistance to anti-epidermal growth factor receptor therapy in primary human glioblastoma cells through continued activation of phosphoinositide 3-kinase signaling. Cancer Research, 62(1):200-207, 2002. [87] B. Elenbaas, L. Spirio, F. Koerner, M.D. Fleming, D.B. Zimonjic, J.L. Donaher, N.C. Popescu, W.C. Hahn, and R.A. Weinberg. Human breast cancer cells generated by oncogenic transformation of primary mammary epithelial cells. Genes & Development, 15(1):50-65, 2001. [88] J.H. Taube, J.I. Herschkowitz, K. Komurov, A.Y. Zhou, S. Gupta, J. Yang, K. Hartwell, T.T. Onder, P.B. Gupta, K.W. Evans, et al. Core epithelialto-mesenchymal transition interactome gene-expression signature is associated with claudin-low and metaplastic breast cancer subtypes. Proceedings of the National Academy of Sciences, 107(35):15449-15454, 2010. [89] J. Yang, S.A. Mani, J.L. Donaher, S. Ramaswamy, R.A. Itzykson, C. Come, P. Savagner, I. Gitelman, A. Richardson, and R.A. Weinberg. Twist, a master regulator of morphogenesis, plays an essential role in tumor metastasis. Cell, 117(7):927-939, 2004. [90] T.A. Martin, A. Goyal, G. Watkins, and W.G. Jiang. Expression of the transcription factors snail, slug, and twist and their clinical significance in human breast cancer. Annals of Surgical Oncology, 12(6):488-496, 2005. [91] M.A. Eckert, T.M. Lwin, A.T. Chang, J. Kim, E. Danis, L. Ohno-Machado, and J. Yang. Twist1-induced invadopodia formation promotes tumor metastasis. Cancer Cell, 19(3):372-386, 2011. 222 [92] Y. Soini, H. Tuhkanen, R. Sironen, I. Virtanen, V. Kataja, P. Auvinen, A. Mannermaa, and V.M. Kosma. Transcription factors zeb1, twist and snail in breast carcinoma. BMC Cancer, 11(1):73, 2011. [93] M.G. Ponzo, R. Lesurf, S. Petkiewicz, F.P. O'Malley, D. Pinnaduwage, I.L. Andrulis, S.B. Bull, N. Chughtai, D. Zuo, M. Souleimanova, et al. Met induces mammary tumors with diverse histologies and is associated with poor outcome and human basal breast cancer. Proceedings of the National Academy of Sciences, 106(31):12903-12908, 2009. [94] J. Ma, M.C. DeFrances, C. Zou, C. Johnson, R. Ferrell, and R. Zarnegar. Somatic mutation and functional polymorphism of a novel regulatory element in the HGF gene promoter causes its aberrant expression in human breast cancer. Journal of Clinical Investigation, 119(3):478, 2009. [95] I.R. Hutcheson, J.M. Knowlden, S.E. Hiscox, D. Barrow, JM Gee, J.F. Robertson, 1.0. Ellis, R.I. Nicholson, et al. Heregulin 131 drives gefitinib-resistant growth and invasion in tamoxifen-resistant MCF-7 breast cancer cells. Breast Cancer Research, 9(4):R50, 2007. [96] H.D. Kim, T.W. Guo, A.P. Wu, A. Wells, F.B. Gertler, and D.A. Lauffenburger. Epidermal growth factor-induced enhancement of glioblastoma cell migration in 3D arises from an intrinsic increase in speed but an extrinsic matrix-and proteolysis-dependent increase in persistence. Molecular Biology of the Cell, 19(10):4249-4259, 2008. [97] E.J. Joslin, L.K. Opresko, A. Wells, H.S. Wiley, and D.A. Lauffenburger. EGF-receptor-mediated mammary epithelial cell migration is driven by sustained ERK signaling from autocrine stimulation. Journal of Cell Science, 120(20):3688-3699, 2007. [98] C. Hidalgo-Carcedo, S. Hooper, S.I. Chaudhry, P. Williamson, K. Harrington, B. Leitinger, and E. Sahai. Collective cell migration requires suppression of actomyosin at cell-cell contacts mediated by DDR1 and the cell polarity regulators Par3 and Par6. Nature Cell Biology, 13(1):49, 2011. [99] R.M. Neve, K. Chin, J. Fridlyand, J. Yeh, F.L. Baehner, T. Fevr, L. Clark, N. Bayani, J.P. Coppe, F. Tong, et al. A collection of breast cancer cell lines for the study of functionally distinct cancer subtypes. Cancer Cell, 10(6):515-527, 2006. [100] T. Blick, E. Widodo, H. Hugo, M. Waltham, ME Lenburg, RM Neve, and EW Thompson. Epithelial mesenchymal transition traits in human breast cancer cell lines. Clinical and Experimental Metastasis, 25(6):629-642, 2008. [101] A. De Luca and N. Normanno. Predictive biomarkers to tyrosine kinase inhibitors for the epidermal growth factor receptor in non-small-cell lung cancer. Current Drug Targets, 11(7):851-864, 2010. 223 [102] L. Li, K. Sampat, N. Hu, J. Zakari, and S.H. Yuspa. Protein kinase C negatively regulates Akt activity and modifies UVC-induced apoptosis in mouse keratinocytes. Journal of Biological Chemistry, 281(6):3237-3243, 2006. [103 M. Guarino. Src signaling in cancer invasion. Journal of Cellular Physiology, 223(1):14-26, 2010. [104] V. Aguirre, T. Uchida, L. Yenush, R. Davis, and M.F. White. The c-JunNH2-terminal kinase promotes insulin resistance during association with insulin receptor substrate-i and phosphorylation of Ser307. Journal of Biological Chemistry, 275(12):9047-9054, 2000. [105] X. Zhang, A. Chattopadhyay, Q. Ji, J.D. Owen, P.J. Ruest, G. Carpenter, and S.K. Hanks. Focal adhesion kinase promotes phospholipase C-'}1 activity. Proceedings of the National Academy of Sciences, 96(16):9021-9026, 1999. [106] M.P. Wymann and R. Schneiter. Lipid signalling in disease. Nature Reviews Molecular Cell Biology, 9(2):162-176, 2008. [107] X. Fang, S. Yu, J.L. Tanyi, Y. Lu, J.R. Woodgett, and G.B. Mills. Convergence of multiple signaling cascades at glycogen synthase kinase 3: Edg receptormediated phosphorylation and inactivation by lysophosphatidic acid through a protein kinase C-dependent intracellular pathway. Molecular and Cellular Biology, 22(7):2099-2110, 2002. [108] J. Gwak, M. Cho, S.J. Gong, J. Won, D.E. Kim, E.Y. Kim, S.S. Lee, M. Kim, T.K. Kim, J.G. Shin, et al. Protein kinase C-mediated -catenin phosphorylation negatively regulates the Wnt/#-catenin pathway. Journal of Cell Science, 119(22):4702-4709, 2006. [109] F. Liao, HS Shin, and SG Rhee. In vitro tyrosine phosphorylation of PLC-7y1 and PLC-72 by SRC family protein tyrosine kinases. Biochemical and Biophysical Research Communications, 191(3):1028-1033, 1993. [110] S.V. del Rinc6n, Q. Guo, C. Morelli, H.Y. Shiu, E. Surmacz, and W.H. Miller. Retinoic acid mediates degradation of IRS-1 by the ubiquitin-proteasome pathway, via a PKC-dependant mechanism. Oncogene, 23(57):9269-9279, 2004. [111] S. Ishibe, D. Joly, Z.X. Liu, and L.G. Cantley. Paxillin serves as an ERKregulated scaffold for coordinating FAK and Rac activation in epithelial morphogenesis. Molecular Cell, 16(2):257-267, 2004. [112] R.H. Alvarez, V. Valero, and G.N. Hortobagyi. Emerging targeted therapies for breast cancer. Journal of Clinical Oncology, 28(20):3366-3379, 2010. [113] M. Luo, P. Langlais, Z. Yi, N. Lefort, E.A. De Filippis, H. Hwang, C.Y. Christ-Roberts, and L.J. Mandarino. Phosphorylation of human insulin receptor substrate-i at serine 629 plays a positive role in insulin signaling. Endocrinology, 148(10):4895-4905, 2007. 224 [114] R. Wu, H. Kausar, P. Johnson, D.E. Montoya-Durango, M. Merchant, and M.J. Rane. Hsp27 regulates Akt activation and polymorphonuclear leukocyte apoptosis by scaffolding MK2 to Akt signal complex. Journal of Biological Chemistry, 282(30):21598-21608, 2007. [115] S. de Jong. SIMPLS: an alternative approach to partial least squares regression. Chemometrics and Intelligent Laboratory Systems, 18(3):251-263, 1993. [1161 I.-G. Chong and C.-H. Jun. Performance of some variable selection methods when multicollinearity is present. Chemometrics and Intelligent Laboratory Systems, 78(1):103-112, 2005. [117] Y. Benjamini and Y. Hochberg. Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society. Series B (Methodological), pages 289-300, 1995. [118] R.G. Miller. Simultaneous Statistical Inference. Springer-Verlag, 1981. [119] C. Ding and H. Peng. Minimum redundancy feature selection from microarray gene expression data. Journal of Bioinformatics and Computational Biology, 3(02):185-205, 2005. [120] U. M. Braga-Neto and E. R. Dougherty. Is cross-validation valid for smallsample microarray classification? Bioinformatics, 20(3):374-380, 2004. [121] R.B. Bendel and A.A. Afifi. Comparison of stopping rules in forward "stepwise" regression. Journal of the American Statistical Association, 72(357):4653, 1977. [122] R. Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological), pages 267-288, 1996. [123] H. Zou and T. Hastie. Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(2):301-320, 2005. [124] B.D. Cosgrove, B.M. King, M.A. Hasan, L.G. Alexopoulos, P.A. Farazi, B.S. Hendriks, L.G. Griffith, P.K. Sorger, B. Tidor, J.J. Xu, et al. Synergistic drugcytokine induction of hepatocellular death as an in vitro approach for the study of inflammation-associated idiosyncratic drug hepatotoxicity. Toxicology and Applied Pharmacology,237(3):317-330, 2009. [125] S. Pece, M. Chiariello, C. Murga, and J.S. Gutkind. Activation of the protein kinase Akt/PKB by the formation of E-cadherin-mediated cell-cell junctions Evidence for the association of phosphatidylinositol 3-kinase with the E-cadherin adhesion complex. Journal of Biological Chemistry, 274(27):19347-19351, 1999. 225 [126] B. Baum and M. Georgiou. Dynamics of adherens junctions in epithelial establishment, maintenance, and remodeling. Journal of Cell Biology, 192(6):907917, 2011. [127] C. Huang, Z. Rajfur, C. Borchers, M.D. Schaller, and K. Jacobson. JNK phosphorylates paxillin and regulates cell migration. Nature, 424(6945):219-223, 2003. [128] M.D. Schaller. Paxillin: a focal adhesion-associated adaptor protein. Oncogene, 20(44):6459, 2001. [129] D.S. Harburger and D.A. Calderwood. Integrin signalling at a glance. Journal of Cell Science, 122(2):159-163, 2009. [130] E.A.C. Almeida, D. Ilid, Q. Han, C.R. Hauck, F. Jin, H. Kawakatsu, D.D. Schlaepfer, and C.H. Damsky. Matrix survival signaling from fibronectin via focal adhesion kinase to c-Jun-NH2-terminal kinase. The Journal of Cell Biology, 149(3):741-754, 2000. [131] M.A. Wozniak, K. Modzelewska, L. Kwong, and P.J. Keely. Focal adhesion regulation of cell behavior. Biochimica et Biophysica Acta (BBA)-Molecular Cell Research, 1692(2):103-119, 2004. [132] P. Friedl and D. Gilmour. Collective cell migration in morphogenesis, regeneration and cancer. Nature Reviews Molecular Cell Biology, 10(7):445 457, 2009. [133] J.F. Santibaiez. JNK mediates TGF-01-induced epithelial mesenchymal transdifferentiation of mouse transformed keratinocytes. FEBS Letters, 580(22):5385-5391, 2006. [134] Q. Liu, H. Mao, J. Nie, W. Chen, Q. Yang, X. Dong, and X. Yu. Transforming growth factor #1 induces epithelial-mesenchymal transition by activating the JNK-Smad3 pathway in rat peritoneal mesothelial cells. Peritoneal Dialysis International,28(Supplement 3):S88-S95, 2008. [135] J. Wang, I. Kuiatse, A.V. Lee, J. Pan, A. Giuliano, and X. Cui. Sustained cJun-NH2-kinase activity promotes epithelial-mesenchymal transition, invasion, and survival of breast cancer cells by regulating extracellular signal-regulated kinase activation. Molecular Cancer Research, 8(2):266-277, 2010. [136] J.D. Storey and R. Tibshirani. Statistical significance for genomewide studies. Proceedings of the National Academy of Sciences, 100(16):9440-9445, 2003. [137] J.P. Wagner, A. Wolf-Yadlin, M. Sevecka, J.K. Grenier, D.E. Root, D.A. Lauffenburger, and G. MacBeath. Receptor tyrosine kinases fall into distinct classes based on their inferred signaling networks. Submitted, 2013. [138] S.R. Hubbard and J.H. Till. Protein tyrosine kinase structure and function. Annual Review of Biochemistry, 69(1):373-398, 2000. 226 [139] A.B. Turke, K. Zejnullahu, Y.L. Wu, Y. Song, D. Dias-Santagata, E. Lifshits, L. Toschi, A. Rogers, T. Mok, L. Sequist, et al. Preexistence and clonal selection of MET amplification in EGFR mutant NSCLC. Cancer Cell, 17(1):77-88, 2010. [140] J. Qi, M.A. McTigue, A. Rogers, E. Lifshits, J.G. Christensen, P.A. Jdnne, and J.A. Engelman. Multiple mutations and bypass mechanisms can contribute to development of acquired resistance to MET inhibitors. Cancer Research, 71(3):1081-1091, 2011. [141] Z. Zhang, J.C. Lee, L. Lin, V. Olivas, V. Au, T. LaFramboise, M. AbdelRahman, X. Wang, A.D. Levine, J.K. Rho, et al. Activation of the AXL kinase causes resistance to EGFR-targeted therapy in lung cancer. Nature Genetics, 44(8):852-860, 2012. [142] T.R. Wilson, J. Fridlyand, Y. Yan, E. Penuel, L. Burton, E. Chan, J. Peng, E. Lin, Y. Wang, J. Sosman, et al. Widespread potential for growth-factordriven resistance to anticancer kinase inhibitors. Nature, 2012. [143] F. Harbinski, V.J. Craig, S. Sanghavi, D. Jeffery, L. Liu, K.A. Sheppard, S. Wagner, C. Stamm, A. Buness, C. Chatenay-Rivauday, et al. Rescue screens with secreted proteins reveal compensatory potential of receptor tyrosine kinases in driving cancer growth. Cancer Discovery, 2012. [144] J. Tegner, M.K.S. Yeung, J. Hasty, and J.J. Collins. Reverse engineering gene networks: integrating genetic perturbations with dynamical modeling. Proceedings of the National Academy of Sciences, 100(10):5944-5949, 2003. [145] R.J. Prill, J. Saez-Rodriguez, L.G. Alexopoulos, P.K. Sorger, and G. Stolovitzky. Crowdsourcing network inference: the DREAM predictive signaling network challenge. Science Signaling, 4(189):mr7, 2011. [146] A. Gordus, J.A. Krall, E.M. Beyer, A. Kaushansky, A. Wolf-Yadlin, M. Sevecka, B.H. Chang, J. Rush, and G. MacBeath. Linear combinations of docking affinities explain quantitative differences in RTK signaling. Molecular Systems Biology, 5(1), 2009. [147] J. Moffat, D.A. Grueneberg, X. Yang, S.Y. Kim, A.M. Kloepfer, G. Hinkle, B. Piqani, T.M. Eisenhaure, B. Luo, J.K. Grenier, et al. A lentiviral RNAi library for human and mouse genes applied to an arrayed viral high-content screen. Cell, 124(6):1283-1298, 2006. [148] M.D. Marmor, K.B. Skaria, Y. Yarden, et al. Signal transduction and oncogenesis by ErbB/HER receptors. InternationalJournal of Radiation Oncology, Biology, Physics, 58(3):903, 2004. [149] M. Sevecka, A. Wolf-Yadlin, and G. MacBeath. Lysate microarrays enable high-throughput, quantitative investigations of cellular signaling. Molecular B Cellular Proteomics, 10(4), 2011. 227 [150] O.E. Sturm, R. Orton, J. Grindlay, M. Birtwistle, V. Vyshemirsky, D. Gilbert, M. Calder, A. Pitt, B. Kholodenko, and W. Kolch. The mammalian MAPK/ERK pathway exhibits properties of a negative feedback amplifier. Science Signaling, 3(153):ra90, 2010. [151] Q. Wang, Y. Zhou, X. Wang, and B.M. Evers. Glycogen synthase kinase-3 is a negative regulator of extracellular signal-regulated kinase. Oncogene, 25(1):4350, 2005. [152] J.E. Ferrell Jr. What do scaffold proteins really do? 2000(52):pel, 2000. Science Signaling, [153] Y. Kim, Z. Paroush, K. Nairz, E. Hafen, G. Jimenez, and S.Y. Shvartsman. Substrate-dependent control of MAPK phosphorylation in vivo. Molecular Systems Biology, 7(1), 2011. [154] M.L. Wynn, A.C. Ventura, J.A. Sepulchre, H.J. Garcia, and S.D. Merajver. Kinase inhibitors can produce off-target effects and activate linked pathways by retroactivity. BMC Systems Biology, 5(1):156, 2011. [155] D. Marbach, J.C. Costello, R. Kiiffner, N.M. Vega, R.J. Prill, D.M. Camacho, K.R. Allison, M. Kellis, J.J. Collins, G. Stolovitzky, et al. Wisdom of crowds for robust gene network inference. Nature Methods, 2012. [156] A.J. Butte and I.S. Kohane. Mutual information relevance networks: functional genomic clustering using pairwise entropy measurements. In Pacific Symposium on Biocomputing, volume 5, pages 418-429, 2000. [157] G.A.F. Seber. Multivariate Observations. John Wiley and Sons, 1984. [158] J. Barretina, G. Caponigro, N. Stransky, K. Venkatesan, A.A. Margolin, S. Kim, C.J. Wilson, J. Lehair, G.V. Kryukov, D. Sonkin, et al. The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature, 483(7391):603-607, 2012. [159] R.C. Harris, E. Chung, and R.J. Coffey. EGF receptor ligands. Experimental Cell Research, 284(1):2-13, 2003. [160] X. Zhang, O.A. Ibrahimi, S.K. Olsen, H. Umemori, M. Mohammadi, and D.M. Ornitz. Receptor specificity of the fibroblast growth factor family. Journal of Biological Chemistry, 281(23):15694-15700, 2006. [161] S.P. Squinto, T.N. Stitt, T.H. Aldrich, S. Davis, SM Bianco, C. RadzieJewski, D.J. Glass, P. Masiakowski, M.E. Furth, D.M. Valenzuela, et al. trkb encodes a functional receptor for brain-derived neurotrophic factor and neurotrophin-3 but not nerve growth factor. Cell, 65(5):885, 1991. [162] J. Andrae, R. Gallini, and C. Betsholtz. Role of platelet-derived growth factors in physiology and medicine. Genes & Development, 22(10):1276-1312, 2008. 228 [163] R. Straussman, T. Morikawa, K. Shee, M. Barzily-Rokni, Z.R. Qian, J. Du, A. Davis, M.M. Mongare, J. Gould, D.T. Frederick, et al. Tumour microenvironment elicits innate resistance to RAF inhibitors through HGF secretion. Nature, 487(7408):500-504, 2012. [164] A.K. Mitra, K. Sawada, P. Tiwari, K. Mui, K. Gwin, and E. Lengyel. Ligandindependent activation of c-Met by fibronectin and a5/31-integrin regulates ovarian cancer invasion and metastasis. Oncogene, 30(13):1566-1576, 2011. [165] M.I. Davis, J.P. Hunt, S. Herrgard, P. Ciceri, L.M. Wodicka, G. Pallares, M. Hocker, D.K. Treiber, and P.P. Zarrinkar. Comprehensive analysis of kinase inhibitor selectivity. Nature Biotechnology, 29(11):1046-1051, 2011. [166] S. Corso, E. Ghiso, V. Cepero, J.R. Sierra, C. Migliore, A. Bertotti, L. Trusolino, P.M. Comoglio, and S. Giordano. Activation of HER family members in gastric carcinoma cells mediates resistance to MET inhibition. Molecular Cancer, 9, 2010. [167] M.E. Marshall, T.K. Hinz, S.A. Kono, K.R. Singleton, B. Bichon, K.E. Ware, L. Marek, B.A. Frederick, D. Raben, and L.E. Heasley. Fibroblast growth factor receptors are components of autocrine signaling networks in head and neck squamous cell carcinoma cells. Clinical Cancer Research, 17(15):50165025, 2011. [168] H. Fischer, N. Taylor, S. Allerstorfer, M. Grusch, G. Sonvilla, K. Holzmann, U. Setinek, L. Elbling, H. Cantonati, B. Grasl-Kraupp, et al. Fibroblast growth factor receptor-mediated signals contribute to the malignant phenotype of non-small cell lung cancer cells: therapeutic implications and synergism with epidermal growth factor receptor inhibition. Molecular Cancer Therapeutics, 7(10):3408-3419, 2008. [169] L. Marek, K.E. Ware, A. Fritzsche, P. Hercule, W.R. Helton, J.E. Smith, L.A. McDermott, C.D. Coldren, R.A. Nemenoff, D.T. Merrick, et al. Fibroblast growth factor (FGF) and FGF receptor-mediated autocrine signaling in nonsmall-cell lung cancer cells. Molecular Pharmacology, 75(1):196-207, 2009. [170] K.E. Ware, M.E. Marshall, L.R. Heasley, L. Marek, T.K. Hinz, P. Hercule, B.A. Helfrich, R.C. Doebele, and L.E. Heasley. Rapidly acquired resistance to EGFR tyrosine kinase inhibitors in NSCLC cell lines through de-repression of FGFR2 and FGFR3 expression. PLoS One, 5(11):e14117, 2010. [171] M. Guix, A.C. Faber, S.E. Wang, M.G. Olivares, Y. Song, S. Qu, C. Rinehart, B. Seidel, D. Yee, C.L. Arteaga, et al. Acquired resistance to EGFR tyrosine kinase inhibitors in cancer cells is mediated by loss of IGF-binding proteins. Journal of Clinical Investigation, 118(7):2609, 2008. [172] F. Huang, A. Greer, W. Hurlburt, X. Han, R. Hafezi, G.M. Wittenberg, K. Reeves, J. Chen, D. Robinson, A. Li, et al. The mechanisms of differential 229 sensitivity to an insulin-like growth factor-1 receptor inhibitor (BMS-536924) and rationale for combining with EGFR/HER2 inhibitors. Cancer Research, 69(1):161-170, 2009. [173] C. Garofalo, MC Manara, G. Nicoletti, MT Marino, PL Lollini, A. Astolfi, G. Pandini, JA Lopez-Guerrero, KL Schaefer, A. Belfiore, et al. Efficacy of and resistance to anti-IGF-IR therapies in Ewing's sarcoma is dependent on insulin receptor signaling. Oncogene, 30(24):2730-2740, 2011. [174] D. Milojkovic and J. Apperley. Mechanisms of resistance to imatinib and second-generation tyrosine inhibitors in chronic myeloid leukemia. Clinical Cancer Research, 15(24):7519-7527, 2009. [175] K. Inaki and E.T. Liu. Structural mutations in cancer: mechanistic and functional insights. Trends in Genetics, 28(11):550-559, 2012. [176] R. Beroukhim, G. Getz, L. Nghiemphu, J. Barretina, T. Hsueh, D. Linhart, I. Vivanco, J.C. Lee, J.H. Huang, S. Alexander, et al. Assessing the significance of chromosomal aberrations in cancer: methodology and application to glioma. Proceedings of the National Academy of Sciences, 104(50):20007-20012, 2007. [177] A. Hellman, E. Zlotorynski, S.W. Scherer, J. Cheung, J.B. Vincent, D.I. Smith, L. Trakhtenbrot, and B. Kerem. A role for common fragile site induction in amplification of human oncogenes. Cancer Cell, 1(1):89-97, 2002. [178] H. Riedel, TJ Dull, AM Honegger, J. Schlessinger, and A. Ullrich. Cytoplasmic domains determine signal specificity, cellular routing characteristics and influence ligand binding of epidermal growth factor and insulin receptors. The EMBO Journal,8(10):2943, 1989. [179] A.P. Won, J.E. Garbarino, and W.A. Lim. Recruitment interactions can override catalytic interactions in determining the functional identity of a protein kinase. Proceedings of the National Academy of Sciences, 108(24):9809-9814, 2011. [180] L. Naldini, U. Blomer, P. Gallay, D. Ory, R. Mulligan, FH Gage, IM Verma, and D. Trono. In-vivo gene delivery and stable transduction of nondividing cells by a lentiviral vector. Science, 272(5259):263-267, 1996. [181] S.M. Chan, J. Ermann, L. Su, C.G. Fathman, and P.J. Utz. Protein microarrays for multiplex analysis of signal transduction pathways. Nature Medicine, 10(12):1390 1396, 2004. [182] D. Eaton and K. Murphy. Bayesian structure learning using dynamic programming and MCMC. In Proceedings of the 23rd Conference on Uncertainty in Artificial Intelligence, 2007. 230 [183] A.V. Kozlov and D. Koller. Nonuniform dynamic discretization in hybrid networks. In Proceedings of the Thirteenth Conference on Uncertainty in Artificial Intelligence, pages 314-325. Morgan Kaufmann Publishers Inc., 1997. [184] D.N. Reshef, Y.A. Reshef, H.K. Finucane, S.R. Grossman, G. McVean, P.J. Turnbaugh, E.S. Lander, M. Mitzenmacher, and P.C. Sabeti. Detecting novel associations in large data sets. Science, 334(6062):1518-1524, 2011. [185] J. Yu, V.A. Smith, P.P. Wang, A.J. Hartemink, and E.D. Jarvis. Advances to Bayesian network inference for generating causal networks from observational biological data. Bioinformatics, 20(18):3594-3603, 2004. [186] K. Sachs. Bayesian network models of biological signaling pathways. PhD thesis, Massachusetts Institute of Technology, 2006. [187] K. Sachs, S. Itani, J. Fitzgerald, L. Wille, B. Schoeberl, MA Dahleh, and GP Nolan. Learning cyclic signaling pathway structures while minimizing data requirements. In Pacific Symposium on Biocomputing, pages 63-74, 2009. [188] P.J. Woolf, W. Prudhomme, L. Daheron, G.Q. Daley, and D.A. Lauffenburger. Bayesian analysis of signaling networks governing embryonic stem cell fate decisions. Bioinformatics, 21(6):741-753, 2005. [189] K. Basso, A.A. Margolin, G. Stolovitzky, U. Klein, R. Dalla-Favera, and A. Califano. Reverse engineering of regulatory networks in human B cells. Nature Genetics, 37(4):382-390, 2005. [190] C. Olsen, P.E. Meyer, and G. Bontempi. On the impact of entropy estimation on transcriptional regulatory network inference based on mutual information. EURASIP Journal on Bioinformatics and Systems Biology, 2009(1):308959, 2009. [191] Y. Yang and G. Webb. On why discretization works for naive-bayes classifiers. AI 2003: Advances in Artificial Intelligence, pages 440-452, 2003. [192] T. Silander, P. Kontkanen, and P. Myllymaki. On sensitivity of the MAP Bayesian network structure to the equivalent sample size parameter. arXiv preprint arXiv:1206.5293, 2012. [193] A.J. Hartemink. Principled computational methods for the validation discovery of genetic regulatory networks. PhD thesis, Massachusetts Institute of Technology, 2001. [194] R.H. Blair, D.J. Kliebenstein, and G.A. Churchill. What can causal networks tell us about metabolic pathways? PLoS ComputationalBiology, 8(4):e1002458, 2012. [195] A.J. Hartemink. Reverse engineering gene regulatory networks. Nature Biotechnology, 23(5):554-555, 2005. 231 [196] R. Kalluri and E.G. Neilson. Epithelial-mesenchymal transition and its implications for fibrosis. Journal of Clinical Investigation, 112(12):1776-1784, 2003. [197] K.S. Lau, V. Cortez-Retamozo, S.R. Philips, M.J. Pittet, D.A. Lauffenburger, and K.M. Haigis. Multi-scale in vivo systems analysis reveals the influence of immune cells on TNF-a-induced apoptosis in the intestinal epithelium. PLoS Biology, 10(9):e1001393, 2012. [198] G. Elidan and N. Friedman. Learning hidden variable networks: the information bottleneck approach. Journal of Machine Learning Research, 6(1):81, 2006. [199] B. D'Ambrosio. Inference in Bayesian networks. AI Magazine, 20(2):21, 1999. [200] S. Mukherjee and T.P. Speed. Network inference using informative priors. Proceedings of the National Academy of Sciences, 105(38):14313-14318, 2008. [201] J.W. Robinson and A.J. Hartemink. Learning non-stationary dynamic bayesian networks. Journal of Machine Learning Research, 11:3647-3680, 2010. 232