Multivariate studies of receptor tyrosine ... function in cancer Joel Patrick Wagner JUN

Multivariate studies of receptor tyrosine kinase
function in cancer
MASSACHUSETTS INSTIWE
OF TECHNOLOGY
by
Joel Patrick Wagner
JUN 2 7 2013
B.S., Chemical Engineering, B.S., Biochemistry
University of Wisconsin-Madison (2006)
M.Phil., Computational Biology
University of Cambridge (2007)
LIBRARIES
Submitted to the Department of Biological Engineering
in partial fulfillment of the requirements for the degree of
Doctor of Philosophy in Biological Engineering
at the
MASSACHUSETTS INSTITUTE OF TECHNOLOGY
June 2013
@
Massachusetts Institute of Technology 2013. All rights reserved.
/71
10
Author ........................
Depa4 nt of Bioldgcal Engineering
oFeb;uagy 20, 2013
Certified by............
~---?
A ccepted by ..................
Dougla'i A/ L/&ffenburger
Ford Professor of Bioengineering
/Z asis Supervisor
'
...................
Forest M. White
Chair, Graduate Program Committee
This doctoral thesis has been examined by a Committee of the Department of
Biological Engineering as follows:
Professor Douglas Lauffenburger
Thesis Supervisor
Ford Professor of Bioengineering
Professor Ernest Fraenkel
Chairman, Thesis Committee
Associate Professor of Biological Engineering
Professor Forest White
Member, Thesis Committee
Associate Professor of Biological Engineering
2
Multivariate studies of receptor tyrosine kinase function in
cancer
by
Joel Patrick Wagner
Submitted to the Department of Biological Engineering
on February 20, 2013, in partial fulfillment of the
requirements for the degree of
Doctor of Philosophy in Biological Engineering
Abstract
Receptor tyrosine kinases (RTKs) are critical regulators of cellular homeostasis in
multicellular organisms. They influence cell proliferation, migration, differentiation,
and transcriptional activation, among other processes, and are therefore also relevant
to cancer biology. Upon interaction with cognate ligand, RTKs initiate signaling
cascades dependent in part on the phosphorylation of proteins. From a computational perspective, this thesis has studied methods for quantifying relationships between measured signals (using Bayesian network inference, correlation, and mutual
information-based methods), and between signals and cellular phenotypes (using linear regression, partial least squares regression, and feature selection methods). From
a biological perspective, this thesis has studied signaling between RTKs, signaling and
cell migration downstream of RTKs in epithelial versus mesenchymal cell states, and
comparative signaling across six RTKs. In the latter case, the results show that the
six RTKs cluster into three classes based on their inferred signaling networks. Using
publicly available transcriptional and pharmacological profiling data from hundreds
of cancer cell lines, it was determined that expression of same-class RTK genes or
their cognate ligands can correlate with insensitivity to drugs targeting other RTKs
in that class. This suggests that resistance to RTK-targeted therapies in cancer may
emerge in part because same-class RTKs can compensate for the reduced signaling of
the inhibited receptor. The thesis concludes by quantitatively exploring the features
of experimental data that improve model accuracy.
Thesis Supervisor: Douglas A. Lauffenburger
Title: Ford Professor of Bioengineering
3
4
Acknowledgments
I would like to first thank my advisor, Professor Doug Lauffenburger, for providing
guidance and support throughout my time at MIT. Doug provides a unique lab environment that purposefully and substantively blends biological and computational
research in a manner that few other labs in the world can claim to do. Implementing this type of hybrid, interdisciplinary environment is very important for the field,
and to have been able to develop, train, and immerse myself in such an environment
during my graduate studies has been an honor and a privilege. I would also like to
thank Doug for allowing me to attend and present at so many difference conferences,
meetings, and workshops. Sharing our work, while at the same time exposing myself
to so many different and new ideas, has been an immense benefit both for my graduate experience and my career beyond. I would also like to thank Doug for allowing
me to apply for and participate in the National Science Foundation's East Asia and
Pacific Summer Institutes program in Singapore, where I worked with Edison Liu at
the Genome Institute of Singapore. Having the flexibility to essentially take leave
for nearly three months in the middle of my PhD was an eye-opening experience
scientifically, professionally, and personally. Doug has consistently been supportive
of anything that I thought would benefit my future career, even if it was not directly
related to my work in his lab, which is something that very few PhD advisors actually
do, and for that I am forever thankful.
I would next like to thank my thesis committee members Professors Forest White
and Ernest Fraenkel, who served as committee chair. Ernest and Forest have provided
helpful guidance and thoughtful contributions to my research throughout my time at
MIT. I am also thankful for their critical reading of this thesis document.
I would like to thank my collaborator Professor Richard Jones at the University
of Chicago, and his lab members, particularly Mark Ciaccio. Rich has served as an
excellent collaborator throughout my graduate career. It was a real pleasure working
with Rich on the Nature Methods paper. He was an engaged and thoughtful colleague, and a model for how a more experiment-centric researcher can interface with
5
a more computation-centric researcher. Through emails, phone calls, and in-person
discussions, including a week spent in his lab in Chicago and mutual attendance at
two conferences, I have learned much and benefited greatly.
I would like to thank my collaborators Mark Sevecka and Alejandro Wolf-Yadlin
from Gavin MacBeath's lab at Harvard University. I began collaborating with Mark
early in my graduate career on numerous projects, and it has been a real pleasure.
Mark always provided thorough, substantive, and thoughtful responses to my questions, for which I will always be thankful, and from which the projects have benefited
greatly. I began working with Ale part way through my graduate career, which was
also a pleasure. Ale and Mark put forth immense effort in collecting all of the data
for the receptor tyrosine kinase project, and provided helpful guidance and discussion
during the analysis of the data.
I would like to thank my collaborators Shannon Hughes, Aaron Meyer, and HD
Kim from the Lauffenburger lab. We worked together on the epithelial-mesenchymal
transition project. Discussions with Shannon and Aaron about the nature of cell
migration, and the signaling underlying it, were very useful and thought-provoking.
I appreciate them taking the time to help educate me about the nuances of cell
migration biology.
I would like to thank my collaborator William Chen from Professor Peter Sorger's
lab at Harvard University. Will is among the most thoughtful and interesting people I
spoke with during my graduate career, and every discussion with him was a rewarding
and enjoyable experience.
I would like to individually thank Julio Saez-Rodriguez for his mentorship and
guidance in the earlier stages of my graduate studies. Beyond Doug, no other person
at MIT had a greater impact on my thought processes than Julio. I am fortunate to
consider him a colleague and friend.
I am grateful to many within the Lauffenburger lab and the wider Biological Engineering community at MIT for helpful and insightful discussions and support: Miles
Miller, Melody Morris, Brian Joughin, Justin Pritchard, Kristen Naegle, Michael
Beste, Edgar Sanchez, Seymour de Picciotto, Chris Ng, Carol Huang, Dave Clarke,
6
Doug Jones, Jorge Valdez, Nate Tedford, Sarah Schrier, Jen Wilson, Kelly Benedict,
Thomas Willems, Abby Hill, Ta-Chun Hang, and Greg Riddick. I am also thankful
to Tommi Jaakkola and David Wingate in the Electrical Engineering and Computer
Science department at MIT for helpful discussions regarding graphical modeling. And
for their support with research and beyond at MIT, I would like to thank Lauffenburger lab manager Hsinhwa Lee, Lauffenburger lab administrative assistant JoAnn
Sorrento, and Information Systems Administrator and all-around good guy Aran Parillo.
Beyond MIT, I would like to thank David Heckerman at Microsoft Research for
very helpful discussions regarding the data quality chapter of this thesis, Daniel Eaton
from Kevin Murphy's lab at the University of British Columbia for providing the
Bayesian Network Structure Learning MATLAB code and for very helpful discussions
regarding Bayesian networks, and Nickel Dittrich from the University of Magdeburg
for early work on data discretization methods.
I would like to thank Professor Edison Liu formerly from the Genome Institute
of Singapore, as well as Drs. Francesca Menghi and Xing Yi Woo in his lab. Ed
graciously agreed to host me for the NSF EAPSI program, not only in his lab but
also partly in his home, and for that I will be forever grateful. It was an amazing
opportunity in every respect. I look forward to continuing my work with Ed as a
postdoctoral associate in his lab at The Jackson Laboratory for Genomic Medicine.
I would like to thank my classmates in BE-2007: Edgar Sanchez, Jeff Wagner,
Melody Morris, Brian Belmont, Steve Goldfless, Michelle Sukup, Emily Florine, Bryan
Bryson, Francisco Delgado, Eddie Eltoukhy, Ricardo Gonzalez, Karunya Srinivasan,
and Prabhani Atukorale. Their friendship, support, and collegiality, mixed with a
viscous sublayer of craziness, have been the best part of graduate school. Thank you
each. And guys, some day we will get those quals beards down.
I would like to thank my undergraduate research advisers, who took the time to
train and support me even when I was a novice: Professor Sean Palecek at the University of Wisconsin-Madison, along with Fang Li from his lab and Dagang Huang
from Professor Eric Shusta's lab; Melissa Lambeth Kemp from Professor Doug Lauf7
fenburger's lab at MIT; and Amariliz Rivera from Professor Eric Pamer's lab at
Memorial Sloan-Kettering Cancer Center.
I would also like to thank many of the influential teachers I have had throughout
my formal education, who played immense and pivotal roles in determining who I
would eventually become. I would like to thank them for dedicating a small portion
of their life to improving my own, a truly generous and gracious act.
From Jackson Elementary: Carol Steiner, Ruth Windmuller, Margaret Mihalic, Virginia Pliner, Jim Bugni, Nancy Reck, and Ken Govek.
From Franklin Middle: Diane Bacon, Anne Smith, Renee Kasten, Scott Christy, Ron
Huisheere, Chick Hawkins, Sheila Wanta, Brenda Winkler, and Jon Taft.
From West High: Jim Van Abel, Mary Diedrich, Dean Cherry, Bryan Radue, Eleanor
Hinz, Sue Kuester, Scott Winkler, Harlan Shupita, Bill Freude, Pam Sylvester, Ron
Wallberg, Don Buntman, and Bill Zigmund.
From the University of Wisconsin-Madison: Alexandru Ionescu, Fleming Crim, Kenneth George, Sigurd Angenent, Claude Woods, Robert Morse, Fred Roesler, Ieva Reich, Regina Murphy, Marshall Slemrod, Manos Mavrikakis, John Yin, Nick Abbott,
Rafael Chavez, Antony Stretton, Charles Hill, David Nelson, Michael Cox, Jeremi
Suri, Thomas Martin, Paul Nealey, Ross Swaney, Eric Shusta, Gary Splitter, Arun
Yethiraj, and Christos Maravelias.
From the University of Cambridge: Stephen Eglen, Simon Tavar6, Johan Paulsson,
and Julia Gog.
From the Massachusetts Institute of Technology: Dane Wittrup, Bruce Tidor, Forest
White, Ernest Fraenkel, Alan Grodzinsky, John Deutch, Arup Chakraborty, Roger
Kamm, Stephen Bell, Frank Solomon, Tommi Jaakkola, David Gifford, and Monty
Krieger.
I would like to thank my Boy Scout leaders Mason Thibeault and Tom Seibert.
Sadly, they are gone too soon; but I am forever thankful for their mentorship, guidance, and teaching. They were two of the most important role models in my life, and
this thesis document is also a tribute to their years of selfless support.
I would like to thank my family, especially my mom and stepdad, Linda and Craig,
8
and my dad and stepmom, Al and Tina. Their support, guidance, encouragement,
selflessness, and love are without bound. I would also like to thank my grandparents,
my siblings-Simon, Matt, Jon, and Heather-and my siblings-in-law-Becky, Sally,
and Dustin. I would especially like to thank Heather and Dustin for all their support
during college.
And lastly, I would like to thank my lovely girlfriend Brittany for all her support
during these sometimes tumultuous and tiresome months. You are incredibly special.
Thank you all. This thesis is by and for each of you.
9
Dicebat Bernardus Carnotensis nos esse quasi nanos, gigantium humeris insidentes, ut possimus plura eis et remotiora videre, non utique proprii visus
acumine, aut eminentia corporis, sed quia in altum subvenimur et extollimur
magnitudine gigantea.
-John of Salisbury, Metalogicon (1159)
10
Contents
1
1.1
Mutual advancement of measurement and modeling techniques . . . .
24
1.2
Multivariate modeling techniques
. . . . . . . . . . . . . . . . . . . .
25
. . .
26
1.3
1.2.1
Causal interpretations across network inference methods
1.2.2
Bayesian networks
. . . . . . . . . . . . . . . . . . . . . . . .
28
1.2.3
Similarities across seemingly disparate modeling strategies . .
29
Modeling phenotypic data . . . . . . . . . . . . . . . . . . . . . . . .
33
Identification of drug targets . . . . . . . . . . . . . . . . . . .
35
Overview of thesis contents . . . . . . . . . . . . . . . . . . . . . . . .
36
1.3.1
1.4
2
23
Introduction
Systems analysis of EGF receptor signaling dynamics with microwestern arrays
39
2.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . .
39
2.2
Results . . . . . . . . . . . . . . . . . . . . . . . . . . .
41
2.2.1
Fabrication of MWAs . . . . . . . . . . . . . . .
41
2.2.2
Validation of MWA method . . . . . . . . . . .
42
2.2.3
Comparison of macrowestern blots and MWAs .
44
2.2.4
Application of MWAs to analysis of EGFR signaling network .
44
2.2.5
Comparison of signaling network at different EGF input levels
47
2.2.6
Bayesian network modeling of receptor layer connectivity . . .
51
2.3
D iscussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
58
2.4
M ethods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
60
2.4.1
60
Signaling network inference modeling . . . . . . . . . . . . . .
11
3
2.4.2
Testing for model significance . . . . . . . . . . . . . . . . . .
64
2.4.3
Comparing different algorithm results . . . . -. . . . . . . . . .
68
2.4.4
Equivalence class analysis for Bayesian network algorithm
.
70
2.4.5
Parent constraint analysis for Bayesian network algorithm
.
72
Signaling network state predicts Twist-mediated effects on breast
75
cell migration across diverse growth factor contexts
3.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
75
3.2
R esults . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
78
3.2.1
Diverse cell motility behavior and growth factor treatment responses in epithelial versus mesenchymal mode . . . . . . . . .
3.2.2
Quantitative analysis of growth factor-elicited multiple-pathway
signaling network dynamics
3.2.3
78
. . . . . . . . . . . . . . . . . . .
85
Node-to-node correlation topology model reveals quantitatively
different signaling relationships between epithelial and mesenchym al states . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2.4
88
PLSR model-reduction analysis reveals quantitatively different
93
pathway emphases between epithelial and mesenchymal modes
3.2.5
3.3
3.4
4
Linear regression predicts cell speed more accurately than PLSR
m odels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
101
D iscussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
109
3.3.1
Excerpt from Discussion in Kim et al .
. . . . . . . . . . . . .
109
3.3.2
Additional discussion . . . . . . . . . . . . . . . . . . . . . . .
111
M ethods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
116
3.4.1
Correlation network modeling
. . . . . . . . . . . . . . . . . .
117
3.4.2
Reduced PLSR models . . . . . . . . . . . . . . . . . . . . . .
117
Receptor tyrosine kinases fall into distinct classes based on their
119
inferred signaling networks
4.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
119
4.2
R esults . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
120
12
4.2.1
A systematic perturbation-based approach to uncover RTKspecific signaling networks . . . . . . . . . . . . . . . . . . . .
4.2.2
RNAi perturbations reveal conserved Akt, MAPK, and PKC
pathways across six RTKs . . . . . . . . . . . . . . . . . . . .
123
. . .
131
4.2.3
Data-driven network inference reveals three RTK classes
4.2.4
Consensus across inference methods reveals RTK class-specific
signaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.2.5
134
RTKs and ligands are co-expressed in cancer cell lines and enriched in certain solid tumor types
4.2.6
120
. . . . . . . . . . . . . . . 140
RTK network class genes are correlated with responses to RTKtargeted therapies .
. . . . . . . . . . . . . . . . . . . . .
14 6
4.3
D iscussion . . . . . . . . . . . . . . . . . . . . . . . . . . . .
152
4.4
Materials and methods . . . . . . . . . . . . . . . . . . . . .
157
4.4.1
C ell culture . . . . . . . . . . . . . . . . . . . . . . .
157
4.4.2
Microarray fabrication
. . . . . . . . . . . . . . . . .
158
4.4.3
Microarray probing . . . . . . . . . . . . . . . . . . .
159
4.4.4
Extraction of microarray data . . . . . . . . . . . . .
159
4.4.5
Data pre-processing . . . . . . . . . . . . . . . . . . .
160
4.4.6
Quantifying the consistency of biological replicates and shRNA
pa irs . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . 161
4.4.7
Quantifying shRNA effects . . . . . . . . . . . . . . .
. . . 162
4.4.8
shRNA effects simulations . . . . . . . . . . . . . . .
. . . 163
4.4.9
Identifying signaling time scales . . . . . . . . . . . .
. . . 164
. . .
165
. . . . . . . . . . . . . . . . . .
165
4.4.10 Data discretization . . . . . . . . . . . . . . . . . . .
4.4.11 Network inference algorithms
4.4.12 Comparison of RTKs by inferred network structures through
dimensionality reduction . . . . . . . . . . . . . . . . . . . . . 167
4.4.13 Network model edge weight threshold robustness
. . . . . . .
169
4.4.14 Generating receptor class-specific consensus networks across inference m ethods . . . . . . . . . . . . . . . . . . . . . . . . . .
13
169
4.4.15 Clustering the raw data
. . . . . . . . . . . . . . . . . . . . .
4.4.16 Generating synthetic data for network inference
. . . . . . . .
170
171
4.4.17 Cancer Cell Line Encyclopedia mRNA expression principal com-
ponent analysis . . . . . . . . . . . . . . . . . . . . . . . . . .
4.4.18 Tumor histology enrichment/depletion
. . . . . . . . . . . . .
173
4.4.20 Partial correlation between genes and drug response . . . . . .
174
Comparison of RTKs by receptor-intrinsic properties through
dimensionality reduction . . . . . . . . . . . . . . . . . . . . .
174
Quality versus quantity: Identifying features of biological data for
making better models
177
5.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
177
5.2
Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
178
5.2.1
A simple two-variable toy model . . . . . . . . . . . . . . . . .
178
5.2.2
Analytical estimates for prediction accuracy as a function of
data range in the two-variable toy model . . . . . . . . . . . .
180
5.2.3
Simulating data from multivariate linear regression networks
184
5.2.4
Inferring Bayesian networks using simulated network data
187
5.2.5
Bayesian network inference accuracy is a function of data range
.
and discretization level . . . . . . . . . . . . . . . . . . . . . .
5.2.6
5.2.7
. . . . . . . . . . . . . . . . . . . . . . .
196
Predicted discretization corresponds strongly with best-performing
discretization
5.3
191
An a priori discretization strategy based on experimental measurement parameters
6
173
4.4.19 Correlating gene expression and drug activity area . . . . . ...
4.4.21
5
172
. . . . . . . . . . . . . . . . . . . . . . . . . . .
199
D iscussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
201
205
Conclusion
. . . . . . . . . . . .
206
. . . . . . . . . . . . . . . .
207
Limitations of methods . . . . . . . . . . . . . . . . . . . . . . . . . .
208
6.1
Emergent biological and computational insights
6.2
Guidelines for analysis of large data sets
6.3
14
6.4
Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210
15
16
List of Figures
1-1
Bayesian network joint probability distributions . . . . . . . . . . . .
29
1-2
Network models as combined input-output models . . . . . . . . . . .
31
1-3
Summary of multivariate modeling approaches . . . . . . . . . . . . .
32
2-1
Microwestern array schematic . . . . . . . . . . . . . . . . . . . . . .
41
2-2
MWA validation of linear response
. . . . . . . . . . . . . . . . . . .
43
2-3
Comparison of MWA to traditional western blot . . . . . . . . . . . .
45
2-4
An MWA containing 6 cell lysates probed with 192 antibodies . . . .
46
2-5
Heatmap of dynamic responses to EGF in A431 cells
48
2-6
Consensus model of EGF receptor level influences modeled by Bayesian
. . . . . . . . .
network inference with comparison to ARACNe and CLR . . . . . . .
50
2-7
Bayesian network consensus model edge weights . . . . . . . . . . . .
54
2-8
Graphical comparisons of the Bayesian, ARACNe, and CLR networks
55
2-9
Comparing inference algorithms when removing the restriction that
the Bayesian network edge weight be >0.3
. . . . . . . . . . . . . . .
56
2-10 Testing for network models' significance . . . . . . . . . . . . . . . . .
57
2-11 Estimating parent-child input-output logic within the Bayesian network 62
2-12 Parent constraint analysis for Bayesian network algorithm
3-1
. . . . . .
63
EMT markers and receptor levels for the human mammary epithelial
cell m odel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
79
3-2
Mesenchymal cells in monolayer lack E-cadherin junctions
. . . . . .
80
3-3
Individual cell speed distributions . . . . . . . . . . . . . . . . . . . .
81
3-4
EMT and growth factor-dependent cell migration is context-dependent
82
17
3-5
Migratory potentials of different epithelial-like versus mesenchymal-like
cell typ es
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
83
3-6
Relative basal phosphorylation levels in epithelial vs. mesenchymal state 86
3-7
Altered signaling pathway activities upon Twist-induced EMT . . . .
87
3-8
Correlative topological modeling . . . . . . . . . . . . . . . . . . . . .
89
3-9
Correlation network with stricter multiple hypothesis correction . . .
90
3-10 Signaling data used for cell speed prediction
93
3-11 3-site reduced PLSR predictions . . . . . . . . . . . . . . . . . . . . .
95
3-12 4-site reduced PLSR predictions . . . . . . . . . . . . . . . . . . . . .
96
3-13 5-site reduced PLSR predictions . . . . . . . . . . . . . . . . . . . . .
97
3-14 Site enrichment in reduced PLSR models . . . . . . . . . . . . . . . .
98
3-15 Correlations among signals used for cell speed prediction . . . . . . .
99
3-16 Prediction accuracy using 1- and 2-site linear regression models
. . . 104
3-17 Signals in high-scoring linear regression models . . . . . . . . . . . . .
105
3-18 Percent error using 1- and 2-site linear regression models . . . . . . .
106
3-19 Signals plotted versus cell speed in a univariate fashion . . . . . . . .
110
3-20 Raw signal-signal correlation values in pre-Twist vs. post-Twist . . .
113
4-1
Data-rich, perturbation-based profiling uncovers RTK-specific signaling netw orks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
4-2
Perturbations reveal specificity in RTK-induced signal transduction
.
124
4-3
shRNA effects for individual shRNAs (1% Storey FDR) . . . . . . . .
125
4-4
shRNA effects for individual shRNAs (1% Benjamini FDR) . . . . . .
126
4-5
Quantitative shRNA-induced effects for individual shRNAs . . . . . .
127
4-6
shRNA effects are not consistent with randomly distributed effects
130
4-7
Clustering RTK-specific network models reveals three RTK classes
132
4-8
Network model clusters are robust to applied edge weight threshold
133
4-9
Identifying RTK class-specific edges through consensus network edge
frequency
. . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
18
135
4-10 Network models' consensus reveals core RTK signaling backbone and
RTK class-specific interactions . . . . . . . . . . . . . . . . . . . . . .
136
4-11 Clustering the raw data directly .... . . . . . . . . . . . . . . . . . .
138
4-12 Clustering network topologies inferred from simulated data reveals underlying network differences but clustering raw data does not . . . . .
139
4-13 Observed distribution of gene expression values in the CCLE . . . . . 140
4-14 RTK and ligand expression in CCLE cell lines . . . . . . . . . . . . .
141
4-15 Co-expression of the receptors and ligands for multiple RMA thresholds143
4-16 Cell line histology enrichment results for multiple RMA thresholds . .
145
4-17 RTK class genes are correlated with anti-RTK therapy response . . .
147
4-18 Gene expression values of tightest TK1258 kinase binders . . . . . . .
148
4-19 Partial correlation between RTK genes and drug response . . . . . . .
151
4-20 Clustering RTK biophysical properties does not reveal RTK network
m odel clusters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
156
5-1
Prediction accuracy in a two-variable system . . . . . . . . . . . . . .
181
5-2
Schematic for two-variable toy model . . . . . . . . . . . . . . . . . .
182
5-3
Network structures used to simulate data . . . . . . . . . . . . . . . .
188
5-4
Bayesian network inference accuracy is a function of data range and
discretization level
5-5
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
193
Schematic for a priori discretization algorithm . . . . . . . . . . . . .
198
19
20
List of Tables
3.1
Literature evidence for EMT network models . . . . . . . . . . . . . .
91
5.1
Number of parameters in local conditional probability table . . . . . .
195
21
22
Chapter 1
Introduction
This thesis has focused on the multivariate analysis of biological networks, with an
emphasis on phosphorylation signaling networks activated downstream of receptor tyrosine kinases [1]. The interaction between receptor tyrosine kinases and extracellular
cognate ligand initiates a signaling cascade dependent in part on the phosphorylation
of specific amino acid residues on particular proteins. Phosphorylation is a reversible
post-translational modification that in essence can change the properties of a protein so that it may perform or take part in particular functions that it could not
do prior to being phosphorylated [2]. The phosphorylation of the amino acid tyrosine in particular has evolved as a primary means used by multicellular organisms
for transmitting information between cells [3]. This communication is used to regulate key physiological processes critical to homeostasis in multicellular organisms,
including cell proliferation, migration, differentiation, cell cycle progression, metabolic
homeostasis, and transcriptional activation, among others [4]. As a result, studying
phosphorylation networks is highly relevant to cancer, a disease in which many of
these homeostatic mechanisms dependent on tyrosine phosphorylation signaling have
gone awry [5].
The motivation for this thesis was to exploit advances in experimental measurement technologies for the purpose of building computational models of signaling network behavior in relevant cancer settings. The hope was to be able to predict, among
the measured signals, how to perturb the system in a manner that would influence
23
cell phenotype. The long-term goal of this approach would be to identify novel drug
targets, or combinations of targets, that would be effective in treating cancer.
1.1
Mutual advancement of measurement and modeling techniques
The complexity of a biological model is limited by the complexity of the biological
data used to build it. As a result, as experimental measurement technologies have
advanced-allowing the quantification of more signals under more conditions while
using fewer reagents, requiring less time, and costing less money-the computational
models derived from these data have generally seen a concurrent increase in their
complexity [6, 7].
Early computational models of cell behavior, such as the mathematical model of
cell migration published by Dimilla et al. [8] in 1991, did not even measure intracellular signaling. This was in part because it was not until later in the 1990s that
methods for measuring many signaling events, primarily by using site- and phosphosphorylation state-specific antibodies, were even available [9]. Even as such antibodies
were developed, they were only available for a limited number of proteins and were
generally measured using low-throughput Western blots. Therefore models developed
around that time utilized a limited number of experimental measurements, such as
the differential equation model published in 2002 by Schoeberl et al. [10] in which
model predictions were only compared to experimentally measured Erk phosphorylation; or the differential equation model published in 2005 by Hua et al. [11] that
was tuned using experimentally measured values of only two proteins, caspase-3 and
caspase-8.
As new experimental methods were developed for measuring more signaling activities per sample, it became feasible to build new types of models. For example, using
a method published in 2003 for measuring the activities of multiple protein kinases
in a given sample [12], in combination with a method also published in 2003 for using
24
antibodies in concert with microarray technology [13], it was possible to build a multivariate regression model published in 2005 incorporating all 11 measured proteins
that was predictive of cell death via apoptosis [14]. Around the same time, a method
was developed for measuring the phosphorylation states of multiple proteins in single
cells using flow cytometry [15], which enabled the construction of a causal network
model of cell signaling [16] also published in 2005 and also containing measurements
for 11 proteins.
Moving closer in time to the beginning of this thesis in 2007-08, additional experimental methods were being developed that allowed one to measure an increasing
number of phosphorylation sites in an increasing number of conditions. For example,
mass spectrometry methods published in 2005 allowed one to measure scores of tyrosine phosphorylation events in a dynamic, site-specific manner [17], although typically
only across fewer than eight conditions. Further improvements to protein microarray
technology published in 2006 allowed one to measure about fifteen phosphorylation
sites, but in hundreds of conditions [18]. As these methods began to measure more
phosphorylation sites, how those sites related to one another in biological pathways
became less clear. Unlike earlier modeling efforts that typically measured a handful of
proteins from canonical pathways that had been studied for years, these newer data
sets included many sites about which little was known.
1.2
Multivariate modeling techniques
One key aspect that led to the successful application of differential equation models
to cell signaling problems is that it was reasonably understood how the relatively few
measured proteins in the model related to one another through biochemical reactions.
Through years, or in some cases decades, of research the pathways governing these
canonical sites had been delineated through careful experimental study. In other
words, the topology of the signaling network, or how the signals relate to and influence
one another, was reasonably well established.
However, when using experimental
methods that measured many phosphorylation sites at once, including many sites
25
with poorly understood roles in signaling, the signaling network topology governing
the relationships between these measured signals was also poorly understood.
A type of modeling called "network inference" offered an appealing prospect for
understanding the topology governing these measured phosphorylation sites
[19].
Net-
work inference aims to quantify the influences between measured signals, and by
doing so estimate the topology governing the signaling network. Depending on the
algorithm, network inference methods can capture linear, nonlinear but monotonic,
and/or nonlinear and non-monotonic relationships between measured nodes. These
methods can utilize continuous data, or can discretize the continuous data into bins.
Methods utilizing continuous data typically rely on a standardized functional form to
quantify signal relationships (e.g., linear, sigmoidal [20, 21], Gaussian [22]), whereas
methods utilizing discrete data do so in an effort to capture any functional relationship between signals, without making assumptions about the underlying functional
form.
1.2.1
Causal interpretations across network inference methods
The interpretation of a network inference result can vary. There exist methods that
attempt to specify causality between signals (e.g., Bayesian networks [22]), methods
that largely rely upon an existing literature-derived causal network topology to perform computations (Boolean logic [23], fuzzy logic [21]), methods that attempt to
quantify some degree of conditional independence between nodes but do not argue
for causal interpretations (e.g., conditional mutual information [24], partial correlation [25], a.k.a., Gaussian graphical models [26]), methods that attempt to filter
out direct from indirect signaling influences in an attempt to identify relationships
that are more likely to be causal (e.g., context likelihood of relatedness (CLR) [27],
the algorithm for the accurate reconstruction of cellular networks (ARACNe) [28]),
methods that select subsets of the most predictive signals in an attempt to identify
relationships that are more likely to be causal (Inferelator [20]), and methods that
26
infer symmetric (i.e., undirected and not causal) relationships between signals without any additional processing steps (e.g., Pearson correlation, Spearman correlation,
mutual information).
Most network inference methods quantify only pairwise relationships between signals (CLR, ARACNe, Pearson correlation, Spearman correlation, mutual information). While a given signal may have multiple other nodes that it is related to in a
pairwise model, there is no explicit relationship between the set of signals a particular signal is related to (e.g., if signal A is correlated with signals B and C, there is
no explicit accounting for the mutual regulation of A by B and C). Fewer methods
explicitly consider cases where a signal is a function of several inputs simultaneously.
In logic models, AND and OR gates can be used to encode higher-order relationships;
in models that explicitly consider feature selection, a signal can be a function of the
multiple selected features; in partial correlation and conditional mutual information,
the order of the encoded relationships depends on how many signals are considered
in the conditional relationships (e.g., Nh -order partial correlation conditions upon
N signals at a time, and thus can encode the mutual regulation of a child node by N
parents); and in Bayesian networks, similar to partial correlation, the mutual regulation of a child node by N parents is encoded by conditioning upon N signals at a
time.
Most network inference research applied to cell biological data, including the earliest work in the field, has been applied to gene expression data. In such work, the
authors would often make an assumption of causality between genes that were known
transcription factors and genes that were not, i.e., that the transcription factors were
influencing the expression of the non-transcription factor genes, and not the other
way around [20, 27, 28]. This provided a broad assumption for determining causality
between genes, and thus limited the need to develop methods for inferring causality
from the data. In the context of cell signaling however, apart from making assumptions about signaling originating at a stimulated cell surface receptor and influencing
downstream signals, there is not a comparable assumption that could broadly provide
causal interpretation of signaling network inference results.
27
1.2.2
Bayesian networks
As a result, this thesis originally focused on developing methods to infer Bayesian
network models from cell signaling data. Bayesian networks belong to a larger class
of graphical modeling techniques.
Bayesian networks are represented by directed
acyclic graphs, in which conditional independence relationships are encoded [29, 30].
A Bayesian network is composed of two components: (1) the topology (or structure) of
the directed acyclic graph, which encodes the conditional independence relationships
(and thus encodes the causal interpretations of the network), and (2) the parameters
that describe the local conditional probability tables for each set of parent-child
input-output relationships, i.e., the likelihood a child node is in a particular state
given the state(s) of its parent(s), or more simply, the functional relationship between
parent and child nodes.
A Bayesian network is a representation of the joint probability distribution across
all nodes (i.e., signals or variables) in the network [31]. A fully connected Bayesian
network can represent any joint probability distribution over those nodes; however,
the utility of the Bayesian network is to produce a network that is not fully connected
and can still faithfully represent the joint probability distribution across all nodes.
This is their key feature: by producing a network that is not fully connected, the
Bayesian network is attempting to identify the most salient conditional independencies in the data (Fig. 1-1). Bayesian networks can encode higher order parent-child
relationships by conditioning the value of a given child node simultaneously on multiple parent nodes.
Historically, the structure and parameters of Bayesian networks were originally
determined by knowledge experts [32]. For example, one may have interviewed physicians to determine which symptoms were most predictive of a given disease, and the
likelihoods associated with those symptoms. The resultant Bayesian network would
then be used to make systematic predictions, or inferences, about the likelihood of a
disease state given observations about a patient's symptoms. One could also take an
existing Bayesian network topology, and data from the nodes in that network, and
28
P(X,,X2,X3,X 4)P(X,)PX
P(X,X 2,X 3,X 4 ) -
2 X,)P(X 3 1X,,X 2 )P(X4
|1X,,X
2,X 3 )
P(X)P(X 21X1)P(X3 )PX 4 1X3 ,X 2 )
Figure 1-1: The joint probability distributions for a fully connected Bayesian network (left)
and a network not fully connected (right). The objective of Bayesian network inference
is to identify conditional independencies in the data, and by doing so create a simplified
representation of the joint probability distribution across all measured signals.
learn the corresponding parameters (i.e., conditional probability tables) associated
with that network structure and those data. It was not until later in the development
of Bayesian networks that algorithms were created for learning their structure and
parameters simultaneously from data [33]. Applications in this thesis have focused
on this latter method, whereby given a data set containing measurement values for a
set of nodes, a Bayesian network structure and parameters are subsequently learned
directly from the data set.
1.2.3
Similarities across seemingly disparate modeling strategies
One key issue generally not addressed in the literature is the similarity of what at
first appear to be quite different modeling approaches. Over the course of this thesis, because so many different modeling approaches were explored, even beyond the
so-called network inference approaches, the common features of these methods have
become apparent. First, let me draw a distinction between "input-output-like" mod-
els and "network-like" models. An example of the former case is partial least squares
regression (PLSR) (e.g., ref. [14]), whereas an example of the latter case is a Bayesian
network model (e.g., ref. [16]).
For PLSR, the general notion is to utilize all mea-
surements simultaneously for predicting a small number of outputs, which have often
corresponded to markers of phenotypic outcomes.
29
For a Bayesian network, as just
described, the goal is to derive a graphical model displaying relationships between
measured signals. In PLSR, the relationships between the measured signals (those
used as inputs to predict the output) are not explicitly considered, whereas in the
Bayesian network, these signal-to-signal relationships are the primary result of the
model.
These methods actually represent a shared algorithmic approach, but applied in
different means. In the case of PLSR, the primary result is a notion of input output,
whereby all signals are considered as predictors of the output(s).
In the case of
the Bayesian network, this type of process, whereby signals are considered as possible
predictors of an output, is also occurring, but on a node-by-node basis.. If one performs
an input-output-type calculation iteratively for each measured signal-treating the
remaining measured signals as candidate inputs, but then only selecting the most
predictive inputs for each signal-the end result is a network-like structure in which
only a few signals are used as predictors for every other signal (Fig. 1-2).
Therefore, network-type methods can be conceptualized as applying a PLSR-type
approach (i.e., many inputs, but one output) consecutively across every node, but
only selecting a subset of the measured nodes as the most predictive inputs. While
the methods for quantifying the relationships between signals of course varies between PLSR-type models and Bayesian network-type models, this concept nonetheless links these methods. Network-like methods also consider all signals as putative
input signals for each node, but then undergo some feature selection-type procedure
to explicitly select only a subset of those putative inputs as the final inputs for each
node. A diagram outlining input output-like models versus network-like models, and
whether the models use discrete or continuous data, is shown in Fig. 1-3 for multiple
modeling methods.
These insights are important to keep in mind when considering modeling methods
that have been published in the past and discussed as if they were entirely different
approaches. In the most basic sense, the modeling methods referenced here simply
attempt to predict the values of some signals in terms of the values of other signals;
but how they go about that process varies.
30
Many signals as predictors
("input-output-like")
* .
U~E~ E~
For each node:
1. Score other signals' utility
2. Eliminate unhelpful signals
3. Result is a 'network' of
input-output relationships
" - -
-
Which subset of
signals matter?
(feature selection)
Few signals
as predictors
("network-like")
Figure 1-2: Network models can be conceptualized as applying input-output approaches to
each signal in turn, and then applying some manner of feature selection to choose which
inputs are most useful. Thus, rather than being seen as disparate methods, input-output
models and network models have a shared underlying modeling framework, but it is applied
in two different ways.
31
Many signals as predictors
("input-output-like")
0
- Partial least squares regression (PLSR)
- Multiple linear regression (MLR)
- Bayesian predictors
Discrete
. Continuous
U-
input
input
- PLSR and MLR with feature selection
- Bayesian networks
- Pearson and Spearman correlation
- Mutual information networks
networks
(including CLR, ARACNe)
- Partial correlation
o Conditional mutual information
o Gaussian Bayesian networks
o Dynamic Bayesian networks
4I o Constrained fuzzy logic
o Boolean and Probabilistic
Boolean networks
Feywsignals
as predictors
("network-like')
Figure 1-3: Different types of modeling approaches are summarized onto two axes: inputoutput-type models vs. network-type models, and methods using discrete binned data vs.
continuous data as input. Methods noted by solid bullet points were explored in this thesis.
32
1.3
Modeling phenotypic data
A fundamental question in cell biological modeling is how to incorporate phenotypic
data; and in the context of this thesis, how does one relate signaling data to phenotypic data. The most important factor in the approach is the frequency of the
phenotypic measurements: is there corresponding phenotypic data for every signaling
measurement, or were the phenotypic data collected at a different rate than the signaling data. In the former case, one can in principle include a "phenotype node" in
any inferred network model, because for every signaling data point there is a phenotypic data point collected under the same condition. In the latter case, one likely has
to somehow summarize the signaling data corresponding to each condition in which
phenotype was measured. For example, if signaling time courses were collected, but
only one phenotypic data point was measured per time course, then one may summarize the signaling data for the entire time course by calculating the area under the
signaling time course trajectory, or calculating the average signaling value across the
time course, etc.
If signaling and phenotypic data are collected at the same rate, then one could
summarize both the signal-signal relationships and the signal-phenotype relationships in the same network model. However, if the signaling and phenotypic data
were not collected at the same rate, then it would suggest that one could have two
models: one model in which signal-signal relationships are quantified, and one model
in which the "summarized signal" -phenotype relationships are quantified. Of course,
trivially, results from two modeling approaches could be combined visually into a
single network-type diagram, but this would not mean the same modeling technique
was applied to all signaling and phenotypic data simultaneously.
In some cases, one may wish to develop a model wherein the root nodes represent a small number of experimentally measured parameters (e.g., ligand levels or
receptor activation levels), the intermediate nodes represent signaling nodes, and the
terminal node(s) represents a phenotype(s) of interest. This type of approach is quite
appealing conceptually. In this case, the values of the root nodes would be used to
33
"forward-simulate" the network and propagate predictions from the root nodes to
the phenotype nodes, and thus predict phenotypic outputs from a small number of
experimentally measured inputs. This scenario, in which an entire network of signals
is predicted using only the values of a small set of inputs, was explored briefly in this
thesis, and has important implications for modeling phenotype. Forward-simulationtype models raise additional concerns beyond those raised by network-type or inputoutput-type models.
If a forward-simulation model is built using linear interaction terms based on N
measured inputs, then any subsequent downstream signal, including any terminal
phenotype node, will always be predicted by linear combinations of those N inputs.
This is because the superposition of linear functions results in a linear function.
However, if a forward-simulation model is built using a nonlinear interaction function
f (x), subsequent downstream nodes and the phenotype may be predicted by novel
nonlinear combinations of the inputs, even beyond those that would be obtained by
directly considering the relationship f(x) between the input nodes and phenotype
node. This is because, in contrast to linear functions, the superposition of nonlinear
functions can create novel nonlinear functional relationships. Therefore, if one seeks
to use this type of forward-simulation approach to model phenotype, the interaction
terms in the model should be nonlinear in some manner. If they are not, then the
phenotype output can be best predicted by simply regressing against the N inputs.
Lastly, there may not be need to build a model if its predictions can be easily
measured experimentally anyway. If one is predicting a complex phenotype like cell
migration speed, then it is logical that identifying a small set of signaling measurements that are predictive for cell speed, and could subsequently be measured in lieu of
cell speed in order to predict cell speed, is reasonable. However, if one is measuring a
proxy for a complex phenotype (e.g., cleaved PARP as a proxy for apoptosis), and one
identifies a small set of signals predictive for cleaved PARP, this may not be worthwhile if measuring the signals required to predict cleaved PARP is as time-consuming
or otherwise costly as measuring cleaved PARP itself. While the latter case may aid
understanding about signals driving PARP cleavage, it may not be useful if one hopes
34
to develop more easily measured proxies or biomarkers for complex cell phenotypes.
In other words, if your model input is more complicated than your model output, it
may not be a useful model in terms of experimental effort.
1.3.1
Identification of drug targets
A primary motivation for this thesis was the idea that, given a network model derived
from high-throughput cell signaling experimental data, ideally collected in concert
with relevant phenotypic data, we could develop methods for the de novo prediction
of useful drug targets. In the context of differential equation models of cell signaling,
sensitivity analysis had been used to identify putative drug targets [34]. However,
the key to identifying drug targets for cancer is having some notion of what signaling features are important to the cell phenotype. To do this, an experimental data
set should include phenotypic data directly or a proxy for phenotype that can be
compared to other measured signals. If such data are available, one could pursue
the methods outlined in the previous section for modeling phenotype, and then seek
to perturb nodes highly predictive of phenotype to determine if they actually affect
phenotype.
However, in many experimental data sets used in this thesis, neither phenotypic
data nor a useful proxy was often available.
As such, one simply has a network
model of signal-signal relationships from which to predict useful drug targets. This
is a task for which, to my knowledge, no great solution yet exists. One can search
the literature for notions about how certain signals may be related to a phenotype
of interest; but if that is the case, then one really has a proxy for phenotype and
can thus proceed as outlined previously. In the case where one has only signaling
data but no clear phenotype proxies, one could consider graph theoretical notions
of robustness to hypothesize useful drug targets (e.g., ref. [35]). Further, one could
consider the "druggability" [36] of particular measured signals to reduce the set of
putative drug targets to only those that could likely be pursued clinically. In spite
of these challenges, a novel application of network models to drug target discovery,
which is not explicitly dependent on identifying nodes in a network model that may
35
be useful drug targets, is described in Chapter 4 of this thesis.
1.4
Overview of thesis contents
By and large, this thesis has focused on three computational approaches: inferring
so-called network models, predicting the values of an output signal (in terms of continuous, experimentally measured units) given one or more input signals, and quantifying the enrichment or depletion of features in data subsets. These methods were
applied to cell biological data from a variety of experimental systems using a variety
of measurement technologies, but in all cases involved the investigation of signaling
downstream of activated receptor tyrosine kinases in vitro using cancer cell lines or
engineered cell lines relevant to cancer.
Chapter 2 studies signaling downstream of epidermal growth factor receptor (EGFR)
using microwestern arrays, a technology developed by Professor Richard Jones at the
University of Chicago. The modeling portion of the paper studied signaling relationships among 15 phosphorylation sites on 10 receptor tyrosine kinases plus two sites
on Src kinase. Using Bayesian networks and two mutual information-based network
inference algorithms, network models were developed that hypothesized receptorlevel crosstalk downstream of EGFR activation. Further computational analyses provided insights into how inference quality varies as a function of data set size, how
prior knowledge can be used to restrict directionality in the Bayesian network, how
Bayesian network complexity varies as a function of the maximum number of parent
nodes allowed per child node, and how the identified discrete parent-child relationships translate to the continuous data space.
Chapter 3 studies signaling downstream of five different receptor tyrosine kinases in the context of the epithelial-mesenchymal transition using a bead-based immunosandwich assay. Cell migration speed data were collected as well, although not
at the same frequency as the signaling data. Pearson correlation was used to derive
network models specific to the epithelial versus mesenchymal states. Feature selection was applied to PLSR models to identify reduced sets of signals that predicted
36
cell speed more accurately than the full 11-signal PLSR models. The enrichment
and depletion of particular sites in the high-scoring "reduced" PLSR models was
quantified. And linear regression models were built, using only one or two sites as
predictors, which also had better prediction accuracy than the 11-site PLSR models. Importantly, the signals identified in the reduced PLSR and linear regression
models could be linked to known differences in epithelial versus mesenchymal cell migration. These predictions were tested experimentally in the mesenchymal case and
successfully validated.
Chapter 4 studies signaling downstream of six different receptor tyrosine kinases
using lysate microarrays. Receptor-specific network models were developed using five
different network inference methods. The consensus across the methods revealed signaling network features that grouped the receptors into three classes. Using publicly
available genomic and pharmacological profiling data, it was discovered that increased
expression of same-class receptors or ligands correlated with insensitivity to drugs targeting other receptors in that class. The enrichment of one receptor class across cell
lines derived from tumors with different histologies was quantified, suggesting clinical
relevance of the receptor class. These results suggest that inferred network structure
itself can serve as a multivariate classifier of the biological condition(s) from which
the network was derived. In this manner, the inferred network structures provided
a means for predicting useful drug targets: receptors with similar inferred network
structures may compensate for one another following targeted inhibition of a sameclass receptor.
Chapter 5 describes a theoretical project studying the features of experimental
data that improve model accuracy. First using a simple two-variable toy model, we
derive numerical and analytical estimates of linear regression model accuracy as a
function of data quantity and features related to data quality. Next, using data
simulated from more realistic 15-node synthetic networks, we show that Bayesian
network inference accuracy can also be cast in terms of data quantity and features
related to data quality. In particular, we describe how increasing the range over which
the data are sampled can improve model accuracy, but only if the continuous data
37
are discretized in a manner that is consistent with the scale of heritable biological
variation in the network. We describe an a priori discretization scheme, dependent
only on experimental parameters related to the biological and technical variation in
the data, that corresponds well with the best-performing discretization schemes in
the simulated data. These results provide, to our knowledge for the first time, a
discretization algorithm designed specifically to improve causal inference that is also
described in terms familiar to experimental biologists.
38
Chapter 2
Systems analysis of EGF receptor
signaling dynamics with
microwestern arrays
Note: This chapter is based on a previously published paper, Ciaccio et al. (2010) [37]. The
author contributions for that paper are as follows: C.P.C., M.F.C., and R.B.J, designed the
experiments. C.P.C. and M.F.C. performed the cell culture, and growth factor stimulations.
M.F.C., and R.B.J. designed the micro-western array method, M.F.C. carried out microwestern experiments and organized the data into heat maps. J.P.W. and D.A.L. performed
Bayesian network, CLR, and ARACNe analysis of the data. M.F.C., J.P.W., D.A.L, and
R.B.J. wrote the original manuscript.
2.1
Introduction
Systems-level understanding of protein functions in biological processes remains a
challenge.
The western blot [38] is a powerful protein analysis method because the
electrophoretic separation step allows for reduction in sample complexity, and the antibody detection step then results in signal amplitude proportional to the abundance
of the immobilized antigen at a physical location on the detection membrane that
can be related to molecular size standards. Because western blots require a relatively
39
large amount of sample and a great deal of human labor, they have been of limited
utility in large-scale protein studies. Reverse-phase lysate arrays (RPAs), performed
by arraying lysates directly on nitrocellulose- coated slides and probing them with
antibodies, are useful for quantifying large numbers of proteins from limited amounts
of material such as in biomarker discovery [39, 40]. In contrast to western blots, however, RPAs lack confirmatory data for signal veracity; in a side-by-side comparison of
measurements from RPAs and western blots, only 4 of 34 phospho-specific antibodies
examined had generated equivalent information [18]. The authors of the study had
concluded that antibody cross-reactivity contributed substantial noise to RPAs, confounding true protein measurements. Many antibodies have been validated for use
with the Luminex xMAP bead-sorting system, but this approach requires ~1,000-fold
more cell material per protein analysis than RPAs, and the cost of detection reagents
per protein is -30-fold greater. Flow cytometry permits a (relatively small) cohort of
proteins to be examined simultaneously in individual cells; this multiplexing feature
has been exploited with Bayesian network modeling to predict new signaling network
causalities [16].
In contrast to antibody-based methods, mass spectrometry can be used to identify new proteins. Using mass spectrometry, thousands of peptides have been assessed
in lung cancers to identify commonly activated receptor tyrosine kinases and downstream signaling pathways [41]. Relative abundances can be examined quantitatively
using isotopic labels across time points, cell types or perturbations as in examination
of phosphorylation dynamics of HeLa [42] and mammary epithelial cells [43] after
epidermal growth factor (EGF) or heregulin treatment. However, the large sample
amount required by mass spectrometry can limit the number of conditions that can
be analyzed; -10'
cells are typically required for a mass spectrometry experiment
[41] versus ~105 cells for an immunoblot or ~103 cells for RPAs
[44].
Here we describe microwestern arrays (MWA), which combine the scalability of
RPAs and retain vital attributes of western blots for highly multiplexed proteomic
measurements: reduction of sample complexity and signals that can be related to
protein size standards. In combination with suitable pan- and modification-specific
40
Treat cells
with EGF
1 min
~
0 min0
'"***
5 min
15 min
30 mine
Lyse cells
Ls el
Ar
yae
andlde to gel
6 0 min
semidry electrophoresis
Transfer to nitrocellulose
Probe with 96 antibodies
Figure 2-1: Microwestern array (MWA) method. Schematic of the procedure.
antibodies, dynamics of protein abundance and modification may be simultaneously
monitored across many samples. We demonstrate that MWA in combination with
computational modeling techniques can yield useful systems-level biological insights
for EGF receptor (EGFR) signaling dynamics.
2.2
2.2.1
Results
Fabrication of MWAs
Our strategy (Fig. 2-1) allows us to compare protein abundances and differences
in post-translational modifications for cells stimulated under different conditions. To
interface the microscopic western blots with microtiter-based liquid handling methods,
we printed cell lysates via a noncontact microarrayer on gels in 96 identical blocks
with dimensions of a 96-well plate [45].
Using these dimensions, 6 different lysates
may be examined with 96 different antibodies or 24 different lysates may be examined
with 24 different antibodies. To increase the migration rate of large proteins and slow
the rate of smaller ones, we used an acetate running buffer, obviating the need for a
41
stacking gel. For each spot, 6 nL of sample was arrayed over the same gel position
ten times, allowing for greater spotting density and signal than microdepositing the
entire 60 nL in a single dispense. We arrayed one spot of size standard and six
spots of experimental sample at 1 mm pitch at the top edge of each block. After
printing, we subjected the samples to semidry electrophoresis for 12 min and then
transferred them to a nitrocellulose membrane. We placed the membrane in a 96well gasket (Arrayit) to isolate each set of 6 separated lysates and then incubated
each block with a different antibody. After incubation with dye-labeled secondary
antibody, we scanned the blot using an infrared fluorescence scanner. This format
allows interrogation of 192 antibodies in 6 samples when two antibodies from different
hosts (for example, rabbit and mouse) are used. A total of 1,152 antibody-sample
readouts is therefore possible per MWA device. Each spot measurement required
~.1,000 cells (equivalent to 250 ng of protein) and 16 ng of detection antibody, thus
enabling analysis of ~4,000 protein abundances from the -1 mg of A431 cell lysate
collected.
2.2.2
Validation of MWA method
We compared the resolution and linearity in signal of MWAs with macroscopic gels
using the Odyssey labeled protein molecular weight standard (LI-COR) (Fig. 2-2a).
For proteins of 150, 50, and 25 kDa, the intensity of each ladder spot was proportional
to the fold dilution over two orders of magnitude for both methods (Fig. 2-2b,c). The
coefficient of variation from arraying, rehydration and transfer of a single band of the
LI-COR ladder across the area of the membrane was < 9%.
We then tested the linearity of signal response in quantifying proteins from A431
human carcinoma cell lysates using a two-stage fluorescence immunodetection system
(Fig. 2-2d,e).
We used five phospho- and two pan-specific antibodies to analyze
15-175 kDa proteins in EGF-stimulated A431 cells lysates. All MWAs showed a
linear relationship between relative antigen concentration and signal intensity over
their detectable range (100- to 1,000-fold). Assuming an expression level of 1.2 x 106
receptors per A431 cell [46] EGFR was detectable down to one cell equivalent (-2
42
a
b
Dumon
DIMon
(kD)t
(kDa)
kDa
3S-150 kDa
C25
V
2 37
12 mm
8s MM
C
d
4
3 5
2=0.975
02
e5
4
9 2
1
= 0.967
-2
-1
0
09 relatieW
concentration
R-
GAPDH
r37,hr46
p-Erk12(Thr202,Tr204)
ep-4E-BP1
I
"~z
p-4EBP1
(Thr37,Thr46)
15-20 kDa
p-Erk1/2
(Thr202,Tyr24)
kDa
42
p-EGFR
(yr845)
175 kDa
0.987
e2=
.22 = 0.9 8
-
-=
=O0.996
0.993
-
2 *
3--1p0
7
.p-Akt(Ser473)
1
0.9182
2-22-120
log retw uicntato
A431 ya*s dilution
GAPDH
p-EGFR(Tyr84)
1
*R
0.987
*
* ,
#0.982
2
2N
8'
2
.
0.964
ka
.O0.993
-R.o.990
e
p-Akt
(S473)
60 kDa
0
- -2
-1
log elatiW nntradon
Figure 2-2: MWA validation of linear response. (a) Traditional 10% SDS- PAGE of 5 pL
aliquots (left) and MWA of 60 nL (right) of twofold serial dilutions of the Odyssey protein
ladder. (b,c) Median net signal intensities quantified for the indicated bands of the Odyssey
protein ladder in the traditional western blot (b) and MWA (c) in a. (d) MWA analysis
of twofold serial dilutions of lysates from A431 cells stimulated for 5 min with 200 ng/mL
EGF and probed with seven rabbit primary antibodies directed to indicated proteins and
detected with goat anti-rabbit Alexa Fluor 680-labeled secondary antibody. Arrows to
the left of gels point to the spot that was quantified. Orange circles depicted to the left
of gels indicate positions of protein molecular weight standards. Numbers to the left of
arrows indicate known sizes of the proteins in the Odyssey protein standard adjacent to the
quantified spots. (e) Median net signal intensity of each band versus relative concentration
from the gels in d.
43
attomoles; -340 femtograms). We assumed linearity for all further analyses.
2.2.3
Comparison of macrowestern blots and MWAs
To compare performance of MWAs with macrowestern blots for monitoring phosphorylation dynamics, we selected a representative test set of 11 antibodies. Four
had been previously shown to generate equivalent quantitative data by RPAs and
western blots [18]; another four had been shown to result in substantial compression
of dynamic range by RPAs owing to antibody cross-reactivity [18].
Measurements
we obtained by MWAs were similar to those obtained by macrowestern blots for all
antibodies (Fig. 2-3) and did not display the dynamic range compression observed
for RPAs. For many protein phosphosites, including EGFR, IRSI and AKT, we observed bands at the predicted size as well as at additional sizes that could obscure
quantitative measurements by RPAs. The precision in estimating sizes of proteins
>100 kDa by MWAs was ±10 kDa, and for smaller proteins ±5 kDa. Although we
could determine protein sizes with precision approaching that of a standard western
blot, proteins were not completely resolved unless they differed by more than the following: 75 kDa for >200 kDa proteins; 50 kDa for 100-200 kDa proteins; 25 kDa for
50-100 kDa proteins; and 10 kDa for <50 kDa proteins, corresponding to a migration
distance of about 1.5 mm, twice the diameter of spotted protein (Figs. 2-2d and 2-3).
Resolution equal to a macrowestern blot could be obtained by electrophoresing the
samples for -1.5 times the distance (Fig. 2-2a).
2.2.4
Application of MWAs to analysis of EGFR signaling
network
To examine EGF signaling dynamics using MWAs, we chose antibodies to a wide
range of phosphosites to monitor many molecular biological processes (see Supp. Fig.
1 and Supp. Table 1 in ref. [37]): early positive growth factor response regulators,
negative signaling regulators, downstream proliferation indicators, nutrition-sensing
indicators, adhesion and migration indicators, phospholipid and calcium-state indi44
MacrOwsMM
-
Macroweten
Microwestmn
jj=01
if"
afts
-
I
p-EGFR
30IA
0
20 460
f0
M"45)Time
p-EGFR
(1yrlOS)
175 IcDa
(min)
2D 40 60
k(mn
0
8
p-EGFRr
1k173)
0
20 406
Tk" ("'in)
(%lke
1o,42ka
0
20 40 60
m (min)
-21o
20 40 60
Tm (mini)
6
54
p-Akt1
(6r473)0
60 k
10
p-ER/2
. . .
40 2 kDa
0
. .
W0
(mli)
p-MekTkm
(Ser217,8ef221)
-
20 40
4ASmmm.
0
p-P9RSK
(Ser360)
90kDa
20 40 60
Tim (min)
2
-0---.
as koa
Time (min)
2020 40 60
704Ek
mm
(IVM7)
Figure 2-3: Comparison of MWA to traditional western blot. Indicated samples were analyzed by traditional western blots (left) and in triplicate in MWA format (middle). Lysates
from A431 cells stimulated with 200 ng/mL EGF and lysed at the indicated times after
stimulation. #3-actin monoclonal mouse primary antibody (detected with IR800 (LI-COR)
secondary antibody; green) was probed with each of the eleven rabbit primary antibodies
(detected with Alexa Fluor 680 secondary antibody; red) polyclonal antibodies to demonstrate equal loading of each sample. An arrow indicating the band quantified is indicated to
the left of the blot along with the corresponding sizes of the LI-COR protein standard. Fold
change in fluorescence signals was quantified for both formats (right). Error bars, s.e.m. of
the three technical replicates of the microwesterns shown.
45
40
L 0 1 5 15 30 60min
&\
\\
| /7
0 1
15 30
n
min
Figure 2-4: An MWA containing 6 cell lysates probed with 192 antibodies. The red channel
(700 nm laser) shows the stimulation of A431 cells with 200 ng/mL EGF probed with a
panel of rabbit anti-human polyclonal antibodies detected with Alexa Fluor 680-labeled
secondary antibodies. The green channel (800 nm laser) reflects a scan of the samples
probed with mouse monoclonal anti-human 0-actin antibody detected with IR800-labeled
secondary antibodies to demonstrate the consistency of printing across the area of the
membrane. L, indicates Odyssey protein molecular weight ladder; numbers indicate the
time after EGF stimulation that the indicated samples were collected. The boxed areas of
the red channel image (magnified on the left) were probed with a rabbit-derived antibody
that recognizes the doubly phosphorylated Ser240 and Ser244 of S6 ribosomal protein (top)
and with a rabbit-derived antibody that recognizes EGFR(Tyr1068) (bottom). The boxed
areas of the green channel image (magnified on the left) were probed with mouse-derived
0-actin antibody. For layout of antibodies for the entire image, see Supp. Table 2 in ref. [371.
Center-to-center distance between arrayed spots was 1 mm.
46
cators, stress indicators, and transcription and cell-cycle indicators.
To observe signaling dynamics at doses approximating physiological levels [47], we
stimulated cells with 2, 50, 100, and 200 ng/mL EGF. We performed a mock stimulation to distinguish EGF-mediated signaling events from nutrition-related events.
We probed all wells with a combination of rabbit and mouse antibodies to observe
temporal dynamics in phosphorylation and control for variation in loading (Fig. 2-4
and see Supp. Table 2 in ref. [37]). The coefficient of variation from arraying, rehydration, transfer, binding of primary antibody and secondary antibody was <17%.
We quantified 91 phosphosites from 67 proteins and 18 pan-specific protein abundances. We analyzed a total of 75 proteins in technical triplicate replicates resulting
in
-9,800 signaling observations.
Sufficient lysates remained for many subsequent
analyses. We recorded integrated intensity, signal-to-background ratio and inferred
sizes from spots detected with each antibody (see Supp. Table 1 and Supp. Figs. 2,3 in
ref. [37]). Seventeen of 91 phosphosites that we quantified here had been previously
quantified in one recent mass spectrometry report using pan-phospho enrichment
[42] and 22 phosphosites had been quantified in another study [48] using phosphotyrosine-specific enrichment (see Supp. Table 3 in ref. [37]). Many ubiquitous EGFR
signaling proteins that we quantified by MWAs, including Tyr845 phosphorylation on
EGFR (p-EGFR(Tyr845)), p-SHP2(Tyr542), p-p70S6K(Ser371), p-Raf(c-)(Ser338),
p-p90RSK (Ser380) and p-Stat3(Ser727), had not been quantified in either mass spectrometry study [42, 48], suggesting that mass spectrometry detects only a fraction of
phosphorylation events elicited by EGF. Of the 91 phosphosites that we quantified
here, only four had been quantitatively measured in an equivalent manner as western
blots by others using the RPA method [40].
2.2.5
Comparison of signaling network at different EGF input levels
We next asked whether biological insights could be revealed using the MWA method.
We organized five clusters of signaling profiles based on the time after stimulation
47
EGF 200 ngmf~
Time
(min
100 ingm~
0 1 51536001
50
515300 1 51
yrU7) (190 kDe)
k0a)
lb.)
(185 ks)
kDa)
1
8~2k~a)
a
)k8
82
(
k
k)
ka
) 40,42 kDu)
Er2) (40 kDs)
301) (74
_
eD)
E
~~a)
8
0 1 51530600
1 5153000 1 5153000 1 5153060 01 5153060
C*MWt
0
2
0
24
4
6
6
8
10
810
Figure 2-5: A clustered heatmap profile of fold changes for antibody bands representing
specific phosphorylation sites of proteins in A431 cells over the indicated six time points
for four EGF stimulation concentrations and the no-EGF control. The net fold change
is color-coded as indicated in the legend. Antibody bands were grouped into six clusters
according to the time point at which maximal fold change occurred. The antibodies are
in descending order, sorted in each cluster by the value of the fold change at the 200
ng/mL EGF stimulation condition at the time point representative of that particular cluster.
Antibody names are listed on the right with an approximation of band size.
48
at which maximal phosphorylation occurred (Fig. 2-5). Phosphosites within clusters
were rank-ordered by fold-change. At the 2 ng/mL EGF input level, we observed
several phosphosites from EGFR, ErbB2, PLC-y, Gabl, Mek, p90RSK, p70S6K and
Crkl that were absent upon mock treatment (Fig. 2-5 and see Supp. Figs. 4, 5 in
ref. [37]).
Conversely, many phosphosites related to phosphoinositide signaling displayed
substantial fold change in mock stimulation but not EGF treatment, including sites
from PDK1 and its downstream targets AKT, PKCy and PKC6; downstream targets
of AKT including mTOR and FOXO1; and mTor substrate p70S6K and its downstream target S6 ribosomal protein. We speculate that activation of PLC-y after EGF
stimulation led to hydrolysis of phosphatidylinositol 4,5-bisphosphate, causing downregulation of PDK1 and AKT. Reduced AKT activity could produce the observed
A431 cell-cycle inhibition [49] through decreased phosphorylation (and therefore increased inhibitory activity) of cyclin-dependent kinase (CDK) inhibitors, including
CDKN1A(Thr145) and CDKN1B(Thr157). Consistent with this notion, insulin-like
growth factor (IGF), which stimulates P13K and AKT, is also a potent mitogen for
A431 cells [47].
We then asked how the dynamic range and timing of the EGF signaling network
were influenced by EGF input amount. The first 'wave' of phosphorylation peaking at
1 min after EGF input included 33 tyrosines from EGFR and other receptor tyrosine
kinases (RTKs) and membrane-localized proteins (Fig. 2-5, and see Supp. Figs. 3, 4,
and 5 and Supp. Table 1 in ref. [37]). At 5 min after EGF input, we observed serine
and threonine sites from downstream kinases and transcription factors including Raf,
MEK, p70S6 kinase, mTor, and ATF2. At 15 min after EGF input, we observed
phosphosites from Erk, P38 MAPK, and cell cycle-related kinases and substrates.
Sites with phosphorylation peaking at 30 min included those of the Crkl adaptor
protein and MAPKAPK2, a substrate of P38 MAPK. Proteins with sites peaking
at 60 min included the PDK1 substrates AKT and PKC3, and the AKT substrate
4EBP1, among others. The timing of most phosphorylation events was not affected
by EGF concentrations.
49
a
C
OLR
I
*APACNO
W4TP
p-
0*
+
Tv
wo ARACW
Orly
APACNI
-EeFRO6,7
U) 14IMs
p-~(1004)
AW
p-ET"3TrCt23)
L
p4?QFP4WO8%
p-EGFP4roIOM
p4UMM"173)
"O"RT1144M
P-EFW84VI221101222)
P-EFO84M"254
p-FGFRI(VM.T(A54) (145W*
p-FGFF"(I
NM.TpW) (IODWOO
p4GFlFM("135:WI3%.148FM(WIW.TIMIBI)
P40T(TYMO)
p4AET(Ty"234.Ty"235)
P-MET("34%
p4MGFPA(TIr6M,PDGFFW(TV@M
P-POGFRA(TVM)
P-PDQFR9("000)
P-SRC(TM16)
P-SpWrOwn
M
CLA
CLRonly
ZAI
OP/
,5e
Figure 2-6: Consensus model of EGF receptor level influences modeled by Bayesian network
inference with comparison to ARACNe and CLR. (a) A consensus model of the EGF signaling network obtained by exact Bayesian model averaging following Bayesian network inference. Significant (p < 0.001) positive edges (green), significant (p < 0.01) negative edges
(red, blunt edges), and interactions with a nonsignificant correlation coefficient (black) are
shown. Edges for which the directionality could not be determined using equivalence class
analysis are shown as undirected. (b) Heatmaps show the undirected adjacency matrices
comparing the Bayesian network to the ARACNe and CLR networks. An edge between
node i and node j is represented by matrix value (i, j). Because the undirected networks
are compared, the adjacency matrix is symmetric across the diagonal, and thus only the
lower triangular matrix of the adjacency matrix is shown. Edge weight thresholds were
set to >0.3 for the Bayesian network and ARACNe (using ARACNe data processing inequality parameter T = 0.03) and to Z >1.13 for CLR. Eight of 11 edges present only in
the Bayesian network and not in the ARACNe network would induce three-node triplets
in the ARACNe network, which is precisely what ARACNe is designed to prune out. (c)
Venn diagram comparing edges across the three networks. The ARACNe network forms a
complete subnetwork of the CLR network and a near complete subnetwork of the Bayesian
network, which forms a near complete subnetwork of the CLR network.
50
2.2.6
Bayesian network modeling of receptor layer connectivity
To elucidate the directional influences among phosphosites, we applied Bayesian network modeling approaches to phosphosites from proteins representing cell membranelevel influences of the EGF signaling network. This permitted us to verify known
influences and identify new directional relationships underlying receptor-level crosstalk. Bayesian networks are graphical representations of conditional independencies
in a probability distribution over a set of variables [29] and can potentially be inferred from experimental data such as those generated by MWAs. The network we
analyzed comprised 17 phosphosites: two from the Src kinase and 15 from the ten
RTKs for which we specifically observed fold-change measurements with all four EGF
treatments and for which the basic local alignment search tool (BLAST) predicted
little similarity with the 57 other human-genome-encoded RTKs and thus indicated
a relatively low probability of antibody cross-reactivity (Fig. 2-6 and see Supp. Table 4 in ref. [37]). We considered each time point as an independent sample of the
EGF-stimulated network state, giving 20 samples for each phosphosite (4 conditions
across 5 nonzero time points of one biological replicate), and we normalized all data
to the zero time point.
Only 17 phosphosites were considered for Bayesian network analysis, even though
91 phosphosites were measured in total, because the inference algorithm we used [50]
performed exact Bayesian model averaging, and thus could only model networks of
about 20 or fewer nodes due to computational limitations [51]. We chose all phosphosites measured on receptor tyrosine kinases (RTKs) in the data set, and the two
sites on Src kinase, because of reports in the literature of RTK coactivation in cancer
[52], and the role of Src family kinases in RTK signaling [53]. We hypothesized that
by inferring a signaling network among the RTK and Src sites, we could gain insights
into putative receptor-level signaling influences downstream of EGFR activation.
Given typically limited amounts of data, a variety of graph structures can be
generated by Bayesian inference modeling that describe the data reasonably well, so
51
a consensus model is often sought rather than aiming to find a unique best-scoring
graph [29].
Accordingly, we created a consensus model (Fig. 2-6) containing only
edges with a score >0.3, derived from exact Bayesian network model averaging over
all directed acyclic graph structures having at most three parents per node [50, 51].
By considering only those directed acyclic graph (DAG) structures in the equivalence class of the consensus model with a directed edge from p-SRC(Tyr4l6) to pEGFR(Tyr845), we determined directionality of the remaining compelled edges [54]
(see Methods). Signs of directional influences (positive versus negative) could also
be discerned. EGFR(Tyr845) is a known Src kinase substrate that is not phosphorylated by the EGFR kinase [55]. We used this prior knowledge only to distinguish
edge directionality in the equivalence class; we used no prior structural knowledge to
derive the consensus model.
The three linked root nodes from which we derived most downstream influences in
the graph structure included p-SRC(Tyr416), p-EGFR(Tyr845) and p-PDGFRB(Tyrl009).
The model suggests that the EGFR and PDGFRA,B influence one another, with pEGFR(Tyrl068), p-EGFR(Tyr1173), p-ERBB2(Tyr1221,Tyr1222) and p-KIT(C)(Tyr719)
depicted directly downstream of both p-PDGFRB(Tyr1O09) and p-EGFR (Tyr845).
Notably, PDGFRB has previously been described to heterodimerize and transactivate
the EGFR [56] in response to PDGF, even in the presence of a PDGFR inhibitor.
Whereas others have previously suggested A431 cells lack PDGFR expression, we
observed bands at the predicted molecular weights using several phospho- and panspecific antibodies directed at the intracellular region of the receptor (see Supp. Figs. 1
and 6 in ref. [37]).
Notably, the model depicted the phosphosite representing the activation loop tyrosine of either PDGFRA(Tyr849) or PDGFRB(Tyr857) (which, owing to homology, we could not distinguish by the antibody in our assay and hereafter designated
as p-PDGFRA(Tyr849),PDGFRB(Tyr857)) to lie downstream of p-MET(Tyr1349),
a root node and p-EGFR(Tyr1173), which was downstream of the root nodes pEGFR(Tyr845) and p-PDGFRB(Tyr1O09).
p-EGFR(Tyr1173) first displayed ro-
bust phosphorylation upon addition of 100 ng/mL EGF, the same concentration at
52
which the activation loop of PDGFRA,B first displayed phosphorylation; at low EGF
amounts, Src kinase may mediate the phosphorylation of some PDGFR sites other
than Tyr849,Tyr857, but at higher EGF amounts, the PDGFR kinase itself becomes
activated through a mechanism involving or concurrent with the phosphorylation of
p-EGFR(Tyr1173).
p-EGFR(Tyr1068), modeled to be upstream of p-EGFR(Tyr1086), p-ERBB4(Tyr1284)
and both p-FGFR1(Tyr653,Tyr654) activation loop isoforms, was distinct among
EGFR sites in displaying maximal phosphorylation at 5 min and sustained phosphorylation amplitude for the duration of the time course. The edge directed from
p-EGFR(Tyr1068) to p-FGFR1(Tyr653,Tyr654) (145 kDa) displayed a relatively high
edge score (0.80; see Fig. 2-7 for all consensus Bayesian network edge weights), similar to that between p-EGFR(Tyr1068) and p-EGFR(Tyr1086) (edge score of 0.89),
suggesting that EGFR can mediate FGFR1 activation. We speculate that the 145
kDa and 100 kDa forms of FGFR1 represent hyper- and hypo-glycosylated forms of
the receptor, respectively. Hyperglycosylation of FGFR1 has been shown to inhibit
its interaction with both FGF2 and heparin-derived oligosaccharides [57], which has
been predicted to decrease its activity. Our model depicted only the 100 kDa form
phosphosite to have downstream targets among the 17 phosphosites modeled. The
only site negatively regulated in the model was p-PDGFRA(Tyr754), which recruits
the SHP2 phosphatase [58] resulting in dephosphorylation of RASGAP recruitment
sites on PDGFRA and B and increased MAPK signaling. Therefore down-regulation
of p-PDGFRA(Tyr754) would be predicted to decrease MAPK signaling. Consistent with previous reports [59], our model suggested that p-SRC(Tyr527), a known
inhibitory site of Src kinase, is disconnected from the EGF network.
To corroborate the Bayesian network results, we also inferred network connectivities using the 'algorithm for the reconstruction of accurate cellular networks'
(ARACNe) and 'context likelihood of relatedness' (CLR) algorithm [28, 27]. ARACNe
and/or CLR also identified 22 of 24 edges in the Bayesian network, though as undirected edges because these latter methods are based on mutual information notions
(Figs. 2-6b,c, 2-8, 2-9). Experimental evidence suggests that consistency across net53
Figure 2-7: Bayesian network consensus model edge weights. A graphical depiction of the
consensus Bayesian network model showing all edges with an exact marginal posterior probability >0.3. EGFR phosphorylation sites are shown in blue for visual clarity. The model is
a consensus of all Bayesian networks allowing a maximum of three parents per node, where
the contribution of each Bayesian network to the consensus Bayesian network is weighted by
the BDeu score of that Bayesian network (the BDeu score is simply a method for calculating
the posterior probability of the Bayesian network model given the data). Thus, for a given
edge Gij, we can compute the likelihood that edge is present given the data, D, by summing
over all possible Bayesian networks, p(Gij = 1| D) = EG p(G I D)f(Gij), where f(Gij) is
one if there is an edge from node i to node j in network G. If the resultant probability (edge
weight) is close to one, that edge is found in nearly all high-scoring networks, whereas if the
edge weight is low, that edge is found in few high-scoring networks. In the case where edges
are shown in two directions (between EGFR(Tyr845) and PDGFR/3 (Tyr1009), and be-
tween EGFR(Tyrl068) and FGFR1(Tyr653,Tyr654) (100 kDa)), this simply indicates that
edges in both directions exceeded the 0.3 threshold; it does not indicate a cyclic interaction.
Using this consensus network as a starting point, equivalence class analysis was performed.
Thus, though the consensus network shown here has a directed edge from EGFR(Tyr845)
to Src(Tyr416), there exist in the equivalence class of this consensus model Bayesian net-
works with a directed edge from Src(Tyr416) to EGFR(Tyr845). Because EGFR(Tyr845)
is a known Src substrate, the edge from Src to EGFR was chosen and used to restrict the
directionality of all other edges in the model. Edges that could not be restricted were shown
as undirected in Figure 2-6.
54
a
BN edge weight >0.3
ARACNe (r=0.03) edge weight >0.3
BN edge weight >0.185
ARACNe (T=0.06) edge weight >0.3
C
BN edge weight >0.3
CLR edge weight (Z) >1.13
BN edge weight >0.18
CLR edge weight (Z) >1.13
Figure 2-8: Graphical comparisons of the Bayesian, ARACNe, and CLR networks. Graphical
representations of the optimal comparisons between Bayesian/ARACNe (top) and Bayesian/CLR
(bottom) networks for both the restriction that the Bayesian network edge threshold be >0.3 (left
column) and placing no restriction on the Bayesian network edge threshold (right column), but
requiring it to be at most >0.4 so as to be a significant network result (see Fig. 2-10). Data
permutation studies also indicated that the ARACNe threshold had to be at least >0.26 and the
CLR threshold had to be at most >1.15. Given these threshold limitations, the optimal comparisons
between Bayesian/ARACNe and Bayesian/CLR networks were determined. Green edges are shared
between the Bayesian network and ARACNe (or CLR); blue edges are only in the Bayesian network;
and orange edges are only in the ARACNe (or CLR) network. The ARACNe and CLR thresholds
were >0.3 and >1.13 in both cases (restricting the Bayesian network threshold to >0.3 and not),
but that was because those thresholds gave the optimal comparisons results, and was not enforced a
priori. Note that, at the >0.3 threshold level for the Bayesian network (left column), the two edges
found only in the Bayesian network, and not in the ARACNe nor CLR network, participate in threeparent interactions. This is logical, considering that ARACNe and CLR only consider undirected
pairwise interactions. Similar results are seen for the lower Bayesian network threshold level (right
column), where many of the edges found only in the Bayesian network participate in higher-order
parent-child interactions. Additionally, it should be noted that 8 of 11 edges present only in the
Bayesian network and not in the ARACNe network would induce three-node triplets in the ARACNe
network, which is precisely what ARACNe prunes out using the Data Processing Inequality. Graph
diagrams were generated using Graphviz (http://www.graphviz.org).
55
Compading Bayesian and ARACNe Networks
P-EGFWYMMN
and ARACNe
-EMim
QY4161
ARACNeanN
Comparing Bayesian and CLR Networks
N and CR
CU only
4-
QY527)
Figure 2-9: Comparing inference algorithms when removing the restriction that the Bayesian
network edge weight be >0.3. Heatmaps show the undirected adjacency matrices comparing
edges in the Bayesian network to edges in the ARACNe and CLR networks, when the edge
weight thresholds for all three algorithms were allowed to take any value, as long as that
value gave a significant network result above the 99% confidence bound. An edge between
node i and node j is represented by matrix value (i, j). Because the undirected networks
are compared, the adjacency matrix is symmetric across the diagonal. The undirected form
of the Bayesian network is shown to simplify the comparison to the undirected edges in
ARACNe and CLR. A search over Bayesian, CLR, and ARACNe edge weight thresholds,
along with the ARACNe Data Processing Inequality tolerance parameter r, found the optimal comparison between the networks. When comparing to ARACNe, the Bayesian network
edge weight threshold of >0.185 was used and the ARACNe result used the Data Processing
Inequality parameter r = 0.06 and edge weight threshold >0.3. When comparing to CLR,
the Bayesian network edge weight threshold of >0.18 was used, and the CLR result used
edge weight (Z) threshold >1.13. Just as was the case when limiting the Bayesian network
edge weight threshold to >0.3, it should be noted that again 8 of 11 edges present only in the
Bayesian network and not in the ARACNe network would induce three-node triplets in the
ARACNe network, which is precisely what ARACNe prunes out using the Data Processing
Inequality. This matrix representation is analogous to the graphically displayed networks
shown in the right column of Fig. 2-8. At these threshold settings, the ARACNe network
is a complete subnetwork of the Bayesian network and one edge short of being a complete
subnetwork of the CLR network.
56
Bayesian Network
O
fo
(
rd
B1
were
seO eeae
B
d
u
GA
BA
nisg
data
y pedn
ing f
IID Set
a
A
networ tresul at"appie edewihtctf
Significanedg
BOA
I
A
BE
A
B
I
B
B
4
B
2
20-oin
h riginal
,
es
daasttoisl.Usnehstdt
rnetomsuta appieddge rweigh
3leve dicret daa drwngnifomlyat
(
Bayesn, ARACNe
setgenratd bapeningtheoriina
Figurerue 2-0:tst
PO
II
in
ee
CLR
ARACNe
andau0-oitat
moreneweightaothn
and CLR
ctoff
netoa sult ato applelf.g
o2-pifcnt(rigtfcly
fore moeleae sinfromate.
oea
daases wo
Daad temutaon eahsftde inaea
(left
the 500 permuted data sets were analyzed. Confidence bounds for the 90% (green curve),
95% (blue curve), and 99% (red curve) percentile levels were estimated as described (see
Methods). The actual number of edges from each non-permuted data set as a function of
edge weight cutoff is also shown (bold black curve). Edge weight cutoff values that gave
significant and non-significant network results are shaded in green and red, respectively.
57
work inference methods improves edge prediction accuracy [60, 61], and in our case
here, data permutation studies showed that the topologies inferred by the Bayesian
network, ARACNe and CLR were significant (P < 0.01) (see Methods and Fig. 210). Because in the context of proteomic signaling networks it is problematic to make
broad assumptions about edge directionality absent extensive prior knowledge (for
example, concerning particular kinase-substrate relationships), we believe that predicting edge directionality using methods such as Bayesian network modeling offers
an appealing advantage.
2.3
Discussion
In contrast to RPAs, MWAs can reduce the complexity of lysates after arraying,
minimizing effort in experimental scale-up. Most of the information of a traditional
western blot can be obtained, using 200-fold less protein and antibody. MWAs should
be useful for analysis of proteins from cell lines and tissues from which there are
sufficient lysates to print hundreds of MWAs that could be distributed en masse in an
analogous manner to spotted DNA microarrays for interrogation with the user's choice
of antibodies. The only devices required after printing are commercially available 96well gaskets and an imager. The ability to obtain information regarding hundreds of
proteins with the MWA method should allow advances in our understanding of cell
context-specific networks underlying human disease when combined with appropriate
computational modeling methods.
MWAs could also be very useful for large-scale, systematic validation of antibodies.
Antibody collections could be systematically verified for selectivity by examining
lysates from cells transfected with a cDNA or depleted for the cognate protein by
RNAi. The amount of antibody obtained from a single rabbit immunization (~5 mg)
would be sufficient for over 100,000 MWAs, thus minimizing lot-to-lot variability of
polyclonal antibodies. MWAs could be useful for current efforts to build a human
protein atlas; samples from tissues used for in situ analyses could be examined with
MWAs to verify that signals observed with each antibody resulted from proteins of
58
the predicted molecular weight(s).
The ability to gather dynamic information regarding hundreds of proteins under
many conditions poses new challenges for computational modeling. The Bayesian network described here represents direct and/or indirect effects of a given node on other
nodes as indicated by high-probability connecting arcs, which are hypothesized to
represent relationships of influence among the phosphoproteins in the network. Using
prior knowledge to restrict edge directionality across a Bayesian network equivalence
class, one can bolster the case for assigning directionality to these edges. To further
support a case for interpreting network connections as causal, one could explicitly
model the temporal data [62] and/or use interventional data [29, 51], which will be
the subject of future inquiry.
The timing and amplitude of phosphorylation dynamics observed here coupled
with the connectivities modeled in the Bayesian network suggest several candidate
sources of RTK coactivation, each of which may be important in specific cancer contexts: (i) direct dimerization and/or phosphorylation by EGFR or other downstream
tyrosine kinases as suggested by the rapid phosphorylation kinetics of Src, ErbB2
and ErbB4, coupled with their close proximity at the top of the network; (ii) activation of proteases that activate precursor growth factors or latent RTKs as might
be predicted from the delayed phosphorylation amplitudes of FGFR1 (100 kDa) and
MET activation loop sites coupled with their distance from EGFR in the network;
and (iii) inactivation of tyrosine phosphatases through oxidation by reactive oxygen
species [63].
Phosphorylation of Tyr542 of Shp2 phosphatase displayed the high-
est fold change of any site in our analysis; this site has been suggested to relieve
inhibition of phosphatase [64] activity. The sustained phosphorylation of this and
other tyrosine sites at EGF concentrations >50 ng/mL suggests that it (and other
cysteine-based tyrosine phosphatases) may be inactivated at such concentrations,
thus unmasking many tyrosine kinase activities. Each of these mechanisms may have
distinct roles in the context of cancers that have become resistant to single kinase inhibitors; systems-level analysis of other tyrosine kinase-driven cancers may be helpful
in revealing appropriate therapeutic targets.
59
2.4
Methods
For a complete description of the experimental methods see Ciaccio et al. [37].
2.4.1
Signaling network inference modeling
Bayesian networks were modeled using a dynamic programming algorithm that computes the exact marginal posterior probability of edges in the Bayesian network derived from the dataset [50]. The algorithm was implemented using a modified version of the open-source Bayesian Network Structure Learning toolbox in Matlab [51].
Node conditional probability distributions were represented by multinomials using
a uniform Dirichlet prior with equivalent sample size of one and a prior over graph
structures was calculated by accounting for the number of ways to choose parents sets
in a graph, as previously described [50, 65]. Networks were scored using the Bayesian
Dirichlet likelihood equivalent uniform (BDeu) score [66]. The BDeu score accounts
for both model fit and complexity and thus avoids overfitting the data. Although this
dynamic programming algorithm introduces a non- uniform prior over graph structures, it has been shown to perform better at structure learning tasks [50, 51] than
local search methods that use a uniform prior over graph structures, such as Markov
chain Monte Carlo searches over directed acyclic graphs [67], as well as Markov chain
Monte Carlo over node orderings, which uses a non-uniform prior [65].
All nodes were discretized using three-level k-means clustering to indicate low,
medium and high phosphoprotein levels (see Supp. Table 5 in [37]). Clustering was
done using the squared Euclidean distance metric and repeated 50 times for each
node to find the optimal clustering assignments. It is believed that by using k-means
clustering, we are better representing the physiological diversity in signaling states
of the phosphoproteins in the network, compared to more arbitrary discretization
60
schemes, like interval and quantile discretization, that do not try to explicitly capture
clusters in the data.
CLR was implemented using Matlab code provided by the original authors, with
Z scores (edge weights) calculated as previously described [27]. ARACNe was implemented using the minet package in R [68].To minimize the sources of variation
between algorithms, the same discretized data that were used to learn the Bayesian
network model were also used to learn the CLR and ARACNe models. The mutual
information matrix for ARACNe was calculated using a simple histogram method in
the minet package, and for CLR was calculated directly from the discretized data.
The edge score thresholds for CLR and ARACNe were varied in an effort to maximize the similarity between the Bayesian network and ARACNe (or CLR), both given
the >0.3 edge weight threshold for the Bayesian network (Fig. 2-6) and when this
constraint on the Bayesian network edge weight threshold was removed, though in
both cases staying within edge weight thresholds that gave significant network results
(Figs. 2-8, 2-9, 2-10).
The sign of the influences between nodes in the Bayesian network was estimated
using pairwise correlation coefficients. Seventeen of 24 pairwise interactions had a
highly significant (p < 0.001) positive correlation coefficient. Two of 24 had a significant (p < 0.05) negative correlation coefficient. The remaining 5 of 24 pairwise
interactions had a nonsignificant (p > 0.05) correlation coefficient but were edges in
two- or three-parent interactions, suggesting a simple pairwise correlation coefficient
was not sufficient to capture the parent-child behavior. Notably, both negative interactions were directed at p-PDGFRa(Tyr754). Five of the six two-parent interactions
(including all four with p-EGFR(Tyr845) and p-PDGFR/3(Tyrl009) as the parent
set) were consistent with "and gate" behavior. The parent-child raw data from all
one-, two- and three-parent interactions are plotted versus one another in Fig. 2-11.
Considering up to three parents per node in the Bayesian network captured almost all higher-order interactions in the dataset (Fig. 2-12).
Although additional
higher-order interactions may be present but there are simply not enough data for
the Bayesian network to infer them, it may also be that such higher-order interac61
21
to
5
o
21
It
It
22
All pairwise
interactions
lo
1
5o
to- - --
-
-
----
o5o
-
-
0
0
p 1-
0
14o
on-38
I.s09
V
Two-parent
interactions
I
K
Three-parent
interactions
Figure 2-11:
Estimating parent-child input-output logic within the Bayesian network. Pairwise (top), two-parent
(middle), and three-parent (bottom) influences were estimated by plotting the raw data for each parent with the
raw data for its child node or children nodes. Pairwise correlation coefficients were used to estimate the sign of
interaction between nodes in the Bayesian network. Significant (p < 0.001) positive correlations are shown with
green circles; significant (p < 0.05) negative correlations are shown with red plus signs; and non-significant (p > 0.05)
correlations are shown with black triangles. Note that all five edges with non-significant pairwise correlation coefficients
participated in a two- or three-parent interaction, suggesting that a simple pairwise measure may be insufficient to
capture their parent-child relationship anyway. For pairwise interactions, the data for the parent is plotted along
the x-axis and the child along the y-axis, to represent the output of the child node as a function of the parent node
input. Nodes for which the directionality could not be discerned using equivalence class analysis are shown in the title
of each window plot with a "-" instead of "->". The correlation coefficient and the p-value for that coefficient are
also shown in the title of each window plot. For two- and three-parent interactions, the child node is plotted along
the z-axis. For three-parent interactions, because one is limited to plotting only two-parent interactions in threedimensional space, the discretized levels of the third parent node are shown in blue ("low"), orange ("medium"),
and red ("high"). When determining two-parent interactions, it was assumed for plotting purposes only that edges
were directed from p-Src(Tyr4l6) to p-PDGFR,3(Tyrl009), from p-EGFR(Tyr845) to p- PDGFR3 (Tyr1O09), from
p-EGFR(Tyr1068) to p- FGFR1(Tyr653,Tyr654) (100 kDa), and from p-FGFR1(Tyr653,Tyr654) (100 kDa) to pIGF1RB(Tyr1l35,Tyr1136), p-INSRB(Tyrl150,Tyr1151).
62
BN elip
- we- tm-t
Mam.
number of pmentsw 2
mN
eds we tmetrb
Max. number of parents =1
BN
OWmtr"
Mm. number of parents
BN edip we It matrk
Max. number of permnts4
3
1"0!
W.PWRI
P4.0"I,
0-
Ie d weightmeerk
Mm. number of parentsa5
n eg weightma
M. number of parents a
ONedge weight metrk
M. number of parents 7
MN
edeweit mtrk
MEL number of parentsa8
0
P-Mraw""
Figure 2-12: Parent constraint analysis for Bayesian network algorithm. Heatmaps of the
Bayesian network adjacency matrices are shown for allowing from a maximum of one parent per node to a maximum of 8 parents per node. The weight of an edge from node i to
node j is represented by matrix value (i, j). These Bayesian network edge weights represent
directed edges (not undirected), and thus the adjacency matrix is not symmetric across the
diagonal. All heatmaps are scaled to the same colorbar, shown on the right. These results
indicate that, while much of the joint probability distribution of the Bayesian network is attributable to strictly pairwise (one-parent) interactions, additional higher-order interactions
also contribute to the joint probability distribution.
63
I
tions are indeed not present, regardless of how much data are available describing the
network. ARACNe and CLR, which only consider undirected pairwise interactions,
thus represent useful, but likely not complete, approximations of interactions in this
dataset.
2.4.2
Testing for model significance
Data permutation studies were performed to test the significance of the inferred network results. 500 permuted data sets were generated from the original discretized
data set, in which the data for each node is permuted across conditions. In this way,
correlations between nodes should be removed, but the actual number and type of
discretized data for each node are the same. For each of the 500 data sets Bayesian,
ARACNe, and CLR networks were generated. The same 500 permuted data sets
were used across the three methods. For each network resulting from each permuted
data set, the edge weight threshold was varied from 1 to 0 by 0.001 decrements, and
the number of edges appearing in the network at that threshold was counted. For
the CLR networks, because the edge weights correspond to Z scores, which are not
bounded by unity, the edge weight threshold was varied between the maximum Z
score obtained across all 500 permuted data networks, and then decreased to zero at
decrements of 0.001 times this maximum Z score.
By counting the number of edges in each network as a function of edge weight
threshold, the fraction of the 500 networks containing at least N edges as a function
of edge weight threshold was determined. This fraction was used as an empirical
estimate of the likelihood of obtaining a network with at least N edges as a function
of edge weight threshold. If, for particular number of edges N, it was never observed
that exactly 10%, 5%, or 1% of the networks contained N edges at a particular edge
weight threshold, then linear interpolation between the fractions that were observed
(e.g., 1.2% and 0.8%) was used to estimate the edge weight threshold that would
have given 1% of networks containing N edges at that edge weight threshold. In this
way, 90%, 95%, and 99% significance bounds were estimated from the permuted data
sets, for all three inference methods, describing how many edges one would expect
64
by chance (i.e., from permuted data) as a function of edge weight threshold. These
bounds can be interpreted to suggest that X% of all permuted data sets would have
at most as many edges as indicated by the confidence bound at a particular edge
weight threshold. Thus, one must use a threshold at which the Actual data curve
(black) is above the confidence bounds (Fig. 2-10).
The data permutation studies indicate that, using the original 20-point data set,
to surpass the 99% confidence bound (i.e., p < 0.01) one needs to use a Bayesian
network edge weight threshold that is at most >0.4 (thus >~0.005 to >0.4 is above
noise level), an ARACNe edge weight threshold that is at least >0.26 (thus >0.26
and higher is above noise level), and a CLR Z score threshold that is at most >1.15
(thus >0 to >1.15 is above noise level).
To determine how many permuted data sets had to be analyzed to develop a
reasonable estimate of these significance bounds, moving window mean and standard
deviation values for these significance bounds were calculated when considering 20 to
500 permuted data sets, in increments of 20 data sets. For a particular significance
bound, the values described the mean and standard deviation of the edge weight
threshold expected to give a particular number of edges N. As expected, the standard
deviation values approached zero as more permuted data sets were analyzed. Using
cumulative values from the
4 8 1 st
to
500
th
permuted data sets (i.e.,data set 1 to 481,
1 to 482, 1 to 483, etc. up to 1 to 500), the average standard deviation values for the
90%, 95%, and 99% significance bounds (in the case of the Bayesian network using
20 data points) were (0.2 ± 0.6)
x
10-3, (0.4 ± 1.2)
x
10-3, and (0.6
± 1.6)
x
10-3,
respectively, indicating that using 500 permuted data sets is sufficient to generate a
useful estimate for the significance bounds.
As a comparison, using cumulative values from the 1st to 20th permuted data
sets, the average standard deviation values for the 90%, 95%, and 99% significance
bounds were (16 ± 19) x 10-3, (18 ± 22) x 10-3, and (18 ± 22)
x
10-3, respectively,
values that are 30- to 80-times larger than the cumulative values from the
500
to
th
481"
permuted data sets. The standard deviation values calculated using the
5 0 0 th
to
4 8 1"
permuted data sets are shown as horizontal error bars in Fig. 2-10, but are
65
often so small they just look like vertical tick marks on the confidence bound curves.
To consider a particular network inference result significant, the curve generated
from counting the number of edges in the network derived from the original (nonpermuted) data should be above the confidence bounds (i.e., the inferred network
should have more edges than one would expect by chance at a particular edge weight
threshold).
In all three algorithms (Bayesian, ARACNe, and CLR networks), the
number of edges in the inferred network using the original 20-point data set is above
the significance curve at the edge weight thresholds used to generate Fig. 2-6 (Fig. 210, top row).
To ensure that a network derived from random data would indeed never exceed
the significance bounds, a random data set was generated by drawing 20 data points
of three-level discretized data (i.e., sampling 1's, 2's, and 3's) uniformly at random
for each node. The three inference algorithms were applied to this random data set,
along with the 500 data set permutation analyses as described above. Using this
random data, we see that the network derived from the original (though random)
data never surpasses the significance bounds (Fig. 2-10, middle row) for any of the
three inference methods. This shows that, if the data were indeed random, the three
algorithms would generate networks that did not exceed the confidence bounds at
any edge weight threshold, and were thus not significant.
To test the effect of data set size on the significance of the network inference
results, a pseudo data set was generated by appending the original data set of 20
data points per node onto itself to give one data set of 40 data points per node. In
this way, the data contained the same "signals" as the original data set (i.e., all of
the observation counts in the original data set were now simply doubled), but was
twice the size. The three inference algorithms were applied to this 40-point data set,
along with the 500 data set permutation analyses as described above. The results are
shown in the bottom row of Fig. 2-10.
Using this larger data set, the Bayesian network result is now significant at all
edge weight thresholds, because compared to the 20-point data set case the confidence
bounds are now much lower and the curve derived from the actual data is also higher.
66
We see such a change in these curves because of 1) having 40 data points to permute
instead of just 20, one is less likely to have significant correlations appear between
nodes in the data set, and also 2) using the 40-point data set, the strength of the
data relative to the prior is greater, and thus one obtains more significant edges at
all thresholds.
For ARACNe, the curve generated from the original 40-point data set is identical
to the curve generated from the original 20-point data set (compare the top and
bottom windows in the middle column). This curve does not change because ARACNe
simply uses mutual information (followed by the Data Processing Inequality) to obtain
the resultant network, and because the 40-point data set contains the same "signals"
in the 20-point data set, the mutual information between the nodes does not actually
change. It should be noted that, while the calculated mutual information matrix
does not change when increasing the data set from 20 to 40 points, the error of that
mutual information measurement does decrease by a factor of one-half (OC 1/20 vs.
1/40) [69].
As with the Bayesian network result, the confidence bounds for ARACNe using 40
data points changes because having 40 data points to permute instead of just 20, one is
less likely to have significant correlations (mutual information) appear between nodes
in the data set. This manifests itself by having lower values of mutual information
between nodes in the resultant mutual information matrix. Thus, with 40 data points,
the confidence bounds shift to the left compared to 20 data points, allowing one to
consider edges with lower weights as significant in the 40 data point case that one
could not consider significant in the 20 data point case. Thus, in the 40 data point
case the entire ARACNe network result is considered significant, whereas in the 20
data point case there are two edges that have edge weights below the confidence
bounds.
For CLR, as with ARACNe, the curve generated from the original 40-point data
set is identical to the curve generated from the original 20-point data set (compare
the top and bottom windows in the right column). Again, this is because CLR uses
mutual information, which in this case is based on counts of observations in the data.
67
Doubling the data set simply doubles the number of times each observation is seen,
which does not change the mutual information matrix, but simply lowers the error
of those mutual information measurements. However, quite notably, the confidence
bounds for CLR when using the 40-point data set are essentially identical to the
confidence bounds generated using the 20-point data set.
This is a result of how CLR calculates its Z scores (edge weights). The essence
of CLR is to put the network connectivities into "context". It does this by calculating Z scores, which correspond to mutual information (MI) values that have been
mean-centered and normalized to unit variance (that is, the Z score is obtained by
subtracting from each MI value the mean of MI values corresponding to all parents of
each node, and then dividing by the standard deviation of MI values corresponding to
all parents of each node). The mean and standard deviation of the MI values of the
randomized data do decrease in going from the 20-point data set to the 40-point data
set (as evidenced by the data permutation results for ARACNe); however, because
CLR is normalizing these MI values, the resultant Z score actually does not change
significantly in the 40-point data set compared to the 20-point data set. In addition
to the "PLoS" method implemented in the CLR MATLAB code, this behavior was
also seen using the Rayleigh, Normal, Beta, and KDE methods (data not shown).
The Z scores tended to be slightly higher using the 40-point data than the 20-point
data, particularly when using raw (non-discretized) data as input (data not shown).
Nonetheless, in any case the resultant Z score did not change significantly with increased data. This invariance to increased data set size is seemingly a drawback of
the "context" normalization of CLR.
2.4.3
Comparing different algorithm results
Results comparing the networks from the inference methods are shown in Figs. 26, 2-8, and 2-9. The data processing inequality tolerance parameter, r, in ARACNe
was varied from 0 to 0.20 by increments of 0.01, and then varied from 0.20 to 0.40
by increments of 0.05.
Among the resultant 25 ARACNe graphs, the maximum
similarity to the Bayesian network result using was found using
68
T
= 0.03 (when
restricting the Bayesian network edge weight threshold to be >0.3) and using
T
= 0.06
(when simply restricting the Bayesian network edge weight threshold to be at most
>0.4 to be significantly above the confidence bounds). Similarity between the graphs
was calculated by first converting the Bayesian network to its undirected form, and
then taking the ratio of the number of edges shared between the Bayesian network
and ARACNe (or CLR) to the number of edges present in the Bayesian network
or ARACNe (or CLR) but not shared between the networks. This metric was then
calculated as a function of both the Bayesian network edge weight threshold and the
ARACNe (or CLR) edge weight threshold (though requiring the Bayesian, ARACNe,
and CLR edge weight thresholds to be within the range that gave a significant network
result above the confidence bounds), or just the ARACNe (or CLR) edge weight
threshold were varied (again within the range that gave a network result above the
confidence bounds) if the Bayesian network edge weight threshold was set to >0.3.
It should be noted that 8 of 11 edges present only in the Bayesian network at
threshold >0.3 and not in the ARACNe network would induce three-node -triplets
in the ARACNe network, which is precisely what ARACNe prunes out using the
Data Processing Inequality (DPI). When removing this restriction on the Bayesian
network edge weight to be >0.3, the optimal comparison to ARACNe was found at
Bayesian network edge weight >0.185 and ARACNe edge weight >0.3 with T= 0.06.
Even with increasing the DPI tolerance parameter to T= 0.06, which should allow
more three-node triplets in the ARACNe network, one still has 8 of 11 edges present
only in the Bayesian network but missing from the ARACNe network that would
induce triplets in the ARACNe network. Though the DPI tolerance parameter can
be increased, using
T
= 0.06 and ARACNe edge threshold >0.3 was found to give
the optimal comparison to the Bayesian network (with no restriction on the BN edge
threshold) across all ARACNe threshold values and T values ranging from 0 to 0.20
(increments of 0.01) and from 0.20 to 0.40 (increments of 0.05).
Additionally, it should be noted that, at the >0.3 Bayesian network edge threshold, both of the two Bayesian network edges present only in the Bayesian network and not in the ARACNe or CLR network (from p-IGF1RL(Y1135/1136)/p69
INSR/(Y1150/1151) to p-PDGFRa(Y754), and from p-PDGFRa(Y849)/p-PDGFRO(Y857)
to p-MET(Y1234/1235)) were edges participating in three-parent interactions. It is
feasible that such a higher-order interaction would be only inferred by the Bayesian
network and missed by ARACNe and CLR, which only consider pairwise interactions.
From these Bayesian, ARACNe, and CLR network comparisons, ARACNe has
been shown to infer the sparsest graphs and CLR the densest graphs. The sparseness
of ARACNe is likely mostly attributable to the Data Processing Inequality, which,
while using the DPI is precisely the aim of ARACNe, appears to explain many of
edges found by the Bayesian network and CLR but not by ARACNe. The density of
the CLR graphs is likely attributable to the mutual information normalization procedure of CLR. While normalizing the mutual information does give one an indication
of interaction strengths in the context of the background mutual information distribution, it may also tend to attribute significance to insignificant interactions as a result.
This is supported by the permutation studies with CLR, in which high edge weights
(Z scores) were given to random 20-point data sets as well as random 40-point data
sets. These results suggest that Bayesian networks, given a tractable network size (for
exact inference methods presently ~100 nodes given layering constraints [50], though
typically <25 nodes otherwise; though there is no limit on inexact Bayesian network
inference methods per se), may provide a balance between the possible false negatives
of ARACNe and the possible false positives of CLR. However, applying all algorithms
to any properly sized data set is not computationally or otherwise burdensome, and
thus such algorithms need not be leveraged against one another, but rather with one
another to provide the opportunity for maximal biological insight.
2.4.4
Equivalence class analysis for Bayesian network algorithm
To determine which of the edges in the Bayesian network could be considered directional (compelled), the different directed acyclic graphs (DAGs) that were in the
same equivalence class (i.e., different DAG structures that specify the same underly70
ing joint probability distribution) as the consensus model were enumerated. At the
edge weight threshold used (>0.3), the consensus Bayesian network model contained
two cycles, manifested as two sets of bidirectional edges between two nodes (Fig. 27). Thus, there were four ( 2 #ccle" = 4) candidate DAG structures represented by the
consensus model. In this case all four represented valid (i.e., acyclic) DAGs, though
that is not guaranteed.
For each of those four DAG structures, all DAG structures in its equivalence class
were enumerated using the Bayesian Network Structure Learning toolbox. Among
all the DAGs in those four equivalence classes, only the subset of DAGs containing
a directed edge from p-Src(Y416) to p-EGFR(Y845) were considered. Only those
edges that were consistent (compelled) across all DAGs in all four equivalence classes
containing that edge were considered directed in the final consensus Bayesian network
model (Fig. 2-6). Any edges that did not have a consistent directionality across all
DAGs in all four equivalence classes containing the Src to EGFR edge were shown as
undirected in the final consensus Bayesian network model.
The same procedure was repeated when finding the Bayesian network that was
most similar to the ARACNe and CLR results. In that case, the Bayesian network
edge weight threshold was reduced below 0.3, inducing even more bidirectional edges
between two nodes. In that case, only those candidate DAG structures (among the
possible
2 #cYcles
candidate DAG structures) that were acyclic were considered for the
subsequent equivalence class enumeration. Any edges that did not have consistent
directionality across all DAGs containing the Src to EGFR edge in all equivalence
classes were shown as undirected in the final Bayesian network model (Fig. 2-8).
Additionally, if one is considering assuming directionality for particular edges that
are undirected in Fig. 2-6 or Fig. 2-8, it is important to note that, in order for a graph
to represent a valid Bayesian network, the directionality chosen for any particular
undirected edge cannot induce cycles.
71
2.4.5
Parent constraint analysis for Bayesian network algorithm
The maximum number of allowable parents was varied from one to eight and the
resultant adjacency matrices were plotted (Fig. 2-12). This was tested to show that,
as one allowed the algorithm to consider more parents, significant edge weights were
given to the higher order parent interactions. Though the edge weights at higher
parent limits are dominated by the 1-parent interactions (i.e., the adjacency matrix
obtained when allowing a maximum of 1 parent dominates the edge weights when
considering higher parent limits), there are nonetheless higher order parent interactions that begin to appear as high-scoring edges. In particular, there are distinct
increases in the number of high-scoring edges when moving from the 1-parent limit
to the 2-parent limit, from the 2-parent limit to the 3-parent limit, and then slightly
when moving from the 3-parent limit to the 4-parent limit. Beyond the 4-parent
limit (i.e., using a maximum of 5 parents to a maximum of 8 parents), there were
only minor changes in edge weights.
Thus, increasing the maximum allowable parents N from one to four produces a
commensurate increase in the number of N-parent interactions showing up as having
significant edge weights in the Bayesian network. But, after a certain point (here
N = 4), the algorithm no longer produces significant additional connections having
N > 4 parents, either because there is not sufficient data to support the presence
of those edges, or those edges are indeed not biologically significant regardless of
how much data one may collect. Thus we see that, while strictly pairwise (1-parent)
interactions dominate the Bayesian network edge weights, there are additional higher
order interactions that are inferred by the algorithm.
These results suggest that
inference methods that attempt to model only pairwise (though not necessarily just
1-parent) interactions (like ARACNe and CLR) will likely capture a portion of the
network interactions, but will also miss some higher-order interactions. For this work,
apart from this figure, all Bayesian network inference was performed using a maximum
of three parents per node. The results here thus suggest that limiting the maximum
72
number of parents to three captures the majority of the joint probability distribution.
Additionally, to give a concept of the Bayesian network algorithm running time,
it took 7, 8, 12, 21, 54, 186 (3m 6s), 634 (10m 34s), and 2012 (33m 32s) seconds to
run the inference algorithm on one data set (i.e., 17 nodes with 20 data points) when
allowing a maximum of 1, 2, 3, 4, 5, 6, 7, and 8 parents, respectively. The algorithm
was run in MATLAB using a desktop computer with a 3.06 GHz Intel Core 2 Duo
processor and 4 GB memory.
73
74
Chapter 3
Signaling network state predicts
Twist-mediated effects on breast
cell migration across diverse
growth factor contexts
Note: Sections 3.1, 3.2.1, 3.2.2, 3.3.1, 3.4.1, and portions of 3.2.3 in this chapter were
previously published in Kim et al. (2011) [70]. The author contributions for that paper are
as follows: H.-D.K., A.W., F.B.G., and D.A.L. designed the research plan; H.-D.K., S.K.A.,
and A.S.M. performed the experiments; H.-D.K. and J.P.W. performed the computational
modeling; H.-D.K., A.S.M., J.P.W., and D.A.L. wrote the paper. The remaining sections of
this chapter were written by J.P.W., based on computational research designed by J.P.W.
and D.A.L. and computational research performed by J.P.W.
3.1
Introduction
In the phenomenon of epithelial-mesenchymal transition (EMT) [71], polarized epithelial cells loosen their cell-cell junctions and acquire the ability to migrate through
extracellular matrices as single cells in a mesenchymal manner [71, 72].
Although
great progress has been made on identifying and understanding components and
75
mechanisms involved in the process of EMT induction (e.g., [73, 74]), the "before"
versus "after" consequences of this transition for signaling pathway control of cell
migration has not yet been investigated from a multipathway, network-wide perspective. Cell migration results from a set of carefully orchestrated biophysical processes
regulated by numerous key signaling pathways whose activities can be influenced
downstream of a range of growth factor receptors. It is appreciated that these growth
factor receptor-elicited signal- ing activities may be modulated in "before" versus
"after" manner by EMT induction [75], whether by TGF# or other developmental
cues or inflammation-related stimuli [76, 77].
However, a current challenge is to
characterize this likely complex modulation from a multipathway network perspective and to establish an approach for predictive understanding of how the multiple
pathway activities integrate to yield different migration behavior in post-induction
compared with pre-induction conditions. This challenge is especially important for,
among other motivations, gaining insights concerning how prospective targeted drug
effects are influenced by whether tumor cells are in epithelial or mesenchymal state
[78].
As one currently clinically urgent application example, the epidermal growth factor receptor (EGFR) is commonly overexpressed or mutated in epithelium-derived
tumors, and its activation is linked to progression and poor prognosis [79]. Therefore,
EGFR has been the target of many small molecule inhibitors and monoclonal antibody antagonists, which have met with limited clinical success [80, 81, 82]. Recent
studies exploiting EMT markers and gene expression signatures suggest that cells with
low levels of epithelial markers, such as E-cadherin, and high levels of mesenchymal
protein expression, such as N-cadherin and vimentin, display resistance against these
inhibitors [83, 84]. Therefore, the decreased sensitivity of mesenchymal-like tumors to
EGFR antagonists argues for an ability to bypass EGFR dependence to activate the
downstream signaling pathways necessary for cell migration and survival [85]. Cell
activation through other receptors including the insulin-like growth factor-1 receptor (IGF-1R), fibroblast growth factor receptor (FGFR), and platelet-derived growth
factor receptor (PDGFR), has been suggested to play a role in resistance to EGFR
76
antagonists [84, 86]. Thus, improved understanding of how EMT-mediated changes in
multiple growth factor signaling networks contribute to cell invasion may necessarily
shift investigational focus toward the design of novel therapeutics targeting tangential tyrosine kinase pathways or intracellular signaling nexi for use in treating EGFR
inhibition-resistant carcinomas.
As a first multipathway network level study of how signaling pathway activities
governing cell migration downstream of receptor tyrosine kinase stimulation differ between "before EMT" and "after EMT" conditions, we use here an established human
mammary epithelial cell line (hMLE) immortalized and transformed via introduction
of a minimal set of oncogenes [87] and focus on EMT induction by Twist1 [88], via
its ectopic expression in hMLEs as previously characterized [89]. Twist expression
has been demonstrated in multiple studies in vitro, in mouse models, and in human
patients, to be associated with breast tumor invasiveness, metastasis, and poor disease prognosis (e.g. [89, 90, 91, 92]), and thus represents a pathophysiologically and
clinically important system for analysis. It also may be as simple an induction process as can be examined, because other EMT inducers such as TGF3 and TNFa
act via multiple transcription factors including Twist along with others [77], so our
initial study here may indicate basic signaling network modulation insights that can
be expanded upon in future analogous investigations of the more pleiotropic EMT
inducers.
In this basic study, we quantitatively characterize the migration characteristics
of hMLEs before and after Twist-mediated induction in both monolayer (indicative
of epithelial mode) and single cell (indicative of mesenchymal mode) migration assays under stimulation by a panel of growth factors present in carcinoma environments including EGF, HRG, IGF, and HGF [86, 93, 94, 95].
Across this broad
landscape of extracellular treatment conditions, we measured phosphorylation states
of 14 signaling pathway nodes to ascertain how Twist-mediated changes in numerous of these signals may be associated with consequent changes in the cell motility
behaviors. Computational modeling with a partial least-squares regression (PLSR)
framework demonstrated that quantitative combinations of multiple signals can ac77
count for the various motility behaviors across all growth factor treatments in both
epithelial and mesenchymal migration modes-and, in fact, can successfully predict
a priori the motility behavior for epithelial and mesenchymal modes in a new growth
factor context, PDGF stimulation.
We then constructed a complementary compu-
tational model, using a correlative topology framework, to identify influences among
the signaling nodes that were modulated by the Twist-mediated EMT induction.
3.2
3.2.1
Results
Diverse cell motility behavior and growth factor treatment responses in epithelial versus mesenchymal mode
hMLEs ectopically expressing a vector control or Twist1, a transcription factor previously shown to induce EMT, were used as a model of EMT-induced phenotypic switch
(called "epithelial" or "pre-Twist", versus "mesenchymal" or "post-Twist" cells hereafter). Cells were cultured in serum-free medium upon seeding to assess growth factorstimulated cell migration. The cells in epithelial and mesenchymal modes maintained
their respectively appropriate EMT markers in this medium (Figs. 3-1, 3-2).
Although invasive carcinomas and cells of mesenchymal developmental origins
may invade as single cells, epithelial cells can also migrate but do so within established monolayers. To consider both types of migration, we seeded cells labeled with
whole-cell tracking dye either sparsely to achieve single-cell migration or in a confluent monolayer with unlabeled cells for migration with cell-cell contact (Fig. 3-4A,
B). Upon serum-starvation, cells were treated with saturating levels of EGF. As anticipated, sparse post-Twist cells migrated significantly, whereas pre-Twist cells that
were maintained as single cells throughout the experiment exhibited little movement
(Fig. 3-4A). Pre-Twist cells with intact cell-cell contacts (Fig. 3-4A) or in a confluent monolayer (Fig. 3-4B) displayed significant locomotion, consistent with previous
reports of mammary epithelial cells
[97, 98]. In contrast, post-Twist cells exhibited
a contact-mediated reduction in motility. Moreover, consistent with clinical observa-
78
A
Epithelial Mesenchymal
E
j
E-cadherin tw
Imm
N-cadherin
Vimentin
--
GAPDH
-~aem-
-
--
-
m
n-
B
Epithelial Mesenchymal
E
E
E n-Et
EGFR
soga
GAPDH
HER2
GAPDH
-m
IGF1-R
GAPDH
-.- -"""
wmm-= w
-=
Met
PDGFR
GAPDH
---0w ---
----
Figure 3-1: EMT markers and receptor levels for the human mammary epithelial cell model.
(A) Western Blot for E-cadherin as an epithelial marker and N-cadherin and vimentin as
mesenchymal markers. (B) Western Blot for total levels of EGFR, HER2, IGFIR, Met,
and PDGFR/3. Cells were seeded overnight and incubated complete media (Serum) or
serum-free media (Serum-free) for 24 hours before cell lysis. GAPDH is used as loading
control.
79
E-cadhern
Actin
Figure 3-2: Mesenchymal cells in monolayer lack E-cadherin junctions. Immunofluorescence
images of epithelial (top) or mesenchymal (bottom) hMLER cells stained with an antibody
against E-cadherin (left) and phalloidin (middle). Cells seeded on coverslips in serum media
for 24 hours. After a wash with PBS, cells were fixed with 4% paraformaldehyde and
permeabilized with 0.2% Triton-X. Cells were blocked with 10% BSA and incubated with
an antibody against E-cadherin (BD Biosciences, San Jose, CA) in a 1% BSA solution.
After three-time wash with PBS, cells were incubated with an AlexaFluor 488 secondary
antibody and phalloidin (Invitrogen, Carlsbad, CA). Mounted coverslips were imaged on a
Deltavision (Applied Precision, Issaquah, WA).
80
60 -
M
Epithelial
Mesenchymal
E
a 40-
20SF
EGF HRG
IGF HGF PDGF
SF
EGF HRG
IGF HGF PDGF
SF
EGF HRG
IGF HGf PDGF
SF
EGF HRG
1GF HGF PDGF
5010-
~30-
0-
Figure 3-3: Individual cell speed distributions of human mammary epithelial cells under
various growth factor treatments. Box-and-whisker plots of individual cell speeds of hMLER
cells under various growth factor treatments. Grey dots indicate measured average cell speed
for individual cells. Edges of the boxes indicate 2 5 th and 7 5 th percentile and the whiskers
indicate 1 0 th and 9 0 th percentile. The line in the box indicates the median and the cross
the mean of the distribution. Fig. 3-4C, D contain the summary figure depicting mean +
S.E.
81
A
A
Epithelial
D
Mesenchymal
naltha
MesnchvmaI
Sparse Migration
25-
2- MEpl"hWa
.
Mesenchymal
15-
B
E
25MnoaeMgrto
15
10
SF
c
EGF HRG
IGFHGF
SF EGF HRG IGF HGF
jo5-
1.0
00
10.
0
.
o1i
1
1A
1'
AG1478 Concentration
[p/ml
Figure 3-4: EMT and growth factor-dependent human mammary epithelial cell migration
is contingent on its context, which is recapitulated by other human breast cancer cell lines.
(A,B) DIC and epifluorescence overlay and cell tracks of epithelial (left) and mesenchymal
(right) cells in the sparse (A) and monolayer (B) migration assay. Cells were labeled with
a whole-cell dye CMFDA and either seeded sparsely (A) or mixed with unlabeled cells and
seeded in confluence (B) before a 24-hour serum-starvation and treatment with saturating
levels of EGF. After 1 hour of stimulation, migration tracks over 18 hours (red) were generated via semi-automated tracking of centroids (grey circles) of labeled cells. Time-lapse
movies are provided under Movie Si (in ref. [70]). (C,D) Cell speeds of epithelial (black)
or mesenchymal (red) cells under stimulation of various growth factors quantified from the
sparse (C) or monolayer (D) migration tracks. Cell speeds were calculated from cell tracks
after 7 to 19 hours of stimulation. (E) Cell speeds of epithelial cells in monolayer migration
assay (black) and mesenchymal cells in sparse migration assay (red) in presence of varying
levels of an EGF receptor kinase inhibitor AG1478. AG1478 was added simultaneously
to EGF. Cell speeds are normalized to their respective no inhibitor control cell speeds.
p < 0.0001 via two-way ANOVA between cell lines. All data is shown as mean ± S.E.
Box-and-whisker plots of individual cell speeds are shown in Fig. 3-3. N = 269-390 (C,D)
and N = 109-175 (E) cells for monolayer migration and N = 16-117 (C,D) and N = 31-66
(E) cells for sparse migration obtained from 3 independent biological replicates. *p < 0.05,
**p < 0.01, ***p < 0.0001 compared to serum-free condition (C,D) or sparse condition (E)
(see Experimental Procedures in ref. [70] for details on statistical analyses).
82
A
Oh
4h
2h
E
B
BT549
25
8
T47D
6
20
15
MDA-MB-453
420
3
MDA-MB-231
15
10.
51
2
0
0
Sparse Mono
5
Sparse Mono
00
Sparse Mono
Sparse Mono
Figure 3-5: Migratory potentials of different epithelial-like versus mesenchymal-like cell
types. (A) EGF-stimulated mesenchymal cells are highly migratory in three-dimensional
collagen I matrix. Epithelial (top) and mesenchymal (bottom) cells were seeded in a neutralized 2.0 mg/mL collagen I solution. Upon gelation of collagen I, cells were serum-starved
for 24 h and stimulated with saturating levels of EGF. Cells were imaged via phase-contrast
microscopy over 6 h. Arrows indicate migratory mesenchymal cells in the three-dimensional
collagen I matrix. Dashed lines are provided as a reference. Details of methodology can be
found in ref. [96]. (B) Cell speeds of human breast cancer cell lines migrating in complete
medium in a sparse or monolayer migration assay. Cell lines are color-coded according to
their widely accepted EMT state; red for mesenchymal and black for epithelial. All data
are shown as mean ± S.E.
83
tions [85], post-Twist cells displayed resistance to inhibition of invasion via inhibition
of EGF signaling (Fig. 3-4C). Similar differences in motility behavior were observed
with respect to invasion into a three-dimensional collagen I matrix (Fig. 3-5A); postTwist cells invaded to a significant extent whereas pre-Twist cells did not.
We also considered whether this differential behavior might be generalized to
other breast tumor cell lines and similarly examined motility behavior of a panel of
breast carcinoma cell lines in both confluence and sparse conditions. We found that
lines representing the luminal subtype (T47D, MDA-MB-453) showed an epithelial
pattern of migration, whereas lines representing the basal subtype (BT549, MDAMB-231) diverged in their pattern of migration (Fig. 3-5B). However, the respective
levels of Twist expression can explain the latter divergence: the MDA-MB-231 cells,
which exhibited epithelial-like migration pattern similar to the T47D and MDA-MB453 cells, likewise express Twist at only low levels whereas the BT549 cells which
exhibited mesenchymal-like migration pattern express Twist at high level [99, 100].
Taken all together, these findings indicate that motility behavior of the mammary
epithelial cells is substantively altered by Twist expression and that insights gained
in our model system may be relevant in at least some clinically relevant contexts.
To determine whether migration in response to carcinoma-related growth factors
was altered upon EMT, we measured steady-state migration of epithelial and mesenchymal cells in response to EGF, HRG-#1, IGF-1, and HGF to activate the ErbB
family, IGF1-R, and Met, respectively. Epithelial cells migrated very little as singular
cells for all stimuli, whereas single mesenchymal cells migrated robustly in response
to select growth factors, notably EGF (Fig. 3-4D, Fig. 3-3). Conversely, epithelial
cells moved rapidly within monolayers, at or above the speeds attained by singular
mesenchymal cells, even in the absence of exogenous stimuli, with only modest enhancement by some of the growth factors (Fig. 3-4E, Fig. 3-3). Within monolayers,
mesenchymal cells exhibited very low cell speeds that were enhanced only slightly by
growth factor treatments. These results suggest that the degree of motility in both
migratory modes is highly growth factor- and EMT-dependent. Each type of cell responded differentially to growth factor treatments based on its phenotype, indicating
84
distinct processing of growth factor-elicited signals in pre-Twist versus post-Twist
cells.
3.2.2
Quantitative analysis of growth factor-elicited multiplepathway signaling network dynamics
We hypothesized that the changes in Twist-related gene expression could induce alteration of multiple pathways in the signaling network downstream of growth factor
cues, leading to the observed EMT-dependent migratory responses. An exciting previous study has reported measurement of more than 1,000 biomolecular species at
the mRNA, protein, phosphopeptide, or phosphoprotein level in tumors with epithelial and mesenchymal phenotypes to generate annotated molecular network graphs
[74],
but our focus here is a quantitative analysis of changes in multipathway signal-
ing network activities in comparative manner from before to after EMT induction in
a particular cell line with the goal of constructing computational model-based prediction of signaling pathway relationships to motility behavior. To achieve this, we
assessed the early activation kinetics of 14 proteins downstream of receptor tyrosine
kinase activation (Fig. 3-6A) in confluent pre-Twist cells and sparse post-Twist cells
(at 0, 5, 10, 30, or 60 min after growth factor activation). Measurements of 14 phosphosites over five time points, five growth factor treatments, two cell lines, and two
to three technical replicates resulted in greater than 1,800 data points (Fig. 3-7).
Interestingly, total receptor expression levels were not readily correlated with their
activity in the context of EMT. For example, although EGFR activation was much
greater in epithelial cells (Figs. 3-6B, 3-7) and the total EGF receptor levels were
comparable or only slightly higher in epithelial cells (Fig. 3-1B), mesenchymal cells
were strikingly more responsive to EGF treatment. This is not necessarily surprising,
because it is appreciated that receptor expression changes (whether at mRNA or
protein level) alone are typically not predictive of associated activity or inhibitor
effectiveness; a prominent instance of this is the lack of correlation of EGFR expression
in patient tumors with anti-EGFR kinase inhibitor efficacy (e.g., [101]). Thus, assays
85
AHGF
I
EGF
PDGF-BB
HRG
IGF-1
l.
I
I
I
Ras -\-She
Mho
Sre
/O MKK4
Y P13K - - - IRS-10 L
GSKU4$
--
-Akt
suasP
nne
Ce-CelAheio
Phos;*4or
n
ansessed by
smuantitativeWstrm o
Bead-based
ELISA
B3
Figure 3-6: Basal phosphorylation changes in key migration signaling nodes in epithelial versus
mesenchymal state. (A) Simplified schematic of receptor tyrosine kinase-activated signaling network
involved in cell migration and the candidates for which phosphorylation was assessed via quantitation. Arrows indicate direct binding of the growth factors used in this study to their respective
receptors. Solid lines indicate direct interactions between proteins whereas dashed lines indicate
demonstrated indirect interactions. Some candidates have been grouped based on their demonstrated involvement in various biophysical processes of cell migration. This figure is intended as
an illustration of the complexity of the signaling network and does not fully account for all components or interactions assessed. Phosphorylation of candidates with green or red 'P' symbols has
been measured via a quantitative multiplexed bead-based ELISA assay or quantitative Western Blot
assay, respectively. (B) Ratio of basal epithelial and mesenchymal phosphorylation, demonstrating
changes in signaling before growth factor stimulation. Data is shown as mean ± S.E., based on error
of measurements in each context. N = 2-3 biological replicates per context.
86
2s0
200150
1001
0
2.C
20
1.5
is
1.C
10'
0.5
5
U
AU
Erk1/2
wnpv^
0
JNK (pT183/pYl8s)
Oi
~L=!Z~ym5
40
0 20
HSP27
10
'a
60
(ps78)
4
8'
6'
4.
1I
21
2
0
20
40
60
0'
6
40
20
60
0
40
20
60
PKC6 (nTsos)
2.5
6 -
-al
-
W-0-
2.0
1.5
2
0.5
0 j1
0
EGF HRG IGF HGF
Epithellal
Monolayer
Mesenmchymal
PAP
0.5
-1
0
20
40o
60
p41 p40 p40 4*
*.*.*
Sparse
10
0
2
40
60
Figure 3-7: Early activation profiles of key regulators of cell migration exhibit altered signaling pathway activities upon Twist-induced EMT. Sixty-minute time courses of phosphorylation
after stimulation of hMECs with various growth factors. Confluent epithelial cells (solid, circle) or
sparsely seeded mesenchymal cells (dashed, diamond) were lysed at various times after stimulation
with EGF (red), HRG (black), IGF-1 (blue), and HGF (green), and subjected to a high-throughput
multiplexable bead-based ELISA or quantitative Western Blot using antibodies against various phosphorylation sites. Assay wells were loaded with equal mass of protein as assessed by a quantitative
bichronic acid assay. Mean fluorescence intensities (MFI) for Western Blots were obtained via den-
sitometry. MFI values were normalized to the 0 min epithelial value within each phosphosite. N
2-3 biological replicates. Data is shown as mean ± S.E.
87
=
that focus on receptor expression levels may not by themselves effectively identify key
targets for therapeutic intervention.
The resulting activation profiles showed diverse kinetics across individual signals
that were growth factor- and EMT state-dependent.
Basal phosphorylation levels
were dependent on the EMT state, with EGFR, Met, Erk, Src,
-catenin, HSP27,
and IRS-1 displaying significantly higher initial phosphorylation in epithelial cells,
but Akt, GSK3a/#, PKC6, PLC-y, and JNK displaying higher phosphorylation levels in mesenchymal cells (Fig. 3-6B). Dynamic changes in JNK, IRS-1, Src, HSP27,
GSK3a/3, and f-catenin phosphorylation after growth factor treatment were cellstate specific and correlated with their initial phosphorylation levels (Fig. 3-7). However, activation of PKC6 and PLCy along with EGFR canonical pathways Erk and
Akt were relatively growth factor-dependent and insensitive to EMT state in most
cases (Fig. 3-7). Visual inspection of signal differences across the diverse treatments
and contexts offered little insight into which signals contribute most significantly to
the profoundly different EMT-dependent migratory responses. The consequent implication is that cells must quantitatively integrate the activities of multiple signaling
pathways to generate robust decisions concerning context- and treatment-dependent
migration responses.
3.2.3
Node-to-node correlation topology model reveals quantitatively different signaling relationships between epithelial and mesenchymal states
Based on the inability of receptor expression to explain changes in growth factor responsiveness, striking changes in observed signaling, and the retained ability of cells
to migrate in all contexts, we hypothesized that differences in downstream signaling
might arise from quantitatively different signal-signal relationships downstream of
receptor tyrosine kinase (RTK) activation in the epithelial vs. mesenchymal state.
This is in addition to likely signaling-independent changes. In order to investigate
Twist-induced differences in node communication downstream of receptor signaling,
88
Mesenchymal
Epithefal
Figure 3-8: Correlative topological modeling, comparing epithelial (left) and mesenchymal
(right) situations, suggests quantitatively dominant nodes may arise from quantitatively different node-to-node influences. Edges between phosphorylation sites indicate statistically
significant positive (black) or negative (red) Pearson correlation (Storey multiple hypothesis
correction, ~1 false positive edge). Edge end annotation indicates literature evidence for detected correlation (listed in Table 3.1), including direct phosphosite-specific or pathway-level
evidence (arrowheads), protein- level evidence (diamond ends) and complex-level evidence
(dot ends). Nodes with red font indicate phosphosites that signal for inhibition or degradation of the protein. Dashed edges have the highest first-order partial correlation p-value
within each three-node triplet.
correlative topology modeling was performed for the epithelial and mesenchymal contexts as described in section 3.4.1. Separate network topology inference models were
constructed for each EMT cell state.
Results using the Storey method are shown in Fig. 3-8, whereas the Bonferroni
and Benjamini methods' results are shown in Fig. 3-9.
In the context of network
inference, each significant correlation value represents one undirected edge in the inferred network. In the epithelial state, using the Storey method with a false discovery
rate (FDR) of 0.08 (p < 0.046) provides for 12 significant correlation values, resulting
in an estimated 0.08 x 12 edges ~ 1 false positive edge. In the mesenchymal state,
using the Storey method with a FDR of 0.11 (p < 0.028) provides for 10 significant
correlation values, resulting in an estimated 0.11 x 10 edges ~ 1 false positive edge.
Not surprisingly, the greatest number of node-to-node influence arcs were found in the
89
post-Twist
pre-Twist
Src |
(Tyr4l16)
MPLCg
(r771)
FAK
(Tyr397)
(Tyr771)
JNK
hrls3,Tyriss
PKCD
....
(rSTS)
E
Sic |
(Tyr416)|
I
FAK
(Tr973
PKCD
C
(ThrOs)
jNK
(ThriS3,TyrIS)
MI
Erkl/2
(Thr202,Tyr204/
Thr28STyr287)
(Ser473)
(S.7)Thr285,Tyr287)
SK3a/b
(Sr21/Ser9)
|
Bet-catenin
(Ser3Ser37,Thr41)
fGSK3a/b
(Ser21/Ser9)
Oet-caenin
|(r33,Ser37Thr4l)
I
IRS-1
HSP27
(Ser78)
HSP27
I(r7)
(Se636,Sr639)
Erk/2
(Thr202,Tr204/
IRS-1
(e36Sr639)
gPLCg
PL~g
(Tyr77r77l
2
Beta-catenin
(Ser33,Ser37,Thr41)
MCDJNK
(ThrSOS)
JMK
|(r183;yriS5)
-PKcD
S(hrSs)
Aid
Erkl/2
(Thr2(2S4/
Thr235.Tyr2&7
(Ser473)
HSP27
(Ser7)
(Ser33,Ser37,Thr4
GSK3alb
er21/Ser9)
HSP27
(
IRS-1
I(Ser636,Ser639)
22
Th~Sry27
Ba-catenin
GSK3a/b
(Ser21/Ser9)
IEkI2I
A
*er473)
(Tri3,Tyr18S
IRS-1
(Ser636,Ser639)
Figure 3-9: Correlative topological modeling suggests adjustment of node-to-node influence.
This figure is intended as an alternative to Fig. 3-8 using more rigorous multiple hypothesis
testing. Using the Benjamini method for multiple hypothesis correction, in the epithelial
state a false discovery rate of 0.15 (p < 0.02) was used, giving an estimated 0.15 x 8
edges = 1.2 false positive edges. In the mesenchymal state a false discovery rate of 0.10
(p < 0.015) was used, giving an estimated 0.10 x 9 edges = 0.9 false positive edges. With
the exception of a now missing Akt-GSK3 edge, this mesenchymal state result matches the
mesenchymal result using the Storey method shown in Fig. 3-8. A Bonferroni-corrected p <
0.05 (corresponding to p < 0.05/55, or p < 0.001, given the 55 correlation coefficients being
considered in one cell states network) was used to generate the epithelial and mesenchymal
states Bonferroni networks.
90
Table 3.1: Edge-specific literature evidence for epithelial and mesenchymal state network
models in Figs. 3-8 and 3-9. Evidence is listed as site-specific (for evidence of the upstream
node phosphorylating the downstream node at the measured phosphorylation site), proteinspecific (for evidence of the upstream node phosphorylating the same type of amino acid,
either pY or pS/pT, but at a different location on the protein), complex-level (for evidence
of the two proteins represented by two correlated nodes being found in the same protein
complex), or pathway-level (only true for PLCy -> PKC6, via diacylglycerol).
Interaction Type
Epithelial State
PKC6 - Akt(S473)
Site-specific
Protein-level
Src-> #-catenin (pY)
Protein-level
JNK->IRS-1 (pS)
Complex-level
FAK-PLC-y
Pathway-level
PLC-y -> PKC3
Mesenchymal State
Site-specific
PKC6 -* GSK3a/ (S21/S9)
PKCo -> -catenin (S33/S37/S45)
Site-specific
Protein-level
Src-> PLCy (pY)
Protein-level
PKC6 -> IRS-I (pS)
Complex-level
FAK-Erk
Epithelial and Mesenchymal States
Site-specific
Akt-> GSK3a/0 (S21/S9)
Erk2-> IRS-i (S636/S639)
Site-specific
Akt-HSP27
Complex-level
Edge
91
Reference
[102]
[103]
[104]
[105]
[106]
[107]
[108]
[109]
[110]
[111]
[112]
[113]
[114]
Storey models, and the smallest number in the Bonferroni models, given the more
conservative nature of the latter algorithm for assigning significance. Accordingly,
all arcs found in the Bonferroni models were found in the corresponding Benjamini
models and all arcs found in the Benjamini models were found in the corresponding
Storey models.
First-order partial correlation determines if the correlation between two nodes
may be explained because of their mutual correlation with a third node [25]. Firstorder Pearson partial correlations were calculated for all three-node triplets present
in the networks. A three-node triplet occurs when three nodes, A, B, and C form a
complete subnetwork, whereby significant pairwise correlation exists between nodes
A and B, A and C, and B and C. Using the Storey method's resultant networks, in the
epithelial network a three-node triplet exists between JNK, Erkl/2, and IRS-1. In
the mesenchymal network, two three-node triplets exist: Akt, GSK3a/#, and HSP27;
and PKC6, GSK3a/,
and IRS-1. For each triplet, the partial correlation edge with
the highest p-value is shown as a dashed edge in Fig. 3-8. These results suggest that
the dashed edge in each triplet may exist because of mutual correlation with the third
node.
Striking differences in the set of significant edges suggests network modulation occurs upon Twist-induced EMT, wherein information is processed via changes in the
influence signaling pathway nodes have upon one another. Key similarities and differences will be noted, in context of available literature information, in the Discussion
section.
3.2.4
PLSR model-reduction analysis reveals quantitatively
different pathway emphases between epithelial and mesenchymal modes
We next sought to identify which measured phosphosite signals were most predictive
of cell migration speed in the epithelial versus mesenchymal state. To relate signals
to cell speed, because only average cell speed was experimentally determined for
92
Epithelial monolayer state
1
AKT
ERK1/2
GSK3A/B
0.8
SRC
JNK
HSP27
0.6
IRS1
0.4
PLCG
PKCD
FAK
BCATENIN
0.2
0
SF
EGF
HRG
IGF
HGF
PDGF
Mesenchymal sparse state
1
AKT
ERK1/2
GSK3A/B
SRC
JNK
HSP27
0.8
0.6
IRS1
0.4
PLCG
PKCD
FAK
BCATENIN
e0.2
SF
EGF
71
HRG
IGF
HGF
PDGF
0
Figure 3-10: Heatmaps indicate the relative signal levels, quantified using the area under the
curve (AUC) of the signaling timecourse trajectory, of the 11 phosphorylation sites across
the 6 growth factor contexts for the two cell states (epithelial monolayer and mesenchymal
sparse). For display purposes only, the signal AUC values here were normalized relative
to the maximum value of each signal. SF represents the serum-free condition; EGF, HRG,
IGF, HGF, and PDGF represent RTK ligands.
93
each growth factor condition (serum-free, EGF, HRG, IGF, HGF, PDGF), whereas
signaling data was measured at five time points (0, 5, 10, 30, and 60 min) in each
growth factor condition, it was necessary to summarize the signaling data in some
manner for each growth factor condition. To do this, we calculated the area under
the curve (AUC) of each signal's 0 60 min time course. In this manner, we could now
calculate "summarized signal" -phenotype relationships, given the 6 growth factor
conditions, a metric for the "quantity" of signal (AUC value) for each of the 11
signals in each growth factor condition, and the average cell speed values in each
growth factor condition (Fig. 3-10).
Partial least squares regression (PLSR) offers a method for taking a collection
of signals and reducing the signals to create a smaller number of so-called latent
variables, which are also orthogonal (independent, uncorrelated) to one another, that
can be used to predict the value(s) of an output variable(s) [115].
In our case we
sought to use PLSR to relate the measured signals' AUC values to the measured
average cell speed values in each growth factor condition. While it is possible to use
all 11 measured signals to predict cell speed, and then use the variable importance
in the projection (VIP) score [116] to gain some insight into how important each
signal is to the phenotype prediction, because there were only 11 total signals in
the data set we sought to use exhaustive feature selection methods to test signals'
importance in predicting cell speed. To do this, PLSR models of varying size were
constructed (Figs. 3-11, 3-12, and 3-13), using every combinatorial 3-, 4-, and 5-site
subset of the 11 signaling measurements. Models in which the test and training error
were both reduced relative to the full 11-site model were selected as high-scoring
models. We focus on the results for the 4-site reduced models in Fig. 3-12, as they
were the simplest models (simpler than 5-site models) that still had enough N-site
models for enrichment analysis (the 3-site models did not). The statistical significance
of observed phosphosite frequencies in the reduced PLSR models using 3, 4 and 5
phosphosites and different multiple hypothesis correction methods is summarized in
Fig. 3-14.
Given m high-scoring reduced models and the expected frequency of each site in
94
8
7
-E
U;
6
-c 5
4
3
EU
2
1
1.5
1
0.5
i
Mean Traning Error [pm/hi
Reduced
/ Funl
Model Er
150
E
E
0
.C
0
@1
00
00000
10
-
.
0
0
0
0
00
00
0
0
CP
0
0
00
0
00&0
0 00
OOO
0
0L
0
0
0
09
00
0
0
O
0
0
0 000000a
2.5
1
1.5
2
0.5
Mean Traning Error [prm/h]
Reduced
/ Fu
ModemEr
Figure 3-11: 3-site reduced PLSR predictions for epithelial monolayer vs. mesenchymal
sparse cell migration speed. (left) Test and training error for all possible four-variable, twocomponent combinations of signals in epithelial (top) and mesenchymal (bottom) situations.
Red and blue spots respectively indicate the full model and reduced models with reduced
error. (right) Variables selected for inclusion (blue) in models with reduced error, and test
error relative to that of the full model. The small subplot with the 11 site names represents
the fraction of high-scoring reduced PLSR models containing a given phosphosite.
95
6
I= 5
rz
F4
00
000
0
0
0
0
8o4o
9
00~0
-C
LU
0
0
0
0
0
3
00000
2
0
S..
0
0.5
1
1.5
Mean Training Error [sm/hr]
RedCWie
/ AdM
MOM~ EMx
E1
V
0.5
1
1.5
2
Mean Training Error [pmn/hr]
1
Reckicd I RdA
Liam AMx
Figure 3-12: 4-site reduced PLSR predictions for epithelial monolayer vs. mesenchymal
sparse cell migration speed. (left) Test and training error for all possible four-variable, twocomponent combinations of signals in epithelial (top) and mesenchymal (bottom) situations.
Red and blue spots respectively indicate the full model and reduced models with reduced
error. (right) Variables selected for inclusion (blue) in models with reduced error, and test
error relative to that of the full model. The small subplot with the 11 site names represents
the fraction of high-scoring reduced PLSR models containing a given phosphosite.
96
*
*
5
4
..
U
0
3
uai
wU
2
1
910
0
1
0.5
1.5
Reduced/ Ful
ModelEnr
Mean Traning Error [pm/h]
0
16
14
0
00
C
e
12
0
0
0
0
0
0
0
0
E
.
0
0
0
0 OO
0
0
0
0(
0 0 00
0
00
0
Z- 10
U 8
+6.
4
2
00
1.5
2
1
0.5
Mean Traning Error [pm/h]
2.5
I
05
0
0.4
Oh6 0.8
Reduced/ FUN
Model Eo
1
Figure 3-13: 5-site reduced PLSR predictions for epithelial monolayer vs. mesenchymal
sparse cell migration speed. (left) Test and training error for all possible four-variable, twocomponent combinations of signals in epithelial (top) and mesenchymal (bottom) situations.
Red and blue spots respectively indicate the full model and reduced models with reduced
error. (right) Variables selected for inclusion (blue) in models with reduced error, and test
error relative to that of the full model. The small subplot with the 11 site names represents
the fraction of high-scoring reduced PLSR models containing a given phosphosite.
97
No. of
No, of
ses
Beajamful
3-site: FDR = 0.12 (p <0.0473)
4te: FDR =0.02(p <0.003)
5-uite: FDR = 0.06 (p <0.0331)
B3rroi
(P <0.0011)
No. of reduced models
scoring better than full
1 1-site model
EnkhdEMhCk~d
3-ste
3 of 165
-
-
4-site
5-site
18 of 330
44 of 462
Akt Src
Akt, Sr, PLCy
-
3-site
4-site
5-site
10 of 165
18 of 330
36 of 462
JNK
JNK
JNK, f-catenin
_
-
Akt, Src
Akt, Sr, PLCy
GSK3a/p
GSK3a/p, IRS-1
-
JNK
JNK
JNK, p-catenin, Erk, HSP27
HSP27
HSP27
Src
Figure 3-14: Site enrichment in reduced PLSR models. The enrichment or depletion of
individual phosphosites within the high-scoring subset of reduced PLSR models (i.e., models
with mean test error and mean training error lower than the full 11-site PLSR model) was
quantified using a two-tailed hypergeometric test. Enrichment was assessed in 3-, 4-, and
5-site reduced PLSR models for both the epithelial and mesenchymal states. Two multiple
hypothesis correction methods were applied: the Benjamini false discovery rate, and the
more stringent Bonferroni method. Akt and JNK were consistently enriched in the epithelial
and mesenchymal models, respectively.
those m models (each of the 11 sites appears in 120 of the 330, i.e., 11 choose 4,
possible 4-site models, yielding an expected frequency of 0.36), we could estimate the
likelihood of the observed frequency of each phosphosite in the set of high-scoring
reduced models.
The Benjamini [117] and Bonferroni [118] methods for multiple
hypothesis correction were applied to the list of 22 p-values (11 phosphosites x 2
states = 22 p-values). Using a FDR of 0.02 (p < 0.003), in the epithelial state Akt
and Src are enriched in the 4-site models, whereas GSK3a/3 is depleted; and in the
mesenchymal state JNK is enriched while HSP27 is depleted. This provides for an
estimated 5 x 0.02
=
0.1 false positives. If we instead use a less conservative FDR of
0.06 (p < 0.013), in the epithelial state Akt and Src are enriched, whereas Erk, IRS1, and GSKa/3 are depleted; and in the mesenchymal state JNK is enriched while
Src and HSP27 are depleted. This less conservative FDR provides for an estimated
8 x 0.06 ~ 0.5 false positives.
Tests for the likelihood of observed phosphosite frequencies in the reduced 3-site
and 5-site PLSR models were also performed. For the 3-site models using the twotail Bonferroni-corrected p < 0.05, JNK was enriched in the mesenchymal state,
98
whereas no sites appeared significantly differently than expected by chance in the
epithelial state. Using the Benjamini method with a FDR of 0.12 (p < 0.0473), JNK
was enriched and HSP27 was depleted in the mesenchymal state. No sites appeared
significantly differently than expected by chance in the epithelial state. This provides
for an estimated 2 x 0.12 - 0.24 false positives.
For the 5-site models using the two-tail Bonferroni-corrected p < 0.05, in the
mesenchymal state JNK and
#-catenin
were enriched, while no sites appeared less
often than expected by chance. In the epithelial state, Akt, Src, and PLCy were
enriched, while no sites appeared less often than expected by chance.
Using the
Benjamini method with a FDR of 0.06 (p < 0.0331), in the epithelial state Akt,
Src, and PLC 1 were enriched, while IRS-1 and GSK3a/0 were depleted. In the
mesenchymal state JNK, 0-catenin, Erk, and HSP27 were enriched, while Src was
depleted. This provides for an estimated 0.06 x 10 = 0.6 false positives.
It should be noted that while depletion of a phosphosite indicates decreased predictive ability of a site, it does not necessarily indicate a poor correlation of that site
with cell speed. Rather, depletion may indicate redundancy between the predictive
ability of phosphosites because two sites are themselves correlated. This can be explored by comparing the enrichment results in Fig. 3-14 to the correlation p-values
shown in Fig. 3-15, which summarizes the most well correlated (p < 0.1) signal pairs
among the epithelial and mesenchymal signals' AUC values. These signal-signal correlations are different than those previously summarized in Figs. 3-8 and 3-9, which
calculated the correlations among signals' fold-change values from the four individual
nonzero time points across the five ligand conditions, i.e., 20 data points. In Fig. 3-15
the signal-signal correlations are calculated using the area under the curve (AUC) of
each signal's entire time course, for each of the six treatment conditions (five ligands
plus serum-free), as shown in Fig. 3-10.
Looking at Fig. 3-14, in the epithelial state none of the signal pairs between
the enriched and depleted phosphosites are correlated (p < 0.1), indicating that
the depleted sites are not redundant with the enriched signals.
Further, none of
the signals that are enriched are correlated (p < 0.1) with one another, indicating
99
Pearson corr. among signals' AUC values (Epithelial state)
-1
-1.5--
-2.5PLCG-FAK
-3 .ERK12-IRS1
2
4
8
6
10
12
p-value ranking
Pearson corr. among signals' AUC values (Mesenchymal state)
-1
-2-
-2.5-
JNK-HSP27
-3
r
2
4
I
I
6
8
10
12
14
p-value ranking
Figure 3-15: The p-values of the Pearson correlations among signals' AUC values in the
epithelial and mesenchymal states. The logio of the p-values for all signal pairs with p < 0.1
are shown. These represent correlations among the signals' AUC values shown in Fig. 3-10.
Depletion of signals in Fig. 3-14 could occur because the enriched and depleted signals are
correlated. These results show this to be the case for the AUC values of JNK and HSP27,
the most well correlated signal pair in the mesenchymal state data.
100
that the enriched signals are not redundant.
In the mesenchymal state, JNK is
enriched whereas HSP27 is depleted, and these two signals are also the most well
correlated in the mesenchymal data. This is consistent with HSP27 being depleted in
the high-scoring reduced PLSR models because it is redundant with JNK. Src, which
is depleted in the 5-site mesenchymal reduced PLSR models, is poorly correlated
(p
-
0.1) with the enriched signals JNK and HSP27.
Among the four enriched
signals in the 5-site mesenchymal reduced PLSR models, only one of the six signal
pairs (JNK and HSP27) is correlated (p < 0.1).
Thus, depletion of a signal in the high-scoring reduced PLSR models may be
due to redundancy with the enriched signal(s), but it is not always the case; and the
enriched signals here are not redundant with one another, except for JNK and HSP27
in the mesenchymal 5-site reduced PLSR model. These results are consistent with
notions of "minimum redundancy, maximum relevance" in identifying useful features
for prediction [119], whereby features that are well correlated with the output but
not well correlated with one another (i.e., not redundant) are useful for prediction.
It should also be noted that the leave-one-out cross-validation performed here can
be sensitive to the case in which the data from two or more conditions are correlated
and numerically similar. For example, if Condition A in the training set is sufficiently
similar to Condition B in the test set, then the test error would be lower than it
would be if Conditions A and B were dissimilar.
To account for this effect, one
could implement some type of stratified cross-validation [120], in which one tries to
explicitly account for the potential similarities across conditions in the training and
test sets.
In our case here, we can inspect the similarity across signals' AUC values in the
different growth factor conditions (Fig. 3-10). For example, the serum-free and PDGF
conditions in the epithelial monolayer state, and the serum-free and the HRG conditions in the mesenchymal sparse state have some similarity in the signals' values. We
have not accounted for these similarities in the cross-validation procedure, but one
could implement a method for doing so. These two pairs of conditions (serum-free
and PDGF, and serum-free and HRG) have the lowest average cell speeds in the ep101
ithelial and mesenchymal states, suggesting that the similar signal values correspond
to similar signaling network states that produce similar and low cell migration speeds.
This suggests we have measured enough signals that govern cell migration speed. In
contrast, if two growth factor conditions had similar values across the 11 measured
signals but produced very different cell speeds, it would suggest that we were not
measuring the signals most important for governing cell migration speed.
3.2.5
Linear regression predicts cell speed more accurately
than PLSR models
Given that many of the reduced PLSR models had better prediction accuracy (both in
terms of mean training error and mean test error) than the full 11-site PLSR models,
including using as few as three phosphosites in the reduced model, we next sought to
determine how accurately even simpler models could predict cell speed. To test this
approach, we used linear regression. Just as was done for the reduced PLSR models,
the area under the curve (AUC) values of the signals' time courses were used as input
(as a metric of signal quantity). Importantly, this AUC approach was required to
model the phenotypic data because we only had one phenotypic data point (i.e., one
average cell speed value) for each condition. Thus we had to summarize all the time
points' data into one "condition-specific" signal value. If, instead, we had phenotypic
data available at each time point, then we could incorporate the signaling data from
each corresponding time point into the prediction task.
In this case, there were 6 conditions (5 growth factor treatments and one serumfree condition) and 11 phosphosite signals. The system would be underdetermined
(more unknowns than equations) if we were to try and assess the full multiple linear
regression solution by including all 11 sites in the model. To make the system solvable,
we must select 6 or fewer phosphosites (5 or fewer if a zero-order constant term is
included in the linear regression model) to include in the multiple linear regression
model. To provide the simplest analysis possible, we only considered linear regression
models using one or two phosphosites as predictors. In other words, the models took
102
the form,
Cell speed = mix,i + /3
Cell speedi
mX
1
,i + m 2x 2 ,i + 3
where xj represents the time course AUC value for phosphosite
j
under condition i,
mj represents the linear regression coefficient associated with phosphosite
j,
and
#
represents the model's constant term (equivalent to the y-intercept in a model with
one input variable).
Because the number of phosphosites is small (11 sites), all linear regression models
containing one or two phosphosites as predictors could easily be exhaustively considered. In other words, all 11 one-site models were considered, and all "11 choose
2"=55 2-site models were considered. To include more sites in the model, one could
also exhaustively search all N-site models, or use a non-exhaustive feature selection
procedure (e.g., stepwise regression [121], Lasso [122], elastic net [123]). Models were
scored based on their mean training error and mean test error from leave-one-out
cross-validation. In this case, because there were 6 conditions, this amounted to 6fold cross-validation. Models were built using the regress function in MATLAB.
Training and test errors were quantified using the absolute difference between the
predicted and observed cell speeds, just as was done for the reduced PLSR models'
errors.
Mean training and test errors for all one- and two-site linear regression models,
for the epithelial monolayer and mesenchymal sparse migration modes separately, are
shown in Fig. 3-16. These results show that even one-site linear regression models
offer training and test error levels that are comparable to or even lower than the full
11-site PLSR models' errors. Expanding to two-site models generates more models
that provide comparable or lower error rates compared to the 11-site PLSR models.
The best one-site predictors (i.e., lowest training and test errors) for epithelial
and mesenchymal cell states are Akt and JNK, respectively. If we zoom in on the
axes (Fig. 3-17), we can see that there are not stand-out winners for best two-site
103
Epithelial
I
Mesenchymal
14
14
12
12
10
10
8
I=8
6
I-
6
40
4
M
20
2
1
2
3
"0
4
Mean Training Error bm/hr]
1
3
4
Mesenchymal
Epithelial
14
14
12
12
10
10
-
1~
-
AL 0
m~3
8
8
U
0
6
i0
4
I-
2
Mean Training Error jab/hr]
U
0
6
i0
4
I-
2
0Do
1
2
3
2
0
4
Mean Training Error jnihr]
1
2
3
4
Mean Training Error Jrn/hr]
Figure 3-16: Prediction accuracy using 1- and 2-site linear regression models. (top row)
One-site model errors of epithelial (blue) and mesenchymal (red) cell speeds. (bottom row)
Two-site model errors of epithelial (blue) and mesenchymal (red) cell speeds. All subplots'
axes are drawn to the same scale.
104
Mesenchymal
Epithelial
4
r
ii
4
OFAK
0
13.5
8
w
W3
OBCATENIN
6
MCG
EK12
PKCD
2.5
SIRSI
I
2
1.5.
2.5
2
1.5
Mean Training Error
km/hr]
*HSP27
2
*JNK
'0
3
A
EsC-FMI
WFAK
SP NK-BCATENIN
2.5
2
C
I
2
AKT-ERK1/23
gFAKS8
EAKT-JNK
AKT-IRS1g
GSK3A-SRCMAKT-GSI
N *JNK-FAK
GSKUBAKT-HSP27
PKCD*AKT-PK(
SRC-BCATENIN N
gmin
-HSP27
ATENIN
w
3ATENIN
D
0.8
1
Mean Training Error
1.2
bm/hr]
*PKCD-FAKOAKT-HSP27
3.5
is
3AJB
1KCD
1-BCATENIN
AKT-SRC
*0 GSK3AB-BCATE' IN
*MTAPCG
1.5 mA P MAKT-BCATENIN
N AKT-FAK
0.6
6
Mesenchymal
Epithelial
13
4
2
Mean Training Error km/hr]
I1-
I
3
IRS1
O&Mt , TENIN
*ERK1 2-HSP27
GSK3A/B-HSP27
HSPE -PLCG
*JN-SRC
OJNK- D
JNK-PKCD
2.5
2
1.5
1
O5%
JNK-FAK
JNK-AKT
A JNK-ERKI/2
W"IN
WJNK..RS1
JNKO-PLG
0.5
1
Mean Training Error
1.5
n/hr]
Figure 3-17: The same data as shown in Fig. 3-16, but with zoomed in axes and labels for
the signals used in the 1- and 2-site models. (top row) One-site model errors of epithelial
(blue) and mesenchymal (red) cell speeds. (bottom row) Two-site model errors of epithelial
(blue) and mesenchymal (red) cell speeds. Data points are labeled with the phosphosite(s)
used for prediction. Each subplot has its own axes scale.
predictors. For the epithelial state, Akt-FAK offers the lowest test error, but AktPLC-y offers a slightly lower training error. Akt-#-catenin has a comparable test error
to Akt-PLCy, but a higher training error. For the mesenchymal state, most two-site
predictors that include JNK perform comparably well, except when JNK is paired
with FAK, PKC6, or Src. Thus, the best performing two-site predictors contain the
best one-site predictors (Akt for epithelial, JNK for mesenchymal).
The best two-site predictors for the epithelial and mesenchymal states provide
about 0.5 pm/hr mean training error, whereas the full 11-site PLSR model provides
105
Mesenchymal
Epithelial
120
12
100
80
w
w
60
C
0i
:1
40
10
E
0.
2
201
0
0
W
10
20
0
30
Mean Training Error [%]
Epithelial
10
20
30
Mean Training Error [%]
Mesenchymal
120
20
100,
00
I!
I
60
60
I-
C
0
0
40
20-
20f*
40
20
10
20
"0
30
10
20
30
Mean Training Error [%]
Mean Training Error [%]
Figure 3-18: The same linear regression models as shown in Fig. 3-16, but now quantified
using percent error instead of absolute error. (top row) One-site model errors of epithelial
(blue) and mesenchymal (red) cell speeds. (bottom row) Two-site model errors of epithelial
(blue) and mesenchymal (red) cell speeds. This shows that the average percent training
and average percent test errors for the best 1- and 2-site models were about 5% and 10%,
respectively, for both the epithelial and mesenchymal states. All subplots' axes are drawn
to the same scale.
106
a mean training error of about 0.3 and 0.6 pm/hr for the epithelial and mesenchymal states, respectively. Thus the two-site linear regression and 11-site PLSR models
provide comparable training errors. However, the best two-site predictors for the epithelial and mesenchymal states provide about 1-1.5 pm/hr mean test error, whereas
the full 11-site PLSR model provides a mean test error of about 2 and 7 pm/hr
for the epithelial and mesenchymal states, respectively. Thus the two-site linear regression models have much lower mean test errors than the 11-site PLSR models.
These absolute error values for the linear regression models correspond to about 5%
average training error and about 10% average test error for the best-scoring models
(Fig. 3-18).
The training errors for the high-scoring 4-site reduced PLSR models are lower than
or comparable to the training errors associated with the best two-site linear regression
models. For the epithelial state, only the best 4-site PLSR models offer test errors
comparable to the best two-site linear regression models. For the mesenchymal state,
the best two-site linear regression models offer lower test error than the best 4-site
reduced PLSR models. Thus, the two-site linear regression models generally provide
lower test error than either the full 11-site or reduced 4-site PLSR models, for both
epithelial and mesenchymal cell states.
The most accurate one-site linear regression predictors for the epithelial and mesenchymal states, Akt and JNK, respectively, are also the two sites that were most frequently observed in the high-scoring 3-, 4-, and 5-site reduced PLSR models (Figs. 311, 3-12, and 3-13).
Indeed, JNK was significantly (p < 0.0011) enriched in the
high-scoring 3-, 4-, and 5-site reduced PLSR models for the mesenchymal state, while
Akt was significantly (p < 0.0011) enriched in the high-scoring 4- and 5-site reduced
PLSR models for the epithelial state (Fig. 3-14).
To gain a better understanding of why certain phosphosites were being included or
excluded from the high-scoring reduced PLSR and one- and two-site linear regression
models, we plotted the signals' AUC values versus cell speed in a univariate fashion
(Fig. 3-19). This is the simplest visual representation for the task of predicting cell
phenotype (here cell migration speed) from signaling data (here phosphosite time
107
course AUC values). Inspecting these plots, it becomes clear why some phosphosites
were useful one-site predictors and some were not. For example, in the epithelial
state, the Akt signal is reasonably linear with cell speed, whereas in the mesenchymal
state, the JNK signal is very linear with cell speed. This plot also allows one to
understand why some phosphosites performed poorly. For example, in the epithelial
state, HSP27 signal is correlated well with cell speed with the exception of the IGF
growth factor condition. Similarly, in the mesenchymal state, PKC6 is correlated well
with cell speed with the exception of the HGF growth factor condition.
The analyses discussed so far have developed separate models for the epithelial
and mesenchymal cell states. This is based on the notion that the signals governing cell migration may be different between the epithelial and mesenchymal states,
particularly since the epithelial signaling data were measured in the monolayer state,
whereas the mesenchymal signaling data were measured in the sparse state. This
notion is further supported by the fact that different signals do correlate well with
cell speed in the epithelial and mesenchymal states. However, it is also possible to
combine the signaling and migration data from the epithelial and mesenchymal states,
and in doing so create a sort of "pan-EMT" model. This approach makes an assumption that, even though the signaling and migration data were collected not only across
different cell types (epithelial vs. mesenchymal) and in different contexts (monolayer
vs. sparse), the signals governing cell migration may be universal, in some sense, and
therefore be maintained across these diverse biological settings.
By plotting the signaling and cell speed data from the epithelial and mesenchymal
states on the same axes (Fig. 3-19, bottom two rows), we can see how signals relate to
cell speed if we combine the data. Such a pan-EMT model, which used all 11 signals in
a PLSR model, was presented in Figures 4 and 5 in Kim et al. [70]. When combining
data, the relationship between JNK signal and cell speed that was so strong in the
mesenchymal state is now lost, because the functional relationship between JNK and
cell speed in the epithelial state is fundamentally different than in the mesenchymal
state. Further, the relationship between Akt signal and cell speed that was strong in
the epithelial state is now much weaker in the combined state, because the relationship
108
between Akt and cell speed in the mesenchymal state is weak. Combining the data
can also change the sign of the signal-cell speed relationship (i.e., positive vs. negative
slope). In the epithelial state, Src and -catenin are both negatively correlated with
cell speed (i.e., increasing the phosphorylation of those sites on Src and
-catenin
decreases cell speed). When combining data, both of these negative slopes are lost;
however, the combined data may highlight a biphasic relationship between Src and
-catenin signal level and pan-EMT cell migration speed. In spite of these differences,
this chapter has focused on the biological and computational implications of separate
epithelial and mesenchymal state models, not a combined state model.
3.3
3.3.1
Discussion
Excerpt from Discussion in Kim et al.
Our objective in this report has been to investigate how activities in multiple signaling pathways downstream of a range of receptor tyrosine kinases are changed between
pre-EMT condition and post-EMT condition, especially with respect to their contributions to regulation of cell motility. We emphasize that we have not aimed to
investigate signaling pathway activities involved in or responsible for the act of EMT
induction per se, for that question has been addressed with great effectiveness by
several laboratories during the past decade (e.g., 171, 72, 73, 74, 75, 76]). We also
note that our analysis focuses on EMT induced by ectopic expression of the Twist1
transcription factor, for two reasons: first, Twist expression is strongly implicated
in clinical tumor biology; and, second, it may represent a relatively tightly defined
dysregulation, compared with extracellular inducers such as TGF# and TNFa, which
alter expression of multiple EMT-associated transcription factors along with Twist
[77].
We have found that although both epithelial- and mesenchymal-like cells possess
migratory potential, their growth factor-elicited behavior is substantively distinct
with respect to contexts under which vigorous motility is exhibited. Epithelial cells
109
AKT
ERK1/2
24
24
22
GF
201120
is
1818
14
0
5
1
F
2
4
5
20
1
i
I
16DGF243
30
20
AK W
ANCr
25D3F
20 I
O
8
14
14
10
F3
2
J G Is~
18i
F 22
FAK
2 4
22
2
F
[
5
500 5506800
14
5
10
xl'
BCATENIN
IC fl3
F I
MKGF
186
1
14
FF
0F
00
40
SRC
GSK3A/B
J
Is
18
201
18
16
14
22
F
11201
X10
F QGF
'
22
OF
20
EG 1
8
ERG
SA
WF
C
F
14
6
3F
4
6
8
ERK1/2
HSP27
2
WOFF14
1.6
5001000500500
PKCD
amJ0F 18
214 FI 4
2
4
x 10'
1
j
x
F
IM IF 22
1414
JNK
HSP27
GF
J$
15
ijo
10F
KC
5
0
CO
IRSI
25
0.
6
PLCG
,F
20I
F
GF
JNK
24
F
2
1
x1
IRSI
2
SRC
21
fwF
2
x10
18
GSKSA/B
NIGis
WRJ: ILFis
18
14
0
10
24
IF
22
20
15
10
x 10O
X 105
PLCG
ODI F 25
IF 20
IW
15
AM
2 4 6 8 1012
x 10'
2D
Kw
15
On
IRSI
201
*OF
10
15
10 is10
a
10
25
I
.-.
20
F
OF
GF
1615is
2224
emF
800
GSK3A/B
800
18 18 2D 22 24
SRC
25
15
10
10
10
2
4
](n
4
6
0.5 1 1.5 2 2.5
X 106
PLCG
PKCD
~2DIe
20I
FF
E
.
RG
10
2
20
F
40
0
5YF
10
0
OF
15
10
x 10
F
15
15
0
BCATENIN
.,W
DGF
GF
WGF
JNK
HSP27
F
F
5
FAK
*-GF 25
W
10 OF
0
GF1
25)G
j
20
ERK1I/2
AKT
25
20
FI
40
M
CF
G
10
PKCD
GF
2 4 6 8 1012
x 10'
X 10
20
40
f
10000 20000
BCATENIN
o
A5
10
10
600
800
2 4 6 8 1012
X10'
SEpIthelial
20I~F
F
10
5
0
FAK
L
1
0 5000000000
X 10
2
10
j
20
40
GF
Mesenchymal
0
Signal Timecourse Integral
Figure 3-19: Signals plotted versus cell speed in a univariate fashion. Each subplot represents a different phosphosite's data. The x-data points represent the area under the curve
of each signal's time course for one condition. The y-data represent cell migration speed.
Blue and red data points represent epithelial and mesenchymal cell states respectively. The
first two subplot rows show only the epithelial data; the middle two subplot rows show only
the mesenchymal data; and the bottom two rows represent show both the epithelial and
mesenchymal data on the same axes. The y-axes are the same within each pair of rows
because the same cell speed data is plotted with each signal. EGF, HRG, IGF, HGF, and
PDGF represent RTK ligands; SF represents the serum-free condition.
110
are predominantly motile only within confluent monolayers in which cell-cell contacts
are maintained (consistent with previous findings [97, 98]), whereas mesenchymallike cells are motile mainly as individual cells and exhibit this best when sparsely
distributed (Fig. 3-4). The responsiveness to any particular growth factor depends
on whether the cells are in an epithelial or mesenchymal-like state. With respect
to EMT-associated alterations in growth factor-induced signaling network activities
downstream of the stimuli/cues that might be critically involved in disparate motility
responses, we showed that quantitative and dynamic properties of numerous phosphoprotein signaling nodes were comparatively modulated from pre- to post-Twist conditions across the different growth factor treatments (Figs. 3-6, 3-7). PLSR analysis
successfully demonstrated that multipathway signaling information can be quantitatively integrated to account for motility behavior across all observed contexts (see Fig.
4 in [70])-and can even predict a priori the motility responses in both epithelial and
mesenchymal situations to treatment by an additional growth factor, PDGF (see Fig.
5 in
[70]).
Finally, we analyzed each EMT-condition separately in order to identify
differences in signaling. Correlative topological modeling suggested Twist-dependant
differences in terms of a network-level explanation for disparate motility responses
(Fig. 3-8). Specifically, we propose a concept of "operational rewiring," in which the
dominance of particular nodes on motility is altered by quantitative modulation of
node-to-node influences.
3.3.2
Additional discussion
The additional computational analyses presented here that were not presented in Kim
et al. [70] have focused on building separate predictive models for cell migration in
the epithelial (pre-Twist) and mesenchymal (post-Twist) cell states. By quantifying
enrichment of phosphosites in high-scoring reduced PLSR models, and quantifying
the errors associated with all one- and two-site linear regression models, we have explicitly addressed which phosphosites are most predictive of cell speed in these two
cell states. These efforts both converged on the same conclusion: Akt and JNK are
most associated with cell migration in the epithelial and mesenchymal states, respec111
tively. Neither of these signals are well correlated with cell speed when combining
epithelial and mesenchymal data in a pan-EMT model.
Regarding the pre-Twist versus post-Twist network models (Figs. 3-8 and 3-9),
an alternative visualization strategy for the differences in the node-to-node influences
between the two states is to simply plot the node-to-node correlation values in the
pre-Twist condition versus the same values in the post-Twist condition (Fig. 3-20).
The advantage of this visualization, compared to the graphically displayed network
models with nodes and edges drawn, is that it does not require one to select a threshold
and lose information related to correlation values that did not exceed the threshold.
Further, this visualization also allows one to immediately see which node-to-node
influences were most changed across the two conditions, versus which influences were
generally maintained across conditions. The downside of this visualization strategy
is that it allows visualization of only two conditions' node-to-node influences at one
time (although one could show additional conditions' dimensions using 3-dimensional
plotting, or by incorporating additional node size and/or node color schemes in a 2-D
plot). In this experiment only two conditions were considered, so it was sufficient;
however, in other experiments with n conditions to compare, one would need to
display n x (n - 1)/2 pairwise plots. From a biological perspective, perhaps most
interesting are node-to-node influences that changed sign from pre-Twist to postTwist (e.g., 0-catenin-GSK3a/
is negatively correlated pre-Twist, but positively
correlated post-Twist; and FAK-Src is weakly positively correlated pre-Twist, but
weakly negatively correlated post-Twist), which may highlight fundamental biological
changes during EMT.
From a computational perspective, these reduced PLSR and linear regression results show that feature selection (i.e., using only subsets of the measured signals) can
significantly improve model accuracy when trying to predict phenotype from signaling
data. Both the reduced PLSR models and the two-site linear regression models can
be more accurate (lower mean test error) than PLSR models that use all measured
phosphorylation sites. The lower prediction accuracy of the full PLSR models stems
from a fundamental feature of PLSR: the algorithm tries to not only capture vari112
Comparing Pre- and Post-Twist Correlation Coefficients
1
0 PLCG-SRC
0.8
0 PKCD-GSKCAB
* PKCD-IRS1
0 IR81-GSK3"I
0 HSP27-AKT
0 SCATB&-PKCp
0.6 -
.
IRS1-ERK1/2
H
-FAK-ERK1I2
0HSP27-GSKSAAB
BCATENIN-GSKMB
-7-
PKC-PLCG
0 FAK-PKCO ,
0.4 1-
AscATE NE-I
Ra2
0 JNK-ERK1/2
O PLCG- R31
0
.- SRC
0 FI-
S BCATENW-ERK1PHSM-JNK
0.2-
0 IR1-K1T
0 PLCG-T
0
C
0
0PLCG
CR
-
0-
~1/2
PKCD-
O
KD#
- -- ------ /---
R2
BCATENIN-SAC
--
- - -
-- ----
0 BCTEW44SP27
4il PK~48P27
BCATENIN-JNK
0 FAK-AKT
- ---
0 FAK-JNK
0 FAK-HSP27
1
PLOG-H(P27
- ---
F*JUQ.
-0.2 -
0
0 HSPj7'-ERK1?2
-0.4 1-
FAK-SRC
95%
99%
-0.61-
" Post-Twist Only (p<0.05)
" Pro-Twist Only (p <0.05)
* Pro- and Post-Twist (p <0.05)
-0.8
-1-
-1
-0.8
-0.6
-0.4
0.2
0
-0.2
Pre-Twist Correlation Coefrfcient
0.4
0.6
0.8
Figure 3-20: Raw signal-signal correlation values in pre-Twist vs. post-Twist plotted against
one another. The x-data represent the Pearson correlation value between pairs of measured
signals in the pre-Twist state, whereas the y-data represent the Pearson correlation value
between the same pairs in the post-Twist state. Dashed boxes indicate borders of significance, including p < 0.1, p < 0.05, and p < 0.01, such that data points outside of a given
dashed box ("outside" in the horizontal direction for pre-Twist, and the vertical direction
for post-Twist) are significant to that level. Data points are colored to indicate significance
(p < 0.05) in the post-Twist state only (orange), pre-Twist state only (cyan), both states
(red), or neither (white).
113
1
ance between the measured signals and the phenotypic output(s), but it also tries to
capture the variance within the measured signals themselves. As such, signals with
high variance will be given more weight by the PLSR algorithm, even if that signal
does not correlate well with the phenotypic output. Here 'high variance' indicates the
variance associated with the mean-centered and unit variance-scaled values. Thus, in
the case when a signal has small variance in the original data prior to scaling, this
signal will subsequently have high variance in the unit variance-scaled data (because
dividing the original data by a small variance value will inflate the unit variancescaled values). Thus, as a side point, one should have sufficient variability in the
original data values to reduce the likelihood of inflated unit variance-scaled values.
It should also be noted that feature selection has generally not been explored in
published PLSR models (one exception is ref. [124], wherein the PLSR models were
implemented using plsregress in MATLAB). This is in part due to the underlying
methods used to build the PLSR models. In some software (e.g., SIMCA-P), building
many different PLSR models (e.g., using different sets of N-site models) is a laborious
process involving a lot of input from the user. In contrast, building PLSR models in a
more computer language-type environment (e.g., MATLAB) enables facile exploration
of thousands of different PLSR models automatically after entering a few lines of code.
Such computing barriers should be kept in mind when choosing modeling software.
From an experimental perspective, given that just two phosphosites can provide
reasonable predictions for cell speed, these results suggest that one could measure
these few sites in a signaling experiment in lieu of actually performing the migration
experiment. Such an approach could be useful if one wanted to obtain cell speed estimates across many different growth factor treatment conditions, for which obtaining
cell speed estimates would take an undesirably long time. To be even more practical
from an experimental perspective, one could repeat the model-building exercises using time point-specific signaling data instead of signal time course AUC values. That
way, one could simply measure signaling values at an individual time point(s) instead of needing to obtain measurements from an entire time course. This could save
substantial amounts of time compared to performing the cell migration experiments
114
directly.
From a biological perspective, these results indicate that separate signals are likely
driving cell migration when cells are in an epithelial monolayer state (3-4B, left)
versus a mesenchymal sparse (isolated single cells) state (3-4A, right). The importance of Akt in epithelial monolayer cell migration and JNK in mesenchymal sparse
cell migration is consistent with literature regarding epithelial versus mesenchymal
cell biology. It has been shown in Madin-Darby canine kidney (MDCK) epithelial
cells that the engagement of E-cadherins is necessary and sufficient for the induction of Akt activity upon adherens junction assembly [125]. E-cadherins are integral
membrane glycoproteins that serve as adhesion receptors and promote homophylic
calcium-dependent cell-cell interactions. They are found within adherens-type junctions in epithelia [125]. Adherens junctions are dynamic structures that physically
connect neighboring epithelial cells, and also couple intercellular adhesive contacts to
the cytoskeleton [126].
JNK is known to phosphorylate paxillin and regulate sparse cell migration; a small
molecule inhibitor of JNK, SP600125, inhibited the directed movement of sparsely
distributed keratocyte cells [127]. Paxillin is a focal adhesion-associated, tyrosinephosphorylated adapter protein [128] that is also involved in signaling from integrins
[129].
Phosphorylated JNK is known to localize at focal adhesions [130].
Focal
adhesions are macromolecular protein complexes that transmit the effects of the extracellular matrix to the actin cytoskeleton through integrins [131].
These observations draw links between Akt signaling and cell-cell adhesion, and
JNK signaling and cell-substrate adhesion. These results are consistent with the experimental design used here: the pre-Twist epithelial cells were observed in a monolayer state in which cell-cell contacts were maintained, while the post-Twist mesenchymal cells were observed in a single cell sparse state in which cell-cell contacts
were not maintained, but cell-substrate contacts were. Further, the observation that
pre-Twist cells in the monolayer condition moved in a sheet-like manner (for a discussion of sheet-like, or collective, cell migration see ref. [132]), while post-Twist cells in
the monolayer condition moved individually (Aaron Meyer, personal communication,
115
August 8, 2010), suggests that post-Twist cells do not form cell-cell adhesions even
when the cells are near one another in a monolayer.
The observations are consis-
tent with the results showing that pre-Twist cells in a monolayer state maintain their
E-cadherin junctions, but that post-Twist cells in a monolayer state do not (Fig. 3-2).
Additionally, further experimental testing has validated the computational predictions about the importance of JNK signaling in mesenchymal cell migration. Using
three different small molecule inhibitors of JNK (JNK-IN-8, EMD Millipore; TCS-
6o, Tocris Bioscience; and SP600125, Selleck Chemicals), each inhibitor significantly
reduced the migration of 12Z endometrial cells. Further, SP600125 significantly reduced the migration of MDA-MB-231 "triple-negative" breast cancer cells; the other
two inhibitors were not tested against the MDA-MB-231 cells (Miles Miller, personal
communication, January 8, 2013). While these cell lines are derived from the epithe-
lium, both lines exhibit essentially post-EMT mesenchymal features.
Taken together, these results suggest that pre-Twist epithelial cell migration is
governed by the adhesivity of cell-cell contacts mediated by adherens junctions, while
post-Twist mesenchymal cell migration is governed by the adhesivity of cell-subtrate
contacts mediated by focal adhesions. In this manner, the pre-Twist epithelial monolayer migration results may actually reflect non-pathological, sheet-like migration
phenomena associated with wound healing and development, whereas the post-Twist
mesenchymal sparse migration results may reflect the pathophysiological, invasive
type of migration more relevant in cancer. Thus, the transition from physiological
to pathophysiological cell migration may be driven by a transition from cell-cell to
cell-substrate dependent adhesion and migration.
The results presented here, through a combination of cell signaling and migration
experimental data, computational analyses, validation of computational predictions,
and literature review, demonstrate that JNK is a key mediator of mesenchymal cell
migration. Further, there is growing evidence for the role of JNK in not just migration,
but in mediating EMT itself [133, 134, 135]. To strengthen the hypothesis about EMT
as a transition from cell-cell to cell-substrate adhesion, future experiments would have
to quantify the presence of focal adhesions in the epithelial and mesenchymal states.
116
It remains unclear the extent to which epithelial cell migration is dependent on both
cell-cell and cell-substrate adhesion, and in the process of EMT the cell-cell adhesion
is lost but cell-substrate adhesion remains; or if cell-substrate adhesion is weak in the
epithelial state, and EMT represents a transition from primarily cell-cell adhesion to
primarily cell-substrate adhesion.
3.4
Methods
For a full description of the experimental materials and methods, see Kim et al. [70].
3.4.1
Correlation network modeling
Pairwise Pearson correlation was used to quantify the relatedness between signaling
nodes in the epithelial and mesenchymal cell states. First, the geometric mean of the
phosphorylation fold-change relative to time zero from two to three biological replicates was calculated for each nonreceptor phosphosite time course. Using only the
four nonzero time points across five growth factor treatments, this gave 20 data points
per phosphosite per cell state. Given 11 nonreceptor phosphosites, the Pearson correlation was then calculated between each pair of phosphosites using this 11 x 20 data
matrix. The p-values for nonzero correlation were calculated using a Students t distribution for a transformation of the correlation. This provided (11 x 10)/2 = 55 unique
pairwise correlation coefficients and p-values, neglecting self-correlations. Three separate methods were applied to account for multiple hypothesis testing: Bonferroni
Bonferroni is the most conservative, and
[118], Benjamini [117], and Storey [136].
Storey the least conservative, of these alternative methods, with respect to assigning
statistical significance to correlations.
3.4.2
Reduced PLSR models
To determine subsets of phosphorylation sites, out of the 11 non-receptor sites measured, that were most predictive of cell speed, reduced PLSR models were created in
117
which N
=
3, 4, or 5 of the 11 sites were used in the model. In each case, all combina-
tions of N sites were considered. For example, there are "11 choose 4", or 330, 4-site
models to create; separate models were analyzed for the epithelial and mesenchymal
cell states. The data used to create the reduced PLSR models were the integral values
of the time courses from the 11 non-receptor phosphosites across the six experimental
conditions (serum-free, EGF, HRG, IGF, HGF, and PDGF treatment), providing for
66 data points. Data values were mean-centered and scaled to unit variance for each
phosphosite prior to building the PLSR models. All reduced PLSR models used two
principal components and were implemented in MATLAB (MathWorks, Natick, MA)
using the plsregress function.
Given the six experimental conditions, each subset of sites was used to train a
model on five of the experimental conditions and use the resultant model to predict
the cell speed in the sixth left out condition. For each subset of sites, the arithmetic
mean training error and arithmetic mean test error were calculated across the six experimental conditions. The errors were the absolute difference between the predicted
and observed cell speeds. High-scoring reduced PLSR models were denoted by their
ability to have both a lower mean training error and a lower mean test error than
the full 11-site PLSR model. To compare different high-scoring reduced models, their
distance from the origin in a plot of training error versus test error was calculated.
These distances relative to the distance from the origin of the 11-site models errors
were then plotted to compare the quality of the reduced models to the 11-site model
(Reduced/Full Model Error, Figs. 3-11, 3-12, and 3-13).
The significance of observed phosphosite frequencies in the reduced PLSR models
using 3, 4 and 5 phosphosites is summarized in Fig. 3-14. A two-tailed hypergeometric test was performed to determine if certain sites appeared in the high-scoring
models more or less often than expected by chance. For the Bonferroni correction,
the cumulative likelihood of observing sites more than a maximum frequency and
less than a minimum frequency was a Bonferroni-corrected p < 0.05 (p < 0.025/22,
or p < 0.0011, for each tail).
The Bonferroni correction of 22 was chosen given
the 11 phosphosites across the two cell states. False discovery rate corrections were
118
calculated based on the Benjamini method [117].
119
120
Chapter 4
Receptor tyrosine kinases fall into
distinct classes based on their
inferred signaling networks
Note: This chapter forms the basis of a manuscript that has been submitted for publication,
Wagner and Wolf-Yadlin et al. (2013) [137]. The author contributions for that manuscript
are as follows: A.W.-Y., M.S., and G.M. designed experimental research.
M.S. performed experimental research.
A.W.-Y. and
J.K.G. and D.E.R. contributed shRNA reagents
and support. J.P.W. and D.A.L. designed computational research. J.P.W. performed all
computational research and data analysis following extraction of the microarray data (including data pre-processing, quantifying shRNA effects, network inference, application of
CCLE data, and study of receptor-intrinsic properties). J.P.W. designed all figures except
portions of Fig. 4-1. J.P.W., A.W.-Y., M.S., D.A.L., and G.M. wrote the paper.
4.1
Introduction
Receptor tyrosine kinases (RTKs) are critical effectors of cell fate that are expressed
ubiquitously during development and throughout the adult body. Fifty-eight RTKs
are encoded within the human genome, belonging to 20 subfamilies as defined by
genetic phylogeny [1]. RTKs initiate intracellular signaling events that elicit diverse
121
cellular responses such as survival, proliferation, differentiation, and motility [138].
Dysregulation of RTK-activated pathways, often a consequence of receptor overexpression, gene amplification, and/or genetic mutation, is a causal factor underlying
numerous cancers, leading to an increasing number of FDA-approved RTK-targeted
therapies [1].
It has become increasingly clear that co-activation of multiple RTKs limits the
efficacy of RTK-targeted therapies
resistance (e.g., [139, 140, 141]).
[52] and can serve as a mechanism of acquired
Recent work has also shown that stimulation of
tumor cells with certain RTK ligands can rescue cells from therapies targeting other
RTKs [142, 143].
Thus, it seems that certain RTKs have sufficient redundancy to
compensate for other RTKs upon targeted inhibition. Exactly which RTKs exhibit
this redundancy and why remains unclear. Here, using a set of engineered isogenic cell
lines, we measured the dynamic signaling networks of six RTKs while simultaneously
perturbing thirty-eight different signaling nodes singly and in combination using RNA
interference (RNAi). Applying multiple computational network inference approaches
to the data, we found that certain RTKs exhibit functional redundancy because they
are able to induce similar downstream signaling networks. The six RTKs studied here
fall into three classes based on their inferred networks, and these classes are consistent
with clinically observed modes of resistance to RTK-targeted therapies.
4.2
4.2.1
Results
A systematic perturbation-based approach to uncover
RTK-specific signaling networks
Reverse engineering of biological networks is an attempt to infer the underlying structure of regulatory networks from gene expression or signal transduction data using
computational network inference algorithms [19].
Although these approaches often
uncover important regulatory interactions, spurious correlations in gene expression
or protein activation levels make it difficult to isolate direct, causal interactions. To
122
QSix isogenic, RTK-specific cell lines
44 W_
WInfect
E
R
FGFR1
c-Met
IGF-1R
NTRK2
PDGFRO
Lentiviral shRNA expression
vectors targeting signaling nodes
/
RTK-specific network-level
data in biological quadruplicate
Stimulate each cell line
with RTK-specific ligand
0 min\\...//256
pData
min
D Print lysate microarrays
and probe with
19mm
processing
PTM-specific antibodies
Measured Signal
P-Actin
Figure 4-1: Data-rich, perturbation-based profiling uncovers RTK-specific signaling networks. Six isogenic cell lines expressing either EGFR, FGFR1, c-Met, IGF-1R, NTRK2
or PDGFR were treated with lentiviral shRNA expression vectors to modulate the cellular abundance of 38 downstream signaling proteins. Upon stimulation with RTK-specific
ligands, time-dependent signaling events were monitored using high-throughput lysate microarrays. The resulting compendium of signaling measurements, consisting of over half a
million individual data points, served as a starting point for computational analysis, allowing insight into the mechanisms underlying RTK specificity.
123
circumvent this limitation, a number of efforts have used targeted perturbations [16],
sometimes in conjunction with dynamic measurements [144, 145], to constrain network topology and to infer directionality between nodes.
Here, we used this strategy to infer the topology of RTK-activated signaling networks by systematically perturbing network nodes using RNAi and broadly measuring
network dynamics under each perturbation condition using high-throughput lysate
microarrays (Fig. 4-1). We focused on a representative subset of six phylogenetically
diverse RTKs: epidermal growth factor receptor (EGFR or ErbB1), fibroblast growth
factor receptor 1 (FGFR1), insulin-like growth factor 1 receptor (IGF-1R), hepatocyte
growth factor receptor (c-Met), neurotrophic tyrosine kinase receptor type 2 (NTRK2
or TrkB), and platelet-derived growth factor receptor beta (PDGFR3). To isolate the
unique features of each RTK from potentially confounding differences in gene expression levels, we used a set of six otherwise isogenic cell lines, each of which expresses
one of the six RTKs at comparable levels, and in which downstream signaling can be
activated by treatment with cognate ligand [146]. Thirty-eight proteins within these
cell lines were systematically perturbed by lentivirus-mediated RNAi [147], individually (Table Si in ref. [137]) or in pools (Table S2 in ref. [137]), using a total of 88
short hairpin RNA (shRNA) interventions with a median average of 77% knockdown
efficiency. To account for possible off-target reactivity of the RNAi reagents, two
different shRNA clones, targeting different regions of the same transcript, were used
for each gene.
Our perturbations broadly covered the pathway's downstream of RTKs, notably
the PI3K/Akt, Ras/MAPK and PLCy/PKC/Ca2
pathways [148], as well as a num-
ber of phosphatases, cytoskeletal components, and receptor-proximal adaptor proteins. For each cell line and each RNAi intervention, we followed signaling activity
by treating the cell lines with RTK ligands for 10 different durations ranging from
1 to 256 minutes (plus a zero time point control for each). Performing all experiments in biological quadruplicate, over 24,000 unique lysates were collected in our
study. To query the state of activation across key signaling pathways in each lysate
in a multiplex fashion, we used lysate ("reverse-phase") microarray technology [40].
124
Using antibodies we had validated in a previous study, and using methodology developed therein [149], we quantified the relative levels of 22 phosphorylation sites on
21 signaling proteins in each of the >24,000 lysates (Table S3 in ref. [137]). Collectively, these signaling measurements, comprising over half a million independent data
points, report on the state of the receptor/adaptor layer of signaling, the MAPK, Akt,
PKC and calcium signaling cascades, and a variety of transcription factors. To our
knowledge, this constitutes the largest signaling data set recorded to date. It is our
hope that these data prove useful to the signaling and systems biology communities.
Processed raw data and discretized median data are provided in Tables S4 and S5 (in
ref. [137]), respectively.
4.2.2
RNAi perturbations reveal conserved Akt, MAPK, and
PKC pathways across six RTKs
To assess the effect of each RNAi intervention on each measured network node, shRNA
knockdown effects were first quantified for all 88 network-targeted shRNAs (shRNAs
targeting signaling proteins) relative to three control shRNAs (two shRNAs targeting
GFP and one empty-vector control). For every shRNA perturbation, we calculated
the area under the curve (AUC) across the 11-point time courses; AUC was calculated
separately for each RTK, each signal, and each biological replicate. A hairpin was
considered to affect a signal only if its AUC was significantly greater than or less
than the AUC values of all three controls (1% false discovery rate (FDR), two-sample
t-test). Example hairpin effects are shown in Fig. 4-2A.
We confirmed consistency among the biological replicates: 75% of RTK-signal
pairs had a coefficient of variation < 10% (Fig. 4-2B). Additionally, Pearson correlation coefficients between the network-wide responses for pairs of shRNAs targeting
the same gene exceeded 0.85 for 95% of the measured signals (Fig. 4-2C). shRNA
pairs exhibiting lower correlation values generally did so in multiple cell lines (e.g.,
MEK2 has low correlation values for multiple cell lines), suggesting biological, rather
than technical, origins for the variability. A conservative summary of network-wide
125
rivisisimind anrn
hkannkol
-.-.
hdU4?7
-.sOCK2OMN
?FW
u#Kt
0
-
-
,
Ir-W-1
U
IS
RTK
-"-4-
ii
80
No RTKs hi
LL
I 1A
-L
II
c-Cbi(Y7 73)
Paxiin (Y1
A
(84
GSK3c- (S21
SO(S235.112
Se (S240,82 44)4
-
elect
A 6 ms
5 RTKS
2 RYK
t
x
No esiht
Mbred ef
et
2RTK
3 m.
PKCP (88
MARCKS (8512,S5
5 RTKs
AMU
6RWix,
cam (S
RSK3 (35,S3
c-Ref(S280,8296,83
MEK1/2 (8217/S2 01)
ERKil2 (T202JY2
01)
p90RSK (S3
STAT1 (Y7
STAT3 (fY7
80)
CRES (S1 33)
NF4S (S5
c-un (8
Uo.m.~ a~I
200
C
Mnute of 1GF-1stimuus
D
E
8~5
nebue
a
AhN
Deareesed
"
TargetAs
shRNATargt
Figure 4-2: Perturbations reveal specificity in RTK-induced signal transduction. (A) Time
courses showing example shRNA perturbations that increase, decrease, and have no effect
on p-ERK1/2 signaling in the IGF-1R cell line. Values shown are averages ± standard
deviations of four biological replicates at each time point. Solid and empty squares of
the same color represent data from two different shRNAs. (B) The coefficient of variation
(c.v.) across four biological replicates was calculated at each time point under each shRNA
perturbation for each measured signal. The median of the resultant c.v. values is shown.
(C) The Pearson correlation between measured signals resulting from two shRNAs targeting
the same gene, when considering all signals and time points together. (D) All six RTKs
induce the canonical downstream pathways Ras/MAPK, PI3K/Akt and PKC/Ca 2 , but
each RTK activates its own complement of non-canonical signaling events. Phosphositeperturbation pairs are categorized into increased (red shades) or decreased (blue shades)
measured signals relative to shRNA controls, based on a threshold of statistical significance.
Yellow outlines denote data points where shRNA targets protein products contain measured
phosphosites. (E) The large number of shRNA-induced increases in signaling across 1-3
RTKs indicates that negative regulatory interactions may play a key role in conferring
specificity to RTK signaling networks.
126
''
c 1
L
-L I- I4
---- 6
-L
-
I-
Lo II4j-
I
LE
I I L
-' II
I-
I
R Ir
I I
10 - 1
1
L- 6m -
61 1
m6 1L
1 1 - I II
-ILi-A
1
-
6
L --J
q
Figur
4-3 shN effct fo iniiuL hNI
In a 1% Storey FD corcIo (ls
osratv) Th repesnato chse her cotat
wit Fig 4-2 in whc th heatmapI
gav Ignfcn an cosstn efecs Stce barsI-I
only inicte case whr bot shN
gie phshst row stce b raoer
indcat bInr inras/eces effects.For a
below
th hoiona lin inict inrae or derae sina effcts repIvey The six
bytesmeclrschm as used in th In tet as intill
colors rereen th si RTA
itoedm in= Fig -1:s cy-a EGFR pupe FGFR- yellow IG-1R red c-e;gen
Aseik iniae shNI sed n pols Phshs1 row
blue PDF1
NTRK2I
by aprxmt paha mebeshp Thi vertica
and
shN coun ar oraie
by aprxmat
Meart sR
liesseart difeen shRN pols Thic vertical" line
pathways.-I
L
-j
- ---
127
Lu
-. L
-- LM
6---1
L 1
-16.
L
-L--- I
I
I
I
I I
I I
I I
I
I
I
I
I
L-
L
I
I
I
I
I
L -a-dJ
I
I
I
r vp I
I
-Mj
I
L
M 'i
L
I
6 -- L
I
I
- i
I I
-1 F K
sl'r--- 491
I
I
I
1
I
I
1
-
1
I I
1hE.
I
17f
II~i
I
--JL+
L+
II - 1I
-i--I- I
a j
9--L
-a L-
II
'2+
1
04
I I -- II I
-1 p-
=-
I
0
L-1
j..
I
1
-I
--
I
IN
- I p- ;I
m-j -K
j I
04 1
it~uIIIiluuiITuuluIlu
I
I
I Il~u i It II I A~u l TIM I II I I II I I jI I UIITI I
I
14-
L
I4
Ig
-
-F
- a m n L-
-L-
T
1-
M
-
-
- E
Figure 4-4: shRNA effects for individual shRNAs using a 1% Benjamini FDR correction
(more conservative).-
128
~A~AA
~A
A
-,--
.
A0
-A -
-
. A -
t
~
~ ~
~
Ii
~-A
A
-
A .
A-
I ~~
0
&4A - f
V
~
*~
*,-
4t -A
A
A
-~-
A-A.
A
-
-.
~
.
,
,
A
AA AA~
~A
7-
.
-
4
eOA
-AA
060
AAA
J
4I
.-
-
-W
O
+
A
-
-
-
-
A
------
A.~.
-
AA
4----
Aq
LA
1
-
A
-
~
-&A
A
.4
-----
-A
A -
-
--
A-
shRA_ &A--
across four__-4
~
A-,
*
--
-
-
t
0
6 A
-
It-
~
_ bioogca relcts*md AAUC
ofcnto
counst arte orgaze
by aprxTe pedaha meberolship. Thin verticalclned s prte mdirnt shRN
hNsby approximate pathwaymmesi.Ti etcllns ofteiarataret
129
-4
+
Astuek indicunatie shRNAsdine poois.s Phspositedrowsuan shRNAI
vetcllnssprt
A
A-0-0:
-A
.-
- -
y -
-
-A.-
i
---
A~A - --
C f es
-
-
-
-
q:----A
~AA
Aal-o2 man A
A
-
------
9
sAP AA-
-o
-----
V400
-A
AAA---
-
~AA
,A
--
--
-A
A ----
-
.
40-
-------
W! A --
O
0
A
ALtA
-
--
AAA_
0
.~-
--
-
-
A
-~
poueaols thc
gifeene sR Apos
hc
,
--
A A A
-
z'
shRNA effects, showing only significant effects (1% FDR) that were observed consistently with both hairpin clones, is shown in Fig. 4-2D. Receptor-specific shRNA
effects, including effects that were observed for only one of the two shRNA clones
targeting each gene, are provided in Figs. 4-3, 4-4, and 4-5.
Tallying the number of significant perturbation effects across RTKs revealed that
network connections within the Akt, MAPK, and PKC pathways are uniquely conserved across RTKs. Perturbations within each of these pathways, specifically (i)
PI3K-+PDPK1-+ Akt-+ GSK3, (ii) Raf-> MEK-- ERK-> p90RSK, and (iii) PLCy -PKC/16 -+ MARCKS, are propagated throughout the entire pathway in the majority
of cases across all six RTKs. In contrast, most other perturbation effects are observed
only across small subsets of RTKs. Additionally, the directionality of relationships
between signaling nodes is highly conserved across RTKs. Considering only shRNA
effects that are consistent across both hairpin clones, all 107 of 107 shRNA effects observed in at least two cell lines affect the signal in the same direction (either increased
or decreased) across cell lines. When pooled shRNAs are included, more than 96%
(184/191) of shRNA effects show consistent directional effects. Even when pairs of
shRNA clones that are not consistent with each other are included, 92% (565/612) of
shRNA effects are directionally consistent. These results indicate that perturbation
sensitivities across RTK-activated signaling networks, if they are present, are generally conserved (i.e., no reversal in directionality), but that RTKs use distinct subsets
of the available RTK connectivity space.
In addition to many reduced signals, some phosphorylation sites exhibit increased
levels following shRNA-mediated perturbations. These increases, however, tend to
be conserved across fewer RTKs. Only one increased signal is observed across all
six RTKs (p-MEK1/2 increases in response to ERK knockdown by the ERK shRNA
pool), and only three increases are observed across five RTKs (p-MEK1/2 and pERK1/2 both increase in response to the GSK3 shRNA pool, and p-Akt increases in
response to the PTPN pool). Notably, robust feedback within the MAPK pathway
and crosstalk between the Akt and MAPK pathways are observed across all six RTKs.
Although these regulatory events have been observed by others-feedback from Erk
130
to Sos and from Erk to c-Raf (ref. [150]), and negative regulation of MEK/ERK
by GSK3 (ref. [151])-the extent to which they are conserved across RTKs was not
previously appreciated.
It is also notable that knockdowns resulting in increased
signals across five or six RTKs are observed only with shRNA pools but not with
single shRNAs, suggesting that protein isoform-specific roles across RTKs can be
overcome by concurrently perturbing multiple isoforms.
The specificity of shRNA effects is quantified in Fig. 4-2E, summarizing the number of each type of colored square in Fig. 4-2D. This summary shows that most
perturbations affect only 1-3 RTKs. Most pan-specific effects (affecting five or six
RTKs) are decreases in signals at phosphosites within the MAPK, Akt, and PKC
pathways. The observed effects across RTKs and across the measured signals are
significant relative to a model that assumes shRNA effects occur randomly (Fig. 4-6,
x test, p = 0). Thus, although the RTKs share many of the same pathways, they
exhibit different levels of sensitivity to targeted perturbations.
Changes in phosphorylation signals following shRNA perturbations can arise from
one or more of the following phenomena: (1) reduction in the concentration of a
kinase or phosphatase, directly affecting the phosphorylation levels of its substrate;
(2) transcriptional, translational, or post-translational feedback or compensation in
the network; and (3) modulation of scaffold or protein complex stoichiometries [152],
including decreases in the concentrations of proteins to which phosphatases dock. We
expect the non-specific effect of lentiviral infection itself to be minimal because, rather
than being dominated by a possible "infection signal", many hairpins exhibit different
effects on the signaling network compared to empty vector and GFP-targeted control
hairpins.
Post-translational feedback in the network may function through feedback reactions (such as Erk phosphorylating inhibitory sites on c-Raf/Raf-1, ref. [150]), or
through more indirect effects. As an example of the latter, competitive inhibitiontype effects may occur when there is an increase in the amount of enzyme available
for a given substrate following a reduction in the concentration of one of that enzymes other substrates [153]. This effect has been called retroactivity. For example,
131
"*crease sinal
A
M
Z
1400
Decrease signal
onW
Thewe
-0 Observed
1400
--
200
1200
1000
1000
600
00
200
z
0
1
3
2
4
5
20
6
0
Number of RTKs affected
B
160
140
140
120
120
100
100
80
z
so
60s
4
5
6
signal
80
60
40
40
20
20
0
3
2500 Simulations
-- SImulaon Average
Theoreticl (hypergeomet)
-0-observed
200
180eW
160
0
2
2Decrease
2500 Smi
-SImuladion Average
T
diW (hy
-e-Observed
180
1
Number of RTKs affected
Increase signal
200
n=M*Al (hyge)
beeved
5
10
15
0
20
0
Number of sites affected
5
10
15
20
Number of sites affected
Figure 4-6: Observed shRNA-induced effects across (A) RTKs and (B) phosphosites are
not consistent with a model in which shRNA effects are randomly distributed. The total
number of increased (1,232) and decreased (1,346) signal effects resulting from the Storey
1% FDR correction were randomly distributed in silico among the 11,616 RTK-shRNAphosphosite pairs (6 RTKs x 88 shRNAs x 22 phosphosites). The number of increased
and decreased signal effects was then tallied across the RTKs and across the phosphosites.
This simulation was repeated 2,500 times. The simulation average converged to the hypergeometric distribution. Because the distributions of randomly distributed shRNA effects
are not consistent with the distributions of observed shRNA effects, often with empirical
p << 1/2,500 (4 x 10-4) when comparing individual values in the distributions, we conclude that the distributions of shRNA effects across RTKs and across phosphosites are
significantly non-random.
132
if a kinase has multiple substrates, reducing the concentration of one substrate may
increase phosphorylation of its other substrates. Similarly, if a phosphatase has multiple substrates, reducing the concentration of one may decrease phosphorylation of
its others. This concept has been considered theoretically in the context of kinase
inhibitors [154], where inhibiting a kinase can turn on a quiescent parallel pathway.
To our knowledge, however, this has not been considered in the context of RNAi perturbations. While demonstrating specific cases of this effect is not our goal here, we
simply submit that some shRNA perturbation effects may stem from this nonintuitive
indirect phenomenon.
Lastly, observed shRNA effects are not just a function of which proteins are phosphorylated by a kinase (or dephosphoryalted by a phosphatase), but rather which
residues are phosphorylated or dephosphorylated. Thus, the absence of an shRNA
effect at a particular phosphosite does not necessarily imply that the corresponding
proteins are not connected, as they may functionally relate through a phosphosite
that is not measured in our study.
4.2.3
Data-driven network inference reveals three RTK classes
To better understand the signaling network topology and dynamics underlying these
perturbation-induced effects, network inference was performed using each cell line's
data separately. The zero minute time point was separated to represent the basal,
unstimulated network state, and the remaining ten time points were grouped into
three time scales based on k-means clustering of the temporal data across all RTKs.
Although the algorithm did not require it, the resulting four time scales contained
only contiguous time points: basal (0 min), early (1, 2 min), intermediate (4, 8, 16
min), and late (32, 64, 96, 128, 256 min). This provided four time scales for each of
the six RTKs, yielding 24 different data subsets.
The use of complementary network inference methods can improve confidence by
circumventing the biases inherent in any single algorithm [155]. We therefore used
five different network inference algorithms: Bayesian networks [50], mutual information [156], context likelihood of relatedness (CLR) [27], Spearman correlation, and
133
Mutual Information
CLR
Spearman Correlation
Pearson Correlation
-
5OCluster 1
-FGFRI
-4
c-Met
~ ~ -IGF-1
---.
-- I
..
-.
..
-
R
NTRK2
(
-----.
1,
PDGFRP ,''.
.I._
_
.---.
-,...--'
_
._ .ne
Cluster 2
Cluster 3
time scales'
ooOFour
stuctures
twork
Eigenvalue 1
Figure 4-7: Clustering RTK-specific network models reveals three RTK classes. Connectivities of RTK signaling networks were derived from our large-scale signaling data using five
different network inference algorithms. Relationships between RTK-specific networks were
then visualized in two dimensions using multidimensional scaling. Marker color denotes
receptor cell line. Marker size denotes the four time scales from basal (smallest markers)
to late (largest markers). Dashed outlines and marker shapes represent k-means clustering
assignments.
134
l
3%
3 0%
v UPn mipO
EdPcew prd8 10% vElde 8
8.%
Vaiwmcam0 30.3% VuemmuGqi*Wd33% VwbiOWbsW:W@d
Cmii:.S
0.10
ICmrd
Vadnmsepnd~M2 %
V msplaiwd:28.1%
CordcOMs
MIlA.04
MIA06S
Veinmexpled: 278%
Vdmosmpind 23%
2
Z
Vednc
.1
explined: 27.3%
N0.c0
mICo
0.16
Vdmsmplind:34.3%
Z: .24
I=
CoMd
VWn
Mc0.2
Vdnce aiUd
McO.10
ON MN10
8N0.06
0.55
ene
elhd27.%
>A.23
A4.% Vanc
.led29.7%
vaence expWM
:0.37
ICi
A4
lComA.38
VWlanc pk nds 51.0%
VedanceMpinuiied:o.3%
e"plaind:44.1%
Cci0 .20
CordA.s
c1onW
eAid: 41.3% VadnS epain
27.0% V
VOd=cs
exined:87% VwmcuqWined:
ON0.4
22
VwWc.q knd40.1% VdMr
iin>013
icor
vdnc= exopnd 45.8% Vadnceeplied41.9%
Ed pindO 40%
Gdgn WhlWp : 50% UPewe pn 40%
Edp WOW
40% vd
40.%
VuliuqOWd
0
VWSWIS*Wnd424% Vwimsmiitd402% Vwbmo*OW4WMed40*%
explaind
Cnii
MI0.13
explained304%
Vadanm
Z:.079
V7lmnc.exWinst 284%
N0C.18
Cmi
n>30
AOAM
A.30
48.0% Vunoqxpais 479% V
Varin
A.16
cAned:310%
Vi..mceud.4GA%
ACond027
ICmniO.5
Cmi
40.8%VwncepiWned:4I.5%
A.22
vedenc
p
d
ZA.38
M
1%
4Wu~tpiOS
%
VWmlpW"422%
vad0n%
e
53.1% VWadnOimneId:2.0%
Vadcemeaplined
nWlMexined
Zy1.03
Vm
lncqpined: 28.
vEdange
W)-O3O
Co A.63
lan
VOWM eid 83.5%
MZ-OA1
eMind:
vAdMne
eplaind: 27A8% varWino
Z>1.75
ZSA41
pined 2p.6% Vadns eqxpwed:322%
28.2% VWadWM
vedencexiplnhud:
8N40.20
N04NA7
MNc0.73
BNA0W
Figure 4-8: Network model clusters are robust to intermediate range of applied edge weight
thresholds. Results for clustering the network model structures, here visualized using the
first two eigenvalues from multidimensional scaling, for each of the five inference methods
across a range of edge weight thresholds. Thresholds were determined by varying the
percentile ranking of edge weights for each method. Percentiles increase from left to right,
and are indicated at the top of each row. Plot marker shapes indicate cluster assignments,
while marker size indicates time scale (basal = smallest, late = largest). The six colors
represent the six RTKs by the same color scheme as used in the main text, as initially
introduced in Fig. 4-1: cyan, EGFR; purple, FGFR1; yellow, IGF-1R; red, c-Met; green,
NTRK2; blue, PDGFR3. The following percentile ranges provide robust clustering of the
three RTK network classes for each inference method: Spearman (50-70), Pearson (30-70),
mutual information (50-70), CLR (30-70), and Bayesian (40-50).
135
8.0%
Pearson correlation. Each of these five methods was applied to the 24 different data
subsets across RTKs and time scales, yielding 24 different network states per method.
To visualize differences in the inferred network structures across RTKs and time
scales, adjacency matrices describing the topology of the inferred networks were analyzed using multidimensional scaling [157] (Fig. 4-7). Remarkably, the five inference
methods consistently revealed three distinct RTK classes:
EGFR/FGFR1/c-Met,
IGF-1R/NTRK2, and PDGFR3, regardless of the time scale.
These three RTK
classes are robust, as they are maintained across a wide range of network model edge
weight thresholds (Fig. 4-8). This indicates that the signaling networks downstream
of these six RTKs operate according to three identifiable programs, and that the majority of variation in the inferred network structures arises from differences between
RTKs, rather than between time scales. Inferred signal-signal relationships are largely
maintained across time for a given RTK.
4.2.4
Consensus across inference methods reveals RTK classspecific signaling
To determine which edges account for the differences in network topology across
the three observed RTK classes, a consensus network was developed using all five
inference methods and all four time scales for each RTK. This approach identified
edges consistently observed within one, two, or all three RTK classes (Fig. 4-9).
Because variation in network structure arises primarily from different RTKs rather
than different time scales, this consensus approach highlights RTK class-specific edges
conserved across most time scales.
The consensus network reveals a striking pan-RTK signaling core shared by all
six RTKs, along with sets of RTK class-specific edges (Fig. 4-10A). Notably, the
IGF-1R/NTRK2 and PDGFR3 networks both contain fewer edges than the EGFR/FGFR1/c-Met
network, with all edges in the IGF-1R/NTRK2 network and all except one edge in the
PDGFR/3 network also present in the EGFR/FGFR1/c-Met network (Fig. 4-10B).
This suggests that, among the measured phosphosites, the EGFR, FGFR1, and c-Met
136
S.7
a=0.7
0101400
as
as
410Wa
Figure 4-9: Identifying RTK class-specific edges through consensus network edge frequency.
Heatmap values indicate the fraction of network models (when considering all five inference
methods and all four time scales) containing the indicated edge.
receptors exhibit a greater degree of coordination in their responses to growth factor
stimulation, as a denser network implies more highly correlated signal-signal relationships compared to the sparser IGF-1R/NTRK2 and PDGFR3 networks. The
pan-RTK backbone identified through our network modeling approach contains the
conserved MAPK, Akt, and PKC pathways-as previously highlighted by our direct
analysis of shRNA-induced effects. In addition, it contains a variety of other conserved directional edges. The majority of RTK class-specific edges are signals related
to c-Cbl, Shc, paxillin, and calmodulin, suggesting that these receptor-proximal signaling influences may play a central role in mediating RTK class-specific responses.
Some nodes in the RTK class network models have no inputs, and thus have no
directed path from the receptor phosphorylation site (labeled 'RTK' in Fig. 4-10A)
to the node. This can occur across all RTKs (for example, Akt has no inputs in
any of the RTK classes models), or within an RTK class (for example, the MAPK
cascade beginning with c-Raf has no inputs in the PDGFRL
class network, but has
c-Cbl as an input in the other RTK classes' models). In the latter case, this does not
137
B
OTxn.
PDGFR
Factor
oR er-
IGF-IR
W
EGF4
FGFR1
NTRK2
J4
C
o EGFR CI ti pog2l
PU N (Y118)
Figure 4-10: Network models' consensus reveals core RTK signaling backbone and RTK
class-specific interactions. (A) RTK backbone edges are shown in thick black edges, while
class-specific edges are colored. Nodes are colored according to their approximate biological
function. Tyrosine and serine/threonine-containing phosphorylation epitopes are shown as
ovals and boxes, respectively. (B) A Venn diagram showing shared and class-specific edges
across the three RTK classes. All IGF-1R/NTRK2 edges and all but one of the PDGFRL
edges are present in the EGFR/FGFR1/c-Met network. (C) Median signal values (across
all time points, shRNA conditions, and biological replicates) for each phosphosite relative
to the EGFR cell line.
138
necessarily imply that the node without an input is unphosphorylated in the RTK
cell line(s) from that class compared to the other cell lines. For example, the median
phosphorylation level of c-Raf in the PDGFRO cell line is comparable or higher than
in the other cell lines (Fig. 4-10C). Instead, nodes lacking inputs for some or all RTK
classes are likely under the influence of unmeasured signal(s), termed hidden nodes,
or under the influence of other measured signal(s), but in a potentially complex way
not captured by the model.
To determine if clustering the raw data directly could recapitulate the network
model clusters, the raw data were clustered using (1) median signal values across all
time points, (2) signal values from all time points, (3) signal values from each time
scale, or (4) signal values from each time point (Fig. 4-11). Data from all shRNA
perturbations were used in each case. The network model clusters were recapitulated
using the raw data in only two of 16 clustering scenarios (late time scale and 256
min.), and one of the three clusters was recapitulated in an additional three scenarios
(0, 2, and 4 min.). That all five network inference methods highlighted the same
three RTK classes, but clustering of the raw data generally did not, suggests that
inferred network topologies contain information not accessible by clustering the raw
data directly.
To explore this notion further, we generated synthetic data from networks with
four different known topologies (see section 4.4.16). We simulated five sample data
sets per network, and then attempted to classify the resultant twenty data sets according to their underlying network using either the raw data or the network topologies
inferred from the raw data. The inferred topologies clearly segregated according to
their underlying network, whereas the raw data did not (Fig. 4-12). This further
supports the notion that inferred network topologies, in which relationships between
measured signals are explicitly quantified, provide insight into the multivariate structures underlying raw data beyond what can be observed by clustering the raw data
directly. This strengthens our case for using the identified RTK network model clusters as relevant indicators of signaling network differences among the six RTKs.
139
B
va"les
"ed'ans'
A
1TRK2
AFGFR1
2
EFR
AtmPot"G
O"WRB
40
G -R
20
F
gGF-1R
E4TR
AFGFR
-20
Aomai
-40
40
10
C
5
0
P40 CWWpI
-100
Eady
Basal
0 min.
0
0
PftnCom. I
-40
100
Late
'l'ed""
32, 64,96, 128, 256 min.
1,2 min.
40
40
m
20
30
20
1I
1I
10
10
PDF1R
EMGFR
0: scu
-20
40
0
20
Pdn.Comp. I
FDR
* FGR
FR
40
-Pu.
B
2 min.
(See *Basal, 0 min."
case above)
20
*GF-IR
10
twSoe
16min.
-20
10
*EGFR
-30
-20
0
20
40
0
PGR
20
FGFR
20
IGF-IR
oArGFRhA,~
GFRI
GFR
0
0
20
PrmCamp.I
GFR1
4
8 min.
EGFR
COM.1
4 min.
2D
&A
AFGFRi
-10
41POF
-40
40
20 --10
U
TGFRI
-20
-10
D
NM
A
20
20
-2 _10
0
10
Pn.Cam- 1
64 min.
20
1
201@ER
10
0
-10
128 mn.
NTRK2
EFOF-1R
AFGFR
0
20
P40n.
CORP.1
-0
0
96min.
0
20
-40
40
10
0
-20
0
20
256 min.
128mm~.
20
I
F 3FR1
-20
40
.10 dkffMOjP'
,Nrx
*
AEF
GFRB
I PF-IR
I
I
ckwnamiado
of3mrwdabE
FGFRI
AjG -1R
3 of 3rawdta Clurs mh
nwhmkmoftdan
-20
-40
Pd".CammI
!
0
-20
P*LCOMpI
R
.0
Pfit.Cow. I
-10
20
0
20
Figure 4-11: Clustering the raw data directly. The raw data were clustered using (A) the
median signal values, (B) all time points together, (C) each time scale separately, or (D)
individual time points. Cases where one of three raw data clusters matched the network
model clusters are shown with blue titles, while cases where all three raw data clusters
matched the network model clusters are shown with red titles. Marker shapes indicate
cluster assignments. The six marker colors represent the six RTKs by the same color
scheme as used in the main text, as initially introduced in Fig. 4-1: cyan, EGFR; purple,
FGFR1; yellow, IGF-1R; red, c-Met; green, NTRK2; blue, PDGFR3.
140
ANetwork
B
#1
Netor
Network #4
::3
Network #2
Samole #1
Saminle #3
Sample #5
Samole #4
0
0
Cn4
z
0
2I
I.
0
a,
Signal values across 200 conditions
C
Raw Data
(PCA)
Continuous Correlation Matrix
(PCA)
Normalized Raw Data
(PCA)
40
4
200
CO
W
Binary Correlation Matrix
(Jaccard Distance)
0.4
30
40
05
10
-200
-E0
-00
02
100
20
0-
-.----- - - - -
0
jo
0
0.1
V
*
0
--
-
0
-
.
-0.1
S
-10
-0-0.2
-40
-207
0
20
40
04.4
0.
-02
0
0.2
0.4
Eigenvalue I
Figure 4-12: Clustering network topologies inferred from simulated data reveals underlying network differences but clustering raw data does not. (A) Four synthetic networks used to simulate
data. Network structures are defined based on edges between individually numbered nodes, whose
positions vary from network to network. (B) Simulated data sets for five independent samples from
each of the four networks. Rows of each heatmap represent nodes and columns represent conditions.
All heatmaps are shown using the same colorbar scale. (C) Using principal component analysis to
visualize the raw data, normalized raw data, and Spearman correlation matrices, and multidimensional scaling to visualize the binary Spearman correlation matrix (correlation values exceeding the
6 0 th percentile). Marker colors indicate which of the four synthetic networks the data were generated from. The inferred network topologies (i.e., correlation matrices) cluster according to their
underlying networks, but the raw data do not.
141
X 102
2-
1.5-
1
0.5-
2
4
6
8
10
12
mRNA expression level [RMA]
14
16
Figure 4-13: Observed distribution of gene expression values in the CCLE. All gene expression values in the CCLE were included (when considering all 18,926 genes across all 967
cell lines, i.e., 18, 926 x 967 ~ 18.3 million gene expression values). The bimodal nature of
the plot suggested a natural range over which to consider genes to be expressed versus not
expressed.
4.2.5
RTKs and ligands are co-expressed in cancer cell lines
and enriched in certain solid tumor types
To determine the degree of expression of the six RTKs and ligands used in this study
in relevant cancer cell lines, we analyzed the Cancer Cell Line Encyclopedia (CCLE)
data set, which includes mRNA expression values for -19,000 genes in 967 cancer cell
lines
[1581. Expression values exceeding five on an RMA (robust multi-chip average)
scale were used to define expressed genes. This threshold was chosen based on the
observed bimodal distribution of RMA values across all genes in the CCLE (Fig. 4-
13). EGFR, FGFR1, MET, and IGF1R were widely expressed (97, 96, 81, and 95%
of cell lines, respectively), whereas PDGFRB and NTRK2 were only expressed in
23% and 4% of cell lines, respectively.
The degree of co-expression for RTK and ligand pairs varied across the six RTKs
(Fig. 4-14A). Because of the nature of the CCLE experiment design, any observed
gene expression would be limited to expression in tumor cells and not stromal cells.
We used co-expression of receptor and ligand as an indicator of potential autocrine
142
A
EGFR
FGFRI
EGF
FGF1
IGF-1R
c-Met
HGF
IGF1
NTRK2
PDGF
BDNF
PDFB
D
12
0
.
EGFPR.MET
Wid =1
Ligands In
this study
a-66200,040~
EGFR
B
EGF or HBEGF
oE'o
FGFR1
FGFI, 2, 4, 5, or 6
RTK-ac vat
Ilgands
C
EGFR mRNA
FGFR1 mRNA
NTRK2
BDNF or NTF3
lot
PDGFRS
PDGFB or D
0l
71O I4
MET M
Pdri.Comp.1
Pdn.Comp.I
IGF1R mRNA
NTRmnRNA
PDGFRB mRNA
Pdn.Coma.I
Pdn.Coma.I
Prn. Com.1
0io
E
Prin.Comp.1
Carcinoma
Prin.Comp.1
F
*.7
H*matopoletc neoplam
Gioma
Prin.Comp.I
Lymphold neoplamm
Malignant mlanoma
Pdn.Comp.1
Neuroblastoma
Figure 4-14: RTK and ligand expression in CCLE cell lines. (A) Co-expression of receptors
(black) and the ligands used in this study (red) across 967 cell lines in the CCLE. (B)
Considering co-expression of multiple cognate ligands in the CCLE increases the number of
cell lines co-expressing receptor and at least one ligand. (C) Gene expression levels of the
six RTKs displayed in principal component space. (D) mRNA expression values for EGFR,
MET, and FGFR1 plotted against one another. Red circles indicate cell lines with greater
than median expression values of the three RTKs. Tumor histologies (E) enriched or (F)
depleted for co-expression of EGFR/FGFR1/MET. Red markers indicate cell lines derived
from the indicated tumor histology type.
143
activation of these RTKs. The low co-expression of some receptors and the ligands
used in our study may be partly explained by the fact that some receptors can be activated by multiple ligands. For example, if in addition to considering the ligands used
in our experiments we consider additional ligands that can activate EGFR (ref. [159]),
FGFR1 (ref. [160]), NTRK2 (ref. [161]) and PDGFR/3 (ref. [162]), we see that most
cell lines expressing an RTK express at least one cognate ligand (Fig. 4-14B). We
estimate that the signaling networks induced by these other family ligands would be
similar to those induced by the ligands used in our study. c-Met is only known to
be activated by HGF, and data for IGF2 (another ligand for IGF-1R) and NTF4
(another ligand for NTRK2) were not available in the CCLE.
Low co-expression of some receptors and ligands may also be partly explained
because some receptors are more commonly activated in a paracrine manner, and thus
we would not expect high receptor and ligand co-expression in tumor cells alone. For
example, c-Met is often activated by HGF secreted from stromal cells [163], although
HGF-independent activation may also occur [164]. Thus we observe that the RTKs
used in this study and one or more of their cognate ligands are co-expressed across
many cell lines in the CCLE. Co-expression of the receptors and ligands are robust
to the RMA threshold used to define expression (Fig. 4-15).
Having established widespread co-expression of receptor and ligands in the CCLE,
we next sought to determine which tumor types were co-expressing the RTKs used
in our study. To provide a two-dimensional visual representation in which cell lines
segregate according to global differences in their gene expression levels, principle component analysis (PCA) was applied to the matrix of -19,000 gene expression values
across all 967 cell lines. The resulting layout of the 967 cell lines in principal component space is shown in Fig. 4-14C. PCI and PC2 explain 8.1% and 4.6% of the variance
across cell lines, respectively. The color of each circle represents the expression level
of the RTK in each cell line. These PCA results show that many cancer cell lines
express multiple RTKs. Further, these data show that EGFR, FGFR1, MET, and
PDGFRB are expressed at high levels only in particular cell types, whereas IGF1R
is expressed at high levels in nearly all cell types, and NTRK2 is only expressed in a
144
FGFRIGF
E3FR~eGF
METMGF
IFIRFI
NTRK26DNPF
GF
DFODF
A
so
W
B
EGMFR
EGForNEEGF
FGFR1
FGF1.2, 4, S.or 6
N12
BOWtr NTF3
e
PXFRS
PDGFBor D
I
'a
-O
Figure 4-15: Co-expression of the receptors and ligands for multiple RMA thresholds. The coexpression of receptors and cognate ligands, as shown by Venn diagrams, are robust to the RMA
threshold used to define expression when considering (A) only ligands used in this study, or (B)
multiple RTK-activating ligands. Different RMA thresholds are shown down the rows, while different
RTKs are shown across columns.
145
small subset of cell types.
Given the number of cell lines co-expressing EGFR, FGFR1, and MET, and the
similarities of the EGFR, FGFR1, and c-Met network models, we sought to identify
which tumor types were co-expressing these three RTKs. Plotting the expression levels of these three RTKs on the same axes indicates that they are indeed co-expressed
at high levels in many cell lines (Fig. 4-14D). Using information in the CCLE about
the original tumor histology of each cell line, we calculated which tumor histologies had more or less EGFR/FGFR1/MET co-expression than expected by chance.
Carcinoma, glioma, and melanoma cell lines were enriched for EGFR/FGFR1/MET
co-expression (p = 8.7 x 10-
2
6, p
= 3.0x 10-6, andp = 1.9 x 10-3, respectively, Fig. 4-
14E), whereas hematopoietic neoplasm, lymphoid neoplasm, and neuroblastoma cell
lines were depleted (p = 2.0 x 10-30,p = 1.7 x 10-24, and p = 7.8 x 10- , respectively, Fig. 4-14F). These results are robust to the RMA threshold used to define
the co-expression signature, with the exception of the melanoma cell lines. At higher
expression thresholds, melanoma cell lines are actually depleted for the RTKs coexpression, suggesting that the three RTKs are not all expressed at high levels in the
same melanoma cell lines. A full assessment of EGFR/FGFR1/MET co-expression
enrichment or depletion across 20 tumor histologies in the CCLE, as a function of
expression level threshold, is shown in Fig. 4-16.
Overall, these results suggest that specific patterns of RTK co-expression (e.g.,
EGFR/FGFR1/MET) are overrepresented in certain tumor types. We propose that
the pre-existing co-expression of multiple RTKs from the same network model class
within many cell lines in the CCLE is consistent with notions of primary (or instrinsic)
resistance to RTK-targeted inhibitors. This redundancy of RTK networks may also
be at play in the development of longer term acquired resistance to RTK inhibition,
through selection of subpopulations of cells with higher levels of compensatory RTKs,
and/or feedback within the same cells that increases expression of compensatory
RTKs.
146
Ewings acome-
01
Prmas,
umour
C-rciwid-
nu,.edmsn
endocdntulmour
-...
-
6
4
2
2
.
4
0.8
.62
400
0
4
200
0.4
0.2
4
>S
>a il
6
0
02
4
>5
>6
502
40
0.5
10'0.5
4
>"
>5
>6
A7
>6
44
>5
1.5
0
3
'7*
7
>9
4
3-5 x6
a
'4.
5
7
0
15
60.20
206
>
>0
10
0
44
>7
102
1.5
'4
>6
30
o1stm
0.6
>7
1T
2
1.51
"S
Rfwosecom
chonrosecome
ecinOM
00
00.8I3
.
'4
447
'4
74
'44,74
4
44
9
RMA threshold used to define
EGFR, FGFR1, and c-Met co-expression signature
Significantly enriched or depleted for
EGFRIFGFR1/c-Met co-expression
Figure 4-16: Cell line histology enrichment results for multiple RMA thresholds. The
enrichment or depletion of cell lines with EGFR/FGFR1/MET co-expression in cell lines
with different histological origins are shown. The values on the x-axis indicate the RMA
threshold used to first identify which cell lines were co-expressing the three RTKs. Using the
hypergeometric distribution, grey data points indicate the expected number of cell lines from
each histology type that should also co-express these three RTKs. Red data points indicate
the actual observed number of cell lines from each histology type that also co-express these
three RTKs. Note that subplots have different y-axis scales. Significance (indicated by
black circles) was determined using a 5% FDR (p < 0.0191) with the Benjamini method.
147
4.2.6
RTK network class genes are correlated with responses
to RTK-targeted therapies
Given the results that certain RTKs fall into classes with shared inferred network
topologies, that these RTKs are frequently co-expressed in cancer cell lines, and that
certain tumor types are enriched for this co-expression, we next sought to assess if
RTK co-expression had implications for response to RTK-targeted therapies. We
first asked whether RTKs within a network class were correlated with resistance to
therapies targeting RTKs in that same class. This is based on the notion that RTKs
within the same class appear to have shared underlying signaling network topologies,
and thus these RTKs may be more capable of compensating for inhibition of other
RTKs in that class. For example, resistance to EGFR inhibitors may be mediated
more effectively by FGFR1 and c-Met than by IGF-1R, NTRK2, or PDGFR3.
To test this hypothesis, we again turned to the CCLE data set. In addition to
gene expression data, the CCLE contains cell growth inhibition data across 500 cell
lines for 24 anticancer compounds, including EGFR, FGFRI, c-Met, and IGF-IR kinase inhibitors (erlotinib, TK1258, PHA-665752, and AEW541, respectively). There
are caveats associated with using the CCLE data to assess drug resistance mechanisms. First, the kinase inhibitors off-target effects may complicate interpretation
of these resistance profiles [1651. In the case of the EGFR inhibitor erlotinib, the
c-Met inhibitor PHA-665752, and the multi-kinase inhibitor TK1258 that also targets
FGFR1, we see that these compounds have many off-target effects compared to, for
example, the EGFR/HER2 inhibitor lapatinib (Fig. 4-17A). These off-target effects
likely induce their own minor resistance mechanisms in concert with the resistance
mechanisms for each compounds primary target(s). Regarding TK1258, although it
binds 18 kinases with greater affinity than it binds FGFR1, many of these other genes
are expressed at low levels across the CCLE cell lines (Fig. 4-18). A second caveat is
that the receptors and their ligands are not always co-expressed, as noted in Fig. 4-14,
so ligand-mediated receptor activation is less likely for some receptors. Nevertheless,
we believe that meaningful conclusions can be drawn from a comparison of network
148
A
k.
mhJl
ii d
.
I
11
I
I
2442
Idna
r - .207, p o 9.77-08
r = 0.179, p x 1.269-04
,
13
=0.16
r = 0AM6,p = 0.16
4
3..
2
3
BhM
3
6
-0.184. p - 1.05*-04
a
4
p = 0.14
-0.72
r
10
3.-
6
10
12
4
6
r -..
6
*0
10
6
S
10
ea
1
to
r- -0.12S,p S6.416-03
105, p.=2.166-02
3
3
p-0.36
1
6
4
6
r-
4.133. p
7
4
0.74
3
$6
r 0.14, p=0.12
3
t
S
3
6
10
.-0.0.350
--
0
2
1
a
r 0.04
- 4.076, p a 0.13
-
3
2
6
4
2
*
6
10
- r.416p-.
-'3
,
2o
1
6
a
r = 0.073,p
4.
i
4
66
10
12
6
24
r - -0.101,p * 2.72P-02
r
*0
to
4
0.202, p = 1.16.-0S
1
6
4
7
r=0.21,pO0A7
6
6
r 0.00k,pa 0.93
g
2
6
1
r -0.18, p - 4.38.-0
=0.125,
46
to
p a 6.30s-03
r
6
to
12
2
-0.161, p = 1.12e-03
4
4
r.4-.300, p=0
4
5
TrU
6
EG
10
t
7
30
r =0.115, p=0.21
4
6
POGFM
r
S.1~a2
2
tg
FOFRI exSP"8sion
6
J
2
4 M6
12
4
6
6
10
IGFIRxpei.
Figure 4-17: RTK class genes are correlated with anti-RTK therapy response. (A) Affinity
of kinase inihbitors lapatinib, erlotinib, PHA-665752, and TK1258 for 442 kinases (data from
ref. [165]). On-target effects are shown in red bars. (B) Correlating RTK gene expression
with responses to EGFR, FGFR1, c-Met, and IGF-1R inhibitors across hundreds of cancer
cell lines. Cell lines with RMA expression values >5 or <5 are shown in blue or grey,
respectively. Values above subplots are Spearman correlation coefficients and p-values. Red
lines indicate linear fits to the data. Genes significantly correlated (1% FDR) with drug
response are shown with red font titles.
149
FLT3 mMA
P.
Comp,1
POGFAtAmAMA
bgo,0 K. M) -. 27
PfkLCOWW
I
RETmAAMYUC4
Ibg(K. aM . 15
FOPRI
mMA
POGPROn~M
P.
Cony I
UU(2 mRHA
Iogo(( (1D*72
PubkComy.I
a~A
1og1 (K MD-7.1
KITmAMA
YSK4mANA
Pdk cum. I
Prin.Corny.I
UU(1 mAf"
bg,,,(K.P
7M
CSFIlR mAMA
i6og,(, WMD
.72
Pri.
I
MYLK2mA
Iog,
.7.4
0 (K.PAD
TNN
mAMA
P.kLCMV. I
PLTI w~M
bo~(.)
.a7.16
Prk~Cony.
I
PdkMCamp.1
GAK4mA
bqI (K.WD 7.01
RAM mRNA
WM-6.89
0910O(Ka
MASTImRAA
Pi.COMP I
MAP2K2mRNA
logO( 0 PCD=7.15
A ConW1
TOKImMNA
Iag,O(K.
WD.S.AW
FA
AK4f~U3
Mg.SK.(K
PODW6.4
I12
10
S
a
4
Figure 4-18: Gene expression values of tightest TK1258 kinase binders. The gene expression
values of the 24 kinases that bind TK1258 most tightly are shown, as indicated by the color of
each marker in principal component space (analogous to the plots in Fig. 4-14C). Kinases are
sorted in order of decreasing affinity from upper left to lower right. Each marker corresponds
to one cell line in the CCLE. Although TK1258 binds to 18 kinases more tightly than it
binds to FGFR1, many of these other binders are expressed at low levels in the CCLE. All
subplots are shown on the same color scale.
150
structure similarity and small molecule growth inhibition data.
To quantify the relationship between gene expression and response to a given drug,
we calculated the Spearman correlation coefficient between each genes expression
values and the activity area for that drug in each drug-treated cell line. Activity
area is a metric for growth inhibition: greater activity area implies more growth
inhibition, and thus increased sensitivity to the drug. Gene expression values that
positively correlate with activity area thus denote sensitivity to the drug, whereas
negatively correlated genes denote resistance. In this manner, we used the diversity
across ~500 cell lines-irrespective of their histological origin, copy number variation,
or mutation status-to understand how variation in gene expression correlated with
variation in drug response.
We considered significant correlations among expression levels of each of the six
RTKs and six cognate ligands and the four relevant inhibitors (1% FDR, p < 0.0054).
Consistent with expectations, we saw that EGFR and IGF1R expression were correlated with sensitivity to the EGFR and IGF-1R inhibitors, respectively. More interestingly, EGFR expression was correlated with resistance to the c-Met and FGFR1
inhibitors; FGFR1 expression was correlated with resistance to the EGFR inhibitor;
and MET expression was correlated with resistance to the FGFR1 inhibitor. NTRK2
and PDGFRB expression were not significantly correlated with responses to any of
the four inhibitors (Fig. 4-17B). Among the six cognate ligands, BDNF (the ligand
for NTRK2) was correlated with resistance to the IGF-1R inhibitor, and IGF1 was
correlated with resistance to the EGFR inhibitor. None of the other ligands (EGF,
FGF1, HGF, PDGFB) were significantly correlated with responses to any of the four
inhibitors (see Fig. S12 in ref. [137]). These correlations were calculated using only
cell lines that expressed a given gene with RMA > 5.
Using correlations is notable because, rather than just considering a binary relationship between the presence/absence of RTK expression and drug response, they
capture the graded relationship between the continuous level of RTK expression and
drug response. In other words, it is the quantitative level of gene expression that can
be considered relevant, not just the presence or absence of an RTK. To confirm this
151
finding, the same analysis was performed using different RMA thresholds (> 0, 4, 4.5,
5.5, 6) for both receptors and ligands (see Fig. S12 in ref. [137]). All noted receptordrug correlations are robust at low thresholds, whereas only the erlotinib-EGFR and
AEW541-IGF1R correlations are maintained for RMA > 5.5 and > 6. Additionally,
at low thresholds MET expression is correlated with resistance to c-Met inhibitor,
and PDGFRB expression is correlated with resistance to EGFR inhibitor. Regarding ligands, the AEW541-BDNF correlation is maintained at all low thresholds, but
because fewer cell lines strongly express the ligands as compared to the receptors, the
ligand-drug correlations are less robust to the RMA threshold.
Thus, despite the caveats associated with the CCLE, we see several cases of
intra-class genes associated with resistance (EGFR with c-Met and FGFR1 inhibitor,
FGFR1 with EGFR inhibitor, MET with FGFR1 inhibitor, and BDNF with IGF-1R
inhibitor), and only one case of an inter-class resistance mechanism (IGF1 with the
EGFR inhibitor) using the RMA > 5 threshold. This supports the notion that coexpression of same-class RTKs may contribute to resistance to RTK-targeted therapies.
To strengthen the argument that expression of an individual RTK correlates with
drug response, we calculated the Spearman partial correlation coefficients [25] between each RTK gene and each drug, and each ligand gene and each drug, while
controlling for the expression of the remaining five RTK genes or five ligand genes
(Fig. 4-19). These results indicate that most correlations identified in Fig. 4-17B are
maintained for the corresponding partial correlations, providing stronger evidence
that the correlation between an RTK and drug response is due to that RTK individually and not because of correlation between RTKs' gene expression values. The
significant (5% FDR) receptor partial correlation values are constant for all RMA
thresholds up to RMA >4.5, whereas the ligand partial correlation values are only
constant for the two lowest RMA thresholds. The receptor partial correlations are
consistent with the three RTK network classes, with the exception of PDGFRB correlating with resistance to erlotinib. However, the ligand partial correlations are not
as consistent with the RTK network classes, perhaps because multiple ligands can
152
Receptors
(p0.0163)
EW541OG-1
I
RMA
IWA4
p.O163)
MA P3.5
A0
(p0.016)
AMA
i
:4.5
(p4.0168)
AEW54100K-F-1
AEW541
Endam NWF
ilE
Miss(FFR
li
MKA
.
IiMA)1.5
RMA A
(p4.0101)
0.2 Sensitive
0.1
0
-0.1
-0.2 Resistant
Ligands
060M)
OGF-1t6W
AEW041
IWA
FAIA3.5
§WA0
AW541
EI
L
E30
<0.1M1
AEU410FI--1
{pWAG13
~LEL
00F-tS4
AON541
AEM54100F-1 4
I
I
EmnbGFHELI4
n~6,MA)E
"" FI
>4.5
(p<00133)
AEW54100F-1f
MM4100F~-tR4
EmG4
E<0AW43
F#M
4
{p<.0146)'
<.AM1)
Io02
Sensitive
0.1
0
-0.1
-0.2 Resistant
Figure 4-19: The Spearman partial correlation coefficients between each receptor gene and
each drug, and each ligand gene and each drug, while controlling for the expression of the
remaining five RTK genes or remaining five ligand genes. Each subplot shows a different
RMA threshold that was applied to the gene of interest. For each RMA threshold a 5%
Benjamini false discovery rate was applied, with p-values noted.
153
activate some receptors and therefore analyzing one ligand is insufficient.
4.3
Discussion
In this study, we integrated pathway-level phosphorylation measurements, RNAi
perturbations, and computational network inference to quantify signaling network
specificity across six RTKs. The shRNA perturbations revealed a core set of Akt,
MAPK, and PKC pathways conserved across all six RTKs, which were recapitulated
in RTK-specific network inference models, along with additional RTK-specific signaling relationships. Importantly, the six RTKs network models clustered into three
classes: EGFR/FGFR1/c-Met, IGF-1R/NTRK2, and PDGFR.
Using gene expres-
sion data from the CCLE, we showed co-expression of RTK and ligand pairs across
many cancer cell lines, along with enrichment for EGFR/FGFR1/MET co-expression
in carcinoma, glioma, and malignant melanoma cell types. Using corresponding anticancer drug response data from the CCLE, we showed evidence for intra-class resistance mechanisms prevailing over inter-class mechanisms, whereby expression of
EGFR was correlated with resistance to c-Met and FGFR1 inhibition, expression of
FGFR1 was correlated with resistance to EGFR inhibition, expression of MET was
correlated with resistance to FGFRI inhibition, and expression of BDNF (the ligand
for NTRK2) was correlated with resistance to IGF-IR inhibition. The relationships
between EGFR and c-Met inhibition, FGFR1 and EGFR inhibition, and MET and
FGFR1 inhibition were also maintained for partial correlation calculations.
The novel application of systematic RNAi perturbations in concert with pathwaywide signaling measurements, made feasible by the use of lysate microarray technology, enabled the inference of the most comprehensive RTK-specific signaling network
models to date. Because network inference is fundamentally a question of quantifying correlation among measured signals, and because the primary driver of variability
in the signaling data used here was RNAi perturbation responses, we conclude that
RTKs with similar inferred networks have downstream signals that respond similarly
to perturbations. Based on this notion, we propose that RTKs within the same net154
work model class are more capable of promoting resistance to therapies targeting
RTKs in that class than are RTKs in a different class.
There is extensive literature evidence consistent with the notion of intra-class
drug resistance mechanisms among the six RTKs studied here. The most comprehensive evidence comes from two recent studies that measured the ability of different
growth factors to rescue cells from various RTK inhibitors [143, 142]. Harbinski et
al. observed (i) the ability of EGF family ligands and FGF family ligands to rescue
c-Met-dependent cell lines from c-Met inhibition; (ii) the ability of EGF family ligands
and HGF to rescue FGFR2- and FGFR3-amplified cell lines from FGFR inhibition;
and (iii) synergistic growth inhibition with combined FGFR1 and c-Met inhibition
both in vitro and in vivo. Wilson et al. observed that (i) FGF2 (FGF-basic) and HGF
can each at least partially rescue the four tested EGFR mutant cell lines included
in their study from EGFR inhibition by erlotinib; (ii) EGF and NRG1 can each at
least partially rescue all three tested MET amplified cell lines from c-Met inhibition
by crizotinib, and FGF2 can at least partially rescue two of them; and (iii) EGF and
NRG 1 can at least partially rescue three of four tested FGFR amplified cell lines from
FGFR inhibition, and HGF can rescue one of the four cell lines. Strikingly, PDGF-AB
ligand never rescued any cell line and IGF-1 only partially rescued three of the 41
tested cancer cell lines from any drug. Although our study used PDGF-BB and FGF1
(FGF-acidic) ligands, these results nonetheless indicate that the rescue potential of
growth factors mimic the RTK classes extracted from our network models: EGF family, FGF family, and HGF ligands have generally similar rescue potential across cell
types, whereas IGF-i and PDGF family ligands have sparse to non-existent rescue
potential.
Beyond these ligand rescue experiments, additional evidence exists that is consistent with the network model classes identified here.
The link between EGFR
and c-Met is well established: c-Met can compensate following anti-EGFR therapy
(e.g., [139]), and, conversely, EGFR can compensate following anti-c-Met therapy
[140, 1661. Additional evidence linking FGFRI and EGFR signaling also exists. Combining EGFR and FGFR family kinase inhibitors has been shown to exhibit additive
155
[167] or synergistic [168] growth inhibition, and combining dominant-negative forms
of both EGFR and FGFR1 resulted in synergistic increases in cell death [168]. Additionally, FGFR1/FGF2 autocrine signaling has been observed in non-small cell lung
cancer (NSCLC) cell lines that do not respond to gefitinib [169], and the induction of
FGFR2 and FGFR3 expression has been observed in response to gefitinib in gefitinibsensitive NSCLC and head and neck squamous cell carcinoma cell lines [170].
There has also been evidence for resistance to EGFR inhibitors by the de-repression
of IGF-1R signaling [171].
Later work by the same group, however, showed that
IGF-1R compensates poorly for EGFR loss because IGF-1R only strongly maintains
activation of the Akt pathway, whereas c-Met activates both the Akt and MAPK
pathways [139]. Consistent with these reports, our results show that IGF-1R exhibits
the lowest median Mek, ERK, and p90RSK phosphorylation among the six RTKs,
whereas c-Met exhibits similar levels to EGFR (Fig. 4-10C). These observations are
consistent with the weak rescue potential of IGF-1 noted above. All non-EGFR cell
lines exhibit comparable p-Akt and p-GSK3 signals that are actually higher than
the EGFR cell line, suggesting that activation of Akt is not a distinguishing feature
among these six RTKs, at least in the context of saturating doses of growth factor in
our isogenic system.
There is less clear evidence for therapies against IGF-1R, NTRK2, and PDGFR3,
in part because there are fewer studies of targeted therapies against these RTKs.
EGFR has been cited as a reason for primary but not acquired resistance to antiIGF-1R therapy [172], whereas others suggest the insulin receptor is the primary
driver of resistance to anti-IGF-1R therapy [173]. Resistance to anti-PDGFRO therapy (in the form of imatinib, which targets Abl, c-Kit, and PDGFRoj/#3) seems to
involve mutations of the targeted proteins and amplification of Src family kinases,
rather than compensation by other RTKs [174]. To our knowledge, no studies have
addressed resistance to any anti-NTRK2 therapies.
In addition to RTK network phenocopying, there may be other mechanisms driving co-activation of particular RTKs, especially chromosomal structure processes
[175]. Notably, EGFR and MET are both present on chromosome 7; and all genes on
156
chromosome 7 are significantly amplified in sets of glioma (FDR < 10-10) and lung
(FDR < 0.25) tumors [176]. Further, MET is located at a fragile site on chromosome
7 that makes it prone to amplification
[177].
Literature evidence that other RTKs
are present at fragile sites was not found. Thus, amplification of MET may be an
especially prevalent mechanism for resistance to EGFR inhibitors because not only
does c-Met phenocopy the EGFR/FGFR1 network, it is also prone to co-amplification
with EGFR.
Although same-class RTKs have similar network models and are capable of rescuing cells from and causing resistance to inhibition of other RTKs in that class, these
same-class RTKs are not fully redundant. Simply inspecting the RTK-specific shRNA
effects shows that, although there are similarities in shRNA effects across same-class
RTKs, these effects are not identical (Figs. 4-3, 4-4, and 4-5). Further, it may be that
if we observed and/or perturbed different or additional signaling nodes beyond those
studied here, we may see these same-class RTKs' network models diverge from one
another. That multiple RTKs exist, and that they are co-expressed in cancer cells,
suggests that these RTKs are not fully redundant. Thus, while the network models
we developed here are sufficient to classify ligand rescue and drug resistance patterns,
the similarity of same-class RTKs is nevertheless a relative concept.
The evidence for the importance of RTK network phenocopying in drug resistance is strong, but the exact mechanism enabling this behavior is unclear. Given
that the six cell lines differ predominantly by a single variable-the identity of the
expressed RTK-we speculate that it may be receptor-intrinsic biophysical properties causing the RTKs to group into the three identified network classes. To this end,
we compared the sequences of the six RTKs' cytoplasmic and kinase domains. We
also compared previously published data about kinase inhibitors' binding affinities
for these receptors [165], hypothesizing that using small molecule binding profiles as
a proxy for kinase substrate specificity may provide an explanation for the observed
RTK classes. However,. none of these three properties, when clustered, produced
clusters identical to the network models (Fig. 4-20). The inhibitor profile clusters
did match the kinase domain clusters, suggesting that the kinase domain sequence
157
Cytoplasmic domain
sequence
Kinase domain
sequence
.......
Kinase inhibitor
binding profiles
.. ..
---- - - - -- - - -
00
o Eigenvalue 1
Figure 4-20: Clustering RTK biophysical properties does not reveal RTK network model
clusters. The cytoplasmic sequences and kinase domain sequences were each clustered across
the six RTKs, but did not reveal the same clusters as the RTK network models. Data
concerning the affinity of each RTK for numerous small molecule kinase inhibitors was also
clustered, but again did not match the network model clusters. Dotted ellipses and marker
shapes represent k-means clustering assignments. The six marker colors represent the six
RTKs by the same color scheme as used in the main text, as initially introduced in Fig. 4-1:
cyan, EGFR; purple, FGFR1; yellow, IGF-1R; red, c-Met; green, NTRK2; blue, PDGFRO.
largely explains the RTKs' differential sensitivities to kinase inhibitors.
Some previous work has explored the notion that receptor recruitment interactions
define specificity in receptor-activated signaling. Using chimeric EGF and insulin receptors, early work showed that RTK cytoplasmic domains encode kinase specificity,
mitogenic and transforming potential, and receptor routing [178]. Others have shown
in yeast that kinase domains encode limited intrinsic discriminatory specificity, and
that the functional identity of a kinase is instead largely determined by its recruitment interactions [179]. These observations are consistent with our results showing
that RTK-proximal edges in the network models tend to be RTK-specific, whereas
downstream edges tend to be conserved across all RTKs. Thus, although we are not
certain how these three RTK clusters emerge, it is unlikely to be driven purely by their
kinase specificity, and instead is likely to emerge from specificity in receptor-proximal
protein recruitment.
In conclusion, the RTK signaling classes identified in this study are consistent
with clinically observed mechanisms of resistance to targeted therapies in cancer.
The limited efficacy of single-agent RTK-directed therapies may therefore be due in
part to the pre-existing co-expression of same-class RTKs across a diverse spectrum
158
of tumor types. In this scenario, these tumors are primed to compensate for the
loss of RTK function following therapy. We submit that classifying RTKs by their
inferred networks and then therapeutically targeting same-class receptors, either in
combination or sequentially, may provide clinical benefit by delaying or preventing
the onset of resistance.
4.4
4.4.1
Materials and methods
Cell culture
Isogenic HEK-293 cells expressing EGFR, FGFR1, IGF-1R, c-Met, NTRK2 or PDGFR3
were described previously [146]. All cell lines were cultured in Dulbeccos Modification of Eagles Medium (DMEM; Mediatech, Manassas, VA) supplemented with
10% fetal bovine serum (FBS; Hy Clone, Logan, UT), 2 mM glutamine (Mediatech),
100 I.U./mL penicillin and 100 pg/mL streptomycin (Mediatech). Additionally, cell
culture media contained 150 pg/mL Hygromycin B (Invitrogen, Carlsbad, CA) to
maintain stable integration of RTK expression cassettes.
Lentiviral shRNA expression vectors were produced using a three-plasmid system
as described previously [147, 180]. Briefly, HEK293T cells were co-transfected with
plasmid pLKO. 1 containing the shRNA expression cassette of interest, as well as packaging plasmids pCMV-dR8.91 (containing HIV gag, pol and rev genes) and pMD2.G
(coding for VSV-G envelope protein). Medium was replaced after 24 hours, and viral
supernatants were harvested 48 and 72 hours post-transfection. Viral stocks were
centrifuged and decanted to remove cellular debris, and stored in aliquots at 80 C.
Relative virus titers were determined by transducing A549 lung carcinoma cells at
low multiplicity of infection, selecting for viral integrants with puromycin (Invitrogen)
and measuring relative cell densities by Resazurin viability assay. All viral stocks were
then diluted to match the lowest-titer individual virus. Viral pools were generated by
mixing equal volumes of the titer-normalized component viruses. The total viral titer
of each pool thus matched the titer of each component virus. A complete list of all
159
76 individual shRNA constructs used in this study is given in Table Si (in ref. [137]),
and a list of all 12 shRNA pools used in this study is given in Table S2 (in ref. [137]).
For gene knockdown experiments, RTK-expressing HEK293 cells were first plated
onto D-lysine coated 96-well plates (BD Biosciences, Franklin Lake, NJ) at a density
of 20,000 cells/cm 2 . After 24 hours, medium was replaced with medium containing
lentiviral particles and 10 pg/mL polybrene (Sigma-Aldrich, Saint Louis, MO), and
plates were centrifuged at 1,178 g for 30 minutes at 37'C for enhanced infection
efficiency. For single and pooled shRNAs targeting signaling proteins (test shRNAs),
cells were infected in biological quadruplicates per cell line and time point. Cells
were also treated in parallel with non-targeting shRNA vectors (control shRNAs)
shGFP49 (8 replicates), shGFP477 (8 replicates) and pLKO.lempty (4 replicates).
Mock-infected cells (not treated with virus) were included as an additional control
for infection efficiency (12 replicates). 24 hours post-infection, medium was replaced
with medium containing 1.5 pg/mL of puromycin (Invitrogen) to select for virally
infected cells.
We observed complete cell death of mock-infected cells within 48
hours, while no sign of cell death was evident for any virally infected cells. Ninety-six
hours post-infection, at which time cells were 70-80% confluent, cells were washed
once with phosphate-buffered saline (PBS) and incubated in serum-free medium for
an additional 24 hours. To initiate RTK signaling, cells were then stimulated with
the cognate ligands of each RTK: EGF (EGFR), FGF1/FGF-acidic (FGFR1), IGF1
(IGF-1R), HGF (c-Met), BDNF (NTRK2) and PDGF-BB (PDGFRO) (all Peprotech,
Rocky Hill, NJ). After 1, 2, 4, 8, 16, 32, 64, 96, 128 or 256 minutes cells were washed
with ice-cold PBS and lysed in 2% SDS buffer as described previously [181, 18].
Lysates of cells not treated with RTK ligands served as the 0 minutes time point.
Cell lysates were cleared by filtration through 0.2-pum filter plates (Pall Corporation,
East Hills, NY) and stored at 80 C until microarraying.
4.4.2
Microarray fabrication
Custom lysate microarrays were printed by Aushon Biosystems (Billerica, MA) on
11.5 cm x 7.5 cm single-pad nitrocellulose-coated glass slides. Slides were custom160
manufactured by Grace Bio-Labs (Bend, OR) and were generously provided as a gift.
Lysates were arrayed at a spot-to-spot spacing of 400 pum using 8 depositions with solid
110 pm pins, which resulted in an average feature diameter of 180 Pm when visualizing
spot protein content. Each lysate in our experiment, including lysates of cells treated
with control shRNAs and lysates of mock-infected cells, was initially spotted once
on each microarray slide. A small number of microarray source plates were then
re-printed onto the same slides in cases where spots were missed due to instrument
errors, as assessed visually under a microscope. Each microarray ultimately contained
a total of 26,496 microarray features, 25,344 of which represented biologically unique
lysates. Following microarray printing, slides were stored dry, in the dark, and at
room temperature until further processing.
4.4.3
Microarray probing
To remove the buffer and detergent contained in each microarray spot, slides were
washed three times for 5 min each with 1X PBS/0.1% Tween-20 (PBST), incubated
in Tris/HCl (pH 9) for 72 h with daily replenishment, washed again with PBST,
and centrifuged dry. Slides were then blocked with 5% BSA/PBST for 1 h at 4 0 C.
Microarrays were incubated in a pool of 1:1,000 anti-o-actin antibody (Sigma-Aldrich)
and 1:1,000 phosphospecific antibody (Table S3) in 5% BSA/PBST at 4'C for 24 h.
Following washing, slides were incubated in a pool of 1:1,000 680 nm-dye-labeled antirabbit and 1:1,000 800 nm-dye-labeled anti-mouse antibodies (LI-COR, Lincoln, NE)
in 5% BSA/PBST for 24 h at 4 C. Slides were washed again three time for 5 min
each with 1X PBS/0.1% Tween-20 (PBST), and centrifuged dry. Microarrays were
scanned in the 680 nm and 800 nm channels using the OdysseyTM imaging system
(LI-COR) at 21 pim resolution.
4.4.4
Extraction of microarray data
Slides were visually inspected and initial feature finding and spot centering were
performed using the ArrayPro TM software package (MediaCybernetics, Bethesda,
161
MD). Spots with morphological defects, notably spots of non-circular shape, spots
affected by lint or scratches, and spots overlapping with neighboring spots, were
manually flagged and excluded from our data set. We then used custom-built code
for MATLAB@ 7.4 (The Mathworks, Natick, MA) to refine the positioning of the
circular areas over which the ArrayProTM software would integrate the microarray
spots to derive signal intensities.
Signal intensities from both target proteins and
-actin were then integrated accordingly, and target protein signals were normalized
to their respective O-actin signal intensities to account for any differences in lysate
concentration or spotting. Normalized signal was used in all subsequent data analysis
steps.
4.4.5
Data pre-processing
To remove data outliers that were not detected by visual inspection of the microarrays,
a smoothing window approach was applied. For a time point ti within a time course
from a particular phosphosite, RTK cell line, and shRNA condition, the data from
the three time points ti_ 1 , ti, and ti+ 1 across all biological replicates were grouped
together in a vector x. An upper bound was defined as Q 3 (x)
lower bound was defined as
+ 1.5 x IQR(x) and a
Q1(x) - 1.5 x IQR(x), where IQR(x) is the interquartile
range of x, and Q1 (x) and Q 3 (x) are the first and third quartile of x, respectively. Any
data replicates at time ti that were above the upper bound or below the lower bound
were flagged. This procedure was applied to time points sequentially, starting with
the first time point in each time series. When applied to the first time point in each
time series, only the first and second time points were used. When applied to the last
time point in each time series, only the penultimate and last time points were used.
This time window approach allowed us to take advantage of the temporal dependence
of the data, as phosphorylation levels at adjacent time points were expected to have
approximately similar values. Data for a given time point could only be flagged by
smoothing if there were at least three replicate data points initially present in the
vector x.
In total, less than 2.1% (11,644/564,960) of all collected data points were flagged
162
either because of poor spot morphology or using the smoothing window approach.
After flagging outliers, the flagged data point(s) at t, (for a given RTK, phosphosite,
and shRNA condition) were replaced with the mean value of the remaining data
replicates at time ti.
Each test shRNA had four biological replicates associated with each RTK, phosphosite and time point. Because a small number of microarray source plates were
printed more than once onto each slide, additional technical replicates were available
in some instances. In addition, several control shRNAs had 8 or 12 biological replicates associated with each RTK, phosphosite and time point. In these cases, every
fourth replicate was averaged together to condense the replicates into only four replicates per shRNA, RTK, phosphosite and time point. For example, if there were 12
replicate data points, they would be condensed into four data points based on the
following scheme: (1, 5, 9), (2, 6, 10), (3, 7, 11), and (4, 8, 12). This condensing step
was done after any individual replicates were replaced in the flagging step. The processed replicate data for all RTKs, phosphosites, time points and shRNA conditions
are available in Table S4.
4.4.6
Quantifying the consistency of biological replicates and
shRNA pairs
To quantify the consistency across biological replicate measurements for each phosphosite in each RTK cell line, the coefficient of variation (c.v.) across the four biological replicates at each time point (across 11 time points) in each shRNA time course
(across 91 shRNA conditions) was calculated, producing 91 x 11 = 1, 001 c.v. values.
For each of the six RTKs and 22 phosphosites, the median of those 1,001 values is
shown in Fig. 4-2B.
To quantify the consistency across pairs of shRNAs directed at the same gene,
for each pair of shRNAs targeting one of 38 unique genes, the median signal values
(calculated across the four biological replicates) were compared across all phosphosites
and all time points. Thus, for each shRNA pair, the Pearson correlation coefficient
163
between two vectors, each containing 22 phosphosites x 11 time points = 242 median
data values, was calculated. These correlation values across the six RTKs and 38
unique genes are shown in Fig. 4-2C.
4.4.7
Quantifying shRNA effects
To quantify shRNA-induced effects on measured signals, area under the curve (AUC)
values were compared between time courses of test shRNAs (shRNAs targeting signaling proteins) and control shRNAs (pLKO.lempty, shGFP477 and shGFP49). We
first assembled four time series vectors for each phosphosite, RTK cell line and shRNA
by randomly assigning each of the four replicate measurements at each time point
into one of the four time series vectors. We then calculated the four AUC values associated with each of the time series by the trapezoid method (using the trapz function
in MATLAB R2009a), accounting for the non-uniform intervals between time points
in the time series. Thus, each replicate time series was represented by a single AUC
value.
For each test shRNA, we then compared its four AUC values to the four AUC
values of each of the three control shRNAs in turn. Using a two-tailed, two-sample ttest assuming equal sample variances, this yields p-values PpLKO,
PGFP477,
and PGFP49-
Performing this procedure on all 88 test shRNAs, 22 phosphosites, and 6 RTKs generated three lists of 11,616 p-values. Using each list of p-values separately, we used
the Storey method [136] to determine significance levels for each of the three control
shRNAs. At a 1% false discovery rate (FDR), the significance levels were calculated
to be
%LKO
-0.02871, OGFP477
=0.01625, and aGFP49= 0.02515.
shRNA-induced effects on measured phosphosites were considered significant only
if all three p-values were below the FDR-corrected levels of significance (i.e.,
apLKO, PGFP477 < aGFP477, and PGFP49 < aGFP49),
PpLKO <
and if the shRNA-induced change
in AUC value was either an increase over all three control shRNAs or a decrease over
all three control shRNAs. To impose additional stringency, only instances where a
measured signal was significantly affected (as defined above) by both shRNAs targeting each gene are shown in Fig. 4-2D. Using the alternative Benjamini method [117]
164
to calculate levels of significance, we obtained apLKO = 0.00399,
aGFP477=
0.00315,
and aGFP49= 0.00377. shRNA-induced effects that pass the significance level using
this alternative method are shown in Fig. 4-4.
4.4.8
shRNA effects simulations
We performed simulations to determine whether the observed pattern of shRNAinduced effects was consistent with a model where shRNA effects are randomly distributed across RTKs or across phosphosites. First, the total number of significantly
decreased and increased signal effects across the 6 RTKs, 22 phosphosites, and 88
test shRNAs (11,616 cases in total) were tallied as 1,346 (11.6%) and 1,232 (10.6%),
respectively. This was computed using the 1% Storey false discovery rate method as
described above, with the exception that we did not require that the same significant
effect be observed for both test shRNAs targeting each gene of interest, as was conservatively required for Fig. 4-2D and Fig. 4-2E. Next, the same number of increases
and decreases in signal were randomly distributed in silico among the 11,616 total
RTK-phosphosite-shRNA combinations. We then tallied the total number of RTKs (0
to 6) exhibiting an increased or decreased signal effect for each phosphosite-shRNA
pair. This analysis considered how the shRNA effects were distributed across the
number of RTK cell lines. Additionally, we tallied the total number of phosphosites
(0 to 22) exhibiting significantly increased or decreased signal for each RTK-shRNA
pairs. This analysis considered how the shRNA effects were distributed across the
number of measured phosphosites. This simulation was repeated for 2,500 different
random assignments of the decreased and increased signal effects.
To corroborate the results of our simulation, we also derived analytical estimates
of the expected distribution of shRNA-induced effects across RTKs and across phosphosites when assuming a random hypergeometric distribution. For the distribution
of effects across RTKs we assumed drawings of six samples out of 11,616 at a time,
while for the distribution of effects across phosphosites we assumed drawings of 22
samples out of 11,616 at a time. Both our simulations and analytical results showed
that the observed distributions of shRNA effects across either RTKs or phosphosites
165
are not consistent with this model that assumes randomly distributed hairpin effects.
The significance of this comparison was measured using a chi-squared goodnessof-fit test. In the tests, we compared the number of increases or decreases in signal
observed across zero to six RTKs with that expected by chance. In both cases, the
distribution of observed effects was significantly different than the distribution of
random effects (p = 0, using 3 degrees of freedom given 7 bins and 3 parameters in
the hypergeometric distribution).
Similarly, we compared the number of observed
increases or decreases in signal observed across zero to 22 phosphosites with that expected by chance. In both cases, the distribution of observed effects was significantly
different than the distribution of random effects (p = 0, using 19 degrees of freedom
given 23 bins and 3 parameters in the hypergeometric distribution).
4.4.9
Identifying signaling time scales
To facilitate analysis of dynamic changes in signaling network structure, we wished
to aggregate the 11 time points in our data set into broader time scales representing
basal, early, intermediate, and late signaling events. First, the time zero data were
taken to represent the basal network state. To determine which of the remaining 10
time points in our data set correspond to early, intermediate, and late time scales, we
subjected our data to k-means clustering (k = 3) using the squared Euclidean distance
metric and 200 replicates of each cluster assignment (using the kmeans function in
MATLAB R2009a). For each time point, data were first compiled across all 6 RTKs,
22 phosphosites, 91 shRNAs (88 test shRNAs + 3 control shRNAs) and 4 biological
replicates into a vector of 6 x 22 x 91 x 4 = 48, 048 data points.
The input for
the clustering algorithm then consisted of a matrix of 10 time points x 48,048 data
points. This pan-RTK approach identified time scales that were indicative of signaling
dynamics across all RTKs.
166
4.4.10
Data discretization
The Bayesian network, mutual information, and CLR algorithms we employed in our
study require discrete data as their input. Because our experimental phosphorylation
data were continuous in nature, we discretized all time course data into four levels,
with 1 indicating the lowest phosphorylation values and 4 indicating the highest phosphorylation values. To further increase data robustness, the median data value was
calculated across the biological replicates at each time point (for each RTK, phosphosite, and shRNA condition), following the previously described data pre-processing
step. The median data were subsequently discretized. For each phosphosite, data
were discretized separately for each RTK and time scale. Within each data subset
(for a particular phosphosite, RTK, and time scale), the Z scores of the raw data were
calculated. Those data points with Z > 4 were set to discrete value 4. Those data
points with Z < 4 were set to discrete value 1. The remaining data points were discretized according to 4-level k-means clustering with the squared Euclidean distance
metric and 100 replicates of each cluster assignment (using the kmeans function in
MATLAB R2009a). The ordinality of the discrete data was always maintained, such
that 1 and 4 consistently represented the low and high raw signal values, respectively.
The discrete data for all RTKs, phosphosites, and shRNA conditions are available in
Table S5.
4.4.11
Network inference algorithms
The core Bayesian network inference algorithm was implemented as previously described [37], using a modified version of the Bayesian Network Structure Learning
toolbox in MATLAB R2009a [182] based on the algorithm of Koivisto and Sood [50].
Here the equivalent sample size (ESS) in the Dirichlet parameter prior was varied
for each time scale to help normalize for varying sample size across time scales. ESS
values of 20, 1, 1, and 3.4 x 104 were used for the basal, early, intermediate, and
late time scales, respectively. shRNA perturbations were modeled as perfect interventions. That is, when a measured phosphosite (e.g., c-Raf Ser289, Ser296, Ser301) was
167
present on the protein product of a transcript targeted by an individual shRNA (e.g.,
CRAF) or shRNA pool (e.g., RAF pool), then these phosphosite data were considered
to be under the influence of that shRNA intervention. In such cases the discrete data
were not modified from their previously determined values, but the network scoring
function was modified to take the intervention into account.
Prior knowledge was used to restrict viable Bayesian network structures. The
RTK phosphosite was not allowed to have any parent nodes (i.e., no incoming edges),
and the transcription factor sites (c-Jun, NF-rKB, STATI, STAT3) were not allowed
to have any child nodes (i.e., no outgoing edges), except if those child nodes were
other transcription factor sites. Nodes were restricted to a maximum of three parents. That is, when computing the posterior edge probabilities, consensus networks
containing all possible one-, two-, and three-parent node-node interactions were considered.
Higher-order parent-child relationships, beyond three-parent interactions,
were not considered.
The directionality of the edges shown in Fig. 4-10 was based on the consensus
directionality observed in the 24 Bayesian networks inferred across the six RTKs and
four time scales, along with the prior knowledge assumptions. Because the RTK phosphosite was assumed in the prior knowledge to be a root node, all nodes connected to
it were required to be child nodes. Similarly, because the phosphosites on transcription factors were required to have no children nodes, except for other transcription
factor phosphosites, all nodes connected to transcription factor sites were required
to be parents of the transcription factor nodes. Edges inferred between transcription
factor phosphosites were left undirected, under the assumption that an edge between
two transcription factors likely represented mutual coordination by an unmeasured
node(s), rather than the action of one phosphosite on another. The edges from the
calmodulin phosphosite were the most uncertain in the consensus directionality analysis. As such, the directions of these three edges (to PKC6, paxillin, and RSK3) are
least confident.
CLR was implemented in MATLAB R2009a using code provided by Faith et al.
[27], with Z scores (edge weights) calculated using the plos method. The mutual
168
information matrix was calculated using a simple histogram method within the CLR
code. Spearman and Pearson correlation networks were calculated using the median
data (median across the biological replicates) and the corr function in MATLAB
R2009a. For the CLR, mutual information, Spearman, and Pearson networks, all 22
measured phosphosites were used as input for the algorithms. Due to algorithmic
memory constraints, we were able to use only 20 out of the 22 measured phosphosites
(p-S6 (Ser240, Ser244) and p-CREB (Ser133) were left out) for Bayesian network inference. However, the same discretized data for the 20 nodes in the Bayesian network
inference were used for those 20 of 22 nodes present in the other algorithms.
It should be emphasized that the network inference results are consensus models
across all shRNA perturbations in a given time scale. Thus these networks provide a
representation of the dominant signal-signal relationships across all 91 shRNA conditions. As a result, perturbation effects observed in only one or two shRNA conditions
are likely washed out and do not appear in the consensus networks. This is likely
one explanation for why certain shRNA effects (such as GSK3 shRNA pool effects on
MEK and ERK signals) are not seen in the network models.
Because there are many signaling nodes that were not measured in our data set,
it is possible that two sites connected by an edge in the network inference models are
actually under the mutual regulation of one or more unmeasured signaling nodes. To
the extent that such hidden variables exert influence on our measured phosphoproteins, our inferred networks represent coarse approximations of the actual network
topologies.
4.4.12
Comparison of RTKs by inferred network structures
through dimensionality reduction
To compare inferred network structures across RTKs, multidimensional scaling (MDS)
was used as dimensionality reduction technique. To enable comparisons across different network inference methods, first each network structures adjacency matrix was
converted into a binary vector describing the presence or absence of each edge. For
169
each network inference method, pairwise distances between all 24 networks binary
vectors were calculated using the pdist function in MATLAB R2009a with the Jaccard distance metric. We focused on the Jaccard distance as our metric for network
structure comparison because it considers binary features, and it does not consider
cases where two observations (networks) both have a value of zero (are both missing a
particular edge). The Jaccard distance matrices were then used as input for classical
MDS using the cmdscale function in MATLAB R2009a.
All of the resultant MDS eigenvalue features were then clustered using k-means
clustering (k = 3) to identify groups of similar network structures.
For all four
network inference methods, clustering was performed using the squared Euclidean
distance metric and 200 replicates of each cluster assignment (using the kmeans function in MATLAB R2009a).
Notably, if the Euclidean distance metric were used
instead of the Jaccard distance metric, the multidimensional scaling procedure would
be identical to principal component analysis.
For the Bayesian networks, the MDS input networks had an edge weight threshold
of > 0.1 applied. For the CLR and mutual information networks, the MDS input networks had an edge weight threshold of Z > 1 and MI > 0.3 applied, respectively. For
the Spearman and Pearson correlation networks, the
6 0 th
percentile of the absolute
value of the correlation coefficients were calculated across the 24 correlation networks,
corresponding to
|correlation
coefficient| > 0.35 and
|correlation
coefficientl > 0.30,
respectively. For the Bayesian networks, the adjacency matrix vectors contained 400
directed edge features (self-edges were excluded). For the four undirected network inference methods, the adjacency matrix vectors contained 231 undirected edge features
(again excluding self-edges).
It should be noted that because the Bayesian networks are directed network structures, while the other four methods are undirected network structures, this provides
more edge features to capture in the Bayesian networks' MDS analysis. Additionally,
the Bayesian networks only contain 20 of the full 22 phosphosite nodes, while all 22
are included in the other three inference methods. These two aspects may explain
why two of the 24 networks in the Bayesian network MDS cluster analysis are not
170
assigned to the same clusters as the other four methods MDS clustering results.
4.4.13
Network model edge weight threshold robustness
To determine the robustness of the network model clusters (EGFR/FGFR1/c-Met,
IGF-1R/NTRK2, and PDGFR3) to the edge weight threshold applied to each network
inference methods result, the edge weight threshold was varied over a range of values
and then clustering was repeated at each value. The range was based on the
9 0 th
1 0 th
to
percentile of the edge weight values, at 10-percentile increments. For the case of
Spearman and Pearson correlation, the percentile was calculated using the absolute
values of the correlation coefficients. The other three inference methods (mutual
information, CLR, Bayesian) have strictly nonnegative edge weights, so no absolute
value was needed. For the Bayesian network edge weights, to increase the dynamic
range of the sensitivity analysis, edge weights < 0.02 and > 0.98 were removed before
calculating the 10-percentile increments. This is because, by the algorithms design,
most of the resultant edge weights are near zero and several are unity.
4.4.14
Generating receptor class-specific consensus networks
across inference methods
The frequency of each edge in the five inference methods and four time scales was
calculated for each RTK. The same edge thresholds used for the dimensionality reduction were applied. To directly compare the five inference methods, the Bayesian
networks were converted to an undirected form. Further, because the Bayesian networks included only 20 of 22 measured nodes, while the four other inference methods
contained all 22 nodes, edges were normalized to the total number of instances they
were considered across the five inference methods (i.e., 4 time scales x 5 inference
methods = 20, versus 4 timescales x 4 inference methods = 16 for the edges connecting nodes excluded from the Bayesian networks). This provided a scale between
0 and 1 for each edge, representing that edges frequency within a particular RTK
across four or five inference methods and four time scales (shown in Fig. 4-9).
171
To generate class-specific networks, it was required that an edge appeared with a
frequency > 0.5 for each RTK within an RTK set and < 0.25 for each RTK outside the
RTK set. RTK sets included (1) each individual RTK class, (2) two of the three RTK
classes, and (3) all three RTK classes. For example, the c-Cbl-CaM edge appeared
with a frequency of 0.85, 0.85, 0.8, 0, 0, and 0.1 for the EGFR, FGFR1, c-Met,
IGF-1R, NTRK2, and PDGFR3 receptors, respectively. As such, the c-Cbl-CaM
edge was considered to be specific to the EGFR/FGFR1/c-Met class, but absent
from the IGF-1R/NTRK2 and PDGFR3 classes. Pan-RTK backbone edges were
required to have a frequency > 0.5 across all six RTKs.
4.4.15
Clustering the raw data
When clustering the median signal values, first each signals median value was calculated across all biological replicates, shRNA perturbations, and time points. This
gave an indication of the typical level of phosphorylation for each phosphosite in
each RTK cell line. The resulting matrix of 22 phosphosites by 6 RTKs was then
mean-centered and unit variance scaled across each phosphosite. This matrix was
then clustered using the kmeans function in MATLAB R2009a with k = 3 and 100
replicates with random initial centroid assignments.
When clustering signals from all time points together, data matrices representing
the data for each RTK and all 22 phosphosites were first constructed (representing a
matrix with 22 rows and 11 x 91 x 4 = 4,004 columns). The data were then meancentered and unit variance scaled for each signal separately (i.e, across rows). This
process was repeated for all six RTKs. These normalized matrices were then converted
into vectors to form a new matrix of 6 rows (one per RTK) and 22 x 4004 = 88, 088
columns. This matrix was then clustered using the kmeans function in MATLAB
R2009a with k = 3 and 100 replicates with random initial centroid assignments.
A similar approach was taken to cluster signals from each time scale, and signals
from each time point.
In each case, signals from the relevant time point(s) were
first mean-centered and unit variance scaled for each RTK separately, and then the
resultant matrices were converted to vectors and compiled into a multi-RTK matrix.
172
This matrix was then clustered using the kmeans function in MATLAB R2009a with
k = 3 and 100 replicates with random initial centroid assignments.
The plots shown in Fig. 4-11 represent the first two components of the PCA
loadings from each clustered data subset. These plots are simply for visualization
purposes to show approximate relative similarity between the RTKs' data. The actual
clustering was done using the full data matrix, not just the first two components. The
marker shapes in Fig. 4-11 indicate which cluster each data point belongs to.
4.4.16
Generating synthetic data for network inference
Directed acyclic networks were randomly generated allowing only one parent node per
child node and containing only one root node (i.e., source signal). The signal levels for
the root node were 200 points randomly sampled from a uniform distribution between
values 1 and 6. The signal levels for all downstream nodes were specified based on
the signal level of its input parent node, namely youtput = yinput.
Data were simulated in a step-wise fashion, such that the only input to the simulation process was the signal levels of the root node. Then, at each step in the
simulation from parent to child node, heritable variation was added to each nodes
data. This variation was drawn from a random normal distribution with mean zero
and a 10% coefficient of variation. This variation-added signal was then used as input
for the nodes child node in the network. Heritable variation was also added to the
terminal nodes in the network, even though they have no child nodes. Once all nodes
were simulated, then non-heritable variation was added to the simulated data. This
variation was drawn from a uniform distribution over the range t1, and this variation
was added independently for each node.
As an example, in the simple case of a two-node network A -> B, the 200 signal
values for A are drawn from a uniform distribution, and then those values have random
normally distributed heritable variation added to them. The subsequent values, A',
are then used as input for node B. The values of node B are then based directly on its
input node, so the values for B are equal to A'. Then random normally distributed
heritable variation is added, creating B'. After the simulation, random uniformly
173
distributed noise is added independently to both A' and B', creating A" and B",
which are the final output from the simulation. To generate the results in Fig. 4-12,
four synthetic networks were generated, each containing 22 nodes. This is the same
number of nodes measured in our experimental signaling data. For each network, five
independent data sets were simulated. Because the input levels and heritable and
non-hertiable variation are all stochastic, this generated five different data sets per
network.
To analyze the data by PCA, for each of the twenty data sets the matrix of 22
nodes x 200 conditions was converted into a vector, providing a final input matrix
for PCA of 20 data sets x 4,400 data points. In the case of the normalized raw data,
data for each node was first mean-centered and unit variance scaled before putting
the data set into vector format. Spearman correlation was used to represent inferred
network topologies. For the binary case, a threshold of the
6 0
th
percentile correlation
value (> 0.8198) was used (the same correlation percentile used in Fig. 4-7). The
percentile was calculated based on the correlation values across all 20 data sets. PCA
was used for the raw data, normalized raw data, and continuous correlation values,
while multidimensional scaling was used for the binary correlation values.
4.4.17
Cancer Cell Line Encyclopedia mRNA expression principal component analysis
CCLE mRNA data were downloaded at the CCLE web site (http: //www. broadinstitute.
org/ccle) from the file CCLEExpressionEntrez_2012-04-06.gct. The data were
analyzed using the principal components analysis function princomp in MATLAB
R2009a. The input matrix for this function was 18,926 genes' robust multi-chip average (RMA) gene expression values in 967 cell lines. The matrix was entered such
that the genes were considered 'observations' and the cell lines considered 'variables'.
Before PCA was applied, the gene expression values were mean-centered and unit
variance scaled for each gene across all cell lines. The PCA results plotted in Fig. 414 represent the first two components of the resultant PCA coefficients, or loadings.
174
4.4.18
Tumor histology enrichment/depletion
Enrichment and depletion of EGFR, FGFR1, and MET mRNA co-expression in
particular tumor histologies was assessed using the cell line information provided in
Barretina et al. [158]. First, it was determined if each cell line did or did not co-express
EGFR, FGFR1, and MET given a particular RMA threshold defining expressed genes
(e.g., RMA > 5). Next, it was determined if cell lines originally derived from tumors
of particular histologies exhibited EGFR, FGFR1, and MET co-expression either
more or less often than expected by chance, given the total number of cell lines coexpressing these genes, the total number of cell lines of each histology, and the overlap
of the two sets. This was quantified using the hypergeometric test as implemented
by the hygepdf function in MATLAB R2009a.
The probability of observing as many or more cases of overlap (N) between
EGFR/FGFR1/MET co-expressing cells and cells of a given histology was obtained
by summing the probability density function from the number of cell lines with overlap N to the total number of cell lines (967). Conversely, the probability of observing
as many or fewer cases of overlap N was obtained by summing the probability density function from zero cases to N cases. Cell lines with histology "other" or with no
histology information provided were not considered for enrichment. For each RMA
threshold and each histology type, the lower p-value between enrichment versus depletion was selected. Given the 20 histologies and 11 tested RMA thresholds, this
provided a list of 20 x 11 = 220 p-values. Applying the Benjamini method with a 5%
FDR to this list of p-values corresponded to p < 0.0191.
4.4.19
Correlating gene expression and drug activity area
Pharmacological profiling data were downloaded at the CCLE web site (http: //www.
broadinst itut e . org/ccle) from the file CCLENP24.2009_profiling_2012.02.20.csv.
Spearman correlation was calculated between gene expression values and drug activity
area.
The six RTKs and six cognate ligands used in this study were considered
together for multiple hypothesis correction. That is, at each RMA threshold, the
175
Spearman correlation and associated p-values were calculated across the 12 genes
(six RTKs and six ligands) and 4 drugs, providing 48 p-values. These p-values were
then corrected for a 1% FDR using the Benjamini method. The 1% FDR p-values
for the > 0, 4, 4.5, 5, 5.5, and 6 RMA thresholds were p < 5.93 x 10-3, 3.72 x 10-3,
5.93 x 10-3, 5.41 x 10-3, 3.76 x 10-3 , and 2.80 x 10-3, respectively. Cell lines with
values of zero for the measured activity area were not included in the correlation
calculations, because there was no indication of how insensitive to a drug a cell line
with zero activity area may be.
4.4.20
Partial correlation between genes and drug response
The Spearman partial correlation values were calculated between each receptor gene
and each drug while controlling for the expression of the remaining five receptors, and
between each ligand gene and each drug while controlling for the expression of the
remaining five ligands, using the partialcorr function in MATLAB 2010b. Only
the individual gene that was being considered for the partial correlation calculation
had to exceed the applied RMA threshold. For example, when calculating the partial
correlation between EGFR and erlotinib and using a threshold of RMA >5, only cell
lines with EGFR >5 were considered, but the expression levels of the remaining five
RTKs in those cell lines could be below five. The 5% Benjamini false discovery rate
was applied for each RMA threshold separately based on the list of 24 p-values (6
genes x 4 drugs), which were calculated using the partialcorr function.
4.4.21
Comparison of RTKs by receptor-intrinsic properties
through dimensionality reduction
To compare different receptor-specific intrinsic properties, multidimensional scaling
and principal components analysis were used as dimensionality reduction techniques.
We extracted
Kd
values describing the affinities between 72 kinase inhibitor drugs
and our six RTKs from the data set by Davis et al. [165]. Of the 72 inhibitors, 61
bound to at least one RTK with
Kd
< 10 pM. The
176
Kd
values were converted to
1ogio(Ka)
values, and, to ensure that non-measurable docking interactions would not
numerically dominate the clustering results, those inhibitor-receptor interactions that
were not measured to bind were set to logio(Ka) = 3 (i.e., Kd
-
1 mM). This matrix
of 6 RTKs x 31 inhibitor compounds was then used as input for principal components
analysis.
The amino acids comprising the cytoplasmic domains of the six RTKs were accessed from http://www.uniprot.org and defined as follows: EGFR (aa669-1210),
FGFRI (aa398-822), IGF-IR (aa960-1367), c-Met (aa956-1390), NTRK2 (aa455-822),
and PDGFR3 (aa557-1106). The amino acids comprising the kinase domains of the
six RTKs were defined as follows: EGFR (aa712-979), FGFR1 (aa478-767), IGF-IR
(aa999-1274), c-Met (aa1078-1345), NTRK2 (aa538-807), and PDGFR3 (aa600-962).
For both the kinase and cytoplasmic domains, in each case the domains were aligned
across RTKs using the multialign function in MATLAB R2009a with the Gonnet
scoring matrix. Pairwise distances between all aligned sequences were then calculated
using the seqpdist function in MATLAB R2009a, also using the Gonnet scoring matrix. This distance matrix was then used as input for classical multidimensional
scaling using the cmdscale function in MATLAB R2009a.
For the kinase inhibitor data, kinase domain sequences, and cytoplasmic domain
sequences, all five eigenvalues resulting from MDS were used for subsequent k-means
clustering. For all receptor-intrinsic properties, k-means clustering was performed
with the city block distance metric and 200 replicates of each cluster assignment.
177
178
Chapter 5
Quality versus quantity:
Identifying features of biological
data for making better models
Note: This work will be submitted for publication. It is based on computational research
designed by J.P.W. and D.A.L. and performed by J.P.W. The authors thank David Heckerman (Microsoft Research), William Chen (Harvard University), and Brian Joughin (M.I.T.)
for helpful discussions.
5.1
Introduction
Within the realm of computational models applied to biological systems, there is a
general lack of understanding of what features of biological data make useful models.
While there are heuristics, like high signal-to-noise and the collection of multiple biological replicates, that generally guide experimentalists when planning studies and
collecting data, these notions have generally not been quantitatively explored in regards to subsequent model accuracy. To begin to address this need, here we generate
data from simple synthetic models and derive insights from their analysis. Using a
simple two-variable toy model, as well as a more realistic although still simplified multivariate network model, we highlight explicit features of data that improve model
179
accuracy. In the two-variable case, "accuracy" refers to the relative error between
the data points produced from a linear model, and the predictions of a linear model
inferred from noisy manifestations of the same data. In the multivariate case, "accuracy" refers to the similarity between the synthetic network topology and the inferred
Bayesian network topology.
Modeling the two-variable system with linear regression and the multivariate system with Bayesian networks, we show that prediction accuracy is a function of data
quantity and also features related to data quality. In the linear regression case,
increasing the range over which the data are sampled improves accuracy. In the
Bayesian network case, increasing the range over which the data are sampled can also
improve accuracy, but only if the data are discretized in a manner that corresponds
to biologically meaningful variation in the data. Further, the Bayesian network results highlight the necessity of the propagation of variation within the network, here
termed heritable variation, for causal inference. An algorithm is developed for the
automatic identification of a discretization scheme for each signaling node, using only
information about the variation across biological replicates and the technical precision
of the measurements, that improves the accuracy of causal Bayesian network inference. While existing literature has discussed nonuniform discretization approaches
(e.g., [183, 184]), the results here cast the problem in terms of parameters familiar to
experimental biologists. These results, for the first time to our knowledge, provide a
simple method for identifying a discretization strategy on a data set-specific basis.
5.2
5.2.1
Results
A simple two-variable toy model
We began our analysis by exploring the simplest possible model of input-output behavior, a two-variable linear toy model of the form y = mx. In other words, the
output, y, is a direct linear function of the input, x. First, x-data are generated
by randomly sampling N data points from a uniform distribution over the interval
180
Xmin
and Xmax, namely,
X = U(Xmin, iman).
A uniform distribution was chosen to
represent a biological signal distribution that may result from a set of experimental
measurements performed across multiple sufficiently biologically different conditions.
In contrast, a set of signaling measurements taken from one biological condition may
follow a normal distribution. In this manner, a biological condition would represent
one instantiation of the biological network state; for example, as measured at a single
time point under one growth factor concentration with no external perturbations.
Sufficiently biologically different states would result from diverse experimental conditions: stimulating cells with different concentrations of growth factor, stimulating
cells with different growth factors, stimulating cells with combinations of growth factors, perturbing the concentrations of proteins using RNAi, small molecule inhibitors,
or antibody inhibitors, measuring signals at different time points, or any combination
of the above. While we do not prove here that such a uniform distribution could be
attainable, it is an assumption we use for our modeling efforts based on the study of
experimental data gathered under many of the different conditions just outlined.
Given the x-data, the corresponding y-data are set by y = mx. Randomly distributed noise is then added to the x-data and y-data separately, such that
xnoisy
=.A(x, uz)
Ynoisy =
A(y,
uy)
where V(p, a) represents data from a normal distribution with mean y and standard
deviation o. Then, a new fitted slope parameter, mnoisy, is inferred from xnoi,
Ynoisy
and
using linear regression. This parameter is then used to predict the expected
output given the originally observed x-data, such that
Yp,,c
- mnnoisyx. The mean
absolute percent error (MPE) was then used to quantify the prediction accuracy:
N
MPE = 100 x N
(yi - Ypred,i)/Yi
i=1
Thus, as the inferred parameter mooi8 , approaches the original parameter m, the
181
MPE will approach zero. To simplify the model, the following parameters were used:
m = 1,
min
= 1, and or = ay
=
1. We then sought to explore the prediction
accuracy as a function of N, the number of data points used to train the model, and
zmax, the range over which the -data were sampled. The value of N was varied from
10 to 104, and the value of Xmax was varied from 1.1 to 50. Because the parameter
fit was dependent on randomly generated input data and noise, for each value of
N and
Xmax
the procedure was repeated 1,000 times. The mean values from these
1,000 simulations are plotted in Fig. 5-1. Note that a relative error measure (MPE)
was used instead of an absolute error measure (e.g., mean squared error) because
increasing the value of xmax itself increases the absolute error.
These results show that the prediction error is a function both of the quantity
of data, N, used to train the model, and the range over which the training data
were sampled, Xmax/Xmin. As one increases the quantity of data, the error decreases,
although only up to about 1,000 data points. And, as one increases the range over
which the z-data were sampled, the error also decreases, although only appreciably
up to Xmax/xmin ~' 10. Further, these results show that a high
Xmax/Xmin
ratio can
compensate for small quantities of data; and similarly, large quantities of data can
compensate for a low
Xmax/IXmin
1, 20 data points with
Xmax/Xmin
ratio. For example, as seen in the inset of Fig. 5~ 12 provide about the same average prediction
accuracy (MPE ~ 4%) as 10,000 data points with
Xmax/Xmin
~8. Thus, in this simple
two-variable toy model, prediction accuracy is a function of both data quantity and
Xmax/Xmin
5.2.2
ratio.
Analytical estimates for prediction accuracy as a function of data range in the two-variable toy model
The two-variable toy model just introduced has a number of parameters involved
in the data generation, including m,
Xmin, Xmax,
Yox,and o.
Because of this, the
numerical results presented in Fig. 5-1 are dependent on the parameters used, namely
m = 1,
min
= 1, and
ax = y = 1. It is possible, however, to derive dimensionless
182
6
L35
5
w
4
25
2
120
1501
-1
2
4
6
a
10
12
14
16
18
201
~5
2 4 6 8 10 12 14 16 18 20
30
40
50
Range of sampled data, xmaI/Xin
Figure 5-1: Prediction accuracy in a two-variable system as a function of data quantity
and the range over which the data were sampled, Xmax/Xmin. The mean absolute percent
error is plotted across different data set sizes (different color markers) as a function of the
range of the input data used to build the linear regression model. The error is quantified
between the underlying linear model and the data predicted from a linear model fit to noisy
manifestations of the underlying data. These results show that a small data set with wide
range can be as predictive as a large data set with narrow range. (inset) Zoomed in plot
from the dashed region of the larger plot.
183
y4
Xmax ~ x
miii
~~
ax
'Ymax)
4/
max ~y
mini
(xmin 9Ymin
/
x
Figure 5-2: Schematic for the two-variable toy model outlining the conditions required to
accurately infer the slope of a line. Red points indicate the original data, (Xmin, Ymin)
and (Xmax, Ymax), whereas black points indicate the positions of those same data while
accounting for the characteristic error values, ex and cy.
analytical estimates of data quality that are not dependent on individual numerical
simulations. If we have two points on a line, (Xmin, Ymin) and (Xmax, Ymax), whereby
Xmax > Xmin, Ymax > Ymin, and Xmin > 0, and we have some characteristic error
associated with the measurement of both variables, Ex
> 0 and ey > 0, then we
can ask under which conditions do the two points remain distinguishable given their
errors. In other words,
Xmax -
Ex > Xmin + Ex
Ymax
Ey
-
> Ymin + Ey
This is motivated by the observation that, if the two points are distinguishable
beyond their associated error values, then the slope of the line between those two
points can be inferred (Fig. 5-2).
An analogous result can be derived for the case
where the line has a negative slope (y =
-mx),
whereby Ymax < Ymin and it is
required that Ymax + Ey < ymin - Ey. Using the definition for the slope of the line,
m = (Ymax - Ymin)/(Xmax - Xmin), these equations can be rearranged to show that
184
the slope of the line can be well identified if
Xmax
> 1+
Xmin
max
o\Xmin
Ex,
/
m|
Here the error terms represent the degree of uncertainty about the position of each
data point. The terms within the max operator represents the error introduced to the
data points by the error in the x-variable versus the y-variable. The max operator
is used because the separation between the two data points must exceed the error in
both dimensions for the points to be distinguishable. The term ey/ ml represents the
range of error in the x-variable introduced by the error in the y-variable, given the
slope of the line, m. Thus we can see that in the case of zero error in both variables
(EX = EY = 0), the slope of the line between the two points can be determined simply
when Xmax/Xmin > 1.
For the data generated in Fig. 5-1, normally distributed noise with mean zero
and or = a = 1 was added to the data. For such a normal distribution, let us
approximate the characteristic error as ~ 2 standard deviations away from the mean
(corresponding to the ~ 5%tails of the distribution). Thus, in this case, ex ~ 2u7 = 2,
and Ey
-
2o, = 2. Using the slope m = 1, we estimate that the slope of the line
will be well specified when
Xmax/Xmin
> 5. These analytical estimates are in good
agreement with the numerical simulation results shown in Fig. 5-1, namely that when
Xmax/Xmin
is greater than
-
5, we expect good separation between the Xmax and Xmin
relative to the noise in the data and thus reasonable prediction accuracy.
We can now inspect the analytical expression for general trends about inference
quality as a function of the relevant parameters. If we reduce the slope to m = 0.5, but
keep the other parameters constant, we see that we need
Xmax/Xmin >
9. However, if
we increase the slope to m = 2 while keeping the other parameters constant, we see
that
Xmax/Xmin
is still > 5. This is because if cx = c, then the error introduced by
yerror
will only dominate the max operator when
ey =
ac,
when
Iml
Iml
< 1. More generally, if we say
then the error introduced by yerro, will only dominate the max operator
< a. This is because as the slope of the line increases and
185
Iml
> a, small
changes in x result in greater changes in y, which will tend to extend beyond the
error introduced by cy.
Thus we see that, for a simple two-variable linear problem, increasing the range
over which the input x-data are sampled increases the prediction accuracy. The
analytical expression shows how prediction accuracy is a function of xmin,
Im|,
EX,
and ey. The numerical results shown how this relationship is also a function of the
number of sampled data points (based on the normal error distribution applied to
the simulated data). This is a simple case of inference, in which we are inferring the
magnitude of the slope of a line. This shows that inference quality is a function of
(1) data quality, here defined as the range of the sampled data relative to the noise in
the sampled data and the functional relationship between the input and output (i.e.,
the line's slope), and (2) data quantity. Next we sought to show that these insights
could be extended to more relevant multivariate network models.
5.2.3
Simulating data from multivariate linear regression networks
To extend the results comparing data quality and data quantity into a more relevant
setting, we next generated simulated data from network models in which the relationships between nodes in the network were defined by linear regression functions.
Directed acyclic networks were randomly generated based on the maximum number of
allowable parents (inputs) per node, while containing only one root node (i.e., source
signal). The signal levels for the root node were generated from N linearly equally
spaced points over the range inputmin to inputmax. The signal levels for all nodes
downstream of the root node were specified based on the signal level of its input parent node(s) using multiple linear regression. For a node
P, the data for
j,
j
with a set of parent nodes
termed xj, is set according to the sum of its inputs,
x.
=
E
V iEP
186
mixi
where mij represents the linear regression coefficient specifying the relationship between node i and node
j.
This coefficient is analogous to the slope of the linear
regression line used in the two-variable toy model.
Data were simulated in a step-wise fashion, such that the only input to the simulation process was the signal levels of the root node. Then, at each step in the
simulation from parent to child nodes, heritable variation was added to each node's
data. This variation was drawn from a random normal distribution with a standard
deviation equal to ebiological multiplied by the absolute value of the node's pre-variation
levels. In other words, normally distributed values with a coefficient of variation of
(100 X ebiological)% were randomly added to each node's data at each step in the step-
wise simulation. This variation-added signal was then used as input for the node's
child node(s) in the network. Heritable variation is also added to the terminal nodes
in the network, even though they have no child nodes. Once all nodes were simulated, then non-heritable variation was added to the simulated data. This variation
was drawn from a uniform distribution over the range
Etechnical,
and this variation
was added independently for each node. Note that, as defined here, Ebiological is a
dimensionless quantity, but that
technical
has dimensions identical to the measured
values xi.
In biological terms, ebiological is intended to represent some degree of stochastic
fluctuation in the signals' levels across different biological conditions. At described in
Section 5.2.1, a biological condition would represent one instantiation of the biological
network state. The source of these fluctuations may be variation in mRNA synthesis
or degradation rates, protein synthesis or degradation rates, or other sources, but their
end effect would be fluctuations in signal levels across conditions that are consistent
with the quantitative signal-signal biochemical relationships underlying the biological
network measured in those conditions, i.e., heritable variation.
Understanding how the data are simulated is a vital step for subsequent understanding, so we will discuss an example here. Consider a simple network of three
nodes with the structure A-+B-C, in which the two interaction parameters are both
unity (mAB - 1, mBc = 1), Ebiological = 0.1, and Etechnical =1.
187
The simulation would
first generate data for A,
XA,
based on a specified range from inputmin to inputmax.
Heritable variation would then be added to XA based on a normal distribution with
10% coefficient of variation. Thus the heritable variation is proportional to the signal
level, and added to each data point individually,
XA,heritable -
NA(XA, 0.1
This variation-added version of node A's data,
X XAl)
XA,heritable,
would then be used as
input for calculating the levels of node B. Because mAB =1, this simply implies that
XB
=
XA,heritable.
Then heritable variation is added to
XB,heritable =
N(XB, 0.1
X
XB,
just as it was for
XA,
XB )
And again, this variation-added version of node B's data would then be used as input
for calculating the levels of node C. And again, because mBc = 1, this simply implies
that xC = XB,heritable. Heritable variation is then added to xc, even though it is the
terminal node,
XC,heritable =
N(xc, 0.1
x Ixc)
After the simulation is complete, non-heritable variation is added to each node's data
independently. Each simulated data point for each node has non-heritable noise added
to it independently. This variation is drawn randomly from a uniform distribution
over the range
(teechnical,
and thus this non-heritable variation is not proportional to
the signal level,
XA,final
XA,heritable
+
U(1,
+1)
XB,final
XB,heritable
+
U--1,
-1)
XC,final
XC, heritable
+
-1(-1, -1)
Using this procedure, data can be simulated for directed acyclic networks of arbitrary
188
size and complexity, given there is only one root node. These simulated data contain
node-specific heritable variation but also node-specific non-heritable variation. Given
a network structure, the parameters required for this simulation are the number of
simulated data points (N), the range of signal values used for the root node (inputmin
to inputma,), the interaction parameters (mij), the magnitude of heritable variation
(Ebiological),
and the magnitude of non-heritable variation
(Etechnical).
This procedure provided a method for simulating data from networks of known
structure. Using the simulated data, and the known network structure as a benchmark, we could then assess the quality of network inference results as a function of
the parameters used to generate the simulated data.
5.2.4
Inferring Bayesian networks using simulated network
data
Having a method for simulating multivariate network-level data, we next sought to
assess our ability to infer the underlying network structures from these data using
Bayesian network inference. Data were simulated from regression networks of varying
complexity by changing the maximum allowable number of parent nodes per child
node. The number of parents for each node was chosen randomly from a discrete
uniform distribution.
The three networks used in this study, each containing 15
nodes but allowing different maximum numbers of parents, are shown in Fig. 5-3.
To simplify the simulation of these networks, all interaction parameters mi were
set to unity. The heritable variation parameter
Ebiological
was set to 0.1, and the
non-heritable variation parameter Etechnical was set to unity. To explore the effects of
sampled data range on inference, three different ranges were used as inputs for the root
node in each network. These ranges were 1.9-2.1 ("low narrow" range), 8.9-9.1 ("high
narrow"7 range), and 1-10 ("broad" range). Given the lower and upper bounds of each
range, linearly equally spaced data were sampled to generate input data containing
50, 100, 200, or 1,000 data points. Thus, as more data points were sampled, the
density of sampling increased but the range of sampling did not. Heritable variation
189
1 parent max.
2 parents max.
3 parents max.
Figure 5-3: Three different directed acyclic graphs were used to simulate synthetic data
from which Bayesian network models were inferred. Each network contains 15 nodes, but
varies in the maximum allowable number of parent nodes per child node. Each network
contains only one root node. Data were simulated by modeling the edges in each network
using linear regression relationships.
proportional to the input signal level was then added, and these data were then used
as input to simulate the entire network, as described in Section 5.2.3. Because the
heritable and non-heritable variation were randomly generated, the simulation was
repeated independently three times for each condition. Thus, in total, 108 data sets
were simulated (3 network structures x 3 input ranges x 4 data set sizes x 3 replicates
= 108).
Before further considering the simulated network data, let us preface it with a
discussion of the role of 6 biological and etechnical in causal network inference. If ebiological =
0 and etechnical = 0, the data generated for any given node would be exactly identical
to the root node's input data except for slope terms mij that would in effect scale
each node's data, and in cases of multiple parent inputs the child's values would be
higher because of summing the parents' values. In this case, it would represent a
unsolvable problem because many causal models would equally explain the data, in
effect because there was no causal information in the network. If we have etechnical
but Ebiological
=
>
0,
0, the variation in the signals would be only technical in nature,
and any causal model inferred from these data would represent random signal-signal
relationships in the data. To derive causal influences that actually represent node-to-
190
node signal propagation, one needs ebiological > 0. Having
6
technical
> 0 is not required
for causal inference; indeed, the greater its magnitude the more it confounds causal
inference.
For each Bayesian network model, prior knowledge was applied such that the
root node from the simulated data was not allowed to have any parent nodes in
the Bayesian network. Further, the maximum number of parents allowed in each
Bayesian network model was restricted to the maximum number of parents found
in each synthetic network. None of the simulated data conditions were treated as a
perturbation or intervention by the model (i.e., none of the nodes were ever 'clamped'
when inferring the Bayesian network). And to clarify, when referring to the different
numbers of simulated data points (50, 100, 200, or 1,000), the term "data points"
refers to the number of instances (or conditions) in which the entire network state was
observed. In other words, 50 data points refers to 50 observations of the entire 15-node
network state, which therefore actually corresponds to 15 x 50 = 750 unique numerical
values. In an experimental biology context, these 50 data points would represent the
total number of conditions in which the network's signals were measured, and may
therefore constitute a range of multiplexed experimental scenarios (e.g., 5 time points
x 10 RNAi perturbations, 2 time points x 25 RNAi perturbations, 5 growth factor
concentrations x 2 time points x 5 RNAi perturbations, etc.).
The main decisions one makes in Bayesian network inference are (1) the application of prior knowledge (in the network structure and/or the parent-child interaction
parameters), (2) the explicit modeling of interventions, and, if one is using Bayesian
network algorithms dependent on discrete data and not continuous data, then (3) how
to discretize the continuous data values. By 'discretize' we mean to transform continuous data values into binned values. For example, discretizing continuous values into
three states may correspond to something like "low", "medium", and "high" signal
values. Historically, early applications of Bayesian networks in biology analyzed gene
expression data. As such, because the classification of genes that were "underexpressed", "normal", or "overexpressed" was such a common scheme in the analysis
of gene expression data, it was natural to consider three-level discretization because
191
of its similarity to this scheme [22]. Subsequent application of Bayesian networks to
phosphoprotein data also used three-level discretization [16]. Other notable applications of Bayesian networks in a biological context used values ranging from two to
four [185, 186], four [187], or twenty-five [188]. Applications of mutual information,
another method using discrete data that generally only quantifies pairwise relationships, have also reported use of multiple discretization levels, including six [189], ten
[156, 27], or values ranging from seven to seventeen [190]. (In ref. [190] the number
of discretization levels used was not stated explicitly, but instead was calculated ex
post facto based on their citation of ref. [191], in which it is recommended discretizing
data measured across n conditions into \,Fn bins.)
Given the lack of guidelines or consensus regarding an appropriate number of
discretization levels for applying Bayesian network inference to a given data set, we
chose to test multiple discretization levels, including 2, 4, 6, 8, and 10 levels. Quantile
discretization was used to separate the data into bins (meaning that an equal number
of data points were put into each bin, although the boundaries of those bins in
the continuous space may not be equally spaced).
This is in contrast to interval
discretization, in which the bins are equally spaced in the continuous data space,
but the number of data points per bin may not be equal. Quantile discretization
was chosen to ensure that changing the number of discretization levels would change
which data points were in each bin. In contrast, if interval discretization had been
used, changing the number of discretization levels may not have changed which data
points were in each bin (for example, if all the data points were near the minimum
and maximum observed values, then increasing the number of discretization levels
may just create more empty bins in between those values).
The area under the receiver operating characteristic (ROC) curve (AUROC) was
used to quantify the accuracy of the inferred Bayesian network models. The ROC
curve represents the trade-off between true positive rate (also called sensitivity or
recall) and false positive rate, as determined by comparing the edges in the inferred
Bayesian network to the edges in the synthetic network used to generate the data from
which the Bayesian network was inferred. Because the Bayesian network inference
192
algorithm employed here [50] uses exact Bayesian model averaging to derive a consensus model with probabilities for the likelihood of each edge feature given the data
[65],
one can vary the threshold at which an edge is considered 'significant' to derive
different network structures. This allows one to traverse the space of true positive rate
versus false positive rate. The edge weight threshold,
P, was
varied between the mini-
mum and maximum observed edge weight values. Thus, at each value of the observed
edge weights, the Bayesian network structure generated by only considering edges
with weight p ;> P was compared to the synthetic network structure. The networks
were scored based on the presence of directed edges, so that a high AUROC score
reflects an inferred network that not only properly detected a relationship between
two nodes (e.g., A-B), but also properly detected the directionality (i.e., causality)
of that relationship (i.e., A--B).
Earlier work by Margolin et al. [28] generated ROC curves for Bayesian network
models by varying the equivalent sample size (ESS) parameter in the parameter prior
(what Margolin et al. call the "Dirichlet psuedocount").
Although the details of
their implementation of the LibB software (Friedman and Elidan, http: //compbio.
cs . huj i . ac. il/LibB/programs .html) are not clear from the paper [28], it is likely
that they inferred a single high-scoring Bayesian network structure, not a consensus
model, and thus did not have edge weight scores to threshold for generating ROC
curve data. Given the subsequently studied sensitivity of the inference process on
the ESS value [192], and that modifying the ESS shifts the weighting between prior
and observed data, it is not clear that varying ESS is an appropriate method for
generating ROC curve data for Bayesian networks.
5.2.5
Bayesian network inference accuracy is a function of
data range and discretization level
The mean AUROC values across the three replicates are shown in Fig. 5-4. The
values plotted above the dashed line in each subplot will be discussed later. These
results reveal two striking trends: inference accuracy is a function of (1) the range of
193
the input data and (2) the number of bins used to discretize the data. Further, how
inference accuracy varies as a function of the discretization scheme depends on the
range of the input data. Additionally, although less surprisingly, increasing the size of
the data set increases inference accuracy (regardless of input data range or synthetic
network complexity, i.e., maximum parents allowed), and increasing the synthetic
network complexity generally decreases inference accuracy (for a given data set size).
While these latter trends are not unexpected, how they varied as a function of data
set size and network complexity was not a priori known.
To begin to understand why inference accuracy is a function of the input data
range and the number of discretization levels applied, we can consider the origins of
the synthetic data. In all cases, heritable variation was added to the data that was
proportional to the signal level, but non-heritable variation that was not proportional
to the signal level was also added. That non-heritable variation was drawn from a
uniform distribution between -1 and +1, resulting in an expected average magnitude
of 0.5. Thus, in the "low narrow" range case, which varied from 1.9-2.1, the nonheritable variation was about 0.5/2 or 25% of the average input signal level. In the
"high narrow" range case, which varied from 8.9-9.1, the non-heritable variation was
about 0.5/9 or about 5% of the average input signal level. And in the "broad" range
case, which varied from 1-10, the non-heritable variation was about 0.5/5.5 or about
9% of the average input signal level. Thus, in part, the inference accuracy varied
across the input ranges because of a signal-to-noise-type issue, where in this case
'noise' refers to non-heritable variation, because the "low narrow" range case had an
especially poor signal-to-noise ratio.
However, such notions do not explain why the inference accuracy varies as a function of discretization level, nor why that variation depends on the range of the input
data. To understand these behaviors, we must consider the heritable variation that
was added to the data. Heritable variation, in which fluctuations in the value of
a parent node are essentially passed on to its child node(s), is key to causal inference. To extract the causal dependencies between measured nodes, we must be able
to faithfully identify these fluctuations across conditions. In the simplest sense, if
194
1
0.9
0.9
0.9
0.8
0.8
0.8
0.8
0.7
0.6
0.7
0.6
0.7
0.6
0.7
0.6
0.9
1 parent
ma.
.
ma
0.5
0
-----
-- - - - -- - - - -
- - -- - - -- --
1,000 data points
200 data points
100 data points
50 data points
2
4
6
8
10
0.5
*2
4
6
8
10
1--------------------
1 --------------------
0.5'
2
4
6
8
0,5
*2
10
0.9
0.9
0.9
0.9
0.8
0.8
0.8
0.
0.7
0.7
.6 0.6
6
8
10
1
1 --------------------
3 parents
max.
2
4
6
8
10
1P
1 --------------------
0.9
0.7
0.7
0.7
4
6
8
10
8
10
2
10
-
-
~~T.
4
0.52
4
6
8
10
0.5 2
4
6
8
10
- - - - - - ----
1
0.9
_....J.....4..'
j,
"7I
m .
9
0.7
0.6
0.6
0.6
0.6
6
W
-~~~
1--------------------
0'9
0.
4
02
0.9
0.52
8
0.6
4
02
6
-
1-
1 --------------------
2 parents 0.8
max.
0.7
4
4
6
8
10
0.5
2
-
"Low narrow" range
-
"High narrow" range
"Broad" range
4
6
8
10
Number of bins used to discretize data
-
Figure 5-4: Bayesian network inference accuracy is a function of data range and discretization level. The mean AUROC values across the three replicates are shown. Vertical error
bars indicate the standard deviation of the AUROC values across the three replicates. Each
subplot shows the AUROC values across the five discretization schemes (x-axis; 2, 4, 6, 8,
and 10 quantile levels) for a given synthetic network and data set size. Within each subplot, values are plotted for the three simulated ranges ("low narrow", "high narrow", and
"broad"), each shown in a different color. The square markers plotted above the dashed line
represent the predicted number of discretization levels for each data range, as determined
by the algorithm presented in Section 5.2.6.
195
the non-heritable variation and heritable variation have a similar magnitude, then
the causal node-to-node fluctuations will be washed out by the non-causal (i.e., nonheritable) fluctuations. However, even when the heritable variation is of a greater
magnitude than the non-heritable variation, there is still another requirement for
accurate inference: the resolution, or granularity, of the discrete data must be sufficiently fine such that fluctuations due to heritable variation coincide with different
discrete states.
As an example, if we have signal A that takes values from 1 to 10 and has some
constant heritable variation of magnitude 5, then discretizing that signal into two
bins may be sufficient to capture heritable variation in the discrete data. However, if
we consider another signal B with range 1 to 100 with the same heritable variation
of magnitude 5, discretizing that signal into just two bins (e.g., values 1-50 as "low"
and values 51-100 as "high") will generally not capture in the discrete data most
of the heritable variation fluctuations. Only data fluctuations near value 50 will be
translated into changes in discrete states. In other words, changes that occur in
values approximately 1-40 and 60-100, for example, will all be considered identical
according to the discrete data. Thus, discretizing the data from signal B into only
two bins would underutilize the causal information in the data. Instead, one will
likely need to discretize signal B into more bins than signal A in order to extract the
fluctuations in the data that correspond to heritable variation.
At the same time, one cannot simply arbitrarily increase the number of bins used
to discretize data for two reasons. First, at a sufficiently fine level of discretization, the
differences in raw data values that are being placed into different discrete bins will no
longer correspond to heritable variation, and will instead correspond to non-heritable
variation. In other words, discretizing too finely can begin to ascribe heritable value
(i.e., placing data points into different discrete bins) to data that does not represent
heritable variation, and thus is akin to overfitting the original data. Second, increasing the number of discretization levels also increases the number of parameters in the
conditional probability tables of the Bayesian network (when using discrete Bayesian
networks based on multinomial local conditional probability distributions [66]). The
196
Table 5.1: This table describes the number of parameters in the local conditional probability
table of a Bayesian network for the case in which a node has 1, 2, or 3 parent nodes, and
assuming the child node and its parent(s) have the same number of discrete states.
Number of
discretization levels
2
4
6
8
10
Number of parents
3
1
2
8
4
2
192
48
12
1,080
180
30
3,584
448
56
9,000
90
900
number of parameters required to characterize the local conditional probability distribution of a node with p parents, assuming both the parent and child nodes are
discretized to the same number of levels C, is CP+
1
- CP [193]. The number of pa-
rameters for 2, 4, 6, 8, and 10 discrete levels given 1, 2, or 3 parents is shown in
Table 5.1.
This table therefore provides guidelines for approximately how many data points
one should have to parameterize a Bayesian network with a given degree of complexity
and given number of discrete states in the data: in general, one should have more
data points than parameters.
As such, some of the decreased inference accuracy
shown in Fig. 5-4 may result from having too few data points to parameterize the
Bayesian network models. However, in certain cases the inference accuracy is still high
even though the number of data points is less than the number of parameters in the
table. This may be because the linear relationships between nodes (which underlie the
synthetic data) are still captured sufficiently well by an under-parameterized model,
and/or the fact that the synthetic networks with a maximum of 2 or 3 parents also
contain 1- and 2-parent relationships. This latter effect may further reduce the data
requirements necessary to parameterize the full joint distribution across all signals (in
contrast to just the local conditional distributions as given in Table 5.1). In summary,
using too many discretization levels may decrease inference accuracy because of fitting
to non-heritable variation in the data, and/or having insufficient data for the more
complex parameterizations induced by using many discretization levels.
With these conceptual insights in mind, we now have a better understanding
197
of why the inference accuracy varies as a function of discretization level, and why
that variation depends on the range of the input data. Inference accuracy using
the "low narrow" range data, likely because it has the poorest signal-to-noise ratio,
is generally not strongly affected by the number of discretization levels. The "high
narrow" range data generally achieves the highest accuracy using 2 or 4 discretization
levels, with sometimes drastic decreases in accuracy as one increases up to 10 discrete
levels. The "broad" range data generally achieves the highest accuracy using about
6 discretization levels, and sometimes even exhibits a biphasic relationship between
discretization level and accuracy.
These changes highlight the fact that more discrete levels are generally required
to exploit the heritable variation present in the "broad" range data compared to the
"high narrow" range data. If one discretizes the "broad" range data into too few
states (e.g., 2), the inference accuracy is often less than the "high narrow" range
data discretized to 2 states. In contrast, if one discretizes the "broad" range data
to 6 states, the inference accuracy is often better than the "high narrow" range data
discretized to 6 states. Overall, these simulation results provide a rationale for the
importance of choosing a discretization strategy appropriate for each data set.
5.2.6
An a priori discretization strategy based on experimental measurement parameters
While the previous section provides rationale for the importance of discretization,
what is needed is an a priori method for estimating the most useful number of discretization levels for a data set given its experimental parameters, in particular the
magnitudes of the heritable and non-heritable variation. To pursue such a method, we
developed the notion of heritable variation windows; namely, how many significantly
biologically different sub-ranges are present in between the minimum and maximum
observed values for a given signal. By "biologically different" we mean signal differences on the same order of magnitude as the heritable variation. Similar to notions
just discussed, the general concept is that a signal with range 1-100 and heritable
198
variation 5 has more "heritable variation windows" within that large range than a
signal with the same heritable variation but a range 1-10.
To quantify the appropriate width of a heritable variation window, we utilize the
knowledge of how the synthetic data were generated. The heritable variation used to
generate the synthetic data was drawn from a normal distribution with a mean value
equal to the original signal and a given coefficient of variation ebiological, such that for
a given signal A,
XA,heritable
AfJ(XA,
6
biological X IXA)
The non-heritable variation was drawn from a uniform distribution with range -IEtechnical
Using this information, we can proceed in a manner similar to that described in Section 5.2.2.
The basic notion is that the heritable variation window must be wide
enough to account for both the heritable variation and the non-heritable variation,
but no wider. Let us begin with the assumption that the minimum observed value,
Xmin
> 0, for a given signal represents the lower boundary of a heritable variation
window, such that:
Xrnjn :- Bl -
(Z x<
Ebioloical)B1
-
Etechnical
where B1 represents the center of the first heritable variation window, and Z represents a scaling factor contributing to the width of the heritable variation window.
In other words, by starting at the center of the first heritable variation window,
subtracting some degree of heritable variation contributed by (Z X Ebiologcal)B1, and
subtracting some degree of non-heritable variation contributed by etechnical one arrives
at the minimum observed value,
Xmin.
This approach can be repeated to identify the center of the second heritable variation window, B 2 :
B2 = B1 + (Z x Ebiological)B1 + (Z
199
x
Ebioloical)B2
+ 2
technical
[E
41
i
I
B2
B,
X
1
I.
11I
6
biological
Bj e
I11
B3
.
r
""a
Figure 5-5: Schematic for a priori discretization algorithm. Red arrows indicate the portion
of each window attributable to heritable variation, ebiologicalBi, whereas blue arrows indicate
the portion of each window attributable to non-heritable variation, etechnical.
In other words, B 2 must be far enough from B 1 to account for the heritable variation
associated with B 1 , (Z x
(Z
X Ebiologica)B1,
Ebiological)B1,
the heritable variation associated with B 2 ,
and the non-heritable variation associated with B 1 and B 2 , namely
2 Etechnical
In this manner, the centers of all heritable variation windows can be determined,
up until the point at which a window center exceeds the maximum observed value,
Xmax
(Fig. 5-5).
The centers of the windows can be solved for recursively, first by
solving for B 1 as a function of
Xmin,
B1
Xmin
1
-
Z
+
X
Etechnical
6
biological
and then for any subsequent window Bn > B 1 ,
Bn
while Bn
<
Xmax.
windows between
-
(1 + Z
6
X ebiological)Bnl + 2 technical
1 - Z X 6 biological
We can then use the number of identified heritable variation
Xmin
and
Xmax
as a metric for the number of significantly biologically
different sub-ranges into which we can discretize our data. Therefore, this recursive
formula provides an a priori method to choose a discretization scheme using only
Xmin, Xmax, Ebiological,
Etechnical, and Z as inputs.
Using this formula, we then re-analyzed the synthetic data underlying the results
shown in Fig. 5-4 to estimate the number of discretization states for each of the 108
data sets. Because a 10% coefficient of variation was used to generate the heritable
200
variation, we set ebiological to 0.1. And because the non-heritable variation was drawn
from a uniform distribution U(-1, +1), in which the magnitude of the expected mean
value is 0.5, we set Etechnical to 0.5. Because the etechnical was drawn from a uniform
distribution, using 0.5 as its characteristic value only captures 50% of that distribution
(while a characteristic value of 1 would capture 100% of that distribution). Therefore,
one may want a more conservative estimate for the characteristic value of
Etechnical.
The Z term represents the scaling factor for the heritable variation. In this case,
because the heritable variation was drawn from a normal distribution with coefficient
of variation Ebiological, if etechnical were zero then Z represents the number of standard
deviations from the window center to the window edge. However, because the width
of each window relies on both heritable and non-heritable variation, the distance from
a given window center, Bn, to its edge is given by (Z
we used Z
5.2.7
=
1.64, corresponding to the
9 0 th
X Ebiological)Bn
+ etechnical. Here
percentile tails of a normal distribution.
Predicted discretization corresponds strongly with bestperforming discretization
The results for the number of discretization states derived using the recursive formula
outlined in the previous section are plotted in Fig. 5-4 above the dotted line in each
subplot. The horizontal error bars represent the standard deviations associated with
these predictions.
Discretization predictions were made for each of the 15 nodes
separately and for each replicate data set. The mean predicted discretization level
across the three replicates was calculated for each node.
The standard deviation
among these resultant 15 mean values is what is shown in the horizontal error bars.
The color of each prediction matches the color of the data range ("low narrow",
"high narrow", "broad") it is associated with. Because the predicted number of
discretization states was simply a function of the raw data, it is not restricted only to
the levels tested using the synthetic data (2, 4, 6, 8, and 10 levels), and may therefore
be any nonzero value.
With few exceptions, the range of the predicted number of discretization levels
201
aligns very well with the number of discretization states that provided the most
accurate Bayesian network inference result in the synthetic data. This demonstrates
that the a priori discretization algorithm, using only features of the simulated data
itself and only one user-defined parameter Z as inputs, is a useful tool for predicting
how many bins one should use to discretize these data. Inspecting the results, the
error bars associated with the prediction increase as the complexity of the network
increases. This is an artifact of the method used to simulate the data. Because the
data for nodes with multiple parent inputs were determined by simply summing the
values of each of the parents, this meant that nodes with multiple parents took on
a higher range of values, and thus were discretized by the algorithm into more bins.
Further, even if a given node only had one parent, but was downstream of a node
that at one point had multiple parents, then its values too would take on a higher
range. However, because not all nodes in the maximum 2- and 3-parent synthetic
networks were downstream of a multi-parent node, not all the nodes experienced
this increased range effect.
Thus, some nodes in the multi-parent networks were
predicted to be discretized to more states than other nodes, causing the increase in
error bars.
To consider cases without this summation effect, one can inspect the
algorithm predictions for the maximum 1-parent synthetic network. Here, because
the discretization algorithm is deterministic for a given data set, the variation shown
in the predictions' horizontal error bars reflects variation from the simulated data
sets' replicates.
To translate these results into more practical experimental biology terms,
6
biological
would correspond to the coefficient of variation measured across biological replicates,
and Etechnical would correspond to the precision of the measurement for a given experimental procedure. A key assumption employed in all analyses here was that the heritable variation was proportional to the measured signal, but the non-heritable variation was not. Thus, the assumption implies that ebiological reflects signal-proportional
experimental error, but Etechnical reflects signal-independent experimental error. Additionally, although the results using Z = 1.64 were satisfactory, the algorithm could
be further tuned by changing the value of Z: increasing its magnitude will result in
202
wider heritable variation windows and therefore fewer predicted discretization bins,
whereas decreasing its magnitude will result in narrower heritable variation windows,
and therefore more predicted discretization bins. Lastly, because the discretization
algorithm is performed on a node-specific basis, in practice each node could be discretized into its own specific number of bins. This was not explored explicitly here,
but it is likely that discretizing each node to its value determined by the algorithm
would improve inference accuracy, compared to using the same number of bins for all
nodes as was done here.
5.3
Discussion
Here we have quantitatively explored, using synthetic data from both a two-variable
toy model and more realistic 15-node networks, how prediction accuracy varies as
a function of data quantity and features related to data quality. We can now see
that the notions explored in the two-variable toy model are elemental corollaries of
the lessons learned from the multivariate network models.
Xmax/xmin
The data range term
explored in Fig. 5-1 is analogous to the heritable variation term in the
network data. That is, one must have a sufficiently high value of
Xmax/Xmin,
which
will drive the output y based on the "heritable" system behavior y = mx, to overcome
the non-heritable variation present in xnoisy and ynoi y. The inference of the slope
parameter m is aided by a larger
Xmax/Xmin;
but even if one has a small Xmax/Xmin,
the accuracy of the inferred parameter will increase as one increases the data set size,
because by using more samples the prediction will converge to the expected slope
according to the law of large numbers.
However, there is a key difference between the two-variable toy model and the
multivariate network model: causality. In the two-variable two model, the modeling task was simply to infer the slope of the relationship between the two signals:
whether x was upstream of y or y upstream of x was not considered. In contrast,
in the multivariate network models, the entire task was centered on identifying the
causal "upstream-downstream" relationships between signals. In fact, the heritable
203
variation introduced in the network models' synthetic data is actually antagonistic to
the process of trying to quantify the slope between signals. If there were no heritable
variation induced in the data
(Ebiological
= 0), the slope parameter could actually be
inferred more easily, if that were a goal, because the remaining variation from
6
technical
was typically small compared to the heritable variation. However, without the heritable variation there would be no node-specific variation introduced into the data that
was passed from parent to child node, and thus there could be no causality inferred
from the data. The necessity of the propagation of variation from signal to signal
through the network for enabling causal inference was recently discussed, albeit in
the context of quantitative trait loci in metabolic pathways, by Blair et al. [194].
The propagation of heritable variation is a requirement for causal inference regardless of whether the model relies on continuous data (as in ref.
[194])
or on discrete
data, as discussed here. The additional requirement in the discrete case, quantified
by our work here, is that the differences in the raw data underlying different discrete
states must be on the same order of magnitude as the heritable variation propagated
from signal to signal. The quality of inference will suffer if continuous data are split
into too many bins, and differences between bins largely reflect non-heritable variation; but it will also suffer if too few bins are used, and the data within any given bin
actually corresponds to differences in the continuous data that are biologically (i.e.,
causally) meaningful.
It should be noted that an agglomerative discretization method, in which an effort
is made to preserve the total mutual information between pairs of measured signals,
is outlined in ref. [193]. While this approach does explicitly calculate a metric for
the degree of information loss as a function of the number of discretization levels
used, and generally advises one to choose a discretization scheme that does not result
in substantial loss of information, it does not provide an explicit estimate of a discretization level. Further, it does not explicitly consider experimental parameters in
the algorithm. That is, it does not consider the degree of heritable and non-heritable
variation in the data. We sought to frame the discretization problem in terms of
experimentally measurable parameters, and not just the numerical features (e.g., mu204
tual information content) of the data set. Nonetheless, it remains unclear the extent
to which that method would compare to the method outlined here.
One aspect missing from this work is determining, given a predicted discretization
scheme, which edge weight threshold to apply to the inferred Bayesian network model.
That is, while the algorithm corresponds well with discretization schemes providing
high AUROC scores in the synthetic data, because the ROC curve calculation considers all edge weights, the algorithm does not actually suggest which edge weight is
best for each discretization scheme. Future work could consider, for each ROC curve,
which edge weight threshold provides a balance between true positive rate and false
positive rate, and this could be an additional output for the algorithm. Short of a
predicted edge weight threshold, one could always use p > 0.5, which will ensure that
no cycles are present in the thresholded consensus Bayesian network.
Another notion not explored here is the impact of explicitly modeling perturbations in the synthetic data. For example, one could simulate a portion of the synthetic
data to mimic an RNAi knockdown and thus the signal level would be greatly reduced
(or perhaps increased) in response. Explicitly modeling such perturbations would
likely reduce the amount of data required to properly infer causal relationships in the
data. However, the fact that no perturbations were simulated in the synthetic data,
and thus no perturbations were explicitly modeled in the Bayesian network algorithm,
is actually encouraging: it indicates that correct causal relationships can nonetheless
be inferred, even when explicit perturbations are not present, if one has sufficient
range and heritable variation in the data. In this manner, the range of the signal
data is actually likely a mimetic of the effects of perturbations, which themselves
typically "push" signals to extremes of their natural physiological range.
Notions related to the data quality features explored quantitatively in this work
have been discussed previously in the literature, but to our knowledge only in qualitative terms. The novelty of our work here is that it explores these notions in quantitative terms, using both analytical expressions and simulated network-level data. For
example, from Basso et al. [189]:
"...[G]enetic interactions are best inferred when the genes explore a sub205
stantial dynamical range. Traditionally, this has been achieved by systematic perturbations in simple organisms (e.g., by large-scale gene knockouts
or exogenous constraints), which are not easily obtained in more complex
cellular systems. We show here that an equivalent dynamic richness can
be efficiently achieved by assembling a considerable number of naturally
occurring and experimentally generated phenotypic variations of a given
cell type [emphases added]."
And from Hartemink [195] commenting on Basso et al. [189]:
"Basso et al. demonstrate that as long as the available data explore a
wide range in the 'expression space' of the system, biologically meaningful
interactions can be recovered by computational algorithms."
The conclusions discussed here actually relate to a more generalizable concept for
discrete models, namely, nonuniform discretization (e.g., [183, 184]). The notion is
to finely discretize regions of the functional space that one knows with high confidence, and to coarsely discretize regions that one knows with low confidence. While
Kozlov and Koller [183] discuss the application of their method to hybrid Bayesian
networks, Reshef et al. [184] only consider pairwise relationships between signals and
do not attempt to infer causality. Importantly, neither method incorporates tangible
experimental parameters, such as measures of heritable and non-heritable variation
as described here, into its algorithm. As such, we believe our results provide novel
insights not only into the features of data useful for building causal models, but also
how those features can be expressed in terms familiar to experimentalists.
206
Chapter 6
Conclusion
This thesis has focused on improving our understanding of receptor tyrosine kinase
signaling in cancer using multivariate computational methods paired with experimental cell signaling and phenotype measurements. By collaborating with numerous
experimental colleagues who used a variety of technologies to measure signaling in a
range of biological settings, much has been learned about biological modeling that
may not have been learned had this thesis focused exclusively on one type of experimental data, or one particular biological topic. In addition, given the variety of
experimental data, numerous modeling methods were explored and applied during the
thesis. By becoming familiar with multiple modeling methods, generalizable modeling lessons could be learned and insights gained that may not have been realized had
this thesis focused heavily on only one modeling method. This broad spectrum of
experience has been a great asset for this thesis.
The results from Chapter 2 highlighted the possible role of signaling relationships
existing between receptor tyrosine kinases, even though only one receptor's ligand was
used to stimulate the cells, in a manner that was not appreciated previously. This has
led to continued experimental study on receptor-to-receptor signaling mechanisms. In
Chapter 3, fundamental differences in the receptor tyrosine kinase signaling networks
and migration modes between epithelial versus mesenchymal cells were highlighted.
In Chapter 4, similarities in signaling across six receptor tyrosine kinases were identified and subsequently linked to possible roles in cancer drug resistance. In Chapter
207
5, analytical, numerical, and conceptual arguments were proposed for identifying features of experimental data that produce more accurate models.
6.1
Emergent biological and computational insights
There are emergent connections between the chapters' biological conclusions. Notions
of receptor tyrosine kinase crosstalk highlight in Chapter 2 may also relate to the receptor network classes identified in Chapter 4. For example, because of the shared
underlying signaling networks across same-class receptors, they may also share intracellular activation mechanisms (e.g., by receptor-proximal docking proteins). As
such, if one receptor is activated, it may activate intracellular proteins that could potentially interact with same-class receptors even without ligand-dependent activation
of the other receptor. Regarding the epithelial-to-mesenchymal transition (EMT)
studied in Chapter 3, the same-class receptors EGFR, FGFR1, and c-Met studied
in Chapter 4 have demonstrated roles in EMT [196], including suggested switching
from EGFR signaling in an epithelial state to FGFR1 signaling in a mesenchymal
state [84]. This potential ability of FGFR1 to compensate for EGFR in an EMTdependent manner, combined with the knowledge from Chapter 4 that EGFR and
FGFRI belong to the same network class, may also help explain how cells that have
undergone EMT are less sensitive to EGFR inhibitors [84]. In this manner, FGFR1
may function as a sort of "mesenchymal version" of EGFR.
This thesis has also provided novel insights from a computational perspective.
While it has thoroughly explored arguably complex methods, like Bayesian networks,
mutual information, and partial least squares regression, It has also demonstrated the
power of simple methods. In Chapter 3, it was shown that linear regression models
using only one or two phosphorylation sites as predictors of cell speed could be more
accurate than a partial least squares regression model using 11 phosphorylation sites.
Chapter 3 also discussed network models derived using Pearson correlation, arguably
the simplest measure of similarity between two signals. In Chapter 4 it was demonstrated that network topologies derived using Pearson and Spearman correlation were
208
as accurate, when used as multivariate classifiers of receptor signaling networks, as
Bayesian networks and methods based on mutual information. And lastly, in Chapter
5, conceptual insights into data quality features were first obtained using a system of
just two variables. Thus, importantly, these results show that biological models do
not have to be complicated to be useful. Indeed, unjustified complexity can obscure
biological insight.
6.2
Guidelines for analysis of large data sets
In the course of working on multiple projects all involving the analysis of relatively
large, relatively involved data sets, some lessons have emerged that may serve as useful guidelines for future study of large data sets. First, plot the data. Although what
one plots may vary based on what type of model one is constructing, visualizing the
data in some manner will almost always be helpful to understanding it and therefore
to modeling it. If one seeks to build a network model quantifying relationships between measured signals, one should absolutely always plot all pairwise signal-signal
relationships. For example, given three measured signals A, B, and C, one should
generate plots of the data from A vs. B, A vs. C, and B vs. C.
When seeking to use measured signals to predict a given output of interest (e.g.,
some phenotypic quantity), always plot each signal individually versus the output.
Additionally, one should start modeling efforts in this case by simply calculating
the Spearman correlation between each signal and the output. One could also use
the Pearson correlation; but the Spearman correlation can capture nonlinear (but
monotonic) relationships and, because it is rank-based, is also typically more robust
to outlier data points than the Pearson correlation.
For any given modeling task, the null hypothesis (i.e., the starting point) should
always be one of simplicity rather than one of complexity. If one builds a simple
model and it is not sufficiently predictive, only at that point should one consider a
more complex modeling approach. Thus, added complexity should be justified. If
one only considers complex modeling approaches to analyze biological data, then the
209
notion that biology is complex becomes a self-fulfilling prophecy.
Along these lines, one common concept cited in biological models is nonlinearity.
What is not always realized is that the relationship between two nonlinear signals can
itself be linear, and thus sometimes linear methods are sufficient. Modeling methods
that promote their utility for capturing nonlinear relationships-for example, including literature using mutual information-based methods [28, 156], Bayesian networks
[185, 195], and fuzzy logic [21]-generally have not provided evidence that the underlying data exhibit nonlinear relationships, and have generally not compared the
predictive capacity of their nonlinear methods to the predictive capacity of simpler
linear methods. Notably, Faith et al. [27] compared CLR and mutual information to
Pearson correlation, finding that Pearson correlation could outperform mutual information and perform comparably to CLR; but unfortunately they never considered
Spearman (nonlinear, monotonic) correlation. Linear methods should be used first;
and if proven insufficient, then nonlinear methods should be used. This is especially
true for nonlinear methods relying on discrete data given the results in Chapter 5,
in which the sensitivity of Bayesian network inference to discretization level was described.
6.3
Limitations of methods
While the methods discussed in this thesis have provided substantial insights into
receptor tyrosine kinase signaling, they also have limitations. One strong limitation
is the availability of data. Inferring network models from data, in the most basic
sense, requires a sufficient number of data points to faithfully describe the functional
relationships between measured signals. To develop models that are arguably causal,
one typically needs even more data points. In the treatment of cancer, it is becoming
increasingly clear that variability in tumor composition between patients, and even
within the same patient, is an important factor in determining which patients will
respond to drugs. This suggests efforts to build patient-specific models for understanding treatment strategies. However, patient-specific protein signaling data is not
210
available in great quantities. And it is not clear that the amounts available are sufficient for constructing patient-specific signaling network models at this time. Thus,
while signaling network models will certainly remain relevant for in vitro cell linebased studies, and likely even in vivo animal models [197], it is not clear they will be
readily applicable to patient-specific data.
Another limitation with Bayesian network inference in particular is the difficulty
inferring large networks. Other methods that do not argue for causal interpretations
or calculate conditional independence relationships are not as limited by large networks. The core Bayesian network inference algorithm used throughout this thesis
[501,
because it performs exact Bayesian model averaging by scoring all possible net-
work structures, is limited to inferring networks containing about 20 nodes. While
the computational complexity of the problem can be reduced by limiting the number
of parents per child node, and limiting the number of bins used to discretize the data,
these may not be sufficient, or desirable, in all situations. To consider networks with
more than 20 nodes, but no longer exhaustively score all networks, one could use
search methods like Markov chain Monte Carlo (MCMC) [65]. Thus, one could analyze a much larger network with MCMC-type approaches, but as with optimization
problems more generally, increasing the dimensionality of the optimization problem
(i.e., the number of nodes in the network in this case) may make it more difficult to
identify high-scoring regions of the search space.
Another limitation of signaling network inference is the presence of so-called hidden variables [198], namely variables that are present in the system but not measured
in the experiment.
Two measured signals that appear correlated in a given data
set may exhibit a functional biological relationship, or they may appear correlated
because of mutual regulation by a third but unmeasured signal. In protein signaling data sets, in which about 15 to 100 signals are typically measured, while this is a
prodigious improvement over previous experimental methodologies, there still are still
vast multitudes of signals not being measured. These hidden variables confound network inference results even when causal methods are not used. When causal methods
are used, any interpretation of causality must be tempered by the possibility that any
211
given network relationship is due to the unaccounted for influence of an unmeasured
signal.
Lastly, using a signaling network model to identify drug targets is still a challenge.
While Chapter 4 described results using the entire inferred network topology as a
multivariate classifier of receptor function, how to use the inferred relationships on
an edge-specific basis is less clear. Further, if one has phenotypic data, it is not
clear that having an inferred signaling network "upstream" of the signal-phenotype
predictions is going to be useful. It may be that simply trying to predict phenotype
directly will yield the greatest insights into what may be a useful drug target, namely
the signals most predictive of phenotype.
6.4
Future work
Given these limitations, there are nonetheless prospects for future research. It may
be that trying to argue causality by using Bayesian networks-given the data requirements to describe causal and higher order parent-child relationships, the limits
on network size, and the presence of hidden variables-is not always necessary to
gain insight. As such, one could consider using simpler methods, like correlation
and pairwise mutual information approaches, to gain insights into gross differences in
network topology across different conditions of interest. In this manner, one would
derive consensus networks across these simpler methods, and then compare the consensus networks between conditions of interest to identify similarities and differences
in the network structures. The most striking differences could be researched against
known signaling mechanisms and literature, and then followed up experimentally to
determine if the observed network differences, while not arguably causal from an
algorithmic perspective, nonetheless reflect real biological differences between those
conditions.
However, if one wants to try and argue causal interpretations of the network
inference result using Bayesian networks, I suggest a modified approach. One can
begin by inferring a Bayesian network structure as has been used throughout this
212
thesis. But then for each inferred network structure, one could fit the identified signal
relationships to linear and/or nonlinear functions using the continuous data. In other
words, if a Bayesian network result identifies a hypothesized link A->B, then return to
the continuous data and fit the data for B as a function of A. One can then determine,
in the continuous data space, how well A predicts B. This procedure could be repeated
at different edge weight thresholds to identify a threshold that corresponds to high
prediction accuracy in the continuous data space. These continuous functions could
then be used to perform sensitivity analysis for evaluating the effects of perturbations
to the system. In this manner, one would have an underlying network topology derived
using causal semantics, but the functions then applied to those network interactions
would be continuous and potentially nonlinear. This may ease interpretation of the
prediction results compared to operating in the discrete data space, in which inference
predictions made using Bayesian networks are presented in probabilistic terms [199].
Prior knowledge about the signaling network structure was applied to the Bayesian
networks inferred in Chapters 2, 4, and 5; and prior knowledge about the parameters
of the conditional probability tables was introduced, by varying the equivalent sample
size parameter to account for different data set sizes, in Chapter 5. Most literature
related to applying prior knowledge to Bayesian networks in biology has focused on
modifying the structure prior [200]. However, to my knowledge, no one has discussed
use of the parameter prior for incorporating prior knowledge. Modifying the equivalent sample size (ESS) value in particular should allow facile incorporation of prior
knowledge on a parent-child specific basis. To begin, the ESS value could be altered
for just the one-parent interactions. If this were insufficient, one could consider encoding higher order prior knowledge (e.g., about mutual regulation of one child node
by two parent nodes) using ESS values for two-parent interactions.
Methods could also be developed to improve the inference of large networks. Initial results in this thesis explored results obtained by breaking a large network into
subnetworks, inferring a Bayesian network for each subnetwork, and then piecing the
subnetworks back together by normalizing against how many times a given interaction was considered across the subnetworks. Other work in this thesis also considered
213
large, directed, linear and nonlinear regression-based networks as an alternative to
the Bayesian network approach. In that approach, one or more root nodes would be
specified. Then, the signal best predicted by the root nodes would be determined
and added to the network. Then, given the set of root nodes and the new node, the
signal best predicted by that set would be added to the network. This process would
be repeated until all the nodes were incorporated into the network. Multiple parents
could be specified for each child node. Additionally, the accuracy between linear and
nonlinear interaction terms were compared, and only if the nonlinear method provided
significantly improved accuracy would it be used in place of the linear function. This
type of approach would allow one to "forward-simulate" the entire network based
only on the values of the root nodes.
Lastly, modeling temporal data remains a challenge. Dynamic Bayesian networks
[62], while often cited as a solution to modeling dynamics and cycles in the context of
Bayesian networks, really just represent a multi-layered Bayesian network. Further,
it is not well established how to split the time points from a given time course into
those multiple layers [201]. Thus there is still work to be done to determine how to
implement dynamic Bayesian networks in a biologically relevant manner. But more
importantly, signals that are correlated temporally are still not necessarily causal.
For example, considering the signaling data from different receptor tyrosine kinases
discussed in Chapter 4, it has signals collected across 11 time points and also across 91
perturbation conditions. If one calculates the Spearman correlation between signals
across time points for a given perturbation condition (i.e., a measure of the similarity
between signals' time courses), and compares it to the Spearman correlation between
signals across perturbations for a given time point (i.e., a measure of the similarity
between signals' perturbation responses), the two can be very different. Some signals
have similarly shaped time courses, but respond very differently to perturbations; and
some signals have very differently shaped time courses, but nonetheless response very
similarly to perturbations. The implications of these phenomena should be further
studied, particularly if models calculate signaling relationships across temporal and
perturbation data combined together.
214
Bibliography
[1]
M.A. Lemmon and J. Schlessinger. Cell signaling by receptor tyrosine kinases.
Cell, 141(7):1117, 2010.
[2]
T. Hunter. Why nature chose phosphate to modify proteins. Philosophical
Transactions of the Royal Society B: Biological Sciences, 367(1602):2513-2516,
2012.
[3] W.A. Lim and T. Pawson. Phosphotyrosine signaling: evolving a new cellular
communication system. Cell, 142(5):661-667, 2010.
[4] T. Hunter. Tyrosine phosphorylation: thirty years and counting.
Opinion in Cell Biology, 21(2):140-146, 2009.
Current
[5] D. Hanahan and R.A. Weinberg. Hallmarks of cancer: the next generation.
Cell, 144(5):646-674, 2011.
[6] T. Ideker and D. Lauffenburger. Building with a scaffold: emerging strategies
for high- to low-level cellular modeling. Trends in Biotechnology, 21(6):255-262,
2003.
[7] K.A. Janes and D.A. Lauffenburger. A biological approach to computational
models of proteomic networks. Current Opinion in Chemical Biology, 10(1):73
80, 2006.
[8] P.A. DiMilla, K. Barbee, and D.A. Lauffenburger. Mathematical model for the
effects of adhesion and mechanics on cell migration speed. Biophysical Journal,
60(1):15-37, 1991.
[9] K. Nagata, I. Izawa, and M. Inagaki. A decade of site-and phosphorylation
state-specific antibodies: recent advances in studies of spatiotemporal protein
phosphorylation. Genes to Cells, 6(8):653-664, 2001.
[10] B. Schoeberl, C. Eichler-Jonsson, ED Gilles, and G. Muller. Computational
modeling of the dynamics of the MAP kinase cascade activated by surface and
internalized EGF receptors. Nature Biotechnology, 20(4):370, 2002.
[11] F. Hua, M.G. Cornejo, M.H. Cardone, C.L. Stokes, and D.A. Lauffenburger.
Effects of Bcl-2 levels on Fas signaling-induced caspase-3 activation: molecular
215
genetic tests of computational model predictions. The Journal of Immunology,
175(2):985-995, 2005.
[12] K.A. Janes, J.G. Albeck, L.X. Peng, P.K. Sorger, D.A. Lauffenburger, and M.B.
Yaffe. A high-throughput quantitative multiplex kinase assay for monitoring
information flow in signaling networks application to sepsis-apoptosis. Molecular
& Cellular Proteomics, 2(7):463-473, 2003.
[13] U.B. Nielsen, M.H. Cardone, A.J. Sinskey, G. MacBeath, and P.K. Sorger. Profiling receptor tyrosine kinase activation by using Ab microarrays. Proceedings
of the National Academy of Sciences, 100(16):9330-9335, 2003.
[14] K.A. Janes, J.G. Albeck, S. Gaudet, P.K. Sorger, D.A. Lauffenburger, and
M.B. Yaffe. A systems model of signaling identifies a molecular basis set for
cytokine-induced apoptosis. Science, 310(5754):1646 1653, 2005.
[15] J.M. Irish, R. Hovland, P.O. Krutzik, O.D. Perez, 0. Bruserud, B.T. Gjertsen,
and G.P. Nolan. Single cell profiling of potentiated phospho-protein networks
in cancer cells. Cell, 118(2):217-228, 2004.
[16] K. Sachs, 0. Perez, D. Pe'er, D.A. Lauffenburger, and G.P. Nolan. Causal
protein-signaling networks derived from multiparameter single-cell data. Science, 308(5721):523, 2005.
[17] Y. Zhang, A. Wolf-Yadlin, P.L. Ross, D.J. Pappin, J. Rush, D.A. Lauffenburger,
and F.M. White. Time-resolved mass spectrometry of tyrosine phosphorylation
sites in the epidermal growth factor receptor signaling network reveals dynamic
modules. Molecular & Cellular Proteomics, 4(9):1240-1250, 2005.
[18] M. Sevecka and G. MacBeath. State-based discovery: a multidimensional screen
for small-molecule modulators of EGF signaling. Nature Methods, 3(10):825831, 2006.
[19] M. Bansal, V. Belcastro, A. Ambesi-Impiombato, and D. Di Bernardo. How to
infer gene networks from expression profiles. Molecular Systems Biology, 3(1),
2007.
[20] R. Bonneau, D.J. Reiss, P. Shannon, M. Facciotti, L. Hood, N.S. Baliga,
V. Thorsson, et al. The Inferelator: an algorithm for learning parsimonious
regulatory networks from systems-biology data sets de novo. Genome Biology,
7(5):R36, 2006.
[21] M.K. Morris, J. Saez-Rodriguez, D.C. Clarke, P.K. Sorger, and D.A. Lauffenburger. Training signaling pathway maps to biochemical data with constrained
fuzzy logic: quantitative analysis of liver cell responses to inflammatory stimuli.
PLoS Computational Biology, 7(3):e1001099, 2011.
216
[22] N. Friedman, M. Linial, I. Nachman, and D. Pe'er. Using Bayesian networks
to analyze expression data. Journal of Computational Biology, 7(3-4):601-620,
2000.
[23] J. Saez-Rodriguez, L.G. Alexopoulos, J. Epperlein, R. Samaga, D.A. Lauffenburger, S. Klamt, and P.K. Sorger. Discrete logic modelling as a means to
link protein signalling networks with functional analysis of mammalian signal
transduction. Molecular Systems Biology, 5(1), 2009.
[24] K. Wang, M. Saito, B.C. Bisikirska, M.J. Alvarez, W.K. Lim, P. Rajbhandari,
Q. Shen, I. Nemenman, K. Basso, A.A. Margolin, et al. Genome-wide identification of post-translational modulators of transcription factor activity in human
B cells. Nature-Biotechnology, 27(9):829-837, 2009.
[25]
A. De La Fuente, N. Bing, I. Hoeschele, and P. Mendes. Discovery of meaningful
associations in genomic data using partial correlation coefficients. Bioinformatics, 20(18):3565-3574, 2004.
[26] J. Krumsiek, K. Suhre, T. Illig, J. Adamski, and F.J. Theis. Gaussian graphical
modeling reconstructs pathway reactions from high-throughput metabolomics
data. BMC Systems Biology, 5(1):21, 2011.
[27] J.J. Faith, B. Hayete, J.T. Thaden, I. Mogno, J. Wierzbowski, G. Cottarel,
S. Kasif, J.J. Collins, and T.S. Gardner. Large-scale mapping and validation
of Escherichia coli transcriptional regulation from a compendium of expression
profiles. PLoS Biology, 5(1):e8, 2007.
[28] A.A. Margolin, I. Nemenman, K. Basso, C. Wiggins, G. Stolovitzky, R.D. Favera, and A. Califano. ARACNE: an algorithm for the reconstruction of gene
regulatory networks in a mammalian cellular context. BMC Bioinformatics,
7(Suppl 1):S7, 2006.
[29] D. Pe'er. Bayesian network analysis of signaling networks: a primer. Science
Signaling, 2005(281):pl4, 2005.
[30] D. Heckerman. A tutorial on learning with Bayesian networks. Innovations in
Bayesian Networks, pages 33-82, 2008.
[31] K.B. Korb and A.E. Nicholson.
Chapman & Hall/CRC, 2003.
[32] D. Heckerman.
1990.
Bayesian Artificial Intelligence, volume 1.
Probabilistic similarity networks.
Networks, 20(5):607-636,
[33] G.F. Cooper and E. Herskovits. A Bayesian method for the induction of probabilistic networks from data. Machine Learning, 9(4):309-347, 1992.
217
[34] B. Schoeberl, E.A. Pace, J.B. Fitzgerald, B.D. Harms, L. Xu, L. Nie, B. Linggi,
A. Kalra, V. Paragas, R. Bukhalid, et al. Therapeutically targeting ErbB3: a
key node in ligand-induced activation of the ErbB receptor-PI3K axis. Science
Signaling, 2(77):ra3l, 2009.
[35] A.L. Hopkins. Network pharmacology: the next paradigm in drug discovery.
Nature Chemical Biology, 4(11):682-690, 2008.
[36] T.H. Keller, A. Pichota, and Z. Yin. A practical view of 'druggability'. Current
Opinion in Chemical Biology, 10(4):357-361, 2006.
[37] M.F. Ciaccio, J.P. Wagner, C.P. Chuu, D.A. Lauffenburger, and R.B. Jones.
Systems analysis of EGF receptor signaling dynamics with microwestern arrays.
Nature Methods, 7(2):148-155, 2010.
[38] W. Burnette. "Western blotting": electrophoretic transfer of proteins from
sodium dodecyl sulfate polyacrylamide gels to unmodified nitrocellulose and
radiographic detection witih antibody and radioiodinated protein A. Analytical
Biochemistry, 112:195-203, 1981.
[39] C.P. Paweletz, L.A. Liotta, and E.F. Petricoin. New technologies for biomarker
analysis of prostate cancer progression: Laser capture microdissection and tissue
proteomics. Urology, 57(4):160-163, 2001.
[40] C.P. Paweletz, L. Charboneau, V.E. Bichsel, N.L. Simone, T. Chen, J.W. Gillespie, M.R. Emmert-Buck, M.J. Roth, EF Petricoin, L.A. Liotta, et al. Reverse
phase protein microarrays which capture disease progression show activation of
pro-survival pathways at the cancer invasion front. Oncogene, 20(16):1981-1989,
2001.
[41] K. Rikova, A. Guo, Q. Zeng, A. Possemato, J. Yu, H. Haack, J. Nardone, K. Lee,
C. Reeves, Y. Li, et al. Global survey of phosphotyrosine signaling identifies
oncogenic kinases in lung cancer. Cell, 131(6):1190-1203, 2007.
[42] J.V. Olsen, B. Blagoev, F. Gnad, B. Macek, C. Kumar, P. Mortensen, and
M. Mann. Global, in vivo, and site-specific phosphorylation dynamics in signaling networks. Cell, 127(3):635-648, 2006.
[43] A. Wolf-Yadlin, N. Kumar, Y. Zhang, S. Hautaniemi, M. Zaman, H.D. Kim,
V. Grantcharova, D.A. Lauffenburger, and F.M. White. Effects of HER2 overexpression on cell signaling networks governing proliferation and migration.
Molecular Systems Biology, 2(1), 2006.
[44] R. Tibes, Y.H. Qiu, Y. Lu, B. Hennessy, M. Andreeff, G.B. Mills, and S.M. Kornblau. Reverse phase protein array: validation of a novel proteomic technology
and utility for analysis of primary leukemia specimens and hematopoietic stem
cells. Molecular Cancer Therapeutics, 5(10):2512-2521, 2006.
218
[45] R.B. Jones, A. Gordus, J.A. Krall, and G. MacBeath. A quantitative protein
interaction network for the ErbB receptors using protein microarrays. Nature,
439(7073):168-174, 2005.
[46]
H. Sunada, B.E. Magun, J. Mendelsohn, and C.L. MacLeod. Monoclonal antibody against epidermal growth factor receptor is internalized without stimulating receptor phosphorylation. Proceedings of the National Academy of Sciences,
83(11):3825-3829, 1986.
[47] G.N. Gill and C.S. Lazar. Increased phosphotyrosine content and inhibition of
proliferation in EGF-treated A431 cells. Nature, 1981.
[48] A. Wolf-Yadlin, S. Hautaniemi, D.A. Lauffenburger, and F.M. White. Multiple
reaction monitoring for robust quantitative proteomic analysis of cellular signaling networks. Proceedings of the National Academy of Sciences, 104(14):58605865, 2007.
[49] F. Chang, JT Lee, PM Navolanic, LS Steelman, JG Shelton, WL Blalock,
RA Franklin, and JA McCubrey. Involvement of PI3K/Akt pathway in cell
cycle progression, apoptosis, and neoplastic transformation: a target for cancer
chemotherapy. Leukemia, 17(3):590-603, 2003.
[50] M. Koivisto and K. Sood. Exact Bayesian structure discovery in Bayesian
networks. Journal of Machine Learning Research, 5:549-573, 2004.
[51] D. Eaton and K. Murphy. Exact Bayesian structure learning from uncertain
interventions. In AI & Statistics, volume 2, pages 107-114, 2007.
[52] J.M. Stommel, A.C. Kimmelman, H. Ying, R. Nabioullin, A.H. Ponugoti,
R. Wiedemeyer, A.H. Stegh, J.E. Bradner, K.L. Ligon, C. Brennan, et al.
Coactivation of receptor tyrosine kinases affects the response of tumor cells
to targeted therapies. Science, 318(5848):287, 2007.
[53] P.A. Bromann, H. Korkaya, and S.A. Courtneidge. The interplay between Src
family kinases and receptor tyrosine kinases. Oncogene, 23(48):7957-7968, 2004.
[54] D.M. Chickering. Learning equivalence classes of Bayesian-network structures.
Journal of Machine Learning Research, 2:445-498, 2002.
[55] J. Downward, P. Parker, and MD Waterfield. Autophosphorylation sites on the
epidermal growth factor receptor. Nature, 311(5985):483-485, 1984.
[56] Y. Saito, J. Haendeler, Y. Hojo, K. Yamamoto, and B.C. Berk. Receptor heterodimerization: essential mechanism for platelet-derived growth factor-induced
epidermal growth factor receptor transactivation. Molecular and Cellular Biology, 21(19):6387-6394, 2001.
219
[57] L. Duchesne, B. Tissot, T.R. Rudd, A. Dell, and D.G. Fernig. N-glycosylation
of fibroblast growth factor receptor 1 regulates ligand and heparan sulfate coreceptor binding. Journal of Biological Chemistry, 281(37):27178-27189, 2006.
[58] S. Ekman, A. Kallin, U. Engstroem, C.H. Heldin, and L. Roennstrand. SHP2 is involved in heterodimer specific loss of phosphorylation of Tyr771 in the
PDGF3-receptor. Oncogene, 21:1870-1875, 2002.
[59] K.L. Gould and T. Hunter. Platelet-derived growth factor induces multisite
phosphorylation of pp60c-src and increases its protein-tyrosine kinase activity.
Molecular and Cellular Biology, 8(8):3345-3356, 1988.
[60] R.C. Taylor, G. Acquaah-Mensah, M. Singhal, D. Malhotra, and S. Biswal.
Network inference algorithms elucidate Nrf2 regulation of mouse lung oxidative
stress. PLoS Computational Biology, 4(8):e1O00166, 2008.
[61] I. Cantone, L. Marucci, F. Iorio, M.A. Ricci, V. Belcastro, M. Bansal, S. Santini,
M. di Bernardo, D. di Bernardo, M.P. Cosma, et al. A yeast synthetic network
for in vivo assessment of reverse-engineering and modeling approaches. Cell,
137(1):172, 2009.
[62] D. Husmeier. Sensitivity and specificity of inferring genetic regulatory interactions from microarray experiments with dynamic Bayesian networks. Bioinformatics, 19(17):2271-2282, 2003.
[63] Y.W. Lou, Y.Y. Chen, S.F. Hsu, R.K. Chen, C.L. Lee, K.H. Khoo, N.K. Tonks,
and T.C. Meng. Redox regulation of the protein tyrosine phosphatase PTP1B
in cancer cells. FEBS Journal,275(1):69-88, 2008.
[64] W. Lu, K. Shen, and P.A. Cole. Chemical dissection of the effects of tyrosine
phosphorylation of SHP-2. Biochemistry, 42(18):5461-5468, 2003.
[65] N. Friedman and D. Koller. Being Bayesian about network structure: A
Bayesian approach to structure discovery in Bayesian networks. Machine Learning, 50(1):95-125, 2003.
[66] D. Heckerman, D. Geiger, and D.M. Chickering. Learning Bayesian networks: The combination of knowledge and statistical data. Machine Learning,
20(3):197-243, 1995.
[67] D. Madigan, J. York, and D. Allard. Bayesian graphical models for discrete data. InternationalStatistical Review/Revue Internationale de Statistique,
pages 215-232, 1995.
[68] P.E. Meyer, F. Lafitte, and G. Bontempi. minet: A R/Bioconductor package
for inferring large transcriptional networks using mutual information. BMC
Bioinformatics, 9(1):461, 2008.
220
[69]
R. Steuer, J. Kurths, C.O. Daub, J. Weise, and J. Selbig. The mutual information: detecting and evaluating dependencies between variables. Bioinformatics,
18(suppl 2):S231-S240, 2002.
[70] H.D. Kim, A.S. Meyer, J.P. Wagner, S.K. Alford, A. Wells, F.B. Gertler, and
D.A. Lauffenburger. Signaling network state predicts Twist-mediated effects
on breast cell migration across diverse growth factor contexts. Molecular 6
Cellular Proteomics, 10(11), 2011.
[71] J.P. Thiery. Epithelial-mesenchymal transitions in development and pathologies. Current Opinion in Cell Biology, 15(6):740-746, 2003.
[72] R. Kalluri and R.A. Weinberg. The basics of epithelial-mesenchymal transition.
Journal of Clinical Investigation, 119(6):1420, 2009.
[73] J.P. Thiery and J.P. Sleeman. Complex networks orchestrate epithelialmesenchymal transitions. Nature Reviews Molecular Cell Biology, 7(2):131-142,
2006.
[74] S. Thomson, F. Petti, I. Sujka-Kwok, P. Mercado, J. Bean, M. Monaghan,
S.L. Seymour, G.M. Argast, D.M. Epstein, and J.D. Haley. A systems view of
epithelial-mesenchymal transition signaling states. Clinical and Experimental
Metastasis, 28(2):137-155, 2011.
Signaling networks guiding epithelial[75] A. Moustakas and C.H. Heldin.
and cancer progression. Cancer
embryogenesis
during
transitions
mesenchymal
Science, 98(10):1512-1520, 2007.
[76] J. Xu, S. Lamouille, and R. Derynck. TGF--induced epithelial to mesenchymal
transition. Cell Research, 19(2):156-172, 2009.
[77] J.M. L6pez-Novoa and M.A. Nieto. Inflammation and EMT: an alliance towards
organ fibrosis and cancer progression. EMBO Molecular Medicine, 1(6-7):303314, 2009.
[78] A. Singh and J. Settleman. EMT, cancer stem cells and drug resistance: an
emerging axis of evil in the war on cancer. Oncogene, 29(34):4741-4751, 2010.
[79] R.I. Nicholson, J.M. Gee, and M.E. Harper. EGFR and cancer prognosis. European Journal of Cancer (Oxford, England: 1990), 37:S9, 2001.
[80] E.M. Bublil and Y. Yarden. The EGF receptor family: spearheading a merger
of signaling and therapeutics. Current Opinion in Cell Biology, 19(2):124-134,
2007.
[81] J.R. Grandis and J.C. Sok. Signaling through the epidermal growth factor
receptor during the development of malignancy. Pharmacology & Therapeutics,
102(1):37-46, 2004.
221
[82] Y. Yarden and M.X. Sliwkowski. Untangling the ErbB signalling network. Nature Reviews Molecular Cell Biology, 2(2):127-137, 2001.
[83] B.A. Frederick, B.A. Helfrich, C.D. Coldren, D. Zheng, D. Chan, P.A. Bunn,
and D. Raben. Epithelial to mesenchymal transition predicts gefitinib resistance
in cell lines of head and neck squamous cell carcinoma and non-small cell lung
carcinoma. Molecular Cancer Therapeutics, 6(6):1683-1691, 2007.
[84] S. Thomson, F. Petti, I. Sujka-Kwok, D. Epstein, and J.D. Haley. Kinase switching in mesenchymal-like non-small cell lung cancer lines contributes to EGFR
inhibitor resistance through pathway redundancy. Clinical and Experimental
Metastasis, 25(8):843-854, 2008.
[85] S. Barr, S. Thomson, E. Buck, S. Russo, F. Petti, I. Sujka-Kwok, A. Eyzaguirre,
M. Rosenfeld-Franklin, N.W. Gibson, M. Miglarese, et al. Bypassing cellular
EGF receptor dependence through epithelial-to-mesenchymal-like transitions.
Clinical and Experimental Metastasis, 25(6):685-693, 2008.
[86] A. Chakravarti, J.S. Loeffler, and N.J. Dyson. Insulin-like growth factor receptor i mediates resistance to anti-epidermal growth factor receptor therapy
in primary human glioblastoma cells through continued activation of phosphoinositide 3-kinase signaling. Cancer Research, 62(1):200-207, 2002.
[87] B. Elenbaas, L. Spirio, F. Koerner, M.D. Fleming, D.B. Zimonjic, J.L. Donaher,
N.C. Popescu, W.C. Hahn, and R.A. Weinberg. Human breast cancer cells
generated by oncogenic transformation of primary mammary epithelial cells.
Genes & Development, 15(1):50-65, 2001.
[88] J.H. Taube, J.I. Herschkowitz, K. Komurov, A.Y. Zhou, S. Gupta, J. Yang,
K. Hartwell, T.T. Onder, P.B. Gupta, K.W. Evans, et al. Core epithelialto-mesenchymal transition interactome gene-expression signature is associated
with claudin-low and metaplastic breast cancer subtypes. Proceedings of the
National Academy of Sciences, 107(35):15449-15454, 2010.
[89] J. Yang, S.A. Mani, J.L. Donaher, S. Ramaswamy, R.A. Itzykson, C. Come,
P. Savagner, I. Gitelman, A. Richardson, and R.A. Weinberg. Twist, a master
regulator of morphogenesis, plays an essential role in tumor metastasis. Cell,
117(7):927-939, 2004.
[90] T.A. Martin, A. Goyal, G. Watkins, and W.G. Jiang. Expression of the transcription factors snail, slug, and twist and their clinical significance in human
breast cancer. Annals of Surgical Oncology, 12(6):488-496, 2005.
[91] M.A. Eckert, T.M. Lwin, A.T. Chang, J. Kim, E. Danis, L. Ohno-Machado, and
J. Yang. Twist1-induced invadopodia formation promotes tumor metastasis.
Cancer Cell, 19(3):372-386, 2011.
222
[92] Y. Soini, H. Tuhkanen, R. Sironen, I. Virtanen, V. Kataja, P. Auvinen, A. Mannermaa, and V.M. Kosma. Transcription factors zeb1, twist and snail in breast
carcinoma. BMC Cancer, 11(1):73, 2011.
[93] M.G. Ponzo, R. Lesurf, S. Petkiewicz, F.P. O'Malley, D. Pinnaduwage, I.L.
Andrulis, S.B. Bull, N. Chughtai, D. Zuo, M. Souleimanova, et al. Met induces mammary tumors with diverse histologies and is associated with poor
outcome and human basal breast cancer. Proceedings of the National Academy
of Sciences, 106(31):12903-12908, 2009.
[94] J. Ma, M.C. DeFrances, C. Zou, C. Johnson, R. Ferrell, and R. Zarnegar.
Somatic mutation and functional polymorphism of a novel regulatory element in
the HGF gene promoter causes its aberrant expression in human breast cancer.
Journal of Clinical Investigation, 119(3):478, 2009.
[95] I.R. Hutcheson, J.M. Knowlden, S.E. Hiscox, D. Barrow, JM Gee, J.F. Robertson, 1.0. Ellis, R.I. Nicholson, et al. Heregulin 131 drives gefitinib-resistant
growth and invasion in tamoxifen-resistant MCF-7 breast cancer cells. Breast
Cancer Research, 9(4):R50, 2007.
[96] H.D. Kim, T.W. Guo, A.P. Wu, A. Wells, F.B. Gertler, and D.A. Lauffenburger.
Epidermal growth factor-induced enhancement of glioblastoma cell migration
in 3D arises from an intrinsic increase in speed but an extrinsic matrix-and
proteolysis-dependent increase in persistence. Molecular Biology of the Cell,
19(10):4249-4259, 2008.
[97] E.J. Joslin, L.K. Opresko, A. Wells, H.S. Wiley, and D.A. Lauffenburger.
EGF-receptor-mediated mammary epithelial cell migration is driven by sustained ERK signaling from autocrine stimulation. Journal of Cell Science,
120(20):3688-3699, 2007.
[98] C. Hidalgo-Carcedo, S. Hooper, S.I. Chaudhry, P. Williamson, K. Harrington,
B. Leitinger, and E. Sahai. Collective cell migration requires suppression of actomyosin at cell-cell contacts mediated by DDR1 and the cell polarity regulators
Par3 and Par6. Nature Cell Biology, 13(1):49, 2011.
[99] R.M. Neve, K. Chin, J. Fridlyand, J. Yeh, F.L. Baehner, T. Fevr, L. Clark,
N. Bayani, J.P. Coppe, F. Tong, et al. A collection of breast cancer cell lines for
the study of functionally distinct cancer subtypes. Cancer Cell, 10(6):515-527,
2006.
[100] T. Blick, E. Widodo, H. Hugo, M. Waltham, ME Lenburg, RM Neve, and
EW Thompson. Epithelial mesenchymal transition traits in human breast cancer cell lines. Clinical and Experimental Metastasis, 25(6):629-642, 2008.
[101] A. De Luca and N. Normanno. Predictive biomarkers to tyrosine kinase inhibitors for the epidermal growth factor receptor in non-small-cell lung cancer.
Current Drug Targets, 11(7):851-864, 2010.
223
[102] L. Li, K. Sampat, N. Hu, J. Zakari, and S.H. Yuspa. Protein kinase C negatively regulates Akt activity and modifies UVC-induced apoptosis in mouse
keratinocytes. Journal of Biological Chemistry, 281(6):3237-3243, 2006.
[103 M. Guarino. Src signaling in cancer invasion. Journal of Cellular Physiology,
223(1):14-26, 2010.
[104] V. Aguirre, T. Uchida, L. Yenush, R. Davis, and M.F. White. The c-JunNH2-terminal kinase promotes insulin resistance during association with insulin receptor substrate-i and phosphorylation of Ser307. Journal of Biological
Chemistry, 275(12):9047-9054, 2000.
[105] X. Zhang, A. Chattopadhyay, Q. Ji, J.D. Owen, P.J. Ruest, G. Carpenter,
and S.K. Hanks. Focal adhesion kinase promotes phospholipase C-'}1 activity.
Proceedings of the National Academy of Sciences, 96(16):9021-9026, 1999.
[106] M.P. Wymann and R. Schneiter. Lipid signalling in disease. Nature Reviews
Molecular Cell Biology, 9(2):162-176, 2008.
[107] X. Fang, S. Yu, J.L. Tanyi, Y. Lu, J.R. Woodgett, and G.B. Mills. Convergence
of multiple signaling cascades at glycogen synthase kinase 3: Edg receptormediated phosphorylation and inactivation by lysophosphatidic acid through
a protein kinase C-dependent intracellular pathway. Molecular and Cellular
Biology, 22(7):2099-2110, 2002.
[108] J. Gwak, M. Cho, S.J. Gong, J. Won, D.E. Kim, E.Y. Kim, S.S. Lee, M. Kim,
T.K. Kim, J.G. Shin, et al. Protein kinase C-mediated -catenin phosphorylation negatively regulates the Wnt/#-catenin pathway. Journal of Cell Science,
119(22):4702-4709, 2006.
[109] F. Liao, HS Shin, and SG Rhee. In vitro tyrosine phosphorylation of PLC-7y1
and PLC-72 by SRC family protein tyrosine kinases. Biochemical and Biophysical Research Communications, 191(3):1028-1033, 1993.
[110] S.V. del Rinc6n, Q. Guo, C. Morelli, H.Y. Shiu, E. Surmacz, and W.H. Miller.
Retinoic acid mediates degradation of IRS-1 by the ubiquitin-proteasome pathway, via a PKC-dependant mechanism. Oncogene, 23(57):9269-9279, 2004.
[111] S. Ishibe, D. Joly, Z.X. Liu, and L.G. Cantley. Paxillin serves as an ERKregulated scaffold for coordinating FAK and Rac activation in epithelial morphogenesis. Molecular Cell, 16(2):257-267, 2004.
[112] R.H. Alvarez, V. Valero, and G.N. Hortobagyi. Emerging targeted therapies
for breast cancer. Journal of Clinical Oncology, 28(20):3366-3379, 2010.
[113] M. Luo, P. Langlais, Z. Yi, N. Lefort, E.A. De Filippis, H. Hwang, C.Y.
Christ-Roberts, and L.J. Mandarino. Phosphorylation of human insulin receptor substrate-i at serine 629 plays a positive role in insulin signaling. Endocrinology, 148(10):4895-4905, 2007.
224
[114] R. Wu, H. Kausar, P. Johnson, D.E. Montoya-Durango, M. Merchant, and
M.J. Rane. Hsp27 regulates Akt activation and polymorphonuclear leukocyte
apoptosis by scaffolding MK2 to Akt signal complex. Journal of Biological
Chemistry, 282(30):21598-21608, 2007.
[115] S. de Jong. SIMPLS: an alternative approach to partial least squares regression.
Chemometrics and Intelligent Laboratory Systems, 18(3):251-263, 1993.
[1161 I.-G. Chong and C.-H. Jun. Performance of some variable selection methods
when multicollinearity is present. Chemometrics and Intelligent Laboratory Systems, 78(1):103-112, 2005.
[117] Y. Benjamini and Y. Hochberg. Controlling the false discovery rate: a practical
and powerful approach to multiple testing. Journal of the Royal Statistical
Society. Series B (Methodological), pages 289-300, 1995.
[118] R.G. Miller. Simultaneous Statistical Inference. Springer-Verlag, 1981.
[119] C. Ding and H. Peng. Minimum redundancy feature selection from microarray
gene expression data. Journal of Bioinformatics and Computational Biology,
3(02):185-205, 2005.
[120] U. M. Braga-Neto and E. R. Dougherty. Is cross-validation valid for smallsample microarray classification? Bioinformatics, 20(3):374-380, 2004.
[121] R.B. Bendel and A.A. Afifi. Comparison of stopping rules in forward "stepwise" regression. Journal of the American Statistical Association, 72(357):4653, 1977.
[122] R. Tibshirani. Regression shrinkage and selection via the lasso. Journal of the
Royal Statistical Society. Series B (Methodological), pages 267-288, 1996.
[123] H. Zou and T. Hastie. Regularization and variable selection via the elastic net.
Journal of the Royal Statistical Society: Series B (Statistical Methodology),
67(2):301-320, 2005.
[124] B.D. Cosgrove, B.M. King, M.A. Hasan, L.G. Alexopoulos, P.A. Farazi, B.S.
Hendriks, L.G. Griffith, P.K. Sorger, B. Tidor, J.J. Xu, et al. Synergistic drugcytokine induction of hepatocellular death as an in vitro approach for the study
of inflammation-associated idiosyncratic drug hepatotoxicity. Toxicology and
Applied Pharmacology,237(3):317-330, 2009.
[125] S. Pece, M. Chiariello, C. Murga, and J.S. Gutkind. Activation of the protein
kinase Akt/PKB by the formation of E-cadherin-mediated cell-cell junctions Evidence for the association of phosphatidylinositol 3-kinase with the E-cadherin
adhesion complex. Journal of Biological Chemistry, 274(27):19347-19351, 1999.
225
[126] B. Baum and M. Georgiou. Dynamics of adherens junctions in epithelial establishment, maintenance, and remodeling. Journal of Cell Biology, 192(6):907917, 2011.
[127] C. Huang, Z. Rajfur, C. Borchers, M.D. Schaller, and K. Jacobson. JNK phosphorylates paxillin and regulates cell migration. Nature, 424(6945):219-223,
2003.
[128] M.D. Schaller. Paxillin: a focal adhesion-associated adaptor protein. Oncogene,
20(44):6459, 2001.
[129] D.S. Harburger and D.A. Calderwood. Integrin signalling at a glance. Journal
of Cell Science, 122(2):159-163, 2009.
[130] E.A.C. Almeida, D. Ilid, Q. Han, C.R. Hauck, F. Jin, H. Kawakatsu, D.D.
Schlaepfer, and C.H. Damsky. Matrix survival signaling from fibronectin via focal adhesion kinase to c-Jun-NH2-terminal kinase. The Journal of Cell Biology,
149(3):741-754, 2000.
[131] M.A. Wozniak, K. Modzelewska, L. Kwong, and P.J. Keely. Focal adhesion
regulation of cell behavior. Biochimica et Biophysica Acta (BBA)-Molecular
Cell Research, 1692(2):103-119, 2004.
[132] P. Friedl and D. Gilmour. Collective cell migration in morphogenesis, regeneration and cancer. Nature Reviews Molecular Cell Biology, 10(7):445 457, 2009.
[133] J.F. Santibaiez.
JNK mediates TGF-01-induced epithelial mesenchymal
transdifferentiation of mouse transformed keratinocytes.
FEBS Letters,
580(22):5385-5391, 2006.
[134]
Q. Liu,
H. Mao, J. Nie, W. Chen, Q. Yang, X. Dong, and X. Yu. Transforming
growth factor #1 induces epithelial-mesenchymal transition by activating the
JNK-Smad3 pathway in rat peritoneal mesothelial cells. Peritoneal Dialysis
International,28(Supplement 3):S88-S95, 2008.
[135] J. Wang, I. Kuiatse, A.V. Lee, J. Pan, A. Giuliano, and X. Cui. Sustained cJun-NH2-kinase activity promotes epithelial-mesenchymal transition, invasion,
and survival of breast cancer cells by regulating extracellular signal-regulated
kinase activation. Molecular Cancer Research, 8(2):266-277, 2010.
[136] J.D. Storey and R. Tibshirani. Statistical significance for genomewide studies.
Proceedings of the National Academy of Sciences, 100(16):9440-9445, 2003.
[137] J.P. Wagner, A. Wolf-Yadlin, M. Sevecka, J.K. Grenier, D.E. Root, D.A. Lauffenburger, and G. MacBeath. Receptor tyrosine kinases fall into distinct classes
based on their inferred signaling networks. Submitted, 2013.
[138] S.R. Hubbard and J.H. Till. Protein tyrosine kinase structure and function.
Annual Review of Biochemistry, 69(1):373-398, 2000.
226
[139] A.B. Turke, K. Zejnullahu, Y.L. Wu, Y. Song, D. Dias-Santagata, E. Lifshits,
L. Toschi, A. Rogers, T. Mok, L. Sequist, et al. Preexistence and clonal selection
of MET amplification in EGFR mutant NSCLC. Cancer Cell, 17(1):77-88,
2010.
[140] J. Qi, M.A. McTigue, A. Rogers, E. Lifshits, J.G. Christensen, P.A. Jdnne, and
J.A. Engelman. Multiple mutations and bypass mechanisms can contribute
to development of acquired resistance to MET inhibitors. Cancer Research,
71(3):1081-1091, 2011.
[141] Z. Zhang, J.C. Lee, L. Lin, V. Olivas, V. Au, T. LaFramboise, M. AbdelRahman, X. Wang, A.D. Levine, J.K. Rho, et al. Activation of the AXL kinase
causes resistance to EGFR-targeted therapy in lung cancer. Nature Genetics,
44(8):852-860, 2012.
[142] T.R. Wilson, J. Fridlyand, Y. Yan, E. Penuel, L. Burton, E. Chan, J. Peng,
E. Lin, Y. Wang, J. Sosman, et al. Widespread potential for growth-factordriven resistance to anticancer kinase inhibitors. Nature, 2012.
[143] F. Harbinski, V.J. Craig, S. Sanghavi, D. Jeffery, L. Liu, K.A. Sheppard,
S. Wagner, C. Stamm, A. Buness, C. Chatenay-Rivauday, et al. Rescue screens
with secreted proteins reveal compensatory potential of receptor tyrosine kinases in driving cancer growth. Cancer Discovery, 2012.
[144] J. Tegner, M.K.S. Yeung, J. Hasty, and J.J. Collins. Reverse engineering gene
networks: integrating genetic perturbations with dynamical modeling. Proceedings of the National Academy of Sciences, 100(10):5944-5949, 2003.
[145] R.J. Prill, J. Saez-Rodriguez, L.G. Alexopoulos, P.K. Sorger, and
G. Stolovitzky. Crowdsourcing network inference: the DREAM predictive signaling network challenge. Science Signaling, 4(189):mr7, 2011.
[146] A. Gordus, J.A. Krall, E.M. Beyer, A. Kaushansky, A. Wolf-Yadlin, M. Sevecka,
B.H. Chang, J. Rush, and G. MacBeath. Linear combinations of docking affinities explain quantitative differences in RTK signaling. Molecular Systems Biology, 5(1), 2009.
[147] J. Moffat, D.A. Grueneberg, X. Yang, S.Y. Kim, A.M. Kloepfer, G. Hinkle,
B. Piqani, T.M. Eisenhaure, B. Luo, J.K. Grenier, et al. A lentiviral RNAi
library for human and mouse genes applied to an arrayed viral high-content
screen. Cell, 124(6):1283-1298, 2006.
[148] M.D. Marmor, K.B. Skaria, Y. Yarden, et al. Signal transduction and oncogenesis by ErbB/HER receptors. InternationalJournal of Radiation Oncology,
Biology, Physics, 58(3):903, 2004.
[149] M. Sevecka, A. Wolf-Yadlin, and G. MacBeath. Lysate microarrays enable
high-throughput, quantitative investigations of cellular signaling. Molecular B
Cellular Proteomics, 10(4), 2011.
227
[150] O.E. Sturm, R. Orton, J. Grindlay, M. Birtwistle, V. Vyshemirsky, D. Gilbert,
M. Calder, A. Pitt, B. Kholodenko, and W. Kolch.
The mammalian
MAPK/ERK pathway exhibits properties of a negative feedback amplifier. Science Signaling, 3(153):ra90, 2010.
[151]
Q. Wang,
Y. Zhou, X. Wang, and B.M. Evers. Glycogen synthase kinase-3 is a
negative regulator of extracellular signal-regulated kinase. Oncogene, 25(1):4350, 2005.
[152] J.E. Ferrell Jr. What do scaffold proteins really do?
2000(52):pel, 2000.
Science Signaling,
[153] Y. Kim, Z. Paroush, K. Nairz, E. Hafen, G. Jimenez, and S.Y. Shvartsman.
Substrate-dependent control of MAPK phosphorylation in vivo. Molecular Systems Biology, 7(1), 2011.
[154] M.L. Wynn, A.C. Ventura, J.A. Sepulchre, H.J. Garcia, and S.D. Merajver.
Kinase inhibitors can produce off-target effects and activate linked pathways
by retroactivity. BMC Systems Biology, 5(1):156, 2011.
[155] D. Marbach, J.C. Costello, R. Kiiffner, N.M. Vega, R.J. Prill, D.M. Camacho,
K.R. Allison, M. Kellis, J.J. Collins, G. Stolovitzky, et al. Wisdom of crowds
for robust gene network inference. Nature Methods, 2012.
[156] A.J. Butte and I.S. Kohane. Mutual information relevance networks: functional
genomic clustering using pairwise entropy measurements. In Pacific Symposium
on Biocomputing, volume 5, pages 418-429, 2000.
[157] G.A.F. Seber. Multivariate Observations. John Wiley and Sons, 1984.
[158] J. Barretina, G. Caponigro, N. Stransky, K. Venkatesan, A.A. Margolin, S. Kim,
C.J. Wilson, J. Lehair, G.V. Kryukov, D. Sonkin, et al. The Cancer Cell Line
Encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature, 483(7391):603-607, 2012.
[159] R.C. Harris, E. Chung, and R.J. Coffey. EGF receptor ligands. Experimental
Cell Research, 284(1):2-13, 2003.
[160] X. Zhang, O.A. Ibrahimi, S.K. Olsen, H. Umemori, M. Mohammadi, and D.M.
Ornitz. Receptor specificity of the fibroblast growth factor family. Journal of
Biological Chemistry, 281(23):15694-15700, 2006.
[161] S.P. Squinto, T.N. Stitt, T.H. Aldrich, S. Davis, SM Bianco, C. RadzieJewski,
D.J. Glass, P. Masiakowski, M.E. Furth, D.M. Valenzuela, et al. trkb encodes
a functional receptor for brain-derived neurotrophic factor and neurotrophin-3
but not nerve growth factor. Cell, 65(5):885, 1991.
[162] J. Andrae, R. Gallini, and C. Betsholtz. Role of platelet-derived growth factors
in physiology and medicine. Genes & Development, 22(10):1276-1312, 2008.
228
[163] R. Straussman, T. Morikawa, K. Shee, M. Barzily-Rokni, Z.R. Qian, J. Du,
A. Davis, M.M. Mongare, J. Gould, D.T. Frederick, et al. Tumour microenvironment elicits innate resistance to RAF inhibitors through HGF secretion.
Nature, 487(7408):500-504, 2012.
[164] A.K. Mitra, K. Sawada, P. Tiwari, K. Mui, K. Gwin, and E. Lengyel. Ligandindependent activation of c-Met by fibronectin and a5/31-integrin regulates
ovarian cancer invasion and metastasis. Oncogene, 30(13):1566-1576, 2011.
[165] M.I. Davis, J.P. Hunt, S. Herrgard, P. Ciceri, L.M. Wodicka, G. Pallares,
M. Hocker, D.K. Treiber, and P.P. Zarrinkar. Comprehensive analysis of kinase inhibitor selectivity. Nature Biotechnology, 29(11):1046-1051, 2011.
[166] S. Corso, E. Ghiso, V. Cepero, J.R. Sierra, C. Migliore, A. Bertotti, L. Trusolino,
P.M. Comoglio, and S. Giordano. Activation of HER family members in gastric
carcinoma cells mediates resistance to MET inhibition. Molecular Cancer, 9,
2010.
[167] M.E. Marshall, T.K. Hinz, S.A. Kono, K.R. Singleton, B. Bichon, K.E. Ware,
L. Marek, B.A. Frederick, D. Raben, and L.E. Heasley. Fibroblast growth
factor receptors are components of autocrine signaling networks in head and
neck squamous cell carcinoma cells. Clinical Cancer Research, 17(15):50165025, 2011.
[168] H. Fischer, N. Taylor, S. Allerstorfer, M. Grusch, G. Sonvilla, K. Holzmann,
U. Setinek, L. Elbling, H. Cantonati, B. Grasl-Kraupp, et al. Fibroblast
growth factor receptor-mediated signals contribute to the malignant phenotype
of non-small cell lung cancer cells: therapeutic implications and synergism with
epidermal growth factor receptor inhibition. Molecular Cancer Therapeutics,
7(10):3408-3419, 2008.
[169] L. Marek, K.E. Ware, A. Fritzsche, P. Hercule, W.R. Helton, J.E. Smith, L.A.
McDermott, C.D. Coldren, R.A. Nemenoff, D.T. Merrick, et al. Fibroblast
growth factor (FGF) and FGF receptor-mediated autocrine signaling in nonsmall-cell lung cancer cells. Molecular Pharmacology, 75(1):196-207, 2009.
[170] K.E. Ware, M.E. Marshall, L.R. Heasley, L. Marek, T.K. Hinz, P. Hercule, B.A.
Helfrich, R.C. Doebele, and L.E. Heasley. Rapidly acquired resistance to EGFR
tyrosine kinase inhibitors in NSCLC cell lines through de-repression of FGFR2
and FGFR3 expression. PLoS One, 5(11):e14117, 2010.
[171] M. Guix, A.C. Faber, S.E. Wang, M.G. Olivares, Y. Song, S. Qu, C. Rinehart,
B. Seidel, D. Yee, C.L. Arteaga, et al. Acquired resistance to EGFR tyrosine
kinase inhibitors in cancer cells is mediated by loss of IGF-binding proteins.
Journal of Clinical Investigation, 118(7):2609, 2008.
[172] F. Huang, A. Greer, W. Hurlburt, X. Han, R. Hafezi, G.M. Wittenberg,
K. Reeves, J. Chen, D. Robinson, A. Li, et al. The mechanisms of differential
229
sensitivity to an insulin-like growth factor-1 receptor inhibitor (BMS-536924)
and rationale for combining with EGFR/HER2 inhibitors. Cancer Research,
69(1):161-170, 2009.
[173] C. Garofalo, MC Manara, G. Nicoletti, MT Marino, PL Lollini, A. Astolfi,
G. Pandini, JA Lopez-Guerrero, KL Schaefer, A. Belfiore, et al. Efficacy of and
resistance to anti-IGF-IR therapies in Ewing's sarcoma is dependent on insulin
receptor signaling. Oncogene, 30(24):2730-2740, 2011.
[174] D. Milojkovic and J. Apperley. Mechanisms of resistance to imatinib and
second-generation tyrosine inhibitors in chronic myeloid leukemia. Clinical Cancer Research, 15(24):7519-7527, 2009.
[175] K. Inaki and E.T. Liu. Structural mutations in cancer: mechanistic and functional insights. Trends in Genetics, 28(11):550-559, 2012.
[176] R. Beroukhim, G. Getz, L. Nghiemphu, J. Barretina, T. Hsueh, D. Linhart,
I. Vivanco, J.C. Lee, J.H. Huang, S. Alexander, et al. Assessing the significance
of chromosomal aberrations in cancer: methodology and application to glioma.
Proceedings of the National Academy of Sciences, 104(50):20007-20012, 2007.
[177] A. Hellman, E. Zlotorynski, S.W. Scherer, J. Cheung, J.B. Vincent, D.I. Smith,
L. Trakhtenbrot, and B. Kerem. A role for common fragile site induction in
amplification of human oncogenes. Cancer Cell, 1(1):89-97, 2002.
[178] H. Riedel, TJ Dull, AM Honegger, J. Schlessinger, and A. Ullrich. Cytoplasmic domains determine signal specificity, cellular routing characteristics and
influence ligand binding of epidermal growth factor and insulin receptors. The
EMBO Journal,8(10):2943, 1989.
[179] A.P. Won, J.E. Garbarino, and W.A. Lim. Recruitment interactions can override catalytic interactions in determining the functional identity of a protein
kinase. Proceedings of the National Academy of Sciences, 108(24):9809-9814,
2011.
[180] L. Naldini, U. Blomer, P. Gallay, D. Ory, R. Mulligan, FH Gage, IM Verma,
and D. Trono. In-vivo gene delivery and stable transduction of nondividing cells
by a lentiviral vector. Science, 272(5259):263-267, 1996.
[181] S.M. Chan, J. Ermann, L. Su, C.G. Fathman, and P.J. Utz. Protein microarrays for multiplex analysis of signal transduction pathways. Nature Medicine,
10(12):1390 1396, 2004.
[182] D. Eaton and K. Murphy. Bayesian structure learning using dynamic programming and MCMC. In Proceedings of the 23rd Conference on Uncertainty in
Artificial Intelligence, 2007.
230
[183] A.V. Kozlov and D. Koller. Nonuniform dynamic discretization in hybrid networks. In Proceedings of the Thirteenth Conference on Uncertainty in Artificial
Intelligence, pages 314-325. Morgan Kaufmann Publishers Inc., 1997.
[184] D.N. Reshef, Y.A. Reshef, H.K. Finucane, S.R. Grossman, G. McVean, P.J.
Turnbaugh, E.S. Lander, M. Mitzenmacher, and P.C. Sabeti. Detecting novel
associations in large data sets. Science, 334(6062):1518-1524, 2011.
[185] J. Yu, V.A. Smith, P.P. Wang, A.J. Hartemink, and E.D. Jarvis. Advances to
Bayesian network inference for generating causal networks from observational
biological data. Bioinformatics, 20(18):3594-3603, 2004.
[186] K. Sachs. Bayesian network models of biological signaling pathways. PhD thesis,
Massachusetts Institute of Technology, 2006.
[187] K. Sachs, S. Itani, J. Fitzgerald, L. Wille, B. Schoeberl, MA Dahleh, and
GP Nolan. Learning cyclic signaling pathway structures while minimizing data
requirements. In Pacific Symposium on Biocomputing, pages 63-74, 2009.
[188] P.J. Woolf, W. Prudhomme, L. Daheron, G.Q. Daley, and D.A. Lauffenburger.
Bayesian analysis of signaling networks governing embryonic stem cell fate decisions. Bioinformatics, 21(6):741-753, 2005.
[189] K. Basso, A.A. Margolin, G. Stolovitzky, U. Klein, R. Dalla-Favera, and A. Califano. Reverse engineering of regulatory networks in human B cells. Nature
Genetics, 37(4):382-390, 2005.
[190] C. Olsen, P.E. Meyer, and G. Bontempi. On the impact of entropy estimation
on transcriptional regulatory network inference based on mutual information.
EURASIP Journal on Bioinformatics and Systems Biology, 2009(1):308959,
2009.
[191] Y. Yang and G. Webb. On why discretization works for naive-bayes classifiers.
AI 2003: Advances in Artificial Intelligence, pages 440-452, 2003.
[192] T. Silander, P. Kontkanen, and P. Myllymaki. On sensitivity of the MAP
Bayesian network structure to the equivalent sample size parameter. arXiv
preprint arXiv:1206.5293, 2012.
[193] A.J. Hartemink. Principled computational methods for the validation discovery
of genetic regulatory networks. PhD thesis, Massachusetts Institute of Technology, 2001.
[194] R.H. Blair, D.J. Kliebenstein, and G.A. Churchill. What can causal networks
tell us about metabolic pathways? PLoS ComputationalBiology, 8(4):e1002458,
2012.
[195] A.J. Hartemink. Reverse engineering gene regulatory networks. Nature Biotechnology, 23(5):554-555, 2005.
231
[196] R. Kalluri and E.G. Neilson. Epithelial-mesenchymal transition and its implications for fibrosis. Journal of Clinical Investigation, 112(12):1776-1784, 2003.
[197] K.S. Lau, V. Cortez-Retamozo, S.R. Philips, M.J. Pittet, D.A. Lauffenburger,
and K.M. Haigis. Multi-scale in vivo systems analysis reveals the influence of
immune cells on TNF-a-induced apoptosis in the intestinal epithelium. PLoS
Biology, 10(9):e1001393, 2012.
[198] G. Elidan and N. Friedman. Learning hidden variable networks: the information
bottleneck approach. Journal of Machine Learning Research, 6(1):81, 2006.
[199] B. D'Ambrosio. Inference in Bayesian networks. AI Magazine, 20(2):21, 1999.
[200] S. Mukherjee and T.P. Speed. Network inference using informative priors. Proceedings of the National Academy of Sciences, 105(38):14313-14318, 2008.
[201] J.W. Robinson and A.J. Hartemink. Learning non-stationary dynamic bayesian
networks. Journal of Machine Learning Research, 11:3647-3680, 2010.
232