Before - WordPress.com

advertisement
BRAIN RELATED ISB RESEARCH
Aitchison lab (ALS, and Jennifer Smith/David Dilworth, peroxisomes in neuropathologies)
Project 1: Development of a cellular response assay for blood biomarker discover detection of amyotrophic
lateral sclerosis
Discovery and detection of blood biomarkers hold promise to revolutionize medicine. However, hurdles such as
low abundance of disease markers, high levels of noise, and potential complexity of disease signatures have
impeded the development of reliable biomarker assays. To overcome many of these barriers, we are developing
an in vitro assay that uses cultured responder cells to translate the multi-component variations in the composition
of blood into a complex response that is reproducible and predictive of disease state. This assay has strong
potential to revolutionize biomarker discovery and early detection of an extensive range of diseases, and can
help enable the practice of predictive and personalized medicine on a broad scale. The assay does not involve
direct identification of molecules in blood. Instead, it involves incubating cultured cells with serum from diseased
or normal subjects, capturing their multi-component response profiles, and identifying a differential response
pattern that can serve as a disease-specific response signature using computational classification tools.
We are establishing the principle of the approach by developing an assay for detection of incipient amyotrophic
lateral sclerosis (ALS) from serum. We have developed the assay to detect incipient ALS in mouse serum from
pre-symptomatic mice of a SOD1 mouse model (using non-carriers from the same colony as controls). For this
application, motor neurons (in embryoid bodies differentiated from stem cells) were
used as responder cells (Fig. 1) and transcriptional cellular responses were measured
using Affymetrix exon arrays. Analysis of combined data from 20 different
experiments using disease classification tools with cross validation, identified a
disease response signature composed of 10 genes that appear to be differentially
spliced in detector cells in response to disease incipient versus normal serum. In
addition, although the assay was developed as a biomarker detection platform, the
disease signature can also provide insights into ALS disease mechanisms as it reflects
motor neuron responses to extrinsic signals of incipient ALS. Therefore, we will also Fig. 1. EB expressing motor
leverage the disease response profile to gain insight into cellular mechanisms of the neuron marker (HB9-eGFP).
bar=100M
ALS disease process.
Ranish lab/Hood Lab (glioblastoma-proteomics of brain cancer)
Wei Yan
I have been working on proteomics analysis on glioblastoma, as part of NSBCC grant in Hood lab (I work 50% each
in both Ranish and Hood labs). So far I don’t have any publication on this project yet. What I have and can provide
to you is an annual progress report of NSBCC in 2012 that I wrote in December of 2012. This is the most recent
and updated report that I have on my hand on my brain related project. Attached please see the progress report.
Please check it to see if it is useful for you. And feel free to let me know if there is anything else that I can do for
you.
To investigate proteins changes of a set of GBM-specific proteins in a patient derived GBM stem cell line (SN143)
during differentiation process and whether hypoxia would affect proteins levels during such differentiation, we
applied an in-house developed targeted quantitative proteomics approach in quantifying proteins/peptides from
this cell line grown at proliferation culture media (with EGF and FGF-2), differentiation media (without EGF and
FGF-2), and mediated with serum, at both hypoxia (7% O2) and normoxia (21% O2) states (Fig. 1).
Fig. 1:
GBM stem cell line
from a patient
(SN143) grown at:
Proliferation
(P)
Differentiati
on (D)
Genomics
studies by
TCGA &
Hopkins
Literature
Normorxi
a (N)
Hypoxia
119 proteins of
interest (160
peptides
H
H
L
H
H
L
iMSTIQ
A total of 160 peptides (corresponding to 119 proteins of interest) were synthesized with heavy isotopic Lys or Arg
amino acids. Peptides were then labeled with heavy and light mTRAQ reagents to generate HH and LH peptides
that were then used as triggering peptides for CID analysis or reference peptides for quantification, respectively
(fig. 1). We successfully quantified 118 proteins (from 158 peptides) in at least 4 of 6 measurements. Data analysis
using PCA indicate different expression patterns for these proteins under all the 6 analyzed states (Fig. 2) .
MAP2K2 and IDH1 show increased levels at differentiation and serum over that at proliferation state. But such
increase only occurred under normoxia condition but not hypoxia condition. GAS1, on the other hand, shows
increase under hypoxia condition but not normoxia condition. Interestingly, PTEN and MET, two proteins involved
in GBM-related signaling pathways, appear to be decreased at differentiation and serum media in comparison with
that at proliferation media. And such decrease only occurs at hypoxia state. Fig. 2
Norm
orxia
N: D/P
H: D/P
PC2
Prolifer
ation
(P)
PC1
Further data analysis suggested that under normoxia conditions (21% O2), many proteins were down-regulated
when cells were differentiated from proliferation (with GF) states into either noGF or Serum states (Fig. 3, shown
in green). There are some proteins that were up-regulated when cells were differentiated (shown in red). On the
other hand, under hypoxia conditions (7% O2), most of proteins were up-regulated (shown in red). Therefore, the
GBM stem cell line (SN143) regulated levels of its GBM signaling proteins during differentiation in a significant
manners between hypoxia and normoxia. Oxygen levels appears to significantly affect protein expression levels in
such cell line. We need to be very careful to interpret data about cultured cells that are normally culture at 21%
oxygen.
Fig. 3
Since the O2 level in human brain is about 7% O2, I further examined proteins that were quantified under this
condition. 30 proteins in the glioma pathway (according to KEGG) were successfully quantified during
differentiation (fig. 4). Proteins with more than 2-fold up-regulation are shown in red while those proteins that
were down-regulated more than 2-fold are shown in green. These include many signaling molecules such as PDGF,
PDGFR, IGF-1, PKC, PI3K, PTEN, Raf, MEK, etc. Further essays are needed to validate with more biological and
technical replicates. However, as proof of principal, this pilot experiment indicates that we can dissect key
signaling pathways in brain cancer stem cells using targeted proteomics methods such as the iMSTIQ methodology
we developed in house.
Fig. 4
Price lab (glioblastoma)
Julie Bletz
Molecular signatures for brain regions
Spatial expression patterns of neuron-specific genes in the adult mouse brain show remarkably clear, spatiallycontiguous, transcriptionally-distinct clusters. Over the past year, we have been quantifying the relationships
between these spatial expression patterns and
known brain regions. To do this, we are taking
advantage of a newly released API from the Allen
Institute that makes available for the first time
three-dimensional grids for the expression of
20,000 genes in 200um^3 voxels of the adult
mouse brain, registered to a common reference
atlas with annotations to fine-scale brain
structures. We downloaded the grid-level
expression data, performed clustering of the new
dataset, and are developing algorithms to quantify
clustering relative to brain regions, for neuronspecific genes and for thousands of randomly
generated gene sets. We have found that the
combined spatial expression patterns of 170
neuron-specific transcripts revealed strikingly clear
and symmetrical signatures for most of the brain’s
major subdivisions in the mouse brain. Moreover,
the brain expression spatial signatures correspond
to anatomical structures, possibly reflecting
developmental ontogeny. Spatial expression
Transcriptionally distinct, spatially contiguous regions in the adult
profiles of astrocyte- and oligodendrocyte-specific
mouse brain revealed by clustering of neuron-, astrocyte-, and
genes also revealed regional differences that are
oligodendrocyte-specific genes. Expression data were
less distinct, but still symmetrical. The regional
assembled for sections through the center of the mouse brain. A,
differences revealed by neuron-specific genes
B. Atlases for regions visible on the coronal (A) and sagittal (B)
related to individual genes with highly restricted
planes. C. k-means clusters derived from the expression of
expression patterns, functionally-related groups of
neuron-, astrocyte-, or oligodendrocyte-specific genes on the
genes with enriched or depleted expression across
coronal plane (k = 3-20). D. K-means clustering based on the
brain regions, and regional differences in neuronal
expression of these genes on the sagittal plane. E, F. Clustering
cell density. Products from some of these neuronof neuron-specific genes for larger numbers of clusters (k = 30specific genes are present in peripheral blood,
60), using expression on the coronal (E) and sagittal (F) planes,
raising
the possibility that they could collectively function as biomarkers for clinical disease diagnostics.
respectively.
Publication:
Younhee Ko, Seth A. Ament, James A. Eddy, John C. Earls, Juan Caballero, Leroy Hood, and Nathan D. Price “Celltype specific genes show striking and distinct patterns of spatial expression in the mouse brain” (2013), PNAS
Elucidating Gene Networks in Bipolar Disorder
The Hood and Price Labs are investigating gene networks underlying bipolar disorder (BPD) in collaboration with
researchers at UCSD and NIMH. We have completed genome sequencing for 205 individuals from 43 pedigrees,
and analysis of family-specific variants for the most recently sequenced genomes is ongoing. Family-specific lists of
candidate variants for the previously sequenced genomes have been completed. Gene-level and network-level
meta-analyses of these data revealed commonly mutated networks of genes with synaptic functions, including
both previously observed and novel candidate genes. We are currently finalizing a list of the top 10 candidate
genes from these meta-analyses for re-sequencing in 3400 cases and 1000 controls that have been collected by
our collaborator, John Kelsoe (UCSD). We will focus on calcium signaling genes, a top-scoring network in our metaanalysis, and our goal is to achieve a network-level validation for mutations influencing bipolar disorder. In parallel,
we will knock down these same 10 genes in the SH-SY5Y neuroblastoma cell line, and monitor both functional
effects on Ca++ flux and transcriptome responses. We will also continue to develop Ca++ flux assays in SH-SY5Y,
including protocols to differentiate intracellular vs. extracellular Ca++ and baseline measurements for
responsiveness to neurotransmitter receptor agonists. We are pursuing both experimental and computational
approaches to understand how these genes contribute to BPD susceptibility. We are encouraged by these
preliminary results and are optimistic that they will continue to reveal novel insights into BPD gene networks.
Reconstruction of genome-scale brain region-specific metabolic models
Brain regions differ in their susceptibility to different neurodegenerative diseases. On the other hand, metabolic
dysfunction has been proposed as a common feature of various neurodegenerative diseases. The aim of this
project is to understand the metabolic characteristics of different brain regions using genome-scale metabolic
networks. We used human gene expression microarray data from the Allen Brain Atlas as input to the mCADRE
method, and built 94 brain region-specific metabolic models. In addition, we also used cerebrospinal fluid (CSF)
metabolomics data from the Human Metabolome Database, and ensured that all the 94 brain region-specific
metabolic models can uptake or secrete metabolites known to be present in the CSF. Many regions involved in
neurodegenerative diseases are included in this collection of brain region-specific metabolic models: substantia
nigra (Parkinson’s), neostriatum (Huntington’s), and multiple subfields of the hippocampus (Alzheimer’s). These
region-specific metabolic models provide the framework for future integration with transcriptional regulatory
network and perturbation data (e.g., genetic mutations).
A systems approach to understanding glioblastoma:
Developing tools to study the effect of RNA editing on tumor development in glioblastoma. Expression of ADAR
has been shown to decrease in glioblastoma relative to healthy tissue. We are trying to understand the role of
ADAR-mediated RNA editing in the development of a tumor. We are interested in identifying instances where RNA
editing by enzymes in either the ADAR or APOBEC protein family may affect miRNA binding. As others have shown,
RNA editing occurs primarily in intronic and 3’UTR regions where ALU sequences combine to form double stranded
RNA. We are interested in finding instances where this editing may alter the existing seed region to which miRNA
bind, or create a novel seed-binding region. To this end, we have generated RNAseq data in which all mature
miRNAs are greatly reduced through knockdown of Dicer. With decreased total miRNAs, we are then able to
identify instances where the relative amount of RNA edits changes with Dicer knockdown relative to control.
In addition, we have developed an RNA-seq splice junction mapper using a new short read aligner called SNAP
(Scalable Nucleotide Alignment Program) developed through a collaboration between Berkeley and Microsoft. This
software makes use of a genome-wide hash table to speed up short read alignments to the genome, but does not
handle splice junction mapping. Because the hash table is used, all mappings can be performed in O(1) time,
therefore multiple ‘seeds’ can rapidly be searched for each read, improving alignment accuracy. We have also
developed a paired end read filter for RNA-seq reads, reducing the error rate of read mapping to effectively zero,
when tested on artificially generated reads from the human genome. Applying this read mapping algorithm to
RNA-seq reads derived from the transcriptome of the U87MG glioblastoma cell line, we have identified thousands
of RNA-DNA differences, most of which can be attributed to RNA editing catalyzed by the ADAR family of enzymes.
Over 80% of the edits identified are of the A-G type, which is indicative of ADAR editing.
Understanding transcriptional and translational regulation in glioblastoma. In an effort to understand the role of
EGF signaling in glioblastoma, we have been working to establish an experimental approach to quantitatively
measure the translation efficiency of all coding mRNAs. Known as polysome profiling, this approach begins with
sucrose-density centrifugation to separate the 40S, 60S, 80S and all mRNA transcripts containing two or more
bound ribosomes. Through this method, we will be able to create a highly detailed and comprehensive picture of
EGF signaling that follows the transcription and translation of essentially all EGF-mediated target genes within
glioblastoma tumor cells. This systems biology approach will provide a comprehensive understanding of the layers
of molecular variation and interactions underlying cancer. Specifically, we will gain insight into how genetic
differences between patients translate into different disease outcomes, which will help in the development of
better diagnostic tools and treatment plans for cancer.
Proteomic and phosphoproteomic analyses of glioblastoma. We are performing proteomic and
phosphoproteomic analyses of GBM by measuring 70 proteins involved in the proliferation and invasion of GBM in
six cell lines. The proteomic analyses of a GBM stem cell line is particularly interesting because this cell line has six
transcriptionally profiled clones and protein measurements in each of these clone populations will provide a
comparative proteomic profiles of the clones. We are utilizing quantitative mass spectrometry to measure protein
levels. Proteomic profiles of the cell lines will be compared through statistical analyses and the main drivers
contributing to proliferation and invasion will be identified. In addition, we are performing a phosphoproteomic
and metabolomic analysis of these cell lines to gain a comprehensive view of perturbations in glioblastoma.
Integrating and interpreting omics data with the U87MG metabolic network. GBM was one of the first cancers
studied by The Cancer Genome Atlas (TCGA), and remains a major focus of the program. TCGA has collected data
for several hundred GBM tumors—including sequence, copy number, methylation, and expression data, all of
which is publically available (TCGA 2008). In addition to this data, which will primarily be used to investigate the
effects of mutations on GBM metabolism, our lab is also performing RNA-seq and metabolomics experiments on
several U87 GBM cell lines with known mutations; while this cell line data cannot be used to study population
distributions of mutations, it can provide insights into the effects of specific perturbations.
Integrating and analyzing these different omics data from TCGA and our own experiments in the context of the
genome-scale metabolic network will facilitate functional interpretation and elucidation of disease mechanism. For
example, the distribution of mutations in GBM has been shown to differ from patient to patient. By examining the
frequency of mutations within different biological networks, such as metabolic pathways in the genome-scale
model, we may be able to identify functional sets of genes that are consistently altered in the cancer. We will map
compiled mutational data onto corresponding enzymes in the GBM metabolic network, and apply statistical tests
for enrichment or over-representation to identify key subnetworks that tend to be perturbed in GBM. Similar
approaches will be applied to transcriptomics and metabolomics measurements to better understand how
patterns observed in high-throughput data relate to changes in network function within the cell.
Reconstruction of a draft genome-scale metabolic network for gliobastoma cell line U87MG. Building upon our
previous experience developing genome-scale metabolic network reconstructions and models through both
manual (Milne, Eddy et al. 2011; Benedict, Gonnerman et al. 2012; Heavner, Smallbone et al. 2012) and
algorithmic (Wang et al., 2012) approaches, we have constructed a draft metabolic model of the U87MG
glioblastoma cell line (Eddy et al., in review). This model, comprising 2134 reactions, 1527 metabolites, 1395
genes, was derived from a previously published reconstruction of the human metabolic network (Duarte, Becker et
al. 2007) using an algorithm we developed called Context- specificity Assessed by Deterministic Reaction
Evaluation (mCADRE) (Wang et al., 2012). In short, the mCADRE algorithm reduces an organism-wide metabolic
model to a tissue or cell-specific model using expression evidence. We generated the U87MG-specific metabolic
reconstruction from the human metabolic network using gene expression data from the NCBI GEO Database and
protein expression data
from the Human Protein
Atlas (Uhlen, Oksvold et al.
2010).
Evaluating a metabolic
network
reconstruction
with flux balance analysis
requires additional model
refinement, including the
definition of an objective
function to be optimized.
We defined a U87MG-
Overview of reconstruction of a draft genome-scale metabolic
network for glioblastoma.
specfic biomass equation by modifying a generic cancer biomass equation developed by Shlomi et al. in their
investigation of the Warburg Effect (Shlomi, Benyamini et al. 2011) with literature values for in vivo composition of
major macromolecule groups in U87MG cells.
Publications: Yuliang Wang, James A. Eddy, Nathan D. Price, “Reconstruction of genome-scale metabolic models
for 126 tissues using mCADRE”, 2012, BMC Systems Biology.
U87MG cell growth simulation with flux balance analysis. Constraint-based analysis (Price, Reed et al. 2004)
restricts the functional states of a network according to physico-chemical laws, known stoichiometric relationships,
and other constraints. To assess the functionality of our draft U87MG metabolic network reconstruction, we used
a previously published implementation of the flux balance analysis algorithm (Becker, Feist et al. 2007) to simulate
cell growth in a variety of conditions—genetic and environmental. With our U87MG-specific biomass definition, we
used maximal growth as the objective function (this was assumed to be a reasonable objective for our model, as
this is a characteristic phenotype of most cancers (Hsu and Sabatini 2008; Kroemer and Pouyssegur 2008; Cairns,
Harris et al. 2011)). Experimentally measured values—collected as part of ongoing model validation and
refinement—serve as criteria for evaluating accuracy and highlight discrepancies in behavior including (i) growth
rates; (ii) gene knock-out effects, including altered growth rates and lethality; (iii) secretion rates of selected byproducts; and (iv) differential growth behavior on varied media. From these preliminary efforts, we are confident
in our ability to simulate growth of the U87MG cell line and find feasible metabolic fluxes using well-established
constraint-based approaches.
Moritz Lab (glioblastoma/biomarkers)
Ulrike Kusebauch
The overall goal of our collaborative research on Brain cancer in the Moritz group is to characterize quantized
human glioblastoma cancer cells from excised tumors (discrete populations cancer cells identified by single-cell
analyses) through complete genomic analyses (generally next generation DNA sequencing) and our comprehensive
development of targeted proteomic analyses. This research in the Moritz group encompasses the application and
dissemination of a new, targeted mass spectrometry-based technology that supports the identification of all
proteins in the human proteome, with particular emphasis on the brain cancer genome. Through our
collaboration with the Hood group, we will characterize the complete genome sequences of the discrete quantized
cell populations (using as few as 10 cells) as well as the complete genome sequences from the families of several
patients to reduce the DNA sequencing error rates by 70% so as to more easily identify key cancer mutations. We
will conduct single cell transcript analysis on as many as 500 selected transcripts and develop a list of high-value
target protein signatures that can distinguish several type of glioblastoma tumor types through a blood based
protein assay. We will use our highly sensitive and highly specific selected reaction monitoring (SRM) assay system
built from over 170,000 separate proteotypic peptides to detect essentially any human protein. Our SRM assay
system is unique to ISB and our collaborative research in glioblastoma a tumor biology of quantized tumor cells will
provide a biomarker panel to develop into an easy to use personalized glioblastoma monitoring system.
Publications:
A complete mass-spectrometric map of the yeast proteome applied to quantitative trait analysis.
Picotti P, Clément-Ziza M, Lam H, Campbell DS, Schmidt A, Deutsch EW, Röst H, Sun Z, Rinner O, Reiter L, Shen Q,
Michaelson JJ, Frei A, Alberti S, Kusebauch U, Wollscheid B, Moritz RL, Beyer A, Aebersold R.
Nature. 2013 Jan 20. doi: 10.1038/nature11835
Reproducible quantification of cancer-associated proteins in body fluids using targeted proteomics.
Hüttenhain R, Soste M, Selevsek N, Röst H, Sethi A, Carapito C, Farrah T, Deutsch EW, Kusebauch U, Moritz RL,
Niméus-Malmström E, Rinner O, Aebersold R.
Sci Transl Med. 2012 Jul 11;4(142):142ra94.
PASSEL: the PeptideAtlas SRMexperiment library.
Farrah T, Deutsch EW, Kreisberg R, Sun Z, Campbell DS, Mendoza L, Kusebauch U, Brusniak MY, Hüttenhain R,
Schiess R, Selevsek N, Aebersold R, Moritz RL.
Proteomics. 2012 Apr;12(8):1170-5
Quantotypic properties of QconCAT peptides targeting bovine host response to Streptococcus uberis.
Bislev SL, Kusebauch U, Codrea MC, Beynon RJ, Harman VM, Røntved CM, Aebersold R, Moritz RL, Bendixen E.
J Proteome Res. 2012 Mar 2;11(3):1832-43
Chris Plaiser, glioblastoma (Baliga)
A. Research Plan:
I.
Specific Aims
The goal of this proposal is to identify the transcriptional regulatory factors (TRFs) driving disease related
processes in the deadly brain cancer glioma. The specific aims are designed to identify the TRFs,
transcription factors (TFs) and microRNAs, from a global transcriptional regulatory network (GTRN)
constructed from glioma patient tumors, validate the downstream targets of these predicted TRFs, and
demonstrate effects on in vitro models of glioma disease related processes by perturbing the expression of
the predicted TRFs.
1. Construct Global Transcriptional Regulatory Network (GTRN) for Glioma – Use publicly available high
grade glioma patient tumor gene expression study samples to discover conditionally co-regulated gene
biclusters and the cis-regulatory motifs driving their co-regulation. The GTRN defines the functional and
operational relationships among the biclusters by linking the cis-regulatory motifs to the transcriptional
regulatory factors (TFs and microRNAs) discovered from literature or inferred through linear regressionbased approaches.
2. Validate Effect of Predicted Transcriptional Regulatory Factors (TRFs) on Down-Stream co-Regulated
Genes in Glioma Transcriptional Regulatory Network – Perturb the functions of predicted TRFs in a
glioma derived cell line to validate their role in co-regulating downstream genes.
3. Identify Effect(s) of Predicted Transcriptional Regulatory Factors (TRFs) on Glioma Relevant
Phenotypes in vitro – Screen predicted TRFs for effects on glioma relevant phenotypes such as
proliferation, migration or invasiveness.
The successful construction of a GTRN for glioma in Specific Aim 1 is required for Specific Aims 2 and 3. I present
a preliminary GTRN that I have constructed using publicly available data for glioma. The experiments in Specific
Aims 2 and 3 are independent and complementary validations of different aspects of this GTRN using different
downstream assays (gene expression vs. in vitro phenotypic assays).
II. Background and Significance
Even with recent advances, progression of glial brain tumors (glioma) to WHO grade III or IV is nearly uniformly
fatal. The distinction between WHO grade III and IV glioma are subtle and prone to misdiagnosis 12,13. Studies of
the global gene expression signatures of glioma versus normal brain tissue have shown great potential in not
only elucidating the etiology of glioma, but in providing molecular markers discriminating between glioma
subtypes 13–41. Molecular phenotypes such as gene expression signatures could provide better classifiers than
the standard histology based classifiers. In my previous studies I found that two co-expression clusters from a
global transcriptional co-expression network accounted for 30% of trait variance, whereas genome-wide
association studies of the same trait could barely account for a total of 10% 3. For this proposal I will improve
upon this by incorporating transcriptional regulatory factors in the network to provide a global transcriptional
regulatory network. This global regulatory network leading to glioma will yield many novel insights that can be
used to improve prevention, diagnosis, prognosis, sub-classification and treatment of glioma.
Typical gene expression studies utilize differential expression to identify genes relevant to a disease of interest.
These studies underutilize a higher order complexity in the data, namely co-expression and co-regulation. The
underlying premise for co-expression is that transcript levels of functionally related genes are expressed in
overlapping spatiotemporal patterns and can be related back to the biology of the disease/process being
studied. Recent studies of glioma have utilized gene co-expression to dig deeper into global gene expression
signatures from patient samples 15,16. The next logical step is to discriminate within these signatures biologically
meaningful regulons of co-regulated genes by virtue of their association with conserved cis-regulatory motifs for
trans-acting transcription factors and microRNAs.
Two recent studies successfully identified transcriptional regulatory factors (TRFs) that drive expression of genes
associated with disease related processes 16,42. The first approach uses the Finding Informative Regulatory
Elements (FIRE) software package to identify TRFs using the a mutual information based de novo motif discovery
algorithm that finds DNA sequence motifs over-represented in the promoters and 3’ untranslated regions (UTRs)
of a predefined set of co-expressed genes 42,43. The second approach uses statistical inference to identify genes
identified as transcription factors (TFs) whose expression correlates with a predefined set of co-expressed genes
16
. Both of these approaches define sets of co-expressed genes and then as a second step identify the TRFs. This
assumes that co-expression is essentially synonymous with co-regulation, which may be problematic in the
presence of complex regulation, genetic heterogeneity, and/or uncontrolled environmental factors. All of these
are known to exist to varying extents in eukaryotic regulation. Methods used to identify co-regulated genes
should be robust to these different forms of regulatory heterogeneity.
The biclustering approach to simultaneously cluster both genes and patient samples has been employed
specifically to solve this problem of regulatory heterogeneity 44. Biclustering specifically selects for genes and
patient samples with more coherent co-regulation and drops those representing heterogeneity or random noise
44
. In our lab we have developed the biclustering algorithm cMonkey which identifies co-regulated genes and
TRFs simultaneously by integrating gene co-expression with de novo motif detection and gene-gene interaction
networks 45. By integrating a biclustering algorithm with biologically relevant information we are able to identify
a bicluster of co-regulated genes in a subset of patients and suggest TRFs driving their co-regulation.
The successful completion of these studies will allow us to construct a global transcriptional regulatory network.
The network will serve as a framework for discovery of signatures that classify subtypes of disease and trends
that are characteristic of progression. Further, this network will also enable the discovery and characterization
of the underlying mechanisms by providing experimentally testable linkages of these signatures to
transcriptional regulatory factors. Iterative refinement of the regulatory network using this approach holds the
potential to provide a better understanding of the etiology of high grade glioma, and provide molecular markers
for diagnosis, prognosis, sub-classification, and potential therapeutic targets.
III. Preliminary Studies
There is a wealth of publicly available high grade glioma patient tumor gene expression microarray study
samples 12-14,16-39. In order to select appropriate gene expression microarray study samples for use in our
analyses, I took into consideration five different issues: 1) similar definitions of disease, 2) large enough to
provide ample power for statistical inferences, 3) appropriate controls available, 4) annotated with as much high
quality phenotypic information as possible; 5) study samples on the same or easily comparable platforms. By
following these guidelines I was able to select three different high grade glioma datasets (WHO grade III and IV)
with postmortem normal brain tissue as a control on the same platform, Affymetrix HG U133 Plus 2.0 (Table 1).
These study samples provide a strong base for our analyses and the other publicly available datasets will be used
for validation or replication in the future.
Table 1. High grade glioma patient tumor gene expression microarray study samples.
Age Of Onset (Years)
High Grade
Controls
Male
Study Sample
Glioma (n)
(n)
(%)
Min.
Mean
28
Gravendeel et al. 2009
244
8
67
15
52
Survival PostDiagnosis (n)
239
Madhavan et al. 2009 41
264
21
63
15-19
50-54
218
Murat, et al. 2008 21
76
4
72
26
51
76
A rigorous quality control pipeline was developed
to prepare gene expression data for analysis, and
has been described previously 2,3,9. Prior to entry
into the pipeline each dataset will be assessed by
Affymetrix quality control criteria, and will also be
compared to known glioma molecular signatures.
The pipeline first removes mis-targeted and nonspecific probes by utilizing an alternate probe
mapping file 46 and then background subtracts and
normalizes the data. Probe-sets that are expressed
above background in at least 50% of the samples in
all study samples were retained. I then consolidate
redundant probe-sets that map to the same gene
(EntrezId) and have a high correlation (correlation
coefficient ≥ 0.8). Finally, I convert the gene
expression intensities to ratios by dividing by the
normal brain tissue controls. This pipeline has been
developed utilizing open source tools and can easily
be automated to allow processing of large numbers
of experiments. Applying this pipeline to the three
datasets in Table 1 resulted in a total of 10,786
genes.
Figure 1. Framework we
developed that utilizes
miRvestigator to identify miRNAs
driving co-expression of
transcripts. The final framework
also included enrichment of
predicted miRNA binding sites
from target prediction databases.
Figure 2. Sensitivity and specificity of methods
used in framework for identifying miRNA
regulators from co-expression signatures.
Our lab has previously demonstrated the utility of
Figure 3. GTRN constructed from glioma patient
biclustering algorithm cMonkey in a single celled
samples.
archaeal organism 45,47. I identified two main issues
that needed to be addressed in order to extend
cMonkey to eukaryotic species: 1) larger genome size
(more genes), 2) additional transcriptional regulatory
factors (microRNAs, etc.). We addressed the issue of
larger genome size by optimizing cMonkey to reduce
computation time, and updating cMonkey to write
objects to the hard drive when they become too large
to hold in memory. We also have developed a
framework to identify miRNAs regulators from coexpression signatures using three methods Weeder
coupled to miRvestigator 48 (Figure 1), and enrichment
of binding sites from PITA 49 or TargetScan 50 miRNA
target prediction database. These three methods were
chosen as they performed the best on a compendium
of 50 sets of experimentally validated miRNA target
genes (Figure 2).
All three methods were incorporated into cMonkey to
provide a more comprehensive miRNA regulatory
network. Including miRNAs is quite important as they
are known to also affect expression levels of transcripts
which is readily observable in transcriptome profiling
studies where a specific miRNA is perturbed 51–72.
These improvements have significantly extended and
optimized cMonkey for use with human datasets and provide a framework that can be used for other eukaryotic
species.
Based on the bias of conserved motifs to 500bp upstream and 200bp downstream of the annotated
transcriptional start site 73, we defined this region as the core promoter region 74. Using the UCSC genome
browser RefSeq gene annotation (HG18) I identified the median 3’ un-translated region size to be 831bp after
the annotated translational stop site, excluding introns. I therefore extracted the 700bp for the core promoter
region and 831bp for the 3’ un-translated region to be used as the cis-regulatory regions in de novo motif
detection. All sequences were repeat masked as defined by the RepeatMasker track from the UCSC genome
browser.
After constructing a GTRN from glioma patient tumors (Figure 3) I observed upstream promoter motifs that are
significantly similar to known TF binding sites (e.g. SP1, CCATT-box, and ISRE motifs). All of the biclusters shown
in Figure 3 have are significantly associated with patient survival and also have significant enrichment for Gene
Ontology (GO) biological process terms that linking them to one or more of the hallmarks of cancer. An
emergent property of the network is the separation into up and down regulated biclusters. By using a random
sampling approach I was able to remove certain motifs that are observed too often. The glioma GTRN that I have
constructed contains numerous hyptothesis that can be tested, and my current efforts are focused on filtering
methods to select TRFs to move forward to specific aims 2 and 3.
IV. Research Design and Methods
Specific Aim 1. Construct Global Transcriptional Regulatory Network (GTRN) for Glioma – Use publicly
available high grade glioma patient tumor gene expression study samples to discover conditionally co-regulated
gene biclusters and the cis-regulatory motifs driving their co-regulation. The GTRN defines the functional and
operational relationships among the biclusters by linking the cis-regulatory motifs to the transcriptional
regulatory factors (TFs and microRNAs) discovered from literature or inferred through linear regression-based
approaches.
The goal of this specific aim is to build a global transcriptional regulatory network using the biclustering
algorithm cMonkey. A rigorous screening procedure will be used to identify transcriptional regulatory factors
(TRFs) that are driving high grade glioma associated co-regulated genes for follow-up in Specific Aims 2 and 3: 1)
select biclusters associated with post-diagnosis survival, 2) select biclusters whose co-expression is replicated
across independent study samples, 3) select biclusters with predicted TRFs.
Using the Gravendeel et al. 2009 28 study sample, which contains tumor and normal samples, I will construct a
global transcriptional regulatory network with the cMonkey biclustering algorithm using STRING 75 as the
network component and MEME 76 for de novo motif detection. The results of the cMonkey are biclusters with
gene and patient membership and de novo predicted motifs. As I have access to multiple independent study
samples I want to implement a bicluster replication based framework. To do this the residual for the same set of
genes will be computed in each independent study sample and compared to the residual distribution of
randomly selected gene sets to obtain a permuted p-value. This provides independent replication for the coexpression for a bicluster and demonstrates that this is a robust co-expression signal, which in conjunction with a
predicted TRF provides strong evidence for the existence of a bicluster. If the de novo motif detection doesn’t
identify meaningful motifs, I will try optimizing the MEME parameters and masking out un-conserved sequences
according to the 17-way vertebrate track on UCSC genome browser. I am also in the process of implementing a
feature to mask out motifs that would be found at random, and potential ways to integrate these motifs so as to
not lose potential co-occurrence patterns with other motifs.
As a first screen step I require that the median expression of a bicluster be significantly associated with postdiagnosis survival. Post-diagnosis survival is a quantitative phenotype that assesses the overall severity of disease
for each patient by integrating many different molecular effects into a combined systems level output. This
screening step is vital as it ensures that the clusters used in Specific Aims 2 and 3 are relevant to high grade
glioma, and specifically to transcriptional signatures that determine patient survival after diagnosis.
After constructing a GTRN, I want to identify the
transcriptional regulatory factors binding to the predicted
motifs that drive the co-regulation of the genes in each
bicluster (Figure 3). I intend to leverage two different
approaches to identify these TRFs: 1) use similarity to
known transcription factor motifs from databases, 2) use
Elastic Net 77 to infer the transcription factors that are
associated with the expression of a bicluster. Both
approaches have been used in previous studies to predict
TRFs that were validated using in vitro or in vivo
experiments 16,43,7816,42,43. These approaches are
complementary in that they both leverage different
transcriptionally relevant data sources: de novo motif
detection uses cis-regulatory sequences, and the inference
method uses the expression of the TRFs. Therefore using
both of these techniques I am maximizing my potential for
identifying TRFs relevant for glioma.
Figure 3. Identifying transcriptional regulatory
factors for follow-up.
I am using the STAMP software package to identify the TFs
that bind cMonkey predicted motifs in the promoter
regions of co-regulated genes 79. The STAMP software
package compares the motif position specific scoring
matrix (PSSM) from the cMonkey run against databases of
HMM = hidden Markov model.
known transcription factors (TRANSFAC, Jaspar, etc). To
identify the microRNAs binding to the motifs in 3’ UTRs I
developed a profile hidden Markov model (HMM) 80 that compares the PSSMs to all 7 base-pair microRNA seeds
from miRBase 48. Benjamini-Hochberg multiple testing correction will be used with an alpha value of 0.05. If no
significant results are identified this can be relaxed, but this will require more strict validation in Specific Aim 2.
Thus I have provided a framework to identify the transcription factors and microRNAs binding motifs overrepresented in the cis-regulatory regions of biclusters predicted by cMonkey.
A complementary approach to the above uses statistical inference to identify transcription factors whose
expression is associated with the expression of a bicluster. Similar to the approach of Carro et al. 16, I identified
644 probes (out of 10,786) that correspond to human transcription factors by integrating annotation information
from Gene Ontology (keyword = ‘transcription factor activity’) and TRANSFAC. I plan to use the Elastic Net
algorithm 77 which allows us to add all TFs as predictors in the regression model and will identify the
transcription factors predicting the expression of a particular bicluster. Given the complexity of regulation
observed in humans it is likely that interaction terms will also be required to properly model regulatory effects,
and I can start by using pair-wise interaction terms. Multiple testing will be controlled using Benjamini-Hochberg
as described in above. The combination of this statistical approach with the similarity to known TRFs should
provide ample TRFs for follow up in Specific Aim 2.
Specific Aim 2. Validate Effect of Predicted Transcriptional Regulatory Factors (TRFs) on Down-Stream coRegulated Genes in Glioma Transcriptional Regulatory Network – Perturb the functions of predicted TRFs in a
glioma derived cell line to validate their role in co-regulating
Figure 4. Validating transcriptional
downstream genes.
regulatory effects on down-stream genes.
In Specific Aim 1 we propose to identify a set of diseases
relevant TRFs by constructing a global regulatory network from
high grade glioma patient tumors and applying a series of
filtering steps. In Specific Aim 2 we propose a systematic
approach to validate the effect of these predicted TRFs on the
co-regulated bicluster genes in the glioma derived cell line U87
(ATCC: HTB-14). First, the expression of putative TRFs will be
perturbed to identify both direct and indirect targets
(subsequent targets of direct targets). Second, direct targets of
the TRF will be determined using luciferase reporter gene
assays fusing either promoters or 3’UTRs of target genes and
again perturbing the expression of the putative TRFs. These
studies will validate the co-regulation of the genes in a bicluster
by a specific TRF.
I will identify the cascade of transcriptional regulatory effects by
perturbing the expression of a specific TRF and analyzing the
perturbation-induced global gene expression changes. TFs can
be over-expressed in a mammalian expression vector 3, or
knocked-down with siRNA 70 or a double-stranded decoy
oligonucleotide 42,74,81. MicroRNA over-expression can be
achieved with microRNA-mimics 82–87 and single-stranded decoy
oligonucleotides can be used to mimic knock-down 88–90. The
perturbing agent will be transiently transfected into glioma
derived U87 cell line and effects on global gene expression
signatures will be assayed using gene expression microarrays
(Affymetrix U133 Plus 2.0) 48 hours after transfection, as
described previously 3. Vehicle only will be used as a negative
control and positive controls will be used when appropriate.
Different methods of perturbation have different off-target and
non-specific regulatory effects, and the overlap between
perturbations can be useful in identifying genes that are
OE = over-expression, KD = knock-down.
regulated by the TRF. The significance of enrichment between
genes perturbed by a TRF and the genes in a bicluster will be computed using a hypergeometric p-value.
Significant enrichment validates the regulation of a biclusters genes by the perturbed TRF that was identified
through the GTRN.
To demonstrate that the genes perturbed by the TRF are in fact direct targets of the TRF the cis-regulatory region
(promoter or 3’ UTR) will be fused to a luciferase gene reporter and the level of luciferase will be quantified after
perturbation of the TRF. We will use the Promega pGL4.12 firefly luciferase and pGL4.72 renilla luciferase
reporter gene vectors for promoter studies. The minimal promoter vector will be used and the putative cisregulatory sequence with 20bp on either side will be cloned into this vector and co-transfected with the renilla
vector and the TRF perturbing agent. For miRNAs the Promega pmirGLO vector, which has both firefly and
renilla on the same vector, will have the putative miRNA binding site cloned in and transfected with the TRF
perturbing agent. The Promega Dual-Lucifease reporter assay system reagents will be used to quantify luciferase
activity in a luminometer. Firefly luciferase activity will be normalized by renilla luciferase activity and each
experimental condition will be done with technical duplicates and six biological replicates. If the TRF perturbing
agent significantly affects luciferase expression this demonstrates that the target gene is directly regulated by
the TRF.
It is possible that the U87 cell line may not be the best model for every bicluster as the GTRN is constructed from
patient data. However, once the perturbation tools are created and tested it would be fairly straight-forward to
repeat the tests using another glioma cell line (e.g. A172 or T98G) and potentially, after getting appropriate IRB
approval, primary cell lines from patients with high grade glioma.
Specific Aim 3. Identify Effect(s) of Predicted Transcriptional Regulatory (TRFs) Factors on Glioma Relevant
Phenotypes in vitro – Screen predicted TRFs for effects on glioma relevant phenotypes such as proliferation,
migration or invasiveness.
The goal of this specific aim is to use in vitro models of glioma proliferation, migration and invasiveness to screen
for effects due to over-expression or knock-down of predicted TRFs. This approach requires that I have methods
to over-express and knock-down TRFs, which have been described in detail in Specific Aim 2: TF: overexpression using mammalian expression vector, knock-down using siRNA; microRNA: over-expression using
microRNA mimics, knock-down using single-stranded decoy oligonucleotides. Vehicle only will be used as a
negative control, and positive controls will be used when possible. These perturbing agents will be used to
modify the expression of the TRFs and effects on cell proliferation, cell migration and invasiveness will be
measured in vitro. At maximum I would expect to perform these assays for ~20 different TRFs (~10 TFs and ~10
microRNAs), however once the assays are established this could potentially be expanded as required.
Unbridled cell proliferation is one of the hallmarks of cancer 91,92 and can be measured as an increase in cell
number over time. After perturbing a specific TRF, the addition of a viable fluorescent dye will allow direct cell
counts can be obtained using the Cyntellect Laser-Enabled Analysis and Processing (LEAP) platform from 96 well
plates and time points can be acquired as needed in an automated way
(http://www.cyntellect.com/content/applications/lan-009.cfm). This platform integrates imaging with a high
powered laser, and is capable of laser-mediated in situ elimination of a region of a cell monolayer to mimic
wounding in a consistent and reproducible manner. These operations can be performed at high speed and
throughput in a standard microtiter plate format. Running initial experiments for longer intervals will allow
optimization of the time points to best observe the proliferative effects and ensure that critical events are not
missed.
Metastatic potential of tumor cells has been modeled in two different ways: 1) wound healing response (cell
migration and motility); 2) invasiveness (cell migration and invasion). Both wound healing and invasion assays
with the glioma derived cell line U87 have been used effectively to model metastatic potential 87. The Cyntellect
LEAP also has the capability to ablate cells using a high powered laser, and has been used effectively in wound
healing assays 88. An overview of the approach: U87 cells will be
allowed to reach 70-90% confluency, the TRF will be perturbed, the
Figure 5. in vitro assays to model
LEAP will ablate a specified pattern into the cell lawn, and imaging will
glioma related process.
be used to observe the rate of closure. The size and shape of the
ablated region can be optimized to provide maximum information
from each experiment.
Glioma invasiveness has been effectively modeled in vitro using the
U87 cell line in a Boyden chamber, in which two medium-filled
compartments are separated by a microporous membrane 93–96. An
overview of the approach: U87 cells are plated in the upper
compartment of the Boyden chamber, the plate is incubated for a
specified interval, and cells that pass through to the other chamber are
stained and counted. If possible an automated counting procedure will
be utilized. Incubation interval can be optimized to time points which
provide the most information about invasion
As described in Specific Aim 2, the U87 cell line may not be the best
model for every bicluster as the global transcriptional regulatory
network is constructed based on patient data. However, once the
perturbation tools and in vitro phenotypic assays are created and
tested it would be fairly simple to repeat the approach using another
cell line, including primary cell lines from patients with high grade
glioma.
Timeline –
I am applying for two years of support. From the previous submission I
have essentially completed Specific Aim 1 by constructing a GTRN for
glioma and am in the process of choosing TRFs for follow-up. The
amount of time given for each different stage of the research should
be sufficient to allow for successful completion. Also currently I am
setting up the cell culture system and thoroughly testing with positive
and negative controls. The assays for Specific Aims 2 and 3 can also
begin to be worked out during this time. Then the assays described in Specific Aims 2 and 3 will be utilized, and
finally the results will be analyzed and published.
Aymeric d'Herouel, non-genetic memory in complex networks (Huang)
Here a reference to a recent relevant article by our visiting scholar / grad student Xiaojie Qiu:
From Understanding the Development Landscape of the Canonical Fate-Switch Pair to Constructing a Dynamic
Landscape for Two-Step Neural Differentiation
X Qiu, S Ding, T Shi - PloS one, 2012
Greg Zornetzer, RNACapture for brain histology (Ozinsky)
Brady Bernard-Glioblastoma (Shmulevich)
PTSD Research Overview
The Hood lab has pioneered a myriad of systems approaches to disease. The fundamental
concept is that in a diseased state, one or more biological networks are perturbed in the diseased
organ. This perturbation alters the patterns of gene expression within the diseased organ—and
these dynamically changing altered patterns of gene and protein expression reflect the
progression and the disease and ultimately, explain the pathophysiology of the disease.
Post-Traumatic Stress Disorder (PTSD) is severe anxiety disorder and poses a challenging
problem for contemporary medicine in that its primary effects are mediated through the brain.
Current medicine has limited tools to assess the onset, progression and mechanistic basis of this
disease/trauma because the brain is one of the most inaccessible human organs for analysis.
PTSD is currently diagnosed based on signs and symptoms and thorough psychological
evaluation and sensory and motor tests—that is, higher-level phenotypic assays with significant
uncertainty. There is an urgent need to develop clinically relevant biomarkers for PTSD. The
Hood lab is applying systems approaches involving two strategies —1) to assess across the
progression of the disease the changes in the informational molecules (e.g. mRNAs, microRNAs
or proteins) in the diseased tissues—and from these studies come to understand how the
dynamics of the inferred networks correlate with disease mechanism and 2) to assess in the blood
the reflections of the disease process by following the changes of blood protein (mainly brainspecific) and RNA spectra.
Download