Novel Cancer Pathway Modeling using Boolean

advertisement

Novel Cancer Pathway Modeling using Boolean Implication and Drosophila

Genetics

Specific Aims

Recent advances in DNA microarray technology that enable the simultaneous measurement of the expression of thousands of genes in a single experiment have revolutionized current molecular biology.

Already, the 21st century is witnessing an explosion in the amount of biological information on normal and disease processes. A large and exponentially growing volume of gene expression data from microarrays is now available publicly. Previously, we have published a novel approach to discover Boolean implications between genes using these massive amounts of gene expression datasets. Also, recently we used Boolean implications to successfully predict genes in B cell developmental pathway.

Discovery of genetic pathways in the fruit fly has led to identification of many genes associated with human cancer. For example RAS pathway discovery in the fly identified genes that are often altered in tumors and their alteration can cause cancer. Understanding of different pathways that are associated with cancer is still limited. This study proposes to build on our successful prediction of human B cell developmental genes which can predict pathways based on human gene expression datasets. Conservation of these pathways can be shown computationally in humans, mice and rats. It is hard to extend this conservation computationally to fruit flies because of inadequate data. However, the function of the genes can be easily and rapidly examined using fly genetics. Once a successful model of the conserved pathways is established in fruit flies, it will dramatically improve our understanding of the underlying biology of cancer as has been the case with the RAS, WNT, and hedgehog pathways.

Our specific aims are as follows. Aim 1 is to construct novel genetic pathways using Boolean analysis of massive amount of gene expression data. Aim 2 is to test genetic interactions and functions of these genes using the fruit fly ( Drosophila melanogaster ) as a model organism. Aim 3 is to verify these genetic interactions in human cell cultures.

Background and Significance

Genetic Pathways in the Fly

The genetics of eye development in Drosophila melanogaster led to discovery of multiple genes which are essential for normal development of the ommatidia

1

. One gene called sevenless , whose mutant failed to form the seventh cell in each ommatidium, encodes a protein similar to EGF receptor. Enhancer and suppressor screens identified mutations in other genes. Genetic complementation tests put these genes in a linear pathway.

For example, downstream of sevenless is son of sevenless ( sos ), which was found to be a Ras guanine nucleotide exchange factors (GEFs). This study combined with other biochemical evidence revealed a linear signaling cascade: tyrosine kinase receptor → Grb → Sos → Ras.

Cancer Pathways

The discovery of src oncogene led to remarkable understanding of cancer pathways

2

. A viral gene v-src was known to cause cancer in chicken. A remarkable experiment revealed that v-src is a mutant form of a cellular gene c-src 2 . Later it was found that src encodes a protein tyrosine kinase 3, 4 . Further biochemical experiments revealed that SH2 domain is responsible for bringing the kinase to its substrate for the oncogenic

activity

5

. These studies solved the mysteries of the receptor → Sos → Ras pathways. It became apparent that a kinase signaling cascades precedes oncogenic transformation

6

. The biochemical study and genetic pathway characterization in eye development in Drosophila melanogaster together are able to portray a more complete picture of a critical cancer pathway:

GF → RTK→ DRK → SOS → RAS → RAF → MEK → MAPK → ETS

Similarly, other cancer pathways were characterized in various studies including TGF-beta pathway,

PI3K-pathway, WNT-pathway, RB-pathway. The latter studies led to discovery of the first pathway with alternating tumor suppressor and oncogenes

7

.

Boolean Implication

Previously, we downloaded all microarray data for human Affymetrix U133 Plus 2.0 from NCBI’s GEO

(Gene Expression Omnibus) database

8

, and normalized using RMA (Robust Multi-chip Average) algorithm

9

.

Within these datasets we identified expression relationships between pairs of genes (represented by probe sets on the arrays) that follow simple “if-then” rules such as “if gene X is high, then gene Y is low,” or more simply stated: “X high 

Y low” (“X high implies Y low”). In this case gene X and gene Y are rarely “high” together.

We call these relationships “Boolean implications” 10

.

Figure 1.

Scatter plots of 4,787 Affymetrix U133 Plus 2.0 human microarrays downloaded from NCBI’s Gene

Expression Omnibus and normalized together. Each probeset is assigned a threshold t . Expression levels above t

0.5 are classified as “high,” expression levels below t

0.5 are classified as “low,” and values between t

0.5 and t + 0.5 are classified as “intermediate.” The plots show six different types of Boolean implication relationships between a pair of genes. Boolean implication is discovered by identifying a sparse quadrant in the scatter plot.

Figure 1 outlines the six different types of Boolean implications discovered among the probe sets within the human data sets. In these scatter plots, each point represents gene X’s expression versus gene Y’s expression within an individual microarray. Each plot is divided, based on thresholds, into four quadrants: (X low, Y low), (X low, Y high), (X high, Y low), and (X high, Y high). A Boolean implication exists when one or more quadrants is sparsely populated according to a statistical test and there are enough high and low values for each gene (to prevent the discovery of implications that follow from an extreme skew in the distribution of one

of the genes). There are four asymmetric Boolean implications, each corresponding to one sparse quadrant. Two symmetric Boolean implications “equivalent” and “opposite” are discovered when two diagonally opposite sparse quadrants are identified. Boolean implications can also be extended to logical combinations of genes.

For example the Boolean implication “A 

B” can be discovered where A and B are either single gene conditions (e.g., X high) or logical combinations of multiple genes (e.g., X high AND Y high).

Pathway Discovery using Boolean Implications

Previously, we developed a new method termed Mining Developmentally Regulated Genes (MiDReG) to predict genes whose expression is either activated or repressed as precursor cells differentiate

11

. MiDReG bases its predictions on Boolean implications mined from large-scale microarray databases and requires two or more input markers in any given developmental pathway.

Figure 2. Genes in B cell developmental pathway are discovered by using a Boolean interpolation between two known genes KIT and

CD19 that marks the endpoints. KIT is expressed early in B cell development and CD19 is expressed late. There is a robust Boolean implication KIT high

CD19 low is observed in the diverse collection of microarray dataset both in humans and mice. Genes that are expressed at an intermediate step and remain high till the end are discovered by identifying genes with KIT high

X low and CD19 high

X high Boolean implications.

For example, in studies of B cell development, we used two known genes KIT and CD19 that are expressed early and late respectively during B cell development (Figure 2). A conserved Boolean implication

KIT high

CD19 low is observed in the microarray dataset. MiDReG searched for genes X that are expressed during development and satisfy the implications “KIT high

X low” and “CD19 high

X high” (Figure 2), which represents the pattern of expression we expect for genes that are not expressed early in development when KIT is highly expressed (KIT high

X low), then upregulated later in development when CD19 is also upregulated (CD19 high

X high). The predicted genes were successfully validated in collaboration with the

Weissman lab at Stanford University.

MiDReG is a general method that can be applied to any genes of interest. The results can be interpreted based on the biological significance of the seed genes. Pathways can be constructed by using MiDReG repeatedly.

Summary of Research

Our hypothesis is that novel cancer pathways that are constructed using MiDReG could lead to discovery of new genes involved in tumorigenesis. These pathways, if they are conserved in humans, mice and rats, could lead to understanding of the basic underlying mechanism of the biology of cancer. We propose to explore the biological significance of these predictions using the power of Drosophila genetics, as well as the genetic perturbations of human cell culture. Our goal is to build a general genome-wide discovery tool for genes involved in cancer pathways that can be tested in human cell culture.

Preliminary Studies

Debashis Sahoo

Dr. Debashis Sahoo has a doctoral degree in Electrical Engineering from Stanford University. He and his thesis advisor Prof. David Dill have developed a set of tools to analyze large gene expression datasets. One of the tools StepMiner that analyzes a timecourse of microarray dataset classifies the gene expression values to

“low” and “high” and measures its significance using F-statistic 12

. Another tool BooleanNet analyzes all publicly available microarray dataset to discover novel Boolean implication relationships between genes

10

.

Recently, another tool MiDReG used BooleanNet to predict novel B cell developmental genes and discovered a branchpoint between B and T cell development (a manuscript is under review in Science)

11

. In addition, Dr.

Debashis Sahoo has been working part-time in the Lipsick laboratory for past three years to learn wet-bench biology. During this time he has performed mosaic analysis and conducted several other experiments involving genetics in fruit flies ( Drosophila melanogaster) . His experiments on the Lin-52 gene have revealed several new facts about the eye development and about chorion gene amplification in ovarian follicle cells.

Joseph Lipsick

Prof. Joe Lipsick has dedicated his research career to understanding the function of Myb oncogene family. As an independent investigator, his laboratory developed the first biological assay for molecular clones of the vMyb oncogene and established its role in oncogenesis 13-16 . More recently his laboratory focused on fruit fly ( Drosophila melanogaster ) as a model organism to understand the function of the human Myb oncogene. The lab created the first null mutant of the Drosophila Myb gene and showed its role in regulating mitosis, chromosome condensation and spindle pole formation

17, 18

. Most of these functions are consistent with the role of Myb as a human oncogene. Most recently, the lab has shown the role of Drosophila Myb, E2F2, and

RBF in epigenetic regulation of the gene expression of key components of the G2/M cell cycle progression, including Polo kinase and the spindle assembly checkpoint

19

.

Research Design and Methods

Specific Aim 1: To identify novel cancer pathway using Boolean implication on human, mouse and rat gene expression datasets.

We will download all publicly available microarray data from GEO (Gene Expression Omnibus) database for the most popular human, mouse and rat Affymetrix platforms

8

. These datasets will be normalized using the RMA (Robust Multi-chip Average) algorithm to produce the gene expression values

9

. Every probe set will be assigned a threshold to separate the “low” and “high” expression values using StepMiner algorithm

12 . Statistically significant Boolean implication relationships between pairs of probe sets will be discovered using BooleanNet algorithm 10 . After this, we will use MiDReG to construct novel pathways 11 . For example, a preliminary study showed that a putative MYBL2 → PTEN → ATM is conserved in both human and mouse gene expression datasets.

Specific Aim 2: To validate the genetic interactions and functions of the genes using Drosophila melanogaster as a model organism.

Several powerful genetic tools are available in Drosophila melanogaster to test functions and genetic interactions between genes. Using already known phenotypes of null mutants of the putative genes, their functions can be assayed. In additions, these mutants can be recombined to construct double or triple mutants of the genes of interest. Also, using the “UAS-GAL4” binary system, genes can be expressed in tissue-specific manner or time-specific expression patterns

20

. In addition, the “FLP-FRT” system can be used to induce mitotic recombination, thereby generating loss-of-function mosaics

21

. In addition, the above two techniques can be

combined effectively to perform mosaic analysis that allows the phenotypic analysis of patches of groups of genetically different cells that develop in a wild type environment

22, 23

. Conversely, using null mutants and the

“FLP-FRT” system, “flip-out” clones can be generated to allow rescue in clusters of cells within an otherwise a mutant environment

24

. Further, using the UAS-RNAi lines from the Vienna Drosophila RNAi Center, knockdown phenotypic analysis can be performed on clones or different tissues 25 . All of these tools have been used in the Lipsick lab for ongoing studies of the Drosophila Myb gene.

We will test genetic interactions of genes predicted using our computational approach using the above tools available in Drosophila. This study will validate the biological significance of the genes and model the cancer pathway in a model organism to facilitate the understanding of the cause and mechanisms of the biology of cancer.

Specific Aim 3: To validate the genetic interactions and functions of the genes using human cell culture as a model system.

We will test the biological significance of the predicted gene in human cell cultures using a variety of techniques available to manipulate genes in human cells. RNA interference (RNAi) techniques have been used to perturb genes in human cells in cultures

26

. Loss-of-function for our predicted genes can be tested by using such RNAi techniques. Conversely, DNA transfection and retroviral infection in vitro have been used classically to check dominant transformational activity of favorite genes and viruses 27 . Transformation of human or mouse cells in cultures is a popular tool to test putative oncogenes and tumor suppressor genes. For example, H-Ras and K-Ras has been used to transform human breast cancer cells

28

. We will perturb these transformed human cells using RNAi techniques to test predicted genetics interactions in the cancer pathway.

Also, we will test the transformation ability of our predicted genes in human cells. Several properties that are assayed in this process are focus formation 29 , growth in low serum, growth in agar suspension cultures and ability to immortalize a cell line 30 . We will perform above experiments to test the function of our predicted genes. In future the transformed cells could be injected in a immunocompromised mice to produce tumors

31

.

Also, the Lipsick lab has previously developed a breast cancer cell lines with “tet-off” inducible system for B-

MYB expression. We would check gene expression for individual genes using quantitative RT-PCR or measure gene expression in a genome wide scale using microarrays after the required gene perturbations.

Reference

References

1. Rubin, G. M.

et al . Signal transduction downstream from Ras in Drosophila. Cold Spring Harb. Symp. Quant.

Biol. 62 , 347-352 (1997).

2. Stehelin, D., Varmus, H. E., Bishop, J. M. & Vogt, P. K. DNA related to the transforming gene(s) of avian sarcoma viruses is present in normal avian DNA. Nature 260 , 170-173 (1976).

3. Collett, M. S. & Erikson, R. L. Protein kinase activity associated with the avian sarcoma virus src gene product. Proc. Natl. Acad. Sci. U. S. A. 75 , 2021-2024 (1978).

4. Levinson, A. D., Oppermann, H., Levintow, L., Varmus, H. E. & Bishop, J. M. Evidence that the transforming gene of avian sarcoma virus encodes a protein kinase associated with a phosphoprotein. Cell 15 ,

561-572 (1978).

5. Waksman, G., Shoelson, S. E., Pant, N., Cowburn, D. & Kuriyan, J. Binding of a high affinity phosphotyrosyl peptide to the Src SH2 domain: crystal structures of the complexed and peptide-free forms. Cell

72 , 779-790 (1993).

6. Yao, R. & Cooper, G. M. Regulation of the Ras signaling pathway by GTPase-activating protein in PC12 cells. Oncogene 11 , 1607-1614 (1995).

7. Classon, M. & Harlow, E. The retinoblastoma tumour suppressor in development and cancer. Nat. Rev.

Cancer. 2 , 910-917 (2002).

8. Edgar, R., Domrachev, M. & Lash, A. E. Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res. 30 , 207-210 (2002).

9. Irizarry, R. A.

et al . Summaries of Affymetrix GeneChip probe level data. Nucleic Acids Res. 31 , e15 (2003).

10. Sahoo, D., Dill, D. L., Gentles, A. J., Tibshirani, R. & Plevritis, S. K. Boolean implication networks derived from large scale, whole genome microarray datasets. Genome Biol. 9 , R157 (2008).

11. Sahoo, D.

et al . A new Method of Mining Developmentally Regulated Genes Identifies a Branchpoint

Between B and T Cell Development. Science (Under review) .

12. Sahoo, D., Dill, D. L., Tibshirani, R. & Plevritis, S. K. Extracting binary signals from microarray timecourse data. Nucleic Acids Res. 35 , 3705-3712 (2007).

13. Chen, R. H., Fields, S. & Lipsick, J. S. Dissociation of transcriptional activation and oncogenic transformation by v-Myb. Oncogene 11 , 1771-1779 (1995).

14. Ibanez, C. E., Garcia, A., Stober-Grasser, U. & Lipsick, J. S. DNA-binding activity associated with the vmyb oncogene product is not sufficient for transformation. J. Virol. 62 , 4398-4402 (1988).

15. Ibanez, C. E. & Lipsick, J. S. Trans Activation of Gene Expression by V-Myb. Mol. Cell. Biol. 10 , 2285-

2293 (1990).

16. Lane, T., Ibanez, C., Garcia, A., Graf, T. & Lipsick, J. Transformation by v-myb correlates with transactivation of gene expression. Mol. Cell. Biol. 10 , 2591-2598 (1990).

17. Manak, J. R., Mitiku, N. & Lipsick, J. S. Mutation of the Drosophila homologue of the Myb protooncogene causes genomic instability. Proc. Natl. Acad. Sci. U. S. A. 99 , 7438-7443 (2002).

18. Manak, J. R., Wen, H., Van, T., Andrejka, L. & Lipsick, J. S. Loss of Drosophila Myb interrupts the progression of chromosome condensation. Nat. Cell Biol. 9 , 581-587 (2007).

19. Wen, H., Andrejka, L., Ashton, J., Karess, R. & Lipsick, J. S. Epigenetic regulation of gene expression by

Drosophila Myb and E2F2-RBF via the Myb-MuvB/dREAM complex. Genes Dev. 22 , 601-614 (2008).

20. Brand, A. H. & Perrimon, N. Targeted gene expression as a means of altering cell fates and generating dominant phenotypes. Development 118 , 401-415 (1993).

21. Golic, K. G. & Lindquist, S. The FLP recombinase of yeast catalyzes site-specific recombination in the

Drosophila genome. Cell 59 , 499-509 (1989).

22. Duffy, J. B., Harrison, D. A. & Perrimon, N. Identifying loci required for follicular patterning using directed mosaics. Development 125 , 2263-2271 (1998).

23. Perrimon, N. Creating mosaics in Drosophila. Int. J. Dev. Biol. 42 , 243-247 (1998).

24. Struhl, G. & Basler, K. Organizing activity of wingless protein in Drosophila. Cell 72 , 527-540 (1993).

25. Dietzl, G.

et al . A genome-wide transgenic RNAi library for conditional gene inactivation in Drosophila.

Nature 448 , 151-156 (2007).

26. Berns, K.

et al . A large-scale RNAi screen in human cells identifies new components of the p53 pathway.

Nature 428 , 431-437 (2004).

27. Copeland, N. G., Zelenetz, A. D. & Cooper, G. M. Transformation of NIH/3T3 mouse cells by DNA of

Rous sarcoma virus. Cell 17 , 993-1002 (1979).

28. Li, Q. & Mattingly, R. R. Restoration of E-cadherin cell-cell junctions requires both expression of Ecadherin and suppression of ERK MAP kinase activation in Ras-transformed breast epithelial cells. Neoplasia

10 , 1444-1458 (2008).

29. Boettiger, D. & Temin, H. M. Light inactivation of focus formation by chicken embryo fibroblasts infected with avian sarcoma virus in the presence of 5-bromodeoxyuridine. Nature 228 , 622-624 (1970).

30. Land, H., Parada, L. F. & Weinberg, R. A. Tumorigenic conversion of primary embryo fibroblasts requires at least two cooperating oncogenes. Nature 304 , 596-602 (1983).

31. Jung, E. Y., Kang, H. K., Chang, J., Yu, D. Y. & Jang, K. L. Cooperative transformation of murine fibroblast NIH3T3 cells by hepatitis C virus core protein and hepatitis B virus X protein. Virus Res. 94 , 79-84

(2003).

Download