Quantitative soform Profiling & Isoform Convergence by Chris Varma B.S. Computer Science, Dept. of Computer Science, M.S. Computational Biology, Dept. of Computer Science, M.S. Management, Dept. of Management, Stanford University, 2001 Submitted to the Harvard-MIT Division of Health Sciences and Technology In Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy in Medical Engineering at Harvard Medical School & MASSATS r iNSTFU'E OF TEOHNO)cQY Massachusetts Institute of Technology May 2005 JUN 3 2005 LIBRAR IES © 2005 Chris Varma. All rights reserved. The author hereby grants to Harvard Medical School and to MIT permission to reproduce and to distribute publicly paper and electronic copies of this-thesis document in whole o/in part. Signature of Author: Harvard-MIT Division of Health Sciences & Technology May 13, 2005 Ci by Certified by · z7 ~ c2~ PeterSzolovits,Ph.D. Professor of Computer Science & Engineering / Health Sciences & Technology, MIT Thesis Chairman Certified by George M. Church, Ph.D. Professor of Genetics, Harvard Medical School Thesis Supervisor Accepted by I WV Martha L. Gray Edward Hood Chaplin Professor of Medic al and Electrical Engineering - Co-director, Harvard-M.I.T. Division of Health Sciences and Technology ARCHIVES Quantitative Isoform Profiling & Isoform Convergence by Chris Varma Submitted to the Harvard-MIT Division of Health Sciences & Technology on May 13, 2005 in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy in Medical Engineering Abstract Alternative pre-messenger RNA splicing is a crucial step in eukaryotic gene expression, and therefore it is subject to tight regulation. Given its importance in conferring protein diversity, alternative splicing is sensitive to changes in cellular states including malignancy. We present a new paradigm by which to quantitatively study the alternative splicing of any molecule through the presented methods of quantitative exon profiling and quantitative isoform profiling which take advantage of a single-molecule based technology [Mit99]. Furthermore, we extend this paradigm to include a novel unified platform-called Isoform Convergence-to qualify particular isoforms as candidate diagnostic markers, potential therapeutic targets, and perhaps even as precursor therapeutics themselves. We apply this paradigm to quantitatively investigate the alternative splicing of CD44 in two leukemias. CD44 is an alternatively spliced cell surface receptor, which is generally implicated in cancer though the specifics are mired in controversy. In this work, we suggest several corrections to previously made claims about the presence of specific CD44 exons and of specific CD44 isoforms in leukemia as well as in non-diseased cells. Furthermore, we provide not only the first comprehensive characterization of CD44's (or any molecule's) alternative exon splicing in human cells, but also its resulting quantities of exons and isoforms to an average resolution on the order of 1.E+06 molecules. Finally, we identify specific isoforms in each leukemia that may serve as candidate markers or possibly as therapeutic targets. Thesis Supervisor: George M. Church, Professor of Genetics, Harvard Medical School Varma, Chris 2 of 65 Ph.D. Thesis Table of Contents OVERVIEW .................................................................................................................................................. 6 CONTRIBUTIONSOF THIS THESIS ............................................................................................................... 6 CD44 CONTROVERSY............................ ......................................................................................................7 BACKGROUND ........................................................................................................................................... 0.1. PRE-MRNA AL TERNA TIVE SPLICING........................................................................... 8 8 Overview& M ajor Players.................................................................................................................... 8 AlternativeSplicing,Disease, & TherapeuticIntervention................................................................. 9 0.2. HCD44 .................................................................... 10 11............................... I Structure& Function .................................................................... CD44 Alternative Splicing ................................ ................................................................................. 11 CD44 & Implications in Disease .................................................................... 13 CD44 in B-cell & T-cell Development .................................................................... 14 0.3. SPLIC IN-INSIGHTS D RIVEN THERAPY .............................................................................................. 14 0.4. LEUKEMI A .......................................................................................................................................... 16 A L L ..................................... ................................................................................................................. 16 AML........................................ 17 Diagnosis of Leukemia ....................................................................................................................... 17 Treatment of Leukemia ...................................................................................................................... 18 CHAPTER 1: PROFILING OF ALTERNATIVE EXON SPLICING ..................................................19 1.1. INTRODUCTION.................................................................... 1.2. POLONY TECHNOLOGYFOR EXONPROFILING .................................................................... 1.3. OVERVIEW OFEXPERIMENTAL METHODS ................................. .................................... 1.4. PRIMER DESIGN.................................................................... 1.5. POLONY SLIDE CREATION& SUBSEQUENTPREPARATION................................................................ 1.6. SINGLE BASEEXTENSIONS & SCANNING.................................................................... 1. 7. REAL-TIE QPCR ..................................................................... 1.8. CONCLU .. N........................ ............................................................................................................ 19 19 20 21 22 22 23 23 CHAPTER 2: CONSTRUCTION OF EXPRESSION PROFILES ...................................................... 24 2.1. INTRODCTION .................................................................................................................... 2.2. OVERVIEW OF SOFTWARE ........................................................ 2.3. IMAGE PROCESSING ....................................................... 24 24 25 2.4. THRESHOLDINC .................................................................................................................... 25 2.5. POLONY IDENTIFICATION ..................................................................................................... 26 2.6. ISOFORMCONSTRUCTION ....................................................... 27 2. 7. SPECIAL CASES & OTHER SOFTWARE ....................................................... 27 2.8. PERFORMANCE OF SOFTWARE....................................................... 28 2.9. CONCLUSION...................................................................................................................................... 28 CHAPTER 3: CHARACTERIZATION OF SAMPLES ...................................................... 28 3.1. INTRODUCTION..................................................................................................................... 3.2. OBTAINING SAMPLES....................................................... 3.3. SEMI-QUINTITA TIVEANALYSIS OF CD44 ....................................................... 3.4. QUANTIFYING VARIANTEXONS ...................................................... 28 29 29 31 3.5. CONCLUSION ........................................................ 33 CHAPTER 4: EXON EXPRESSION PROFILES ....................................................... 34 4.1. INTRODUCTION....................................................... 34 4.2. QUANTITTIVE COMPARISON OFAML TONORMAL...................................................... 34 35 4.4. QUANTITATIVE COMPARISONOFAML TO ALL ............................................................... Varma, Chris 3 of 65 Ph.D. Thesis 35 4.5. QUANTITATIVE COMPARISONOFALL TO NORMAL ........................................................... 4.6. QUANTITA TIVE COMPARISONOF CANCER TO NORMAL ............................................................ 36 4.7. CONCLUSION ........................... ................................. 37 CHAPTER 5: ISOFORM EXPRESSION PROFILES .............................................................. 5.1. INTROD CTIO N .................................................................................................................................. 5.2. COMMON ISOFORMS ........................................................................................................ 5.3. QUANTITATIVE COMPARISONOFAML TO NORMAL ........................................................... 5.4. QUANTIrA TIVE COMPARISON OFAML TOALL ................................................................................. 5.5. QUANTITATIVE COMPARISONOF ALL TONORMAL ............................................................ 5.6. QUANTI:A TIVE COMPARISONOF CANCER TO NORMAL ............................................................ 5.7. CONCLUSION.......................... .................................. 38 38 38 40 42 43 44 46 CHAPTER 6: IN PURSUIT OF EXCLUSIVE CONVERGING-ISOFORMS .....................................47 6.1. INTROD CTI ON ................................................................................................................ 6.2. INTRODUCINGCISOFORM CONVERGENCE........................................ 6.3. FINDING CONVERGENCE................................................................................................. 47 4...................................4 49 6.4. CONVERGI NG-ISOFORMS ................................................................................................. 50 6.5. EXCLUSIVE CONVERGING-ISOFORMS................................................................................................ 6.6. IDENTIFICATION OFPOSSIBLE CANDIDATE TARGETS ......................................... 6.7. CONCLU.ION........................... ................................. 51 52 52 CONCLUSION ............................................................... 54 SUPPLEMENTARY RESULTS ............................................................. 55 S.. TM-EXONSKIPPI NG ........................................................................................................ Initial Observation,Identification & Background......................................... .................... Isoform ExpressionProfiles................................ ............................. UniversalExpression............................................................. 55 55 57 59 S.2. E ONEXPRESSIONPRFILE OFCELLLINES ..................................................................................59 Obtaining, Samples ............................................................. 59 QuantitativeComparisons............................................................. Varma, Chris 4 of 65 60 Ph.D. Thesis Acknowledgments I am indebted to George Church, Peter Szolovits, and Srini Devadas for their invaluable help, advice, support, and mentorship in all aspects of my thesis and graduate career. I would like to thank Jun Zhu for his significant guidance and support. In addition, I would like to acknowledge the contribution of members of the Church lab - particularly Jay Shendure and Kun Zhang - for providing useful feedback. Varma, Chris 5 of 65 Ph.D. Thesis Overview Contributions of This Thesis The end goal of our research is to enable meaningful quantitation in biology that leads to the generation of new therapeutic targets and new markers for the diagnosis and prognosis of malignant disease. Since alternative splicing and general splicing are fundamental processes, it is conceivable that some diseases or abnormalities may be associated with changes or defects in the cellular splicing machinery [Kra97], resulting in inappropriately spliced transcripts given the cell type, physiologic conditions, environment, etc. It is also conceivable that consistent abnormalities in alternative splicing reflect general dysfunction of cellular machinery due to a cell's compromised condition or cancerous state. In fact, 1) alternative splicing is a crucial step in gene expression, 2) thus alternative splicing is subjected to tight regulation, 3) given its importance in conferring diversity, alternative splicing is very sensitive to the changes in cellular states including malignant or abnormal states. Therefore, alternative splicing provides a unique angle to study disease, particularly malignant disease. CD44 defects are strongly associated with malignant disease. CD44 is a cell surface molecule that is involved in numerous cell processes including cellular communication and cell-matrix interaction, which is important for tumor progression. CD44 is alternatively spliced via 10 variant exons in most species (though only 9 in humans-not including the tail region), enabling 1024 theoretical isoforms, and specific isoforms are correlated with particular malignant states including leukemic states. In fact, peripheral blood lymphocytes (PBLs) of patients with AML, CML, and ALL have significantly elevated CD44 alternatively-spliced isoforms, which are undetected in normal patients [Kha96]. However, even semi-quantitative evaluation of these values has not been successfully obtained which has lead to significant controversy in the current CD44 literature. Therefore, CD44 serves a model molecule on which to begin to study the alternative splicing in malignant disease. In this work, we present a new paradigm by which to quantitatively study the alternative splicing of any molecule in any cell type through a single-molecule based technology [Mit99] and our methods of quantitative exon profiling and quantitative isoform profiling. Furthermore, we extend this paradigm to include a novel unified platform-called Isoform Convergence-to qualify particular isoforms as candidate diagnostic markers, potential therapeutic targets, and perhaps even as precursor therapeutics themselves. Thus our purpose and goals are the following: Varma, Chris 6 of 65 Ph.D. Thesis 1. To evaluate our proposed methods of quantitative exon profiling, quantitative isoform profiling, and Isoform Convergence for their ability to generate new and biologically meaningful discoveries 2. To provide unprecedented quantitative insight into the splicing of CD44 in two human leukemias while resolving controversy in the relevant literature-through novel methods of quantitative exon profiling and quantitative isoform profiling 3. To investigate if alternative spicing of CD44 confers a unique and consistent signature in two human leukemias-through a novel method called Isoform Convergence CD44 Controversy According to [HerOO], "The type of action of CD44 on a given cell will depend on the isoform pattern of CD44 expressed...despite a flood of more than 2,000 papers on CD44, the correlation of its detection with disease progression has remained controversial, and fundamental questions on the function of the CD44 proteins have not been answered." Several studies on the correlation of CD44 isoforms with particular diseases have inconsistent results and conclusions. For example, [Hei93] and [Mu194]demonstrated that CD44v6 (exon-v6-containing CD44) correlated with colorectal carcinoma, whereas [Fin95] showed that CD44v6 does not correlate with colorectal carcinoma. As another example, [Kau95] reported that expression of CD44v3, v5, and especially v6 epitopes were found in most primary breast cancer tumor samples but were not present in normal mammary ductal epithelial cells, and the expression of variant isoforms especially CD44v6 was correlated with shorter length of survival. In direct contrast, [Foe99] reported that the expression of CD44v6 may be a marker to identify patients with a relatively favorable prognosis. We speculate that the controversy in the literature is mainly due to the semi-quantitative nature of assays used and the inability to distinguish individual splicing isoforms. We believe that a single-molecule-based technique (i.e. Polony Technology) along with our methods of quantitative exon profiling, quantitative isoform profiling, and Isoform Convergence can help to resolve such controversy by enabling direct quantification of splicing isoforms. Furthermore, by being able to consider the complete isoform expression of CD44 at the pre-mRNA level we can elucidate statistical profiles of alternatively spliced transcripts in a quantitative manner that together implicate disease. Therefore, this study is not only important as a basic biological study of alternative splicing regulation but also as a possible development program for the identification of potential diagnostic markers and therapeutic targets. Varma, Chris 7 of 65 Ph.D. Thesis Background o. 1. Pre-mRNA Alternative Splicing Pre-messenger RNA (pre-mRNA) splicing is a crucial step for mammalian gene expression. The removal of introns has to be highly precise in order to produce the appropriate message (i.e. mRNA) for protein production. Splicing can also be alternatively regulated which is one of the major mechanisms to give rise to proteome diversity (see Figure 1). Given the importance of pre-mRNA splicing in gene expression, it is conceivable that splicing is under tight spatial and/or temporal control [Has01]. Alternative splicing refers to splicing of variable exons at the pre-mRNA step which generate different messages through exon inclusion or exon skipping (see Figure 1) resulting in functionally diverse protein isoforms in a spatially and temporally regulated manner [Has01]. ~. ~. s v C.' C.2 C.3C.4C= •• ...u secreled II mRNA M Exons •• membrane II mRNA Figure 1. Alternative splicing of the mouse immunoglobulin \J heavy chain results in two distinctive mRNAs, one that is secreted as an antibody and another that is membrane bound (Note: introns are represented by the straight line in the pre-mRNA step. The mRNAs contain no introns. Figure from htto:/Iwww.blc.arizona.edu/martv/411/Modules/altsolice.html. ) Overview & Major Players Splicing occurs in the nucleus of cells and is executed by the spliceosome, which consists of five small nuclear ribonucleoprotein (snRNP) complexes U1, U2, U4, U5, and U6 and many non-snRNP proteins [MarOO]. The spliceosome precisely excises introns and joins exons in a sequential order (see Figure 2) through numerous RNA-RNA, RNA-protein and protein-protein interactions [Has01]. Varma, Chris 8 of 65 Ph.D. Thesis Figure 2. Cartoon depiction of the spliceosome as it splices out introns and splices in/out variable exons. This process is not well understood and this depiction may not be accurate. Figure from htto:/Iwww.blc.arizona.edu/martv/411/Modules/soliceo.html. There are five major modes of alternative splicing, namely alternative 5' or 3' splice-site choice, exon skipping, intron retention, and mutually exclusive exons. It has been estimated that at least 59% of human genes are alternatively spliced, and approximately 80% of them result in changes in the encoded protein [Wau03, Has01]. Alternative splicing has a plethora of functional effects on mammalian gene expression. Generating several mRNA variants from a single gene allows functionally diverse protein isoforms to be expressed according to different regulatory programs. The regulatory programs are cell specific and the splicing pathways are modulated according to the type of cell, development stage, gender of organism, external stimuli, etc [Wau03]. Alternative splicing results in structural variation by the insertion or removal of amino acids, shifting of the reading frame, and introduction of termination codons. This enables gene expression variability by the removal or insertion of regulatory elements that control translation, mRNA stability, or localization [Wau03]. Alternative Splicing, Disease, & Therapeutic Intervention Although alternative splicing accounts for much genetic variability by generating multiple protein isoforms, aberrant splicing-- which is alternative splicing that has lost regulation due to a particular defect--can result in disease [BleOO]. For example, Thalassemia is an inherited condition, which results in impaired production of either the alpha or beta hemoglobin chain resulting in severe anemia and possible organ and bone defects. Patients with Thalassemia carry mutations in the HBB (beta-globin) gene, which activates cryptic splice sites in beta-globin pre-mRNA, resulting in a deficiency of adult hemoglobin A [Has01]. There are several other cases of defects in splicing leading to disease (see Table 1) [ArsOO], [Liu01]. Varma, Chris 9 of 65 Ph.D. Thesis Disorder intcrmriitent Aclite porphyna Breastandovariancancer Carbohydratedeficient glycoprotein syndi'ome Cne Translationallysilent R28R(C -i,G3) Nonsense Missense Pmphobilinogen deaminase BRCAI PMM2 Refs [69] 1701 1711 E139K(G-E 18) E139K(G-A. 5) typela Sterol 27 hydroxylase Cysticfibrosis CFTR E60X (G-T.: 3): R75X Ehlers-Danlos synd:ome type V'I Fanconianenfia Fronlatoemporal den:ntia (IFTDP-17) HemophiliaA Lysyl hydroxylase 1 ): W1228X (G -A. 20) Y511X (C-A 14) IlPRT deficiency Hypoxanthine phosphon- G40V (G-T. 2): R4811 (G-A. 3): A161E (C--A. bosyltransferase 6): P1841. (C T. 8): D194Y (G-T. 8): E197K xanthntlatosis FANCG Tau 1721 G112G(G-T 2) Cerbrotendinous [651 (C-T 3): R553X(C-: Q356X (C -T. 8) S305N(G -A. 10): N297K (T--G. 10) E1987X (G-I: R2116X (C -T, 22) Facta ViL [731 [74] 1.2841.(T C. 10): S305S [75.76] (T--C. 10) 1651 19): FI99F(C-T. 8) [65] G1850 (A -G. 6) [77] 51) 178] [79] (G-A. 8): Ei197V(A-T. enleigh's cephalomyelopathy Marfan syndrome Mctachroraltic leukodystrophy(juvenileform) Neurofibromatosis type I 8) Pyruvate dehydhogenase ElIx Fibrillin T4)91 (C -T. 8) ArylsulfataseA 121181(C-. 180I R304X (C-T. 7): Q756X 14): Y2264X (C -'. NIF (C-A. 37): Ornithine carbanloyltrans ferase Uporphyrogen Porphyriacutanea tardi decarboxylase P404L (C-T: 1]) Hexosaninidase Sandhoffdisease R142Q(G A. 5) Severe cbined inmlun- Adenosinedeanunasc odeficiency SMN1 Spinalmuscle atrohpy SMN2 Spinal muscleatrophy Fumaryl acetoacetate hy- Q279R(A-G. 8) Tyrosinemnia type drolase OCT deficiency [81 T.9) L304F (G-- [82] E314E (G-A, 9) 1831 [84] R142X(C -T. 5) 185] [86] [87.88] WI02X (G-A. 3) F280F (C -T. 7) N232N (C-T. 8) Table 1. The effects of defective alternative splicing leads to various disease states. Table from: Caceres JF and Kornblihtt AR, Trends Genet 18: 186-93 (2002). In the case of Thalassemia, scientists reported successful treatment of these patients' erythroid progenitor cells using antisense oligonucleotides targeted to the cryptic splice sites, which restored appropriate splicing of the HBB gene and increased hemoglobin production to near normal levels [LacOO]. There are numerous other recent examples of such therapeutic interventions of aberrant splicing induced disease states with similar beneficial effects [CarO3],[SkoO3]. 0.2. hCD44 Human CD44 (hCD44) is a cell surface glycoprotein receptor for the glycosamino glycan hyaluronan (HA), which is a major component of extracellular spaces. CD44 is expressed on many types of cells including most hematopoietic cells (e.g. B-cells, T-cells, and myloid cells), keratinocytes, chondrocytes, many epithelial cell types, and some endothelial and neural cells [Bio98-2]. The functional role of CD44 (in different cell types or developmental stages) is regulated by alternative splicing and by post-translational modification such as glycosulation [AbbOO]. Varma, Chris 10 of 65 Ph.D. Thesis Structure & Function Human CD44 is an 80-250 kDa transmembrane glycoprotein [Tan94]. Its primary ligands are osteopontin, fibronectin, collagen (I, IV23), and hyaluronan (HA)-HA has been CD44's most important ligand in terms of disease implication [Uku01]. However, more recently osteopontin has been implicated in disease progression. For example, the work of [Kha02] has demonstrated that binding of osteopontin to variant isoforms of CD44 has anti-apoptotic effects, which may provide immunity to cancerous cells. CD44 is a multifunctional receptor involved in cell-cell and cell-ECM (extracellular matrix) interactions, cell trafficking, T-cell and B-cell adhesion, lymph node homing, presentation of chemokines and growth factors to traveling cells, transmission of growth signals, uptake and intracellular degradation of HA, and transmission of signals mediating hematopoiesis and apoptosis. The structure of CD44's HA-binding domain is shown in Figure 3a and 3b. (Crystal structures of complete CD44 are not available-see the Protein Data Bank). The cytoplasmic domain of CD44 (approximately 70 amino acids long) is highly conserved (> 900/0)in most of the CD44 isoforms [Bi098-2] . .. CD44 Alternative Splicing Human CD44 encodes 20 total exons (see Figure 4). (There are two different nomenclatures in the hCD44 literature for designation of exons; here we use the original designation.) Exons 1-5, 15 & 16 are constant exons and they code for the extracellular domain. Exons 6a,b-14 are variably spliced, which are also designated as V1 to V10. (Note: Exon 6a / V1 is thought not to be expressed in humans.) Exon 17 codes for the transmembrane segment (21 amino acids long)-thus this exon enables CD44 to be expressed on the cell surface. Exons 18 and 19 code for the cytoplasmic tail (72 amino acids long) and are mutually exclusive. Varma, Chris 11 of 65 Ph.D. Thesis In tra- Extracellular CcUuJar Domain Domalru; IIMI I 1 2 J 2 J 4 I 5 6a 6b 7 8 9 10111213 14 15 16 I 17 18 19 Elton Numbering ~1 4 5 6 7 8 91011 12131415 16 17 18 1t 20 Figure 4. Exons of hCD44 in the 5' to 3' direction. TM denotes transmembrane. Exon 6a includes a stop codon and is thought not to be expressed in humans. Note: hCD44 has two sets of nomenclatures-the original nomenclature is on top and the new nomenclature is below. The common or standard form of Human CD44 is the hematopoietic form designated as CD44H or as CD44s ('s' for standard). CD44H does not encode any variant exons (see Figure 5), includes exon 19 as the tail segment, and is approximately 270 amino acids long. Besides the common form, at least 45 alternatively spliced variants have been found to exist [Van93]. In the murine Eph4 cell line, 69 distinct CD44 isoforms have been reported [Zhu03]. Exons are alternatively spliced on the extracellular proximal domain of CD44 (see Figure 5) which is involved in ligand binding. Varma, Chris 12 of 65 Ph.D. Thesis Varlanl non aJlnn.lh~ Spl,", SII .. --. Trnn!lmembnme Domain } } C,1oPlasmic Domain Figure 5. Cartoon representation of CD44 including location of variant exons. The results of [Sle97] indicate that alternative splicing regulates the ligand binding specificity of CD44 and suggests that structural changes in the CD44 protein have a profound effect on the range of Iigands to which CD44 can bind with potentially wide-ranging functional consequences. Specifically, they report that isoforms containing exons v6 and v7 enable direct binding to chondroitin sulfate, heparin, and heparin sulfate in addition to HA. It has also been established that splicing isoforms of CD44 affect the affinity of binding of particular growth factors and growth-promoting proteins because the larger isoforms better stabilize clusters of CD44 molecules [HerOO]. In addition, there is evidence that CD44's function is enhanced or manipulated by clustering of CD44 molecules [Yu99] and one way found to enhance CD44 clustering is by inclusion of alternative-splicing isoforms [Sle96]. Interestingly, the effect of including alternative isoforms on overall CD44 function is analogous to HA's binding to and stimulation of CD44 [Sle96]. CD44 & Implications in Disease Both the standard form of CD44 as well as CD44 variants have been implicated in disease in a plethora of human cancers. The work of [Bou97] indicates that CD44s and p185-HER2 are physically linked to each other via interchain disulfide bonds on the surface of ovarian carcinoma cells. Further, HA stimulates CD44sassociated p185-HER2 tyrosine kinase activity which leads to an increase in the growth of the ovarian carcinoma [Bou97]. In [Shu01] the authors show that CD44v4-v10 confer enhanced in vitro rolling, enhanced in vivo local tumor growth, and lymph node invasion by lymphoma cells. Further, a site-directed point mutation at the HA-binding site of the variants Varma, Chris 13 of 65 Ph.D. Thesis resulted in loss of these enhanced functions. In another study [Kat99], the authors demonstrate that several different variants bind osteopontin (OPN), but CD44s does not. Moreover, the expression of OPN and CD44 variants has been correlated with tumorigenesis and metastasis. Further investigation has shown that OPN binding by CD44 variants promotes cell spreading, motility, and chemotactic behavior. The work of [Kha96] elucidates the expression of CD44 variants in lymphomas and leukemia. In the study, peripheral blood lymphocytes (PBLs) of 30 normal patients and PBLs of 183 patients with hematologic disorders revealed that only in patients with malignant disorders did a measurable proportion of PBLs express CD44 variant isoforms, mostly exons v5, v6, v7 and less frequently v10. Elevated levels of these variant isoforms were present in the following percentage of patients for each hematologic disorder: acute myeloid leukemia (AML) 16%, chronic myeloid leukemia (CML) 25%, acute lymphoid leukemia (ALL) 23%, Hodgkin's disease 17%, non-Hodgkin's disease 54%, and multiple myeloma 22%. In addition, expression of CD44v in PBLs was not linked to the histological grading or clinical staging of disease. CD44 in B-cell & T-cell Development Antibodies to CD44 (isoform non-specific) block development of B-cell precursors in the marrow-both myeloid and lymphoid cells-in culture, but it is not known how CD44 and hyaluronan (HA) function in the bone marrow [Jan96]. Therefore, CD44 is present throughout the development of B-cells. During T-cell development, progenitor T-cells migrate from the marrow to the thymus-thus they are now called thymocytes. Initially thymocytes express CD44 for a short time, but as they mature they lose expression of CD44. Since this is a negative selection process, it is highly unlikely that these cells could become neoplastic [Jan96]. After development, both effector T-cells and memory T-cells significantly express CD44 [AbbOO]. The property of CD44 binding to HA is responsible for the retention of T-cells in extravascular tissues at sites of infection and for the binding of effector and memory T-cells to endothelium at sites of inflammation and in mucosal tissues [AbbOO]. 0.3. Splicing-Insights Driven Therapy Although a particular CD44 antibody that identifies an epitope that is aberrantly expressed (presumably through alternative exon splicing) and that is significantly upregulated on acute lymphoblastic leukemia (ALL) cells has been found [BenO3], a rational method of target identification via insights gained from alternative splicing has not yet been developed. We theorize this is due to the semiquantitative nature of RT-PCR and antibody-based assays as well as to the inability to study exon combinatorics until recently. Varma, Chris 14 of 65 Ph.D. Thesis However, there are several cases where insights gained from study of the aberrant alternative-exon splicing of a particular gene in a disease state resulted in re-establishing correct splicing which subsequently re-enabled normal function. By using antisense oligonucleotides to correct splicing, normal gene expression has been established in cellular models of -Thalassemia [Suw02], cystic fibrosis [Fri99], and Duchenne muscular dystrophy [Wil99]. Antisense oligonucleotides can interact with mRNA or its precursors in a sequence specific fashion thereby affecting the expression of the transcript. As we have seen (see Table 1) it is frequently the case that mutations causing disease act by affecting pre-mRNA splicing, often resulting in unique transcripts. Antisense oligonucleotides can be targeted against these unique transcripts to restore correct splicing [SieOO, Sie99]. carry mutations in the HBB (beta- For example, patients with l-Thalassemia globin) gene, which activates cryptic splice sites in beta-globin pre-mRNA, resulting in a deficiency of adult hemoglobin A [HasOl]. In [Suw02] correct human beta-globin mRNA was restored in erythroid cells from transgenic mice carrying the human gene by correcting the splicing of thalassemic human B- . .F. globin pre-mRNA via an oligonucleotide targeted to the aberrant 5' splice site. Aberrantly (ab) and correctly (c) spliced mRNAs are shown in Figure 6. IL - human 13-globinIVS2-654 pre-mRNA aigonucleotide b a I ss III b a b 0 4 ] ab mRNA |------I- ab IVS2-654 (367) Z c I ss I I c mRNA I I C I c IVS2-654 (294) c IVS2-654 (231) ab IVS2-654 (304) Figure 6. Correction of splicing of thalassemic human B-globin pre-mRNA by oligonucleotide targeted to the aberrant 5' splice site [Suw02]. Aberrantly (ab) and correctly (c) spliced mRNAs are shown in Figure 6. Forward primers a and b were used for patient and murine RNA, respectively. Varma, Chris 15 of 65 Ph.D. Thesis This was accomplished in a dose- and time-dependent manner by free uptake of morpholino oligonucleotide antisense to the aberrant splice site at position 652 of intron 2 in beta-globin pre-mRNA. Under optimal conditions of oligonucleotide uptake, the maximal levels of correct human beta-globin mRNA and hemoglobin A in patients' erythroid cells were 77 and 54%, respectively. These levels of correction were equal to, if not higher than, those obtained by syringe loading of the oligonucleotide into the cells. The effectiveness of the free antisense morpholino oligonucleotide in restoration of correct splicing suggests the applicability of this or similar compounds in vivo experiments and possibly in treatment of thalassemia. 0.4. Leukemia Leukemia is a disease of the reticuloendothelial system which involves uncontrolled proliferation of leukocytes (i.e. white blood cells). Leukemia is generally thought to be an acquired (i.e. non-inherited) cancer, though genetic abnormalities (e.g. The Philadelphia Chromosome) may also play a role in the development of this condition [HeaO2]. Leukemia originates in an early cell in the blood-forming marrow or in the portion of the lymphoid system in the marrow. The major forms of leukemia are divided into four categories. Myelogenous (i.e. myeloid) and lymphocytic (i.e. lymphoid) refer to the progenitor cell type involved. Myeloid cells differentiate into erythrocytes, granulocytes, macrophages, monocytes, and platelets [Jan96]. Lymphoid cells differentiate into two types of lymphocytes: B-cells and T-cells. Acute and chronic refer approximately to the rate of progression of the disease. Thus, the four major types of leukemia are acute myelogenous leukemia (AML), chronic myelogenous leukemia (CML), acute lymphocytic leukemia (ALL), and chronic lymphocytic leukemia (CLL). (In our work, we will examine only AML and ALL.) The term acute lymphocytic leukemia is synonymous with acute lymphoblastic leukemia. The latter term is more frequently used to denote cases in children [HeaO4]. Acute leukemia is a rapidly progressing disease that affects mostly cells that are not yet fully developed or differentiated. These immature cells cannot carry out their normal functions. In addition, cells of acute leukemia strongly impede production of normal blood cells because of their uncontrolled multiplication, which crowds out normal cells. In contrast, chronic leukemia progresses slowly and does not significantly impede the production and development of normal blood cells, at least initially. In addition, these more mature cells can carry out some of their normal functions [LeuO4]. ALL Most lymphoid neoplasms (i.e. 80 to 85%) including ALLs are precursor B-cell in origin, whereas the remaining are primarily T-cell in origin [Cot99]. The majority of ALLs (85%) manifest as childhood leukemia with extensive bone marrow involvement [Cot99]. Clinically and morphologically, pre-B and pre-T lymphoblastic: malignancies are indistinguishable, therefore differentiation requires immunophenotyping [Cot99]. To further complicate the clinical picture, Varma, Chris 16 of 65 Ph.D. Thesis not only do pre-B and pre-T lymphoblastic malignancies present similarly, but they present with similar clinical features to AML--even though ALL and AML are immunophenotypically and genotpyically distinct diseases. This has been attributed to the fact that in both cases there is an accumulation of neoplastic blast cells in the bone marrow suppressing normal hematopoiesis by physical crowding and other mechanisms [Cot99]. This manifests the states of anemia, thrombocytopenia (i.e. low platelet count), and neutropenia (i.e. low neutophil count) which are shared by both ALL and AML. AML Acute myelogenous leukemia is characterized by the rapid proliferation of precursor myeloid cells (i.e. blood forming cells) resulting in the clinical picture shared with ALL which includes deficiency of red cells, decreased numbers of platelets, and reduced count of normal white cells (especially neutrophils) in the blood [LeuO4]. Also, the rapid proliferation of these precursor myloid cells, along with a reduction in their ability to undergo programmed cell death (apoptosis), results in their accumulation in various organs, most commonly the spleen and liver [EmeO4]. Diagnosis of Leukemia Diagnosis of a particular leukemia usually involves the following steps: · Medical history and physical examination · Complete blood counts · · Bone marrow examination for presence of leukemic blast cells Cytogenetics: Patient tissue is used in the process of analyzing the number and shape of the chromosomes of cells Immunophenotyping: A method that uses the reaction of antibodies with cell antigens to determine a specific type of cell in a sample of blood cells, marrow cells, or lymph node cells · To diagnose leukemia including the particular type of leukemia, the blood and marrow cells must be examined. In addition to low red blood cell and platelet counts, examination of the stained (dyed) blood cells with a light microscope will usually show the presence of leukemic blast cells. This is confirmed by examination of the marrow which usually show leukemia cells. The blood and/or marrow cells are also used for studies of the number and shape of chromosomes (cytogenetic examination), immunophenotyping, and other special studies, if required. Blood and bone marrow samples are used to diagnose and classify the disease. The following tests are used in the further classification of the disease. Examination of leukemic cells by cytogenetic techniques permits identification of chromosomes or gene abnormalities in the cells. The immunophenotype and chromosome abnormalities in the leukemic cells are very important guides in determining the approach to treatment and the intensity of the drug combinations to be used [LeuO4]. Varma, Chris 17 of 65 Ph.D. Thesis CML (and less frequently, ALL) is a particularly unique case in that a specific molecular test exists for its diagnosis. The Philadelphia Chromosome is a translocation of chromosomes 9 and 22 [t(9:22)(q34;q 1)], which is diagnostic for 95% of chronic myelogenous leukemia (CML) and a subset of acute leukemia(-l-13% of ALL). Molecular detection of the BCR-ABL translocation is performed using Reverse Transcription-Polymerase Chain Reaction (RT-PCR) analysis. This is currently performed at hospital labs such as Barnes-Jewish Hospital in St Louis, Missouri which is associated with Washington University in St Louis [Bar97]. In general however, molecular techniques are just beginning to enter the clinical setting for diagnosing leukemia. Treatment of Leukemia Leukemia is usually treated by chemotherapy and radiation therapy which aim to reduce the growth of the leukemic cells in the bone marrow and by transplantation of the bone marrow [MedO2]. For CML in particular, there is a drug called imatineb (Gleevec) which slows the proliferation of precursor myeloid cells by targeting the BCR-ABL proto-oncogene. However, resistance to Gleevec has become a common problem due to approximately 20 point mutations arising in BCR-ABL. There are currently several next generation imatineb-like drugs in late-stage development for the treatment of CML. Varma, Chris 18 of 65 Ph.D. Thesis Chapter 1: Profiling of Alternative Exon Splicing Note: a subset of the CD44 SBE and polony amplification primers as shown in Table 1 were originally designed and ordered by Jun Zhu, formerly a post-doc in the laboratory of George Church at Harvard Medical School. 1.1. Introduction In this chapter, we discuss the experimental methods developed and used for the profiling of alternative exon splicing: quantitative exon profiling and quantitative isoform profiling. However, this is just the first part of creating such profilesChapter 2 will introduce the necessary computational algorithms and computational methods to provide quantification. To profile exons, a method for in situ polymerase chain reaction in acrylamide gels followed by subsequent querying by single base extensions (specific for each exon of interest) is presented. The enabling wetlab technology for this method was previously developed by [Mit99] and improved by [Zhu03]. Our method-taken as the sum of both the wetlab and computational components (as presented in Chapter 2)-significantly improves on previous work as well as enables a higher level of robust quantitation. 1.2. Polony Technologyfor Exon Profiling Polymerase colony (polony) technology enables parallel amplification of millions of DNA or RNA molecules via performing polymerase chain reaction (PCR) within an acrylamide gel on a thin glass slide. This process of solid-phase PCR results in each template giving rise to a unique and distinct colony of amplified products known as polonies-and thus, each polony is monoclonal [Mit99, MitO3]. During PCR amplification, when two polonies come into contact they tend to form a distinct border excluding each another rather than overlapping or invading each other-thus they are spatially distinct [AacO4]. Therefore, each polony is effectively an independent PCR reaction on the order of nanoliters to femtoliters in size. Furthermore, because an acrydite modification is included at the end of one of the amplification primers, a strand of each amplicon is covalently linked to the acrylamide matrix of the gel and serves as a template for single base extensions (SBEs) and for probe hybridizations (see Figure 1). For more information on Polony Technology, please visit the polony website at http://arep.med.harvard.edu/Polonator/ Varma, Chris 19 of 65 Ph.D. Thesis - "_/=-8-- Figure 1-1. The core of Polony Techology. / I -/ ~a~~ It has been shown that combinatorial patterns I pm- I oeq.-ced I I pour acrylamide gel of exon inclusion or exclusion can be with DNA and PCR reagents determined across multiple polonies in parallel through an exon profiling method [Zhu03]. be!onl single DNA __ Since each polony arises from a single molecule molecule and because of the digital nature of I in-gel PeR .. amplification polony technology, interrogation of polony slides enables sensitive and accurate poIony= -'•• quantification of individual mRNA isoforms. In PeA colony: ••• fact, in cases of complex alternative splicing after image 01 (Le. many alternative exons), polony poIaniea technology is the only practical method to quantify specific isoforms. Furthermore, it is theorized that polony technology has even less bias than traditional PCR because 1) it is solid-phase, thus it is analogous to thousands or millions of separate femtoliter reactions and 2) in traditional PCR amplicons of different sizes compete for primers-and larger transcripts compete better (thus creating a bias). A DNA '0 be prvne< B l EJ--- ! •• ampkflC:Blion ••• ampl#lc:alion 1.3. Overview ~f Experimental Methods Assume following pre-mRNA: Figure 1-2. Simple conceptual example of creating a polony slide and querying it by single base extensions (SBEs). A polony slide is created by first polymerizing acrylamide in a solution containing standard PCR reagents and a substantially low concentration of cDNA template - this composes the gel [Mit99]. Before the gel is polymerized, it is added to a slide which is bind saline treated so that the gel can attach to the glass of the slide. Additionally, the reverse primer is Acrydite (Ac) modified on one end so that it can be covalently attached to the gel matrix. Thus, after the slide is thermal cycled, a single cDNA template has given rise to a PCR colony or 'polony' because each particular cDNA template was immobilized by virtue of being bound to the Acrydite modified reverse primer and products Varma, Chris 20 of 65 S' 3' Patient cDNA ~ 1. Pour gel wI PCR reagents & DNA j Single Molecule __ /".. At 2. In gel PCR amplification 3. Query Exon C2 by SSE ! 4. Query V1& V2 by SSE S. Aggregate [. .0 °0 Profile Ph.D. Thesis of the reaction remain localized near their respective templates. Now, each polony can be queried for its signature by single base extensions in parallel (see Figure 2). This involves designing primers that uniquely bind to each exon of interest. The base subsequent to the last base of the primer is where fluorescent-labeled deoxynucleotides (dNTP) are attached and thus can be uniquely queried via an integrated array scanner. 1.4. Primer Design All primers were designed using the Primer3 software package [RozOO]. Primers along with name and brief purpose are listed in Table 1. Primers to quantify amounts of CD44s and CD44v, L_5,17 and L_5,14 respectively, were designed such that approximately 1/4 of the bases flank the end of exon 5 and the rest of the primer flanks the beginning of either exon 17 or exon 14, respectively. This was done to reduce false positives during realtime QPCR experiments. All primers were checked for appropriate PCR product. NAME SEQUENCE PURPOSE L_5,17 L 5,14 R_ 19(s,v) hCD44ETE 19R2 hCD44ETE4F TCACTGTTCCTGATTGCTCA ATCACTGCTGATTCCACCTC AGCACAAAAGGTGAAGATCG TTTCCTGAGACTTGCTGGCCTCTCC ACAGACCTGCCCAATGCCTTTGATG Quantify CD44s Quantify CD44v Quantify CD44s/CD44v Quantify total CD44 Quantify total CD44 BACT F BACT R TCACCCACACTGTGCCCATCTACGA CAGCGGAACCGCTCATTGCCAATGG Quantify B-actin Quantify B-actin hCD44ETE19Rac2 hCD44ETE18Racl hCD44ETE4F hCD44SBE5F hCD44SBE6F TTTCCTGAGACTTGCTGGCCTCTCC ACAGCCCATGTGTCAGTTCTAGCGA ACAGACCTGCCCAATGCCTTTGATG ACACCCCATCCCAGACGAAGACAG(tccc) GAGGCAAGAAACCTGGGATTGGTTT(tca) Polony amplification Polony amplification Polony amplification Single Base Extension Single Base Extension hCD44SBE7AF hCD44SBE7BF GTACGTCTTCAAATACCATCTCAGCAGG(ct) TGGATCAGGCATTGATGATGATGAAG(attt) Single Base Extension Single Base Extension hCD44SBE8F hCD44SBE9F CCACGGGCTTTTGACCACACAAA(aca) GAAGCACACCCTCCCCTCATTCAC(cat) Single Base Extension Single Base Extension hCD44SBE1OF hCD44SBE1 1F AGAAGGAACAGTGGTTTGGCAACAGA(tgg) AGGACAACACCAAGCCCAGAGGAC(agtt) Single Base Extension Single Base Extension hCD44SBE12F TCCAAACACAGGTTTGGTGGAAGATT(tgg) Single Base Extension hCD44SBE13F TACATCACATGAAGGCTTGGAAGAAGA(taaa) Single Base Extension hCD44SBE14F GCAGGACCTTCATCCCAGTGACCT(cag) Single Base Extension hCD44SBE15F GGGGGTCCCATACCACTCATGGA(tct) Single Base Extension hCD44SBE17F CTTGGCCTTGGCTTTGATTCTTGC(agt) Single Base Extension Table 1-1. Primers used for quantification of CD44v, CD44s, total CD44, and B-actin via realtime PCR and primers used for polony amplification Varma, Chris 21 of 65 and single base extensions. Ph.D. Thesis 1.5. Polony Slide Creation & Subsequent Preparation Polony amplification was performed similar to [Mit99, MitOl, Zhu03], however with significant modifications. 5ul of cDNA sample was added to gel mix (7.5% acrylamide, 0.35% DATD, 0.035% Bis-acrylamide, 0.71% each of two 100 uM acrydite modified reverse primers, 3.5% Rhinohide PCR gel strengthener, 0.2% BSA) along with ammonium persulfate (APS) and TEMED to a final concentration of 0.087%. 18 ul of this solution was added to a bind saline (Pfizer) treated, partially Teflon coated (Erie Scientific) glass microscope slide (Fisher Scientific). The partial coat of Teflon was designed to leave an oval shaped recessed center where the solution was added. The glass microscope slide was covered with a glass coverslip (No.2-Fisher Scientific). The gel polymerized under argon gas for 24 minutes. The slide was washed without its coverslip for 18 minutes in dH20, then dried under the hood for 16 minutes. 28ul of polony amplification mix (0.67% BSA, 1.0% 100 uM forward primer, 2.5% 10mM dNTP, 10% 1Ox Jumpstart Taq Buffer, 6.67% Jumpstart Taq) was added to the slide and it was covered with a coverslip. The slide was covered with 65 ul frame-seal chamber (MJ Research). The chamber was filled with 550 ul of mineral oil. 'The slides were cycled as follows: denaturation (94°C for 3 minutes), 59 cycles (94°C for 30s, 56°C for 30s, 72°C for 2 minutes). Subsequently, the polony slide was denatured in 70% formamide pre-heated at 70°C for 15 minutes (in order to remove the excess template) and washed three times in wash buffer 1E (10 mM Tris-HCI pH 7.5, 50 mM KCI, 2 mM EDTA, 0.01% Triton X-100). 1.6. Single Base Extensions & Scanning A frame seal chamber (MJ Research) was attached to each slide, and 100 ul of annealing mix (0.5% 100 uM SBE primer and 10% 10x Jumpstart Taq Buffer) was added to the gel. The slide was heated at 94°C for 6 minutes and cooled to 60°C for 15 minutes. The slide was washed three times in wash buffer 1E. The slide was equilibrated with Klenow extension buffer (50 mM Tris-HCI pH 7.3, 5 mM MgCI 2, 0.01% Triton X-100). Single base extension reactions were conducted by adding fluorescence-labeled dNTP, Klenow Polymerase (NEB), Klenow buffer, and single stranded binding protein (SSB). The slide was incubated at room temperature for 2 minutes. After washing with wash buffer IE, slide was scanned using a GenePix 4100B Integrated Array Scanner (Axon Laboratories) at 10 um resolution using 635 nm (Cy5 detection) and 532 nm (Cy3 detection) lasers. Sixteen-bit values per pixel are obtained. Each slide that was created is queried with single base extensions for each of 14 exons (5 - 14, 17 - 19) followed by scanning. After each cycle of single base extension, the slide is denatured and scanned again to obtain a valid background image. This results in 7 cycles of single base extensions,14 scans on two Varma, Chris 22 of 65 Ph.D. Thesis channels, and 28 images per slide (including a background image for each exon queried). (Note: Exon 7a was also queried here - data not reported). 1.7. Real-Time QPCR Real-Time PCR was done to quantify the amounts of total CD44, amounts of a specific variant isoform of CD44 - that which only contains variant exon 14, and amounts of B-Actin in each sample. Experiments were done in duplicate. An Opticon 2 Real-Time QPCR machine (Bio-Rad Laboratories) and SyBr Green dye were used to perform real-time PCR experiments. Each sample was first diluted 10x. Serial dilutions of 1x, 5x, and 25x were performed for each sample for each primer pair. Each well contained 20 ul of solution (2.5% appropriately diluted sample, 50% SyBr Green Real-Time PCR Mix, 3% 10 uM reverse primer, 3% 10 uM forward primer). A blank well (i.e. dH20 substituted for sample), associated with each sample for each primer pair, was used for background subtraction. The plates were cycled as follows: denaturation (94°C for 6 minutes), 47 cycles (94°C for 30s, 58°C for 30s, Plate Read, 72°C for 1:30 minutes, Plate Read), obtain melting curve (72°C for 8 minutes, melting curve from 65°C to 90°C, read every 0.2°C, hold s). 1.8. Conclusion The experimental methods of polony slide creation and single base extensions as described here were used to generate images which capture the digital representation of the exons present on a slide. Chapter 2 discusses how we obtain data from this large set of images. Varma, Chris 23 of 65 Ph.D. Thesis Chapter 2: Construction of Expression Profiles 2. 1. Introduction Process Figure 2-1. Summary of raw data obtained. 18 We introduce a software program for the computational processing and subsequent construction of quantitative exon profiles and construction of quantitative isoform profiles of all sample slide images acquired in Chapter 1. The need for computational processing is significant due to the large amounts of data needing to be processed for this work (see Figure 1). Furthermore, computational processing allows for standardization of processing which reduces inconsistencies among images analyzed. g x 1 [= = - 54x 1 756 x 1512 x • • II .tit -15 M polonies, >11 GB ot data 2.2. Overview of Software The goal of such a software program is quite simple (see Figure 2). We are attempting to determine the number and location of polonies present on each image. The numbers by themselves serve as early values for exon counts which when further analyzed are used to create quantitative exon profiles. To obtain the specific isoforms present, we essentially want to remember the locations of each polony present on each image of a sample and string these images together (in the correct order) to construct the isoform signatures. Varma, Chris 24 of 65 Ph.D. Thesis Assume following pre-mRNA: Figure 2-2. Simple conceptual example of the goal ofthe software -- to determine the number of exons on each polony image of each slide and to construct an isoform profile from this information. This figure is consistent with Figure 1 of Chapter 1. These isoform signatures are then used to create the quantitative isoform profiles. Of course, the significant challenges of image-processing wetlab generated gels, identifying what is truly the structure of a polony for recognition, and appropriately constructing an isoform signature require sophisticated techniques and algorithms. 5' 3' Exons queried by SBEs : Exon C1 C2 V1 V2 C3 Count N/A 5 2 3 N/A Assume C2 -> C1 -> C3, then isoforms: V1+V2 ~ Count1 V2 Count 2 V1 Count 1 Zero Count 1 2.3. Image Processing Figure 2-3. Original Image - pre processing. Raw 16-bit images obtained via scanning (see Figure 3) in Chapter 1 were subtracted from a pre.defined mask that removed the pixels residing on the non-gel portions of the image. The images were then filtered (along with their associated background images) with a 3 X 3 median filter. The image was then subtracted from its background image. The border of the image was then cleaned and smoothed to reduce noise that is often found at the border of images. These images were then pre-processed using a specific combination of image openings and closings including a top-hat transformation. 2.4. Thresholding Next, the images were thresholded. Because polonies have florescence on an image within at least the same order of magnitude which is different than the Varma, Chris 25 of 65 Ph.D. Thesis florescence of particle junk (e.g. dust) and remaining background, we are able to usually apply a fully automated means-based thresholding method (Figure 4). T = 0.5*(double(min(f(:))) = + double(max(f(:)))); done false; while -done g = f >=T; Tnext = 0.5*(mean(f(g)) + mean(f(-g))); done abs(T - Tnext) < 0.5; T Tnext; end = = Figure 2-4. Primary threshold method applied. Figure 2-5. Image after thresholding. In some cases, this thresholding method did not yield good results across a sample. For these cases we either applied the matlab function 'thresh' or we used a semi-automated technique. The resulting thresholded images (see Figure 5) were used for subsequent processing. 2.5. Polony Identification Candidate polonies were derived via matlab functions that yielded the connectedcomponents on the thresholded image. These candidate polonies were evaluated for true polonies based on their size and shape. Size range was determined by area (Le. number of pixels). Shape was determined by range of allowed eccentricity (roughly from a oval to a circle) and by a bounding box configuration. These minimal yet comprehensive parameters enabled a single parameter set to be used for the successful processing of all images of all samples-this was a major goal of our work. True polonies were then saved into a pre-initialized structure. Varma, Chris 26 of 65 Ph.D. Thesis 2.6. Isoform Construction In the previous section, we knew the number of types of exons to expect and thus memory allocation was simply handled by pre-initialization. In isoform construction, since the number of isoforms cannot be predicted, more advanced memory allocation methods were used to enable minimum runtime. The key portion of the novel algorithm used to construct isoforms is shown in Figure 6. _ %Determine the isoforms for i = 1:(NUM_SLIDES - 1) %Comparing the last slide to itself makes no sense, check separately if (Exons(i).ThereArePolonies == 1) %save computational time by not looping when not required forj =-1:Exons(i).NumPolonies if (Exons(i).Polonies(j).AlreadyCounted == 0) Numlsoforms = Numlsoforms + 1; Norkinglsoform(i) = 1; Exons(i).Poloniesoj).AlreadyCounted = 1; for k = i+1 :NUM_SLIDES %Compare only against slides numbered greater sequentially if (Exons(k).ThereArePolonies == 1) for I = :Exons(k).NumPolonies if (Exons(k).Polonies(l).AlreadyCounted == 0) NumPixels.... if (comparepolonies(Exons(i).Polonies().Pixels,Exons(i).Polonies(j). Exons(k). Polonies(l).Pixels, Exons(k). Polonies(l). Num Pixels) == 1) Workinglsoform(k) = 1; Exons(k).Polonies(l).AlreadyCounted = 1; break %break the for loop as only 1 polony can overlap per slide end end end end end [Isoforms, NumUniquelsoforms] = addisoform(lsoforms,NumUniquelsoforms,Working soform); Workinglsoform = linspace(O,O,NUM_SLIDES); %reset to all O's end end end end Figure 2-6. The main algorithm by which isoforms were constructed. The function comparepolonies was enabled to allow for a small amount of gel movement between images because a gel tends to shrink over repeated SBE's and this may result in slight movement of polonies. The results of the algorithm were randomly checked by hand. 2.7. Special Cases & Other Software In order to determine exon 17 skipping, image of exon 17 was subtracted from image of exon 5, followed by a necessary array of image post-processing Varma, Chris 27 of 65 Ph.D. Thesis algorithms which included opening and closing of the images, top-hat transformations, and guassian filtering. Software was also written for necessary processing of quantitative exon and isoform profiles, as well as for the novel process of Isoform Convergence (introduced in Chapter 5). 2.8. Performance of Software Software was developed on Matlab (R12.5) with extensive use of the image processing toolbox functions. A single sample is represented by 28 images - one for each alternative exon (including exon 18 and 19), exon 5, exon 17, and one background image for each. Each 16-bit .tiff image is on average 7.5MB in size - variation in size is primarily due to number of polonies and gel variation. To completely process one sample, it takes on average 15 minutes of processing time (assuming cases where no semi-automated interference is required) on a Pentium 4 machine with 1GB of RAM. For cases where a semi-automated intervention is required, time can increase significantly. 2.9. Conclusion The software described in this Chapter was an integral component of quantitative exon and isoform profiling. In fact, it is not likely that these profiles can be constructed without the algorithms developed due to the large quantity of data, necessity to maintain rigid consistency of parameters across all images of all samples, and requirement of fidelity. Chapter 3: Characterization of Samples 3.1. Introduction In this chapter, we conduct both semi-quantitative analysis of CD44 as well as detailed variant exon quantitation. Since much of the previous work on alternative splicing of CD44 in leukemia has attempted only semi-quantitative variant exon quantitation, we compare our findings to this previous work. In several cases, we identify exons that were not thought to be expressed in certain cell types as well as previously unreported findings. Varma, Chris 28 of 65 Ph.D. Thesis 3.2. Obtaining Samples Samples of blasts from human peripheral blood and from human bone marrow derived from patients with acute myelogenous leukemia (AML) and B-cell acute lymphocytic leukemia (ALL) are kindly provided by Dr. Linda Bendall, at The Westmead Institute for Cancer Research, Westmead Millennium Institute, University of Sydney, Westmead, NSW, Australia. Each sample is obtained from a different individual and was provided as 20 to 50 ul of cDNA. The percentage of leukemic cells in all samples is greater than 90%. Samples of human peripheral blood purified B cells, human cord blood purified B cells, and human adult whole bone marrow cells derived from normal (i.e. non- diseased) individuals were purchased from ALLCELLS, LLC, Berkeley, California USA. Each sample is obtained from a different individual and was provided as 20 ul of cDNA. A sample of human breast tissue cells derived from a breast tumor was used as a positive control. This sample was provided by Jun Zhu, Assistant Professor, Duke University, Durham, NC USA. The samples that will be analyzed are listed in Table 1. Sample No. Designation Source 318 Normal (NM) Human cord blood purified B cells 794 984 1072 Normal (NM) Normal (NM) Normal (NM) Human adult bone marrow cells Human adult bone marrow cells Human peripheral blood purified B cells 1139 397 Normal (NM) AML Human peripheral blood purified B cells Human peripheral blood 505 656 735 AML AML AML Human adult bone marrow Human adult bone marrow Human adult bone marrow 1601 391 572 AML ALL ALL Human peripheral blood Human peripheral blood Human adult bone marrow 596 616 0 (originally 'breast tumor') ALL ALL B.Tumor Human adult bone marrow Human adult bone marrow Human solid breast tumor Table 3-1. Samples analyzed in this work. Sample No. designations come from the sources. Normal implies non-diseased. 3.3. Semi-Quantitative Analysis of CD44 All samples were quantified for total CD44, for total B-Actin, and for CD44v10 -an abundant variant isoform of CD44 -- via Real-Time QPCR (as discussed previously). The quantified values of CD44v10 (not shown) were then used as a surrogate for overall variant CD44 expression in order to derive the value of total CD44 present on each polony sample slide. This was necessary because large Varma, Chris 29 of 65 Ph.D. Thesis quantities of molecules (> 1.E+04) on a polony slide cannot be accurately quantified via computational methods due to signal saturation. Additionally, serial dilutions of starting template for polony amplification were used to verify derived values of total CD44 (results not shown), and %RNA template ratio correlates strongly (r=0.99, 95% confidence interval) with %Polony count ratio [Zhu03]. Total CD44 had the following ranges for each type of sample: AML 4.E+4 to 5.E+5, ALL 5.E+3 to 1.E+5, NM 7.E+3 to 2 E.+5 (see Table 2). Desig. Sample No. #CD44 #CD44s #CD44v CD44v/CD44 CD44/Bactin AML AML AML AML AML 397 505 656 735 1601 2.E+05 4.E+05 4.E+04 4.E+05 5.E+05 2.E+05 4.E+05 4.E+04 4.E+05 5.E+05 431 977 272 1005 519 2.E-03 2.E-03 8. E-03 2. E-03 9.E-04 3.E-01 3.E-01 2.E-01 1.E-01 3.E-01 3. E-03 4.E-03 2. E-02 3. E-03 9.E-03 2. E-01 2.E-01 3.E-01 1.E-01 2.E-01 3. E-03 3.E-02 6. E-02 4.E-03 2.E-03 2.E-03 2. E-01 4.E-01 4.E-02 4.E-02 4.E-01 6.E-02 2. E-02 2. E-01 ALL ALL ALL ALL NM NM NM NM NM SUM 2.E+06 2.E+06 3204 A VERAGE 391 572 596 616 SUM AVERAGE 318 794 984 1072 1139 3.E+05 6.E+04 3.E+04 1.E+05 5.E+03 2.E+05 1.E+05 7.E+03 8.E+03 9.E+04 5.E+04 2.E+05 3. E+05 6.E+04 3.E+04 1.E+05 5.E+03 2.E+05 1.E+05 7.E+03 8.E+03 9.E+04 5.E+04 2.E+05 641 290 541 361 49 1241 310 199 520 329 89 340 SUM 3.E+05 3.E+05 1477 AVERAGE 6.E+04 6.E+04 295 Table 3-2. # CD44 represents the total number of CD44 molecules on a slide of each sample - this value was obtained using the value for CD44v10 (not shown) for each sample as a surrogate for overall CD44v expression which enabled linking Real-Time PCR data to Polony Technology data. # CD44s represents the total number of standard (i.e. without alternative exons) CD44. # CD44v represents the total number of CD44molecules that include at least one alternative exon - this value was obtained by alternative exon profiling. Note that CD44v includes isoforms that are exon 5 cryptically-spliced and includes isoforms that express exon 17-skipping - and these are analyzed separately. The counts for exon 5 cryptically-spliced isoforms for each of AML, ALL, NM are 362, 440, 337 respectively. The counts for exon 17-skipping isoforms for each of AML, ALL, NM are 300, 151, 203 respectively. CD44v/CD44 represents the ratio of the total number of CD44 molecules that include at least one alternative exon to the total number of molecules of CD44. CD44/Bactin represents the ratio of number of molecules of CD44 to number of molecules of the house-keeping gene B-Actin in each sample. Based on the averaged values of #CD44 (see Table 2) for the various sets of samples, total CD44 is up-regulated approximately one order of magnitude in both AML and ALL compared to Normal (NM). However, the ratio of total CD44 to total B-Actin is constant among AML, ALL, and NM. Since B-Actin is a well known house-keeping gene [Alb94] and may be applied as a surrogate for overall Varma, Chris 30 of 65 Ph.D. Thesis gene transcription, it is likely that overall gene transcription (including that of CD44) is up-regulated by one order of magnitude in the leukemia samples as compared to Normal. However, the differences in the averaged ratios of total variant CD44 (obtained via alternative exon profiling) to total CD44 in the samples suggests that expression of the variant exons of CD44 is up-regulated by approximately one order of magnitude in AML and ALL versus normal. 3.4. Quantifying Variant Exons Variant exons were queried via alternative exon-profiling (as previously described in Chapter 1) and appropriate SBE primers. Sample were aggregated by sample type. In Normal samples, ALL samples, and AML samples (see Figure 1), counts of exons generally increase in the 5' to 3' direction as observed by [Zhu03]. Exon Counts of Samples Aggregated by Designation 1200 (J) +-' 1000 C ::J o o c o x 800 600 W 400 200 o V2 V3 V4 V5 V6 V7 V8 V9 V10 Variant Exon Figure 3-1. Exons are represented by their variant exon designation. Counts of each variant exon were aggregated within like designations of AML, ALL, and Normal (NM). In normal samples (see Table 3), we found that all samples expressed variant exon 7 (V3), all samples expressed variant exon 10 (V6), and all samples expressed exons 12 (V8) through 14 (V10) as is consistent with previous work [Aks02, BenOO, Ben04]. We also found that no normal sample expressed variant exon 11 (V7). Other work has also found lack of CD44v7 mRNA expression as well via RT-PCR and southern blotting methods [BenOO]. Varma, Chris 31 of 65 Ph.D. Thesis We also detected a small quantity (0.40/0) of both exon 8 (V4) and exon 9 (V5) in 2 of 5 samples (one of which is human whole bone marrow purified cells and the other is human peripheral blood purified B-cells). This differs from previous work in which these mRNA variants were not found and were thought not to be expressed in normal peripheral blood lymphocytes or normal whole bone marrow [BenOO]. We also found low expression (0.5%) of exon 6b (V2) in 3 of 5 normal samples which has not previously been reported to the author's knowledge. VariantExon EXON 6B (V2) EXON 7 (V3) EXON 8 (V4) EXON 9 (VS) EXON 10 (V6) EXON 11 (V7 ) EXON 12 (va) EXON 13 (V9) EXON 14 (V10) Totals AML %of #of Total Samples 0.7% 20f5 2.9% 40f5 0.4% 30f5 0.4% 50f5 8.1% 50f5 1.1% 30f5 21.6% 40f5 36.2% 50f5 28.7% 50f5 100.0% ALL %of #of Total Samples 1.5% 40f4 5.8% 40f4 0.3% 20f4 2.6% 30f4 14.9% 40f4 4.6% 30f4 24.6% 40f4 25.1% 40f4 20.7% 40f4 100.0% Normal (NM) %of #of Total Samples 0.5% 30f5 3.3% 50f5 0.4% 20f5 0.4% 20f5 7.2% 50f5 0.0% o of5 25.1% 50f5 33.4% 50f5 29.7% 50f5 100.0% Table 3-3. % of aggregated samples that express each variant exon and number of samples expressing each variant exon. % of totalis the percent of polonies identified from the summation of allsamples of each sample type for each variant exon - thus these results are pooled from data acquired on independent samples. In AML samples (see Table 3),we found that a majority of samples expressed variant exons 7 (V3) and 10 (V6) to 14 (V10) as is consistent with previous work [Leg98, Aks02, BenOO]. However, we also detected a small quantity (0.40/0) of both exon 8 (V4) and exon 9 (V5) in 3 of 5 and 5 of 5 samples, respectively. This differsfrom previous work in which these variants were reported not to be found [BenOO]. We also found low expression (0.7%) of exon 6b (V2) in 2 of 5 AML samples which has not previously been reported to the authors knowledge. In ALL samples (see Table 3), we found that 4 of 4 samples expressed exons 7 (V3), 10 (V6), and 12 (V8) to 14 (V10) which are expressed in higher ratiosthan previously found [Ben04, Aks02, BenOO]. Further, we found that exon 8 (V4), exon 9 (V5), and exon 11 (V7) were present in 2 of 4, 3 of 4, and 3 of 4 of the samples in small to medium amounts: 0.3%, 2.6%, and 4.6%, respectively. This differsfrom previous work in which these mRNA variants were reported not to be found [Ben04]. We also found some expression (1.5%) of exon 6b (V2) in all ALL samples which has not previously been reported to the author's knowledge. Varma, Chris 32 of 65 Ph.D. Thesis 3.5. Conclusion Although this is just the beginning of the quantitative profiling ability of our methods, we have already corrected several incorrect claims in the published CD44-leukemia literature as well as gained insight previously not reported. Varma, Chris 33 of 65 Ph.D. Thesis Chapter 4: Exon Expression Profiles Note that in order to derive more robust and universally accepted results, we compare the leukemias against a heterogeneous mix of normal samples (containing independent samples of human peripheral blood purified B-cells, human cord blood purified B-cells and, human adult whole bone marrow purified cells) designated as NM. 4. 1. Introduction We now apply quantitative methods to obtain insight when looking for significant differences between exon profiles of the various sample types - we are specifically interested in statistically significant differences between AML and normal, ALL and normal, and AML and ALL. Therefore, samples of each type were aggregated. Here we only consider the variant exons 6 - 14 and the exclusively splicing exons of the CD44 tail: exon 18 and exon 19. We compare our results to published work when available and provide new findings. The level of quantitative analysis provided here significantly exceeds what has previously been identified in variant exon profiling of CD44 in the leukemias. 4.2. Quantitative Comparison of AML to Normal Variant exon counts of AML and normal samples were aggregated and compared (see Table 1). After comparison of the ratios of each variant exon to the total number of counts for all variant exons of each sample type, we found that AML is down-regulated with respect to exon 12 (p < .009) and AML is upregulated with respect to exon 13 (p < .05). Since NM does not express exon 11, AML was up-regulated with respect to this exon (p < .0004). Exon 11 (V7) enables direct binding to chondroitin sulfate, heparin, and heparin sulfate in addition to HA [Sle97]. Exon AML AML Ratio NM NM Ratio P.value EXON 6B (V2) EXON 7 (V3) EXON B (V4) EXON 9 (V5) EXON 10 (V6) EXON 11 (V7) EXON 12 (VB) EXON 13 (V9) EXON 14 (VI0) Varma, Chris 34 of 65 Ph.D. Thesis Table 4-1. Individual exon counts were based on the aggregation of 5 independent samples of AML and 5 independent samples of normal. NM = normal samples. X Ratio (where X = sample type) = ratio between counts of each exon out of total count of exons. C.f. Ratio = ratio between AML and NM (negative values indicate down-regulation in AML). Statistically significant up-regulation of AML is highlighted in light blue and downregulation is highlighted in light green. The p-values were calculated by a standard largesample test procedure for evaluating the difference between proportions obtained in two different populations [Dev91]. 4.4. Quantitative Comparison of AML to ALL Variant exon counts of AML and ALL were aggregated and compared (see Table 2). After comparison of the ratios of each variant exon to the total number of counts for all variant exons of each sample type, we found that ALL is upregulated vs. AML for a majority of the exons (6 of 9): 6b (p < .02), 7 (p < .00006), 9 (p « .05), 10 (p « .05), 11 (p« .05), and 12 (p < .04). Exon 10 (V6) has been reported to be preferentially expressed on ALL cells [Mag01, Ben04]. and our data is consistent with this result. AML was up-regulated as compared to ALL for only 2 of 9 exons: 12 (p «.05) and 14 (p < .000006). However, AML expressed almost 10x as much exon 13 than ALL. Exon EXON EXON EXON EXON EXON EXON EXON EXON EXON AML 6B (V2) 7 (V3) a (V4) 9 (V5) 10 (V6) 11 (V7) 12 (va) 13 (V9) 14 (VI0) 20 82 - 12 AML Ratio ALL 0.0070 0.0288 11 43 0.0042 2 ~9 :111' 1:8 ,., «_.~ ::iO.OO3-S rzam <.;; -,~:; t.;O.OIOB" 5'2 0.0142 614 1029 816 0.2158 0.3617 0.2868 " , " 34 183 187 154 ALL Ratio 0.0148 0.0578 C.f. Ratio -2.10 -2.01 2.E-02 6.E-05 0.0027 1.57 3.E-01 0.0255 0.1492 0.0457 0.2460 0.2513 0.2070 -7.27 -1.85 <<5.E-02 <<5.E-02 «5.£:-02 4.E-02 «5.E-02 6.E-06 -4.06 -1.14 1.44 1.39 P-value Table 4-2. Individual exon counts were based on the aggregation of 5 independent samples of AML, and 4 independent samples of ALL. X Ratio (where X = sample type) = ratio between counts of each exon out of total count of exons. C.f. Ratio = ratio between AML and ALL (negative values indicate down-regulation in AML or up-regulation in ALL). Statistically significant up-regulation of AML is highlighted in light blue and up-regulation of ALL is highlighted in light green. The p-values were calculated by a standard largesample test procedure for evaluating the difference between proportions obtained in two different populations [Dev91]. 4.5. Quantitative Comparison of ALL to Normal Simiar to section 4.3, variant exon counts of ALL and normal samples were aggregated and compared (see Table 3). After comparison of the ratios of each Varma, Chris 35 of 65 Ph.D. Thesis variant exon to the total number of counts for all variant exons of each sample type, we found that ALL is up-regulated with respect to exons 6 (p < .02), 7 (p < .005) and ALL is significantly up-regulated with respect to exons 9 (p < .00002) and 10 (p «.05). Exon 6 has been reported to be preferentially expressed on ALL cells [Mag01, Ben04], and our data is consistent with this result. Conversely, ALL is significantly down-regulated with respect to exons 13 (p < .00008) and 14 (p < .000008). Since NM does not express exon 11 and ALL expresses it in almost 5% of total CD44v (see Table 3, Chapter 2), ALL was significantly up-regulated with respect to this exon (p «.05). Exon 11 (V7) enables direct binding to chondroitin sulfate, heparin, and heparin sulfate in addition to HA [Sle97]. C.f. P-value Ratio 2.69 EXON 6B (V2) 11 0.0148 6 0.0055 2.E-Q2 EXON 7 (V3) 1.75 36 0.0330 5.E-Q3 43 0.0578 EXON B (V4) 4 -1.36 4.E-01 2 0.0037 0.0027 EXON 9 (V5 ) 19 4 6.97 2.E-Q5 0.0037 0.0255 EXON 10 (V6) 111 79 0.0724 2.06 «5.E-Q2 0.1492 EXON 11 (V7) N/A 0 0.0000 «5.E-02 34 0.0457 EXON 12 (VB) 274 -1.02 183 0.2511 4.E-01 0.2460 EXON 13 (V9) -1.33 187 364 0.3336 8.E-Q5 0.2513 EXON 14 (VI0) -1.43 154 324 0.2970 0.2070 8.E-Q6 Table 4-3. Individual exon counts were based on the aggregation of 4 independent samples of ALL and 5 independent samples of normal. NM normal samples. X Ratio (where X sample type) ratio between counts of each exon out of total count of exons. C.f. Ratio ratio between ALL and NM (negative values indicate down-regulation in ALL). Statistically significant up-regulation of Cancer is highlighted in light blue and downregulation is highlighted in light green The p-values were calculated by a standard largesample test procedure for evaluating the difference between proportions obtained in two different populations [Dev91]. Exon ALL Ratio ALL = = 4.6. Quantitative NM = = Comparison NM Ratio of Cancer to Normal Variant exon counts of AML and of ALL were aggregated and designated as "Cancer." They were then compared to aggregated values of normal samples (see Table 4). After comparison of the ratios of each variant exon to the total number of counts for all variant exons of each sample type, we find that no exon is statistically up-regulated in Cancer vs. normal except for exon 11 (p < .000003) which was found not to be expressed in normal cells (in Chapter 3). This is interesting in regard to [Kha96] which reports that the authors found significant expression of CD44v7 (Le. any isoform containing V7) in 10 healthy volunteers via staining for anti-CD44v7. However, this is not possible if there is no expression of CD44v7 mRNA. Other work has also found lack of CD44v7 mRNA expression via RT-PCR and southern blotting methods [BenOO]. Varma, Chris 36 of 65 Ph.D. Thesis Exon Cancer 6B (V2) 7 (V3 ) C. Ratio NM 31 125 14 29 341 NM Ratio C.f. Ratio 1.57 1.06 1.06 2.20 1.31 P-value 0.0086 6 0.0055 2.E-01 0.0348 36 0.0330 4.E-01 a (V4) 0.0039 4 0.0037 5.E-01 9 (VS) 0.0081 4 0.0037 6.E-02 10 (V6) 0.0950 79 0.0724 1.E-02 11 (V7) 66 0.0184 0.0000 N/A 3.E-06 0 12 (va) 0.2221 797 274 0.2511 -1.13 2.E-02 13 (V9) 1216 0.3388 364 0.3336 4.E-01 1.02 14 (VI0) 970 0.2703 324 0.2970 4.E-02 -1.10 Table 4-4. Individual exon counts were based on the aggregation of 5 independent samples of AML and of 4 independent samples of ALL, versus the aggregation of 5 independent samples of normal. NM normal samples. X Ratio (where X sample type) ratio between counts of each exon out of total count of exons. C.f. Ratio ratio between (AML+ALL) and NM (negative values indicate down-regulation in the leukemias). Upregulation of Cancer is highlighted in light blue. The p-values were calculated by a standard large-sample test procedure for evaluating the difference between proportions obtained in two different populations [Dev91]. EXON EXON EXON EXON EXON EXON EXON EXON EXON = = = = 4.7. Conclusion From this early analysis, we can begin to consider several exons as potentially important transcriptional elements that may help to distinguish AML and ALL from normal-based on their statisticallysignificantup or down regulation. Exon 11 (V7) may perhaps be the most interesting of these since normal cells from various sources (peripheral blood, bone marrow, and cord) were found not to express itat an average resolution of on the order of 1.E+06. However, looking only at the exons present or absent in transcripts does not provide an accurate or comprehensive view of the actual transcripts that are present since this data does not account for combinatorial diversity-we explore that in the next chapter. Varma. Chris 37 of 65 Ph.D. Thesis Chapter 5: Isoform Expression Profiles Note that in order to derive more robust and universally accepted results, we compare the leukemias against a heterogeneous mix of normal samples (containing independent samples of human peripheral blood purified B-cells, human cord blood purified B-cells and, human adult whole bone marrow purified cells) designated as NM. 5. 1. Introduction We now apply quantitative methods to obtain insight when looking for significant differences between isoform profiles of the various sample types - we are specifically interested in statistically significant differences between AML and normal, ALL and normal, and AML and ALL. Here we only consider the exons 6 - 14 and the exclusively splicing exons of the CD44 tail: exon 18 and exon 19. We compare our results to published work when available - although it is quite sparse in the case of isoform profiles and identification of particular isoforms. Therefore, the results provided here firstly represent an unprecedented level of quantitative detail into the exact nature of CD44's (or any protein's) alternative exon splicing in AML, ALL, and normal cells of the immune system-thus, many of the isoforms presented here have previously never been identified. 5.2. Common Isoforms Quantitative isoform profiling (as described in Chapter 1 and Chapter 2) was used to determine the common isoforms in aggregated samples of AML, ALL, and NM. Common isoforms are identified by the following procedure: 1) Aggregate the isoforms (and their associated occurrences) of all samples -including those of different sample types 2) Order the resulting isoforms from greatest occurrence to least 3) Select isoforms such that the number of occurrences (of the aggregated alternatively spliced isoforms) is greater than 10 and that the last isoform selected represents at least 1% of the number of alternatively spliced isoforms in an average sample of the aggregated set. In our case, we identify 17 common isoforms since the 18 th candidate common isoform has only 8 occurrences (data not shown), but the 1 7 th candidate common isoform has 11 occurrences with an expression of 3.36% in the average sample of the aggregated alternatively spliced isoforms. This simple process gives us some degree of qualitative confidence that the isoforms are "real" and are very Varma, Chris 38 of 65 Ph.D. Thesis unlikely to be due to errors from polony gel creation, SSE's, or computational profile construction. Looking at Table 1 which shows the 17 common isoforms identified, AML clearly expresses many more total isoforms as well as more total different isoforms (data not shown) than either ALL or NM. Interestingly, ALL expresses fewer total isoforms than NM - even when correcting for the difference in number of samples used for aggregated values. Desig. Isoform Signature AML ALL NM #1 #2 #3 ---------------13------19 ------------------14---19 ------------12---------19 ------10---------------19 ------------1213------19 ------------12---14---19 -7---------------------19 ---------------1314---19 -7------------13 ------19 ---------11------------19 ------------121314---19 6----------------------19 -----9-----------------19 ------10------13------19 ----------------------18----8-------------------19 ---------111213------19 Other Troe Totals Theorized Totals 840 701 410 222 110 65 63 27 16 10 0 13 7 1 282 249 185 72 23 35 16 10 16 0 26 6 3 4 1 3 0 7 2542 126 122 137 81 14 13 29 8 9 26 8 11 19 19 11 2 0 15 650 2542 813 #4 #5 #6 #7 #8 #9 #10 #11 #12 #13 #14 #15 #16 #17 Other 9 11 11 26 938 938 Table 5-1. Counts of the 17 most common CD44 alternatively spliced isoforms in aggregated samples of AML, ALL, and NM were obtained by quantitative isoform profiling as described in Chapter 1 and Chapter 2. Isoform Signatures describe the known expressed variant exons where '6' refers to '6b' and we include exons 18 and 19 which are the mutually exclusive tails. Note that "Other" only contains other isoforms that are NOT exon 5 cryptically spliced and are NOT exon 17-skipping - these are analyzed separately. The counts for exon 5 cryptically spliced isoforms for each of AML, ALL, NM are 362, 440, 337 respectively. The counts for exon 17-skipping isoforms for each of AML, ALL, NM are 300, 151, 203 respectively. Note that adding these values back into the "True Totals" results in the values as presented in Table 3-2. Also, notice that whereas AML and NM are a composite of 5 samples, ALL is only a composite of 4 samples. Thus, we provide a theorized total for ALL which assumes that a fifth sample would express a number of isoforms that can be approximated as an average of the reported 4 samples, or 163 isoforms. There are several orders of magnitude difference in the expression of various alternatively spliced isoforms of CD44 (see Figure 1). Also, the fall-off between the most prevalent and less prevalent isoforms is steep. AML encompasses the Varma, Chris 39 of 65 Ph.D. Thesis most expansive range between most common alternatively spliced isoform and most rare alternatively spliced isoform. Isoform Counts of Samples Aggregated by Designation 900 800 700 .AML (J) ...... c: :J a 600 .ALL () ONM E 500 Sa (J) - 400 300 200 100 o #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 #11 #12 #13 #14 #15 #16 #17 Other Alternatively Spliced Isoform Figure 5-1. Counts of the 17 most common alternatively spliced CD44 isoforms in aggregated samples of AML, ALL, and NM were obtained by quantitative isoform profiling as described in Chapter 1 and Chapter 2. 5.3. Quantitative Comparison of AML to Normal Aggregated alternative splicing isoforms of AML samples were compared to those of NM samples (see Table 2). 4 of the 17 most common isoforms were found to be up-regulated in AML with statistical significance (from p < .05 to p < .005). 5 of the 17 isoforms were found to be down-regulated in AML with statistical significance (from p < .03 to p «.05). AML was found to express 2 common isoforms that are not present in NM samples: #10 (p < .03) and #17 (p < .02). Both of these isoforms contain exon 11 (V7) which was found not to be expressed in NM samples (see Table 3, Chapter 3). Exon 11 (V7) enables direct binding to chondroitin sulfate, heparin, and heparin sulfate in addition to HA [Sle97]. Interestingly, NM was found to significantly express one common Varma, Chris 40 of 65 Ph.D. Thesis isoform that was not expressed by AML: #11 (p « .05). This isoform contains all three of the most prevalent exons-12 (VB), 13 (V9), and 14 (V10). However, upon further investigation of less common alternative splicing isoforms (data not shown), we identified three isoforms in the aggregated AML samples that expressed these exons (12,13,14) but always along with exon 11 (V7). Desig. Isoform Signature AML AML Ratio 0.3304 0.2758 0.1613 0.0873 0.0433 0.0256 0.0248 0.0106 NM NM Ratio 0.3006 0.2655 0.1972 0.0768 0.0245 0.0373 0.0171 0.0107 0.0171 0.‫סס‬oo C.f. Ratio 1.10 1.04 -1.22 1.14 1.76 -1.46 1.45 -1.00 -2.71 P-value ---------------13------19 5.E-02 840 282 ------------------14---19 3.E-01 701 249 ------------12---------19 410 6.E-03 185 ------1 0---------------19 2.E-01 222 72 ------------1213------19 5.E-03 110 23 ------------12---14---19 3.E-02 65 35 -7---------------------19 9.E-02 63 16 ---------------1314---19 27 5.E-01 10 -7------------13 ------19 #9 0.0063 16 2.E-03 16 ---------11------------19 #10 N1A 3.E-02 10 0.0039 0 ------------1213 14---19 #11 «5.E-02 0.02n 0.00 0 0.‫סס‬oo 26 6----------------------19 0.0051 0.0064 3.E-01 #12 13 6 -1.25 -----9-----------------19 0.0028 #13 7 0.0032 -1.16 4.E-01 3 ------10------13 ------19 #14 1 0.0004 4 0.0043 -10.84 4.e-03 ----------------------18-9 0.0035 1 0.0011 1.E-01 3.32 #15 ---8-------------------19 11 0.0043 #16 3 0.0032 3.E-01 1.35 ---------111213------19 #17 11 0.0043 2.e-02 0 0.0000 N/A Other 26 7 Other 2542 Totals 938 Table 5-2. Quantitation of the 17 most common CD44 alternatively spliced isoforms in aggregated samples of AML and NM were obtained by quantitative isoform profiling as described in Chapter 1 and Chapter 2. Isoform Signatures describe the known expressed variant exons where '6' refers to '6b' and we include exons 18 and 19 which are the mutually exclusive tails. X Ratio (where X = sample type) = ratio between counts of each exon out of total count of exons. C.f. Ratio = ratio between AML and NM (negative values indicate down-regulation in AML). Statistically significant up-regulation of AML is highlighted in light blue and down-regulation is high-lighted in light green. We report the following novel isoforms for AML (highlighted in yellow): #1 - #3, #5 - #10, and #12 - #17. The p-values were calculated by a standard large-sample test procedure for evaluating the difference between proportions obtained in two different populations [Dev91]. #1 #2 #3 #4 #5 #6 #7 #8 Taking into account the less common alternative splicing isoforms of AML (data not shown), we can account for almost all of the isoforms found in previous work [Leg9B, Aks02, BenOO,Kha96]. However, we have also identified previously unreported and in fact unexpected isoforms. We report the following novel isoforms for AML: #1 - #3, #5 - #10, and #12 - #17. Through a comprehensive RT-PCR-based analysis of CD44 transcripts expressed in 70 AML patient samples, [Leg98] concluded that exon 13 (V9) was always associated with exon 12 (VB). In contrast, we have found four common isoforms where this is not the cas~ne that is quite prevalent (33%): #1, and 3 Varma, Chris 41 of 65 Ph.D. Thesis that are not: #8, #9, #14. [Leg98] also reported that the following exons tend to be found in the same isoform together: 12(V8), 13(V9), and 14(V10). However, we found no instance of this to an average resolution of 1.E.+6 molecules (CD44 mRNA transcripts) - even though isoform #11 is present in NM. In addition, [Leg98] reported identifying isoforms with the following exons: 10 (V6) and 11 (V7), 10 (V6) and 12 (V8) to 14 (V10), and 10 (V6) to 14 (V10) - however, we found no instance of these isoforms. We concur with [Leg98] in finding that isoforms with exon 10 (V6) exist in directly spliced versions - see isoform #4. There is controversy as to the existence of isoforms containing exons 8(V4) and exons 9(V5) in AML. [Leg98] reported that exons 8 (V4) and 9 (v5), although rare, are expressed. In contrast, after performing a RT-PCR-based analysis (and Southern blotting) of 24 AML patient samples, [BenOO] reported not finding these variant exons expressed. Here we concur with the results of [Leg98] - see isoforms #16 and #13. As another example, [BenOO]found that exon 7 (V3) was not detected in combination with any other variant exon. However, we found that isoform #9 expresses exon 7 (V3) as well as exon 13 (V9). 5.4. Quantitative Comparison of AML to ALL Aggregated alternative splicing isoforms of AML samples were compared to those of ALL samples (see Table 3). 4 of the 17 common isoforms were found to be up-regulated in AML with statistical significance (from p < .05 to P < .005). 10 of the 17 isoforms were found to be down-regulated in AML (or up-regulated in ALL) with statistical significance (from p < .03 to p « .05). Desig. Isoform Signature #1 ---------------13------19 ------------------14---19 ------------12---------19 ------10---------------19 ------------1213 ------19 ------------12---14---19 -7--------------------19 --------------1314---19 -7------------13------19 ---------11----------19 ------------121314---19 6----------------------19 -----9-----------------19 ------10------13------19 #2 #3 #4 #5 #6 #7 #8 #9 #10 #11 #12 #13 #14 #15 AML P-value ----------------------18-- Varma, Chris 42 of 65 Ph.D. Thesis 11 0.0043 2 0.0031 1.41 3.E-01 11 0.0043 0.0000 5.E-02 N/A 26 15 Totals 2542 650 Table 5-3. Quantitation of the 17 most common CD44 alternatively spliced isoforms in aggregated samples of AML and ALL were obtained by quantitative isoform profiling as described in Chapter 1 and Chapter 2. Isoform Signatures describe the known expressed variant exons where '6' refers to '6b' and we include exons 18 and 19 which are the mutually exclusive tails. X Ratio (where X = sample type) = ratio between counts of each exon out of total count of exons. C.f. Ratio = ratio between AML and ALL (negative values indicate up-regulation of ALL). Statistically significant up-regulation of AML is highlighted in light blue and down-regulation is highlighted in light green. The p-values were calculated by a standard large-sample test procedure for evaluating the difference between proportions obtained in two different populations [Dev91]. #16 #17 Other ---8-------------------19 ---------111213 ------19 a AML was found to express one common isoform that was not present in (common or less common) NM samples: #17 (p < .05). This isoforms contains exon 11 (V7) as well as exons 12 (VB) and 13 (V9). ALL was found to express one common isoform that was not expressed by AML: #11 (p «.05). This isoform contains all three of the most prevalent exons-12 (VB), 13 (V9), and 14 (V10). Upon further investigation of less common alternative splicing isoforms (data not shown), we identified three isoforms in the aggregated AML samples that expressed these exons (12,13,14) but always along with exon 11 (V7). 5.5. Quantitative Comparison of ALL to Normal Aggregated alternative splicing isoforms of ALL samples were compared to those of NM samples (see Table 4). 7 of the 17 most common isoforms were found to be up-regulated in ALL with statistical significance (from p < .02 to p «.05). 4 of the 17 isoforms were found to be down-regulated in ALL with statistical significance (from p < .02 to P < .0000009). ALL was found to express one common isoform that was not present in NM samples: #10 (p« .05). This isoform contains exon 11 (V7) which was found not to be expressed in NM samples (see Table 3.3). Exon 11(V7) enables direct binding to chondroitin sulfate, heparin, and heparin sulfate in addition to HA [Sle97]. NM was not found to express any common isoform that was not expressed by ALL. Desig. Isoform Signature ALL ALL Ratio #1 #2 #3 #4 #5 #6 #7 #8 #9 ---------------13------19 ------------------14---19 ------------12---------19 ------10---------------19 ------------1213------19 ------------12---14---19 -7---------------------19 ---------------13 14---19 -7------------13------19 126 .. 1).'1IA.' 1",28.2 !2i19" Varma, Chris NM 12!"/'j 1(:.Gjf$~ 137 81 14 13 29 8 9 0.2108 0.1246 0.0215 :::' 9:GIiI) 0.04!46 0.0123 0.0138 43 of 65 185 72 23 3S 1'~.. 16 10 16 NM Ratio 0.3006 0:.2655 0.1972 0.0768 0.0245 0.0313 0.0171 0.0107 0.0171 C.f. P-value Ratio ..(;.1.55 ,,,9~E-Dl -'1.4'1, 2.1£-04 1.07 3.E-01 1.62 7.E-04 -1.14 3.E-01 -1.87 2.E-Q2 2.62 6.!-04 1.15 4.E-01 -1.23 3.E-01 Ph.D. Thesis #10 #11 #12 #13 #14 #15 #16 #17 Other ---------11------------19 ------------121314---19 6----------------------19 -----9-----------------19 ------10------13------19 ----------------------18----8-------------------19 ---------111213------19 26 8 11 19 19 11 2 a 0.0400 0.0123 0.0169 0.0292 0.0292 0.0169 0.0031 0.0000 0.0000 0.0277 0.0064 0.0032 0.0043 0.0011 0.0032 0.0000 0 26 6 3 4 1 3 a N/A -2.25 2.65 9.14 6.85 15.87 -1.04 N/A «5E-02 2.E-02 2.E-02 6.E-06 2.E-05 2.E-04 5.E-01 N/A 15 7 Totals 650 938 Table 5-4. Quantitation of the 17 most common CD44 alternatively spliced isoforms in aggregated samples of ALL and NM were obtained by quantitative isoform profiling as described in Chapter 1 and Chapter 2. Isoform Signatures describe the known expressed variant exons where '6' refers to '6b' and we include exons 18 and 19 which are the mutually exclusive tails. X Ratio (where X sample type) ratio between counts of each exon out of total count of exons. C.f. Ratio = ratio between ALL and NM (negative values indicate down-regulation in ALL). Statistically significant up-regulation of ALL is highlighted in light blue and down-regulation is high-lighted in light green. We report the following novel isoforms for ALL (highlighted in yellow): #1 - #3, #5 - #10, and #12 - #17. The p-values were calculated by a standard large-sample test procedure for evaluating the difference between proportions obtained in two different populations [Dev91]. = = Taking into account the less common alternative splicing isoforms of ALL (data not shown), we can account for all of the isoforms found in previous work [Ben04]. However, we have also identified previous unreported and in fact unexpected isoforms. For example, as previously discussed in Chapter 3, we have identified exon 8 (V4), exon 9 (V5), and exon 11 (V7) that were previously not found in ALL. The exact isoforms that include them are as follows: #16, #13, and #10 respectively. We report the following novel isoforms for ALL: #1 - #3, #5 - #10, and #12 - #17. 5.6. Quantitative Comparison of Cancer to Normal Alternative splicing isoforms of ALL and of AML samples were aggregated and designated as 'Cancer.' They were then compared to aggregated isoforms of NM samples (see Table 5). 15 of the 17 most common isoforms were found to be up-regulated in Cancer with statistical significance (p «.05). 1 of the 17 isoforms was found to be down-regulated in Cancer (or up-regulated in NM) with statistical significance (p < .000002). Cancer was found to express 2 common isoforms that are not present in NM samples: #10 (p « .05) and #17 (p « .05). Both of these isoforms contain exon 11 (V7) which was found not to be expressed in NM samples (see Table 3, Chapter 3). NM was not found to express any common isoform that was not expressed by Cancer. Varma, Chris 44 of 65 Ph.D. Thesis Desig. #1 #2 #3 Isoform Cancer NM NM Ratio 0.3006 0.2655 0.1972 0.0768 0.0245 0.0373 0.0171 0.0107 0.0171 0.0000 1"'0.0217 0.0064 0.0032 0.0043 0.0011 0.0032 0.0000 C.f. Ratio 1.01 4.77 4.27 6.07 1.78 3.22 8.30 5.05 2.25 P-value 282 5.E-01 «5E-02 249 185 «5E-02 #4 «5E-02 72 #5 23 «5E-02 #6 «5E-02 35 #7 16 «5E-02 #8 «5E-02 10 #9 16 «5E-02 #10 «5E-02 36 0 N/A #11 8 -2.25 2.E-06 O:0~Z3 26 #12 «5E-02 24 0.0369 6 5.n #13 26 0.0400 12.51 «5E-02 3 #14 «5E-02 20 0.0308 4 7.22 #15 20 1 0.0308 28.86 «5E-02 #16 0.0200 «5E-02 13 3 6.25 #17 11 0.0169 0 N/A «5E-02 Other 41 7 Totals 3192 938 Table 5-5. Quantitation of the 17 most common CD44 alternatively spliced isoforms in aggregated samples of (ALL + AML) versus aggregated samples of NM were obtained by quantitative isoform profilingas described in Chapter 1 and Chapter 2. We define Cancer ALL + AML. Isoform Signatures describe the known expressed variant exons where '6' refers to '6b' and we include exons 18 and 19 which are the mutually exclusive tails.X Ratio (where X = sample type) = ratiobetween counts of each exon out of total count of exons. C.f.Ratio ratio between Cancer and NM (negative values indicate up-regulation of NM). Statisticallysignificant up-regulation of Cancer is highlighted in light blue and upregulation of NM is high-lighted in lightgreen. We report the following novel isoforms for NM (highlighted in yellow): #1 -#3, #5 - #9, #12 - #14, and #16. The p-values were calculated by a standard large-sample test procedure for evaluating the difference between proportions obtained in two different populations [Dev91]. ---------------13 ------19 ------------------14---19 ------------12---------19 ------10---------------19 ------------1213------19 -----------12---14---19 -7 ---------------------19 --------------1314---19 -7 ------------13 ------19 ---------11------------19 ------------121314---19 6----------------------19 -----9-----------------19 ------10------13------19 ----------------------18----8-------------------19 ---------111213------19 966 823 547 303 124 78 92 35 25 C. Ratio 0.3026 1.2662 0.8415 0.4662 0.1908 0.1200 0.1415 0.0538 0.0385 0.0554 ,,, = = Regarding NM samples, we have identified(see Chapter 3.4) both exon 8 (V4) and exon 9 (V5) in NM samples (one of which is human adult whole bone marrow purified cellsand the other is human peripheral blood purified B-cells) that were previously not found in normal bone marrow cells or in normal peripheral blood cells [BenDD]. The isoform signatures that include them are #16 and #13, respectively. Here we report the following novel isoforms for NM: #1 #3, #5 - #9, #12 - #14, and #16. The results of comparing Cancer to NM are particularlyinteresting in context of the results of quantitative exon profilingin Chapter 3.4. We saw that in Cancer no exons except for exon 11 (V7) were up-regulated with statistical significance-and exon 11 (V7) was up-regulated (with statisticalsignificance) only because itwas not found to be expressed in NM. In stark contrast, almost allof the common isoforms are up-regulated (with statisticalsignificance) compared to NM. Therefore, itis clear that looking only at variant exon inclusion and exclusion is not enough-analysis of the combinatorics of the exons in transcripts is crucial. Varma, Chris 45 of 65 Ph. D. Thesis 5.7. Conclusion In this chapter we have provided the first comprehensive quantitative analysis of isoforms performed on human cells, or almost any organism or cell type. The developed methods of quantitative isoform profiling have made this possible. We have found many statistically significant up-regulations and down-regulations of particular isoforms as well as several cases, even among the common isoforms, where isoforms were not present in a particular sample type-usually NM. We have also reported the specific novel isoforms for each samples type. Comparison of uncommon isoforms (i.e. by definition rare isoforms that may only be expressed in a few samples) were not analyzed here. In the next chapter we use the quantitative results obtained here to generate statistically significant findings of isoforms that may serve as potential diagnostic markers or may be considered for further analysis for therapeutic targets. Varma, Chris 46 of 65 Ph.D. Thesis Chapter 6: In Pursuit of Exclusive Converging-lsoforms Note that in order to derive more robust and universally accepted results, we compare the leukemias against a heterogeneous mix of normal samples (containing independent samples of human peripheral blood purified B-cells, human cord blood purified B-cells and, human adult whole bone marrow purified cells) designated as NM. 6.1. Introduction In this chapter we strive to develop a new paradigm that will enable the analysis of isoforms (that we have identified in Chapter 5) in order to qualify them for further evaluation as potential diagnostic markers, therapeutic targets, and perhaps even therapeutics themselves. 6.2. Introducing Isoform Convergence In order to robustly compare isoforms of AML, ALL, and NM-especially for a small number of total samples-we will need to develop a unified framework that allows us to capture the qualities that will most effectively and flexibly differentiate between different sample types through the comparison of isoforms. Those qualities include the presence of a particular isoform that is consistently expressed in almost all of the samples of a sample type - but with the flexibility to consider less consistently expressed isoforms, the consideration of and prioritization of rare isoforms as a function of complex splicing, and the ability to identify those isoforms that are present in one sample type but not in another with a certain degree of confidence. We begin by introducing the concept of Isoform Convergence. We first assume that for each member of a population, it's unique isoforms (i.e. the different types of isoform signatures expressed) have been characterized. In addition, we require that each unique isoform of a member has an associated number of occurrences--that is how many times that unique isoform was detected in the member. This also enables us to then determine the total number of alternatively spliced isoforms expressed in each member which is the sum of the unique isoforms occurrences. Then, the Convergence Value of a population is determined as follows. Members of a population are ordered according to the following operator: a member with a larger number of unique isoforms is listed before a member with a smaller number of unique isoforms. If two members have equal number of unique isoforms, then we order them based on the total number of alternatively spliced isoforms. If two members have equal numbers of total alternatively spliced isoforms then we choose one based on coin toss. We Varma, Chris 47 of 65 Ph.D. Thesis then intersect the set of unique isoforms of each member in order starting by intersecting the first member in the ordered list by itself (for appropriate indexing of intersections). After each intersection, each remaining unique isoform is associated with the least occurrence value (i.e. min(occurrence of member 1, occurrence of member 2)). We continue to intersect members until we have intersected at least 3/4 of the members and the number of unique isoforms remaining Converges - that is, until an intersection (n) and its subsequent intersection (n + 1) result in the same number of remaining unique isoformsadditionally, intersection n is now identified as the Converging Intersection and the members of the population intersected in reaching the Converging Intersection are known as the Converging Members. This list of unique isoforms (including their occurrences) is assigned a Convergence Value of 1. Assume that k intersections were required to reach a Convergence Value of 1, then a list of unique isoforms selected from a previous intersection (k - 1) has a Convergence Value that is equivalent to the ratio k -1 / k. Thus, each isoform of a set of unique isoforms resulting from an intersection can be assigned a Convergence Value. We then further define a unique isoform that has a Convergence Value (derived from the process of Isoform Convergence) of greater than 2/3 ( - 0.667) as a Converging Isoform of its population. A unique isoform that has a Convergence Value of 1 is designated as Maximally Convergent. Furthermore, we rank isoforms from most converging to least converging, we prioritize first by Convergence Value and second by number of occurrences. Lastly, we introduce the following additional concepts as part of our unified framework: Exclusive and Maximally Exclusive as applied to alternatively spliced isoforms. A Converging Isoform, i of a population, m is said to be Exclusive against another population, n if it does not occur in n's set of Converging Isoforms. More stringently (assuming the same comparison population n), a Converging Isoform, i of a population, m is said to be Maximally Exclusive against another population, n if it does not occur in any member of n. The presented unified framework is fundamentally based on the mathematical operators of intersection, difference, and union - it applies these operators through a flexible algorithm that enables rational identification of isoforms that are consistently exclusive to a sample type. Thus, Converging (or Maximally Converging) Isoforms that are Exclusive (or Maximally Exclusive) have the following properties: 1) they occur in a majority of the samples of a sample type with the flexibility to range from a minor majority to a significant majority and assigning a level of confidence based on preponderance of occurrence, 2) they can be rare because the unified platform prioritizes the evaluation of samples with more complex splicing, 3) they can be used to distinguish between different sample types with a certain degree of consistency (without requiring absolute consistency). Furthermore, the processes of Isoform Convergence guarantees that an isoform present after an intersection was also present in all prior intersections. Varma, Chris 48 of 65 Ph.D. Thesis We will now apply these concepts to our samples of AML, ALL, and NM. 6.3. Finding Convergence We first apply the concept of Isoform Convergence to each of AML, ALL, and NM (see Figure 1~ The Converging Intersection for each of AML, ALL, and NM is th rd 4 , 3 , and 4 ,respectively. Between the first intersection and the Converging Intersection we lose 87.50/0 of unique isoforms for AML, 61.1 % of unique isoforms for ALL, and 76.5% of unique isoforms for NM. Convergence of Isoforms Upon Series of Intersections 30 25 en E .E 0 ~AML - 20 ..0 15 ~ -ALL NM 0 ~ Q) E :] Z 10 5 • o 1st 2nd 3rd 4th 5th Number of Intersections Figure 6-1. The process of Isoform Convergence is used to analyze each set of samples, AML, ALL, and HM. Each set of samples has been found to reach Convergence. Varma. Chris 49 of 65 Ph.D. Thesis 6.4. Converging-Isoforms We determine the set of Converging Isoforms for each of AML, ALL, and NM by applying the process of Isoform Convergence. Here we use all isoforms found for each sample type - not just the common isoforms as used previously. In AML, we have identified 7 Converging Isoforms (see Table 1). 3 of the 7 isoforms are Maximally Convergent. Counts of Converging Isoforms for each pair (intersections,Convergence Value) Conver~dn2 Isoform (AML) 3,0.75 -7------------13 ------19 -7--------------------19 ------10--------------19 --------11-----------19 -----------12---------19 --------------13------19 -----------------14---19 Totals 2 5, 1 0 0 9 0 0 57 19 85 4,1 0 0 10 0 0 77 115 202 10 10 1 49 77 115 264 Table 6-1. Converging Isoforms for AML as determined by the process of Isoform Convergence on samples of AML. Convergence was reached after 4 intersections. In ALL, we have also identified 7 Converging Isoforms (see Table 2). All 7 of 7 isoforms are Maximally Convergent. Note that since we only had 4 samples of ALL, it does not make sense to report values for 2,1 since any new isoforms identified would not be Converging Isoforms. This is because a Converging Isoform must have a Convergence Value of greater than 2/3. Counts of Converging Isoforms for each pair (intersections,Convergence Value) Converein2 Isoform (ALL) 6---------------------19 -7------------13 ------19 -7---------------------19 ------10-------------19 -----------12---------19 --------------13------19 ----------------14---19 Totals 3, 1 1 1 6 6 8 11 4, 1 1 1 1 3 5 5 8 24 32 65 Table 6-2. Converging Isoforms for ALL as determined by the process of Isoform Convergence on samples of ALL. Convergence was reached after3 intersections. Note that the values of 2,1 were not reported as these do not meet the definition of a Converging Isoform and thus would not be valid. Varma, Chris 50 of 65 Ph.D. Thesis In NM, we have identified 6 Converging Isoforms (see Table 3). 5 of the 7 isoforms are Maximally Convergent. Counts of Converging Isoforms for each pair (intersections,Convergence Value) Conver2in2 Isoform (NM) 3,0.75 4,1 5, 1 -7------------13------19 ------10-------------19 ---------12---------19 -----------12---14---19 ------------13------19 ----------------14---19 1 8 19 0 8 18 0 26 0 8 2 0 11 24 24 7 39 30 Totals 104 76 45 Table 6-3. Converging Isoforms for NM as determined by the process of Isoform Convergence on samples of NM. Convergence was reached after 4 intersections. 6.5. Exclusive Converging-Isoforms Here we determine and identify the Exclusive and Maximally Exclusive alternatively-spliced, unique isoforms from each sample type's set of Converging Isoforms (see Table 4) by comparing different sample types. Each type of sample is found to have at least one Exclusive Converging Isoform. Both AML and ALL have two Exclusive Converging Isoforms and for AML one of these is Maximally Exclusive. Both of ALL's Exclusive Converging Isoforms are Maximally Converging (see previous section). Counts of Exclusive Isoforms for each sample type and pair (intersections,Convergence Value) AML 3, 0.75 ALL 3,1 AML N/A None NM 3,0.75 ----------12---14---19 3,0.75 ALL None N/A -----------12---14---19 3,1 NM3, -7 ---------------------19 6----------------------19 N/A 0.75 Count: 10 Count: 1 NM4,1 NM UNION Count: 7 Count: 7 ---------11-----------19 -7 ---------------------19 Count: 1 Count: 6 -7 ---------------------19 6----------------------19 Count: 10 Count: 1 ---------11-----------19 -7 ---------------------19 Count: 1 Count: 6 ---------11-----------19 None N/A N/A Count: 1 Table 6-4. Exclusive Converging Isoforms and Maximally Converging Isoforms (ifany) of each sample type determined based on the process of Isoform Convergence and subject Varma, Chris 51 of 65 Ph.D. Thesis to the concepts of Exlusive and Maximally Exclusive. Count indicates least occurrence of the converging isoform in the set of Converging Members. 6.6. Identification of Possible Candidate Targets Finally, we compare our results of the identification of the Exclusive Converging Isoforms with our previous results for isoform profiles (Chapter 5) and for exon profiles (Chapter 4) to determine which of our Exclusive Converging Isoforms are also up-regulated with statistical significance (see Table 5). Exon Profile Isoform Profile Exclusive Converging Isoform Occurr -ence Ratio Up regulated vs. NM Pvalue Occurr -ence Ratio Up regulated vs. NM Pvalue AML#l .09 2.9% No N/A -7---------------------19 2.5% Yes Yes .03 Yes .0002 0.4% AML#2 1.1% -------11-----------19 .02 .02 1.5% Yes ALL#l 6----------------------19 2.7% Yes Yes .0006 Yes .005 ALL#2 5.8% -7--------------------19 4.5% <.05 NM#l -----------12---14---19 3.7% Yes* Table 6-5. Isoforms for both AML and ALL that were identified as Exclusive Converging Isoforms were evaluated for statistically significant up-regulation Vs. NM (from Chapter 4). In addition, the single variant exon present in each Exclusive Converging Isoform was evaluated for statistically significant up-regulation vs. NM (from Chapter 3). *NM's Exclusive Converging Isoform was compared to both AML and ALL separately and its upregulation was found to be statistically significant. The Occurrence Ratio refers to the ratio of the occurrences of the Exclusive Converging Isoforms (or exons) to all alternativesplicing isoforms (or variant exons) identified for each sample type. Therefore, we propose the following Exclusive Converging Isoforms as potential candidates for further study as targets for diagnostic probing or therapeutic intervention since the included variant exons are expressed on the extracellular proximal domain of CD44 which is involved in ligand binding: AML#2, ALL#1, and ALL#2. AML#2 may be particularly relevant as exon 11 has not been found to be present in normal cells in our studies or in others [BenOO, Leg98, Ben04, Kha96]. Furthermore, inclusion of exon 11 (V7) enables direct binding to chondroitin sulfate, heparin, and heparin sulfate in addition to HA [Sle97] which could confer additional functionality that may be exploited in malignancy. 6.7. Conclusion In this Chapter we proposed a process for Isoform Convergence as a method to select robustly expressing isoforms from a population of all unique isoforms expressed by a particular sample type. The method of Isoform Convergence first Varma, Chris 52 of 65 Ph.D. Thesis requires that Convergence be obtained with a population of samples. This allows for the identification of Converging Isoforms. The processes of Isoform Convergence guarantees that an isoform present after a particular intersection was also present in all prior intersections. Finally, we determine exclusivity of a Converging Isoform by comparing it against the Converging Isoforms of other relevant populations. Furthermore, we propose three Converging Isoforms (1 of AML, and 2 of ALL) that are Exclusive and are up-regulated with statistical significance (compared to NM). These isoforms may serve as potential diagnostic probes or therapeutic targets, though significant studies need to be completed. Varma, Chris 53 of 65 Ph.D. Thesis Conclusion We have presented a new paradigm by which to quantitatively study the alternative splicing of any molecule in any clinical sample through Polony Technology [Mit99] and our methods of quantitative exon profiling and quantitative isoform profiling. Furthermore, we extended this paradigm to include Isoform Convergence-a process by which we can potentially qualify particular isoforms as candidate diagnostic markers, potential therapeutic targets, and perhaps even as precursor therapeutics themselves. We applied this paradigm to quantitatively investigate the alternative splicing of CD44 in two leukemias acute myeloid leukemia and acute lymphocytic leukemia. To address some of the controversy in the CD44 leukemia literature, we suggested several corrections to previously made claims about the presence of specific CD44 exons and of specific CD44 isoforms. Furthermore, we provided not only the first comprehensive characterization of CD44's (or any molecules) alternative exon splicing in human cells, but also the resulting quantities of the exons and of the exact isoforms present to a resolution of 1.E.+06 molecules (of CD44). Through this process, we identify a plethora of novel isoforms of CD44 expressed in acute myeloid leukemia and in acute lymphoblastic leukemia. Finally, we identify specific isoforms in each leukemia that may serve as candidate markers or possibly as therapeutic targets. In future work, we hope to establish our paradigm as a new and rational method for the identification of therapeutic targets that result in the generation of successful therapies. In order to do this, we need to identify Exclusive Converging Isoforms that are able to demonstrate therapeutic intervention through intracellular techniques such as antisense oligonucleotides and smallinterfering RNAs (siRNAs) or through extracellular techniques such as monoclonal antibodies-for example, chemoimmunotherapy. Intracellular techniques would attempt to knock-down defective alternative splicing and restore normal splicing patterns in targeted cells. Extracellular techniques would target the Exclusive Converging Isoforms expressed on the targeted cell's surface in order to induce apoptosis, to reduce cell viability, or to induce an inflammatory response against the cell. Varma, Chris 54 of 65 Ph.D. Thesis Supplementary Results S.1. TM-Exon Skipping Note: A portion of the results of this section after our initial observation of TMExon Skipping, not including the quantitative comparisons, were completed in collaboration with Jun Zhu, formerly a post-doc in the laboratory of George Church at Harvard Medical School and now an Assistant Professor at Duke Univeristy. Initial Observation, Identification & Background During our initial studies of isoform profiling, we found that transmembrane exon of CD44 (i.e. exon 17) was spliced out in some isoforms when compared to the constant exon 5 (see Figure 1). Cy3-dA CyS-dU Merge Exon-17 Exon-5 Merge Figure 1. Identification of TM-skipping in human CD44 on polony slides. Arrows point to the TM-skipped isoforms in the merged image. Note also that the merged image shows cases were the constant exon 5 is not present - this is due to cryptic splicing as previously identified. In order to determine if the identified TM-skipping polonies expressed a true alternatively-spliced isoform, a polony with such expression was cut out of the polony gel and sequenced (this was also done by gel electrophoresis for further confirmation) (see Figure 2). The results of the shown sequence (Figure 2) clearly demonstrate that exon 17 has been appropriately alternatively-spliced out due to the absence of the entire TM exon. Furthermore, the flanking sequence provides a U1 binding 5'ss signal (GGT) that is consistent with 80% of exonskipping [Ast04]. Varma, Chris 55 of 65 Ph.D. Thesis . -R~<b ':\~~,o~ ~~ ~~ ~" ~~ "'-v ~,,\A ~ ~,,\A.'" Exon 16 I AC C C C AA A T T C C A G G T G T G G G C A G A A G ,...., ..., ,.., rnfl 1\ ,. ~ I , ~ ~ij , Vv\.A Figure 2. Confirmation J'.. .AA of TM-exon skipping and subsequent .\ A sequencing. It is theorized that TM-skipping would cause the expressed TM- CD44 protein to be secreted. CD44 is known to be present in soluble form in quantities of ug/ml in human blood, however this was considered to be due to the process of CD44 shedding (see Figure 3) - caused by proteolytic cleavage of a post-translated CD44 protein on its extracellular domain. This form of soluble CD44 is then regulated at the post-translational level and does not express the tail region. In stark contrast, our identified soluble CD44 is regulated at the pre-mRNA level and does express the tail region - this may confer special properties upon this form of soluble CD44 as well as represent a different highly regulated function. NH2 NH2 >-- Anti-CD44ecto CD44 full length Ab CD44 ectodomain proteolytic cleavage Extracellular .. Alternative splicing site Soluble CD44 Alternative splicing site Membrane Intracellular >-- Anti-CD44cyto Ab CD44 cleavage product COOH >-- Anti-CD44cyto Ab COOH Figure 3. CD44 shedding occurs via proteolytic which confers soluble CD44. cleavage of a portion of the TM exon In mice, soluble CD44 has been associated with several functions [Yu96]. Soluble CD44 has been shown to block endogenous CD44 from binding and internalizing its primary ligand, HA - acting as a decoy. Soluble CD44 has been shown to inhibit TA3 cell invasion of HA-producing cell monolayer and has been shown to inhibit tumor formation when intravenously injected into mice. Varma, Chris 56 of 65 Ph.D. Thesis Furthermore, soluble CD44 has been found to induce apoptosis of invading tumor cells. Isoform Expression Profiles Alternative-exon splicing of exon 17 (TM-exon) was performed via quantitative isoform profiling as described previously (Chapter 1 and 2). Counts of each identified TM- isoform were obtained for AML vs. NM, AML vs. ALL, ALL vs. NM, and Cancer (AML and ALL aggregated) vs. NM (see Tables 1 to 4). Ratios of each type were obtained over total CD44 counts as provided in Table 2 of Chapter 3. All samples expressed TM- CD44. Isoform AML AML Ratio NM 0.001137 0.000023 0.000023 333 0 1 NM C.f. Ratio Ratio 0.005550 0.000000 0.000017 -4.88 N/A 1.40 <<5E-02 1.E-01 4.E-01 4.E-02 2.E-01 ---------------------- ( ---- 19 --------------- 13--- ( )----19 ------------------ 14( )----19 341 7 7 -------12---( )---- 19 ------10------------ -----19 2 4 0.000007 0.000013 2 0 0.000033 0.000000 -5.00 N/A --7-----------------( )----19 --------------- 1314( )----19 -----------1213---() ---- 19 0 1 0 0.000000 0.000003 0.000000 0 0 1 0.000000 0.000000 0.000017 N/A N/A N/A P-value N/A 3.E-01 1.E-02 Totals 300000 60000 Table 1. Quantitation of the 8 most common CD44 TM skipped isoforms in aggregated samples of AML and NM were obtained by quantitative isoform profiling as described in Chapter 1 and Chapter 2. AML Ratio and NM Ratio are calculated over the total number of respective CD44 molecules as reported in Table 2 of Chapter 3. C.f. Ratio = ratio between AML and NM (negative values indicate down-regulation in AML). Statistically significant up-regulation of AML is highlighted in light blue and down-regulation is high-lighted in light green. The p-values were calculated by a standard large-sample test procedure for evaluating the difference between proportions obtained in two different populations [Dev91]. Isoform AML AML Ratio 0.001137 436 ALL Ratio 0.004360 C.f. Ratio -3.84 P-value ------------------------ 19 ---------------.. --() 19 7 0.000023 1 0.000010 2.33 2.E-01 --------------1.4( )----19 12---- ---- 19 ---------------- 10------------- -- -19 7 2 4 0.000023 0.000007 0.000013 0 1 0 0.000000 0.000010 0.000000 N/A -1.50 N/A 6.E-02 4.E-01 1.E-01 .. 19 ( )---- 0 --------------- 13 14( )----19 ------------ 1213-)----19 Totals 1 0 300000 -7---------- 341 ALL 0.000000 0.000003 0.000000 2 0 0 100000 <<5E-02 0.000020 N/A 7.E-03 0.000000 0.000000 N/A N/A 3.E-01 N/A Table 2. Quantitation of the 8 most common CD44 TM skipped isoforms in aggregated samples of AML and ALL were obtained by quantitative isoform profiling as described in Chapter 1 and Chapter 2. AML Ratio and ALL Ratio are calculated over the total number of respective CD44 molecules as reported in Table 2 of Chapter 3. C.f. Ratio = ratio between Varma, Chris 57 of 65 Ph.D. Thesis AML and ALL (negative values indicate down-regulation in AML). Statistically significant up-regulation of ALL is highlighted in light blue and down-regulation is high-lighted in light green. The p-values were calculated by a standard large-sample test procedure for evaluating the difference between proportions obtained in two different populations [Dev91]. Isoform --------------------- ( )---- 19 --------------- 13---( ---- 19 ------------------ 14( )----19 ------------ 12------ 19 ------ 0------------ ()----19 7--------------)----19 .( --------------1314()---- 19 ------------ 1213---( )----19 ALL ALL Ratio 436 1 0 1 0 2 0 0 NM 0.004360 0.000010 0.000000 0.000010 0.000000 0.000020 0.000000 0.000000 333 0 1 2 0 0 0 1 NM Ratio C.f. Ratio 0.005550 0.000000 0.000017 0.000033 0.000000 0.000000 0.000000 0.000017 -1.27 N/A N/A -3.33 N/A N/A N/A 0.00 P-value 4.E-04 2.E-01 1.E-01 1.E-01 N/A 1.E-01 N/A 1.E-01 Totals 100000 60000 Table 3. Quantitation of the 8 most common CD44 TM skipped isoforms in aggregated samples of ALL and NM were obtained by quantitative isoform profiling as described in Chapter 1 and Chapter 2. ALL Ratio and NM Ratio are calculated over the total number of respective CD44 molecules as reported in Table 2 of Chapter 3. C.f. Ratio = ratio between ALL and NM (negative values indicate down-regulation in ALL). Statistically significant up-regulation of ALL is highlighted in light blue and down-regulation is high-lighted in light green. The p-values were calculated by a standard large-sample test procedure for evaluating the difference between proportions obtained in two different populations [Dev91]. Isoform --------------------( )----19 --------------- 13..--() ---- 19 14(--------------)----19 ------------ 12------( )----19 ------ 0-----------( )--- 19 7--------------( -)--19 --------------- 13:14( )---- 19 ------------ 1213-..--( )----19 Cancer 777 8 7 3 4 2 1 0 Cancer Ratio 0.001943 NM 0.000020 0.000018 0.000008 0.000010 0.000005 0.000003 0.000000 333 0 1 2 0 0 0 1 NM Ratio 0.005550 C.f. Ratio -2.86 P-value 0.000000 0.000017 0.000033 0.000000 0.000000 0.000000 0.000017 N/A 1.05 -4.44 N/A N/A N/A 0.00 N/A <<5E-02 5.E-01 4.E-02 N/A N/A N/A 5.E-03 Totals 400000 60000 Table 4. Quantitation of the 8 most common CD44 TM skipped isoforms in aggregated samples of (AML + ALL) and aggregated samples of NM were obtained by quantitative isoform profiling as described in Chapter 1 and Chapter 2. Cancer = AML + ALL. Cancer Ratio and NM Ratio are calculated over the total number of respective CD44 molecules as reported in Table 2 of Chapter 3. C.f. Ratio = ratio between Cancer and NM (negative values indicate down-regulation in Cancer). Statistically significant up-regulation of Cancer is highlighted in light blue and down-regulation is high-lighted in light green. The p-values were calculated by a standard large-sample test procedure for evaluating the difference between proportions obtained in two different populations [Dev91]. Varma, Chris 58 of 65 Ph.D. Thesis Universal Expression We additionally characterized the expression of TM- CD44 in other cells types beyond leukemic and normal through a comprehensive tissue panel. TM- CD44 expression was found in all tissue and was most significantly expressed in the tissues of the lung and salivary gland (Figure 4). I::I.G.J TM+ TM- Figure 4. Expression of TM- CD44 in all tissues queried via a tissue panel. 5.2. Exon Expression Profiles of Cell Lines Obtaining Samples The following EBV-transformed human cell lines were purchased from The Coriell Institute: 3638 (B-cell ALL) and 3797 (Normal B-celllymphoblast). The following EBV-transformed human cell line RNA was purchased from Ambion: KG-1 (AML). Varma, Chris 59 of 65 Ph.D. Thesis Quantitative Comparisons Exon AML EXON 6B LN 0 0 .0000 EXON 7 (V3) EXON 8 (V4) EXON 9 (V5) 20 0 0 0.0070 0.0000 0.0000 EXON 10 (V2) (V6) AML Ratio NM LN NM Ratio C.f. P-value 0 0.0000 Ratio N/A N/A 97 0 2 0.0889 0.0000 0.0018 -12.65 N/A N/A 1.E-04 N/A 3.E-01 1.E-02 10 0.0035 42 0.0385 -10.95 EXON 11 (V7) 0 0.0000 1 0.0009 N/A 3.E-01 EXON 12 EXON 13 EXON 14 153 180 229 0.0538 0.0633 0.0805 0 73 147 0.0000 0.0669 0.1347 N/A -1.06 -1.67 1.E-02 5.E-01 5.E-02 (V8) (V9) (V10) Table 1. NM LN = normal cell line. C.f. Ratio = ratio between AML_LN and NM_LN (negative values indicate down-regulation in AML_LN). The p-values were calculated by a standard large-sample test procedure for evaluating the difference between proportions obtained in two different populations [Dev91]. Exon AML LN AML Ratio ALL LN ALL Ratio C.f. Ratio P-value 0 34 0.0000 0.0457 N/A -6.50 N/A N/A N/A 8.E-03 N/A 2.E-01 EXON 6B 0 0.0000 EXON 7 (V3) 20 0.0070 EXON 8 (V4) 0 0.0000 0 0.0000 EXON 9 (V5) 0 0.0000 3 0.0040 0.0035 4 0.0054 0.0000 0 0.0000 (V2) -1.53 4.E-01 N/A N/A EXON 12 (V8) 153 0.0538 1 0.0013 40.01 2.E-02 EXON 13 (V9) 180 0.0633 22 0.0296 2.14 1.E-01 EXON 14 (V:10) 229 0.0805 264 0.3548 -4.41 <<5.E-02 Table 2. C.f. Ratio = ratio between AML_LN and ALL_LN (negative values indicate downregulation in AML_LN). The p-values were calculated by a standard large-sample test EXON 10 EXON 11 (V6) (V7) 10 0 procedure for evaluating the difference between proportions obtained in two different populations [Dev91]. Exon (V2) ALL LN ALL Ratio NM LN NM Ratio C.f. Ratio P-value 0 0.0000 34 0.0457 0 0.0000 97 0.0889 N/A -1.95 N/A N/A 7.E-02 N/A EXON 6B EXON 7 EXON 8 (V4,) 0 0.0000 0 0.0000 EXON 9 (V5) 3 0.0040 2 0.0018 2.20 4.E-01 EXON 10 (V6) 4 0.0054 42 EXON 11 (V7) 0 0.0000 1 EXON 12 (V8) 1 0.0013 0 0.0385 0.0009 0.0000 -7.16 N/A N/A 4.E-02 4.E-01 3.E-01 EXON 13 (V9) 22 264 0.0296 0.3548 73 147 0.0669 -2.26 8.E-02 (V3) 0.1347 2.63 <<5.E-02 Table 3. NM LN = normal cell linae Cf_ Ratin = ratin bhtween ALL LN and NM LN EXON 14 (V10) (negative values indicate down-regulation in ALL_LN). The p-values were calculated by a standard large-sample test procedure for evaluating the difference between proportions obtained in two different populations [Dev91]. Varma, Chris 60 of 65 Ph.D. Thesis Exon Cancer C. Ratio NM LN NM Ratio LN EXON 6B (V2) EXON 7 (V3) EXON 8 (V4) EXON 9 EXON 10 EXON 11 EXON 12 EXON 13 (V5) ('6) (V7) (V8) (V9) 0 54 0 3 14 0 0.0000 0.0150 0.0000 0.0008 0.0039 0.0000 0 97 0 2 42 1 0.0000 0.0889 0.0000 0.0018 0.0385 0.0009 154 0.0429 0 202 0.0563 73 C.f. P-value Ratio N/A N/A -5.91 1.E-04 N/A N/A -2.19 -9.87 4.E-01 3.E-03 0.0000 N/A N/A 3.E-01 2.E-02 0.0669 -1.19 3.E-01 493 0.1374 147 0.1347 1.02 5.E-01 Table 1. CancerLN = aggregation of AML_LN and ALL_LN exon counts. NM_LN = normal EXON 14 (V'10) cell line. C.f. Ratio = ratio between CancerLN and NM_LN(negative values indicate downregulation in CancerLN). The p-valueswere calculated by a standard large-sample test procedure for evaluating the difference between proportions obtained in two different populations [Dev91]. Varma, Chris 61 of 65 Ph.D. Thesis References [AacO4] Aach, J and Church, GM: Mathematical models of diffusion- constrained polymerase chain reactions: basis of high-throughput nucleic acid assays and simple self-organizing systems. J. Theoret. [Alb94] Biol. 2004 May 7;228(1):31-46. Abbas AK, Lichtman AH, Pober JS: Cellular and Molecular Immunology, 4 th Ed. W.B. Saunders Company. 2000. Aksk E, Bavbek S, Dalay N: CD44 variant exons in leukemia and lymphoma. Path. Onc. Res. 2002; 8(1):36-40. Alberts B, Bray D, Lewis J, Raff M, Roberts K, Watson J: Molecular [ApaO3] Apaydin MS, Brutlag DL, Guestrin C, Hsu D, Latombe JC, Varma [AbbOO] [Aks02] Biology of the Cell, 3rd Ed. Garland Publishing. 1994. C: Stochastic Roadmap Simulation: An efficient Representation and Algorithm for Analyzing Molecular Motion. J. of Comp. Biology. 2003:10;257-281. [ArsOO] Ars E, Serra E, Garcia J, Kruyer H, Gaona A, Lazaro C, Estivill X: Mutations affecting mRNA splicing are the most common molecular defects in patients with neurofibromatosis type 1. Hum Mol Genet 2000, 9:237-247. [AstO4] Ast G: How did Splicing Evolve? Nat. Gen. Oct. 2004; 5:777. [Bar97] http://www.surqery.wustl.edu/bicmdl/mdl.htm [BenOO] Bendal LJ, Bradstock KF, Gottlieb DJ: Expression of CD44 variant exons in acute myeloid leukemia is more common and more complex than that observed in normal blood, bone marrow or CD34+ cells. Leukemia. 2000. 14:1239-1246. [BenO4] Bendall L: Role of CD44 variant exon 6 in acute lymphoblastic leukemia: association with altered bone marrow localisation and increased tumor burden. Leukemia (correspondence). 2004. Online Pub doi: 10. 1038/sj.leu.2403393. [Bio98-1] [Bio98-2] [BleOO] http://www.bioscience.org/1998/v3/d/lesley/2. htm http://www.bioscience.org/1998/v3/d/bou rguig/4.htm Blencowe BJ: Exonic splicing enhancers: mechanism of action, diversity and role in human genetic diseases. Trends Biochem Sci 2000, 25:106-110. [Bou97] Bourguignon L, Zhu H, Chu A, lida N, Zhang L, Hung M: Interaction between adhesion receptor, CD44, and the oncogene product, p185-HER2, promotes human ovarian tumor cell activation. J. Bio. Chem. 1997. 272:27913-27918. [CarO3] [Cot99] [Dev91] Varma, Chris Cartegni L, Krainer AR.: Correction of disease-associated exon skipping by synthetic exon-specific activators. Nat. Struct. Biol. 2003. 10(2):120-5. Cotran RS, Kumar V, Collins T: Robbins Pathologic Basis of Disease, 6 th Ed. W.B. Saunders Company. 1999. Devore JL: Probability and Statistics for Engineering and the Science, 3 rd Ed. Wadsworth Publishing, Inc. 1991. 62 of 65 Ph.D. Thesis [EmeO4] [Fer99] [Fie99] www.emedicine.com Fersht, A: Structure and Mechanism In Protein Science. W.H. Freeman & Company. 1999. Friedman, K.J., J. Kole, J.A. Cohn, M.J. Knowles, M.J. Silverman and R. Kole (1999) Correction of aberrant splicing of CFTR gene by [Fin95] antisense oligonucleotides. J. Biol. Chem. 27436193-36199. Finke LH, Terpe HJ, Zorb C, Haensch W, Schlag PM: Colorectal cancer prognosis and expression of exon-v6-containing CD44 proteins. Lancet 1995. 345:583. [Foe99] [Gha96] Foekens JA, Dall P, Klijn JG, Skroch PA, Claassen CJ, Look MP, Ponta H, van Putten WL, Herrlich P, Henzen-Logmans SC: Prognostic value of CD44 variant expression in primary breast cancer. Int. J. Cancer 1999. 84:209-215. Ghaffari S, Dougherty GJ, Eaves AC, Eaves CJ: Altered patterns of CD44 epitope expression in human chronic and acute myeloid leukemia. Leukemia. 1996. 10:1773. [Has01] Hastings ML, Krainer AR: Pre-mRNA splicing in the new millennium. Current Opinions in Cell Biology 2001, 13:302-309. [HeaO2] http://health.allrefer.com [HeaO4] http://www. healthcentral.com/mhc/top/000570.cfm [Hei93] Heider KH, Hofmann M, Horst E, Van Den Berg F, Ponta H, Herrlich P, Pals ST: A human homologue of the rat metastasis- associated variant of CD44 is expressed in colorectal carcinomas [Hen96] and adenomatous polyps. J. Cell Biol. 1993. 120:227-233. Henke C, Bitterman P, Roongta U, Ingbar D, Polunovsky V: Induction of fibroblast apoptosis by anti-CD44 antibody: implications for the treatment of fibroproliferative lung disease. Am [HerOO] J Pathol. 1996 Nov;149(5):1639-50. Herrlich P, Morrison H, Sleeman J, Rousseau VO, Konig H, Remers SW, Ponta H: CD44 Acts Both as a Growth- and Invasiveness-Promoting Molecule and as a Tumor-Suppressing [Jan96] Cofactor. Annals New York Academy of Sciences 2000, 106-120. Janeway CA and Travers P: Immunobiology: The Immune System [Kat99] In Health and Disease, 2 nd Ed. Garland Publishing Inc. 1996. Katagiri Y, Sleeman J, Fujii H, Herrlich P, Hotta H, Tanaka K, Chikuma S, Yagita H, Okumura K, Murakami M, Saiki I, Chambers A, Uede T: CD44 variants but not CD44s cooperate with B1- containing integrins to permit cells to bind to osteopontin independently of arginine-glycine-aspartic acid, therby stimulating [Kau95] [Kha96] Varma, Chris cell motility and chemotaxis. Cancer Res. 1999. 59"219-226. Kaufmann M, Heider KH, Sinn HP, von Minckwitz G, Ponta H, Herrlich P: CD44 variant exon epitopes in primary breast cancer and length of survival. Lancet 1995. 345:615-619. Khaldoyanidi S, Achtnich M, Hehlmann R, Zoller M: Expression of CD44 variant isoforms in peripheral blood leukocytes in malignant 63 of 65 Ph.D. Thesis lymphoma and leukemia-inverse correlation between expression and tumor progression. Leukemia Research. 1996. 20:839-851. [KhaO2] Khan SA, Lopez-Chua CA, Zhang J, Fisher LW, Sorensen ES, Denhardt DT: Soluble osteopontin inhibits apoptosis of adherent endothelial cells deprived of growth factors. J Cell Biochem. 2002;85(4):728-36. [Kin99] Kincade PW: Blasting away leukemia. Nature Med. 1999; 5(6):619- 620. [Kra97] [LacOO] Krainer AR: Eukaryotic mRNA ProcessinQ. Oxford University Press. 1997. Lacerra G, Sierakowska H, Carestia C, Fucharoen S, Summerton J, Weller D, Kole R: Restoration of hemoglobin A synthesis in erythroid cells from peripheral blood of thalassemic patients. Proc Natl Acad Sci USA 2000, 97:9591-9596. [Leg98] Legras S, Gunthert U, Stauder R, Curt F, Oliferenko S, KluinNelemans HC, Marie JP, Proctor S, Jasmin C, Smadja-Joffe F: A strong expression of CD44-v6 correlates with shorter survival of patients with acute myeloid leukemia. Blood. 1998; 91(9):34013413. [LeuO4] [Liu01] [MagOl] [MarOO] http://www.leukemia-lymphoma.org Liu H-X, Cartegni L, Zhang MQ, Krainer AR: A mechanism for exon skipping caused by nonsense or missense mutations in BRCA1 and other genes. Nat Genet 2001, 27:55-58. Magyarosy E, Sebestyen A, Timar J: Expression of metastasis associated proteins, CD44v6 and NM23-H1, in pediatric acute lymphoblastic leukemia. Anticancer Res. 2001. 21:819-823. Maroney PA, et al: Functional recognition of the 5' splice site by U4/U6.U5 tri-snRNP defines a novel ATP-dependent step in early spliceosome assembly. Mol Cell 2000, 6:317-328. [MedO2] [Mit99] http://www.nlm.nih.gov/medlineplus/ency/article/000570.htm R. D. Mitra, G. M. Church: In situ localized amplification and contact replication of many individual DNA molecules. Nucleic Acids Res. [MitO3] 27, e34 (1999). R. D. Mitra et al.: Digital Genotyping and Haplotyping with Polymerase Colonies Proc. Natl. Acad. Sci. U.S.A. 100, 5926 [Mu194] Mulder JWR, Kruyt PM, Sewnath M, Oosting J, Seldenrijk CA, (2003). Weidema WF, Offerhaus GJA, Pals ST: Colorectal cancer prognosis and expression of exon-v6-containing CD44 proteins. Lancet 1994. 344:1470-1472. [Yu96] Yu Q, Toole B: A new alternatively spliced exon between v9 an v10 provides a molecular basis for synthesis of soluble CD44. JEM. [RozOO] 1997. 18:1985. S. Rozen and H. J. Skaletsky: Primer3 on the WWW for general users and for biologist programmers. In: Krawetz S, Misener S Varma, Chris 64 of 65 Ph.D. Thesis [Sie99] [SieOO] (eds) Bioinformatics Methods and Protocols: Methods in Molecular Biology. 2000. Humana Press, Totowa, NJ, pp 365-386 Sierakowska, H., M.J. Sambade and R.Kole: Sensitivity of splice sites to antisense oligonucleotides. RNA. 1999. 5369-377 Sierakowska, S. Agrawal and R. Kole (2000) Antisense oligonucleotides as modulators of pre-mRNA splicing. Methods Mol Biol. 133223-33. [SkoO3] Skordis LA, Dunckley MG, Yue B, Eperon IC, Muntoni F.: Bifunctional antisense oligonucleotides provide a trans-acting splicing enhancer that stimulates SMN2 gene expression in patient fibroblasts. Proc Natl Acad Sci U S A. 2003. 100(7):4114-9. [Sle96] Sleeman J, Rudy W, Hofmann M, Moll J, Herrlich P, and Ponta H: Regulated clustering of variant CD44 proteins increases their hyaluronate binding capacity. J. Cell Biol.1996. 135:1139-1150. [Sle97] Sleeman J, Kondo K, Moll J, Ponta H, Herrlich P: Variant exons v6 and v7 together expand the repertoire of glycosaminoglycans [Str95] [Suw02] bound to CD44. J. of Bio. Chem. 1997. 272:31837-31844. Stryer, L: Biochemistry, 4 th Ed. W.H. Freeman & Company. 1995. Suwanmanee, T., H. Sierakowska, S. Fucharoen, and R. Kole. (2002) Restoration of human beta-globin gene expression in murine and human IVS2-654 thalassemic erythroid cells by free uptake of antisense oligonucleotides. Mol Pharmacol. 2002 Sep;62(3):545[Tan94] 53. Tanabe KK & Saya H: Crit. Rev. Oncog. 1994, 5:201. [Uku01] http://www.uku.fi/laitokset/anat/PG/ha_funct. htm [Van93] van Weering DHJ et al: PCR Methods Appl 1993, 3:100. [WauO3] http://www. neuro.wustl.edu/neuromuscular/pathol/spliceosome.htm [Wil99] Wilton, S.W., F. Lloyd, K. Carville, S.Fletcher, K. Honeyman, S. Agrawal and R.Kole (1999) Specific removal of nonsense mutation from the mdx dystrophin mRNA using antisense oligonucleotides. [Yu99] [Zhu03] Neuromuscular Disorders 9330-338. Yu Q, Stamenkovic I: Localization of matrix metalloproteinase 9 to the cell surface provides a mechanism for CD44-mediated tumor invasion. Genes Dev. 1999. 13:35-48. Zhu J, Shendure J, Mitra RD, Church GM: Single Molecule Profiling of Alternative Pre-mRNA Splicing. Science. 2003. 301:836-838. Varma, Chris 65 of 65 Ph.D. Thesis Room 14-0551 77 Massachusetts Avenue MITLibraries Document Services Cambridge, MA 02139 Ph: 617.253.5668 Fax: 617.253.1690 Email: docs@mit.edu http: Illibraries. mit. edu/docs DISCLAIMER OF QUALITY Due to the condition of the original material, there are unavoidable flaws in this reproduction. We have made every effort possible to provide you with the best copy available. If you are dissatisfied with this product and find it unusable, please contact Document Services as soon as possible. Thank you. Some pages in the original document contain color pictures or graphics that will not scan or reproduce well.