Multi-scale imaging and informatics pipeline for in situ pluripotent stem cell analysis ARCHIVES MASSACHUSETTS INSTITUTE OF TECHNOLOLGY by Bryan Robert Gorman APR 14 2015 B.Eng., Vanderbilt University, 2006 M.S., Vanderbilt University, 2007 LIBRARIES Submitted to the Harvard-MIT Division of Health Sciences and Technology in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY at the MASSACHUSETTS INSTITUTE OF TECHNOLOGY FEBRUARY 2015 C Massachusetts Institute of Technology 2015. All rights reserved. ... ? /7 Signature redacted Author...........................%. i4irvard-MI ivision of Health Sciences and Technology February 2, 2015 Certified by ................................... . Signature redacted -.................... Paul H. Lerou, MD Assistan t Professor of Pediatrics, Harvard Medical School Thesis Supervisor Accepted by......Signature r edacted ..................................... Emery N. Brown, MD, PhD Director, Harvard-MIT Program in Health Sciences and Technology Professor of Computational Neuroscience and Health Sciences and Technology Multi-scale imaging and informatics pipeline for in situ pluripotent stem cell analysis by Bryan Robert Gorman Submitted to the Harvard-MIT Division of Health Sciences and Technology on February 6, 2015, in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Bioinformatics and Integrative Genomics Abstract Human pluripotent stem (hPS) cells have the ability to reproduce indefinitely and differentiate into any cell type of the body, making them a potential source of cells for medical therapy and an ideal system to study fate decisions in early development. However, hPS cells exhibit a high degree of heterogeneity, which may be an obstacle to their clinical translation. Heterogeneity is at least partially induced as an artifact of removing the cells from the embryo and culturing them on a plastic dish. hPS cells grow in spatially patterned colony structures, which necessitates in situ quantitative single-cell image analysis. This dissertation offers a tool for analyzing the spatial population context of hPS cells that integrates automated fluorescent microscopy with an analysis pipeline. It enables high-throughput detection of colonies at low resolution, with single-cellular and sub-cellular analysis at high resolutions, generating seamless in situ maps of single-cellular data organized by colony. We demonstrate the tool's utility by analyzing inter- and intra-colony heterogeneity of hPS cell cycle regulation and pluripotency marker expression. We measured the heterogeneity within individual colonies by analyzing cell cycle as a function of distance. Cells loosely associated with the outside of the colony are more likely to be in G1, reflecting a less pluripotent state, while cells within the first pluripotent layer are more likely to be in G2, possibly reflecting a G2/M block. Our analysis tool can group colony regions into density classes, and cells belonging to those classes have distinct distributions of pluripotency markers and respond differently to DNA damage induction. Our platform also enabled noninvasive texture analysis of live hPS colonies, which was applied to monitoring subtle changes in differentiation state. Lastly, we demonstrate that our pipeline can robustly handle high-content, high-resolution single molecular mRNA FISH data by using novel image processing techniques. Overall, the imaging informatics pipeline presented offers a novel approach to the analysis of hPS cells, which includes not only single cell features but also spatial configuration across multiple length scales. Thesis Supervisor: Paul H. Lerou, MD Title: Assistant Professor of Pediatrics, Harvard Medical School 3 4 Acknowledgements I wish to thank my thesis advisor, Professor Paul Lerou, for taking me under his wing and guiding me and supporting this work. I also thank Dr. Rami Mangoubi, who advised the technical aspects of this project. I am grateful to my other committee members, Professor Leonid Mirny and Professor Suneet Agarwal, for tremendous advice and direction. Additionally, I thank Professor Jeremy Purvis at UNC and Dr. Nathan Lowry at Draper Laboratory for many invaluable discussions. Thank you to all current and past members of the Lerou Lab: Anna, Anthony, Faisal, Faith, Hilary, Jeanne, Junjie, Junning, Julia, Promise, Sean, Sutini, Vishesh, and Xiao. I am especially indebted to Anna Baccei, who assisted me with all of the projects described in this document by testing and debugging software, and generating most of the samples that I analyzed; and Dr. Junjie Lu, who assisted me with editing and revising the manuscript. Thank you to those that provided financial support for this work, especially the National Institutes of Health through the Bioinformatics and Integrative Genomics training grant, the Harvard-MIT Division of Health 5 Sciences and Technology (HST), and the HST IDEA-squared program. I am indebted to Dr. Julie Greenberg of HST, who went above and beyond in guiding me through the doctoral process. I am also grateful to the entire HST support staff, especially Laurie Ward, Traci Anderson, and my academic advisor, Professor Pete Szolovits. Most of all, I thank my wife Emily, my daughter Daphne, and my parents, for everything. 6 Contents M otivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 1.2 Related Work ...................... . . . . . . . . . 18 1.3 Approach and contributions . . . . . . . . . . . . . . . . . . 20 1.4 Organization . . . . . . . . . 21 . . 1.1 . . . . . . . . . . . . . . . . . . Biological background 25 2.1 Pluripotent stem cells . . . . . . . . . . . . . . . . . . . . . . 25 2.2 Stem cell heterogeneity . . . . . . . . . . . . . . . . . . . . . 26 2.3 Cell cycle regulation in pluripotent stem cells . . . . . . . . . 28 . 2 16 Introduction . 1 3 Image acquisition 30 H ardware . . . . . . . . . . . . . . . . . . . 30 3.2 Main acquisition graphical user interface . . 30 3.3 Pre-scan graphical user interface . . . . . . . 31 3.4 Manual and automated colony selection . . . 34 . . . . 3.1 7 38 Colony segmentation and identification . . . . . . 38 4.2 Segmentation of cells and feature extraction . . . . . . 39 4.3 Laser Scanning Cytometry . . . . . . . . . . . . . . . 39 4.4 Data visualization with FCS Express . . . . . . . . . 40 4.5 Construction of a virtual slide . . . . . . . . . . . . . 40 4.6 Single molecule FISH processing . . . . . . . . . . . . 41 4.6.1 . . . . . . 45 . . . . 4.1 . Threshold selection . . . . . . . . . Data pipeline and informatics 49 Spatial analysis . . . . . . . . . . . . . . . 49 5.2 Texture analysis . . . . . . . . . . . . . . . 51 5.3 DAPI normalization . . . . . . . . . . . . 52 5.4 Ensuring smFISH spot detection consistency across adjacent . 5.1 . 5 Image processing: colonies to cells . 4 fields . . . . . . . . . . . . . . . . . . . . . . 57 6 Results and discussion 59 6.1 Cell cycle heterogeneity as a function of location within colonies 59 6.2 Monitoring differentiation progress through texture analysis 68 6.3 Regional heterogeneity in cellular response to DNA damage. 72 6.4 High resolution quantitative analysis of early endoderm differ. entiation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Conclusion 75 79 8 D iscussion . . . . . . . . . . . . . . . . . . . . . . . . . . 79 7.2 Suggestions for future work . . . . . . . . . . . . . . . . . 81 . 81 . 7.1 Improving and streamlining program operation 7.2.2 Extension to other tissue types . . . . . . . . . . 81 7.2.3 Modeling hES colony formation . . . . . . . . . . 82 7.2.4 Modeling in vitro differentiation patterning . . . . 84 . . . . 7.2.1 98 98 A.2 Immunofluorescent protein labeling . . . . . . . . . . . . 99 A.3 Single molecule FISH (smFISH) labeling of gene expression 99 . A .1 C ell culture . . . . . . . . . . . . . . . . . . . . . . . . . . A Wet lab protocols B Cellular features list 102 C Software manual 109 C. 1 Availability . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 C.2 Installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 C.3 Compatibility . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 C.4 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 C.5 Image Acquisition . . . . . . . . . . . . . . . . . . . . . . . . . 110 C.6 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 9 List of Figures 1.1 Overview of research approach. 1.2 A workflow to obtain image-derived features from single cells, . . . . . . . . . . . . . . . . . 21 while placing them in the context of the colony they belong to. 22 3.1 The main window of the acquisition graphical user interface. . 32 3.2 The prescan sub-interface. . . . . . . . . . . . . . . . . . . . . 33 3.3 Selection of regions for high-magnification imaging based on a low-magnification prescan. . . . . . . . . . . . . . . . . . . . . 35 3.4 High-magnification imaging of region selected in Figure 3.3. Maximum projection images are automatically generated from z-stacks and titled into a regional array. The array is composed of 9 tiff stacks in a 3x3 grid. . . . . . . . . . . . . . . . . . . . 36 4.1 Heuristic method for obtaining seamless segmentation fields using adjacent overlapping regions without stitching the raw im age data directly. . . . . . . . . . . . . . . . . . . . . . . . . 42 4.2 Quantification of Oct4 mRNA level with smFISH in a hES cell. 43 10 4.3 Segmentation of high resolution stacks of hES cells with nuclear boundaries. 4.4 . . . . . . . . . . . . . . . . . . . . . . . . . 44 Overlay of detected Oct4 transcripts on the segmented cellular field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 5 4.5 Distribution of mRNA test statistic resulting from repeatedly re-imaging the same field of view, six times total. 4.6 . . . . . . . 46 Repeated imaging of the same field of view and applying the same smFISH spot detection threshold to each iteration. The number of spots detected decreases rapidly. . . . . . . . . . . . 47 4.7 Repeated imaging of the same field and adjusting the smFISH spot detection threshold such that each iteration has the same number of detected spots. detected spots is reduced. The accuracy of the individual . . . . . . . . . . . . . . . . . . . . 47 . . . . 50 5.1 Spatial Poisson method for analyzing colony structure. 5.2 Overview of the wavelet-based texture analysis method. . . . . 53 5.3 Illustration of intra- and inter-class texture differences in hES cells. The textures of three pluripotent patches and three mildly differentiated patches were calculated and compared in a pairwise fashion. Generally, the inter-class KLD was substantially larger than the intra-class KLD. 5.4 . . . . . . . . . . . 54 DAPI normalization method to correct for background variation. 56 11 5.5 Method for setting smFISH spot detection thresholds to ensure spot detection consistency across adjacent fields in a large region . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 6.1 (A) Density scatter plot of integrated DAPI intensity versus mean EdU intensity. Three gates were chosen to categorize cells based on their cell cycle state: G1, S, and G2/M. (B) Normalized histograms of mean Nanog intensity for the G1, S, and G2/M subpopulations. 6.2 . . . . . . . . . . . . . . . . . . 60 Illustration of the distance transform applied to cells in all colonies. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 6.3 Frequencies of cells as a function of distance from edge for the GI, S, G2, and M subpopulations. . . . . . . . . . . . . . . . . 63 6.4 (A) A cellular map illustrating the periphery, Cell Layer 1, Cell Layer 2, and Cell Layer 3 subpopulations. (B) Distributions of cell cycle phases for each cell layer. . . . . . . . . . . . . . . 64 6.5 Histogram distribution of mean Nanog intensity of cells on periphery and cell layers 1-3. . . . . . . . . . . . . . . . . . . . 65 12 6.6 Inter-colony heterogeneity of cell cycle distribution. (A) Histogram of cellular distance from edge in cells belonging to differently-sized colonies. The maximum cellular distance from the edge of the colony to the center was used to separate colonies into small (< 150 pm), medium (150 - 300 pm) and large (> 300 pm) sizes. (B) Distributions of cell cycle phases for each cell layer in small, medium, and large-sized colonies. The inter-colony heterogeneity is not significant except for a slight enrichment in S-phase cells in Cell Layer 1 of small colonies. Error bars represent 95% of bootstrap samples. Adapted from Gorman et al. [1] . . . . . . . . . . . . . . . . . . . . . . 67 6.7 Quantifying colony morphology change upon addition of retinoic acid to culture media. A square grid is superimposed onto colonies, dividing them into boxes, which are then processed. All boxes not completely contained within the colony borders are excluded. 6.8 . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 Pairwise Kullback-Liebler Divergence (KLD) values between all patches at every time point, 0-7 hours of differentiation. . . 70 6.9 Comparing distributions of KLD values. . . . . . . . . . . . . 71 6.10 Phenotypic changes of hES cells during retinoic acid-induced differentiation. . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 6.11 Regional heterogeneity in response to DNA damage. . . . . . . 74 13 6.12 Oct4 distribution of cells in low density, mixed density, and high density subpopulations. . . . . . . . . . . . . . . . . . . . 75 6.13 Single-cellular levels of Oct4 and cPARP in low density, mixed density, and high density subpopulations. . . . . . . . . . . . . 76 6.14 Reconstruction of Oct4 mRNA levels across 6 contiguous fields of view using the spot matching procedure. . . . . . . . . . . . 77 6.15 Sox17 vs Oct4 transcript counts in immediate vs. delayed early endoderm differentiation. . . . . . . . . . . . . . . . . . . 78 14 List of Tables 6.1 Differential spatial distribution of Nanog positive and negative cells in hES colonies. . . . . . . . . . . . . . . . . . . . . . . . 66 B. 1 Cellular and colony features derived by the data workflow discussed in Chapter 4. . . . . . . . . . . . . . . . . . . . . . . . 103 15 Chapter 1 Introduction 1.1 Motivation Ever since human embryonic stem cells (hES) cells were first isolated from the inner cell mass of a human blastocyst [2], they have been viewed as a 'holy grail' of medical promise. Because they have the ability to self- renew indefinitely and differentiate into any cell type of the body, they are potentially an unlimited source of cells for patients in need of cellular therapy [3]. Moreover, due to their provenance, hES cells are an ideal system to study cellular fate decisions in early human development. More recently, Yamanaka and colleagues devised a method to convert fully differentiated somatic cells into an embryonic-like state, known as induced pluripotent stem (iPS) cells, through the over-expression of certain transcription factors [4,5]. Collectively, we refer to hES cells and iPS cells as human pluripotent stem (hPS) cells. 16 A major branch of therapeutic stem cell research is aimed at understanding how pluripotent cells acquire their ultimate fate as a defined tissue. Considerable effort has gone into developing directed differentiation protocols by empirically adding or removing inductive signals to the differentiating cell population in order to progressively enrich specific cell subsets that will yield the cell of interest [6], however current directed differentiation protocols are often low yield and highly variable. Compounding the complexity of in vitro differcntiation is that hPS cells are inherently highly heterogeneous [7, 8]. Heterogeneity (cell-to-cell phenotypic variation) is a consistent and necessary feature of hES cells [9, 10]. Lineage-biased progenitor cells, identified by expression of specific cell-surface markers, can be isolated from a clonal population of undifferentiated hES cells [11]. This inherent heterogeneity is thought to contribute to the ability of hES cells to differentiate into multiple lineages [9]. Nevertheless, it poses problems for the clinical use of pluripotent stem cells by biasing subsets of cells to different lineages [12]. An additional source of heterogeneity is induced as an artifact of the cell culture micro-environment, and includes such features as proximity to other cells, density, and gradients in growth factors and other cytokines [13]. hES cells share direct cell-to-cell contacts in the form of gap junctions [14]; are maintained through diffusible autocrine and paracrine signaling [15]; display high rates of apoptosis when plated as single cells [14]; and undergo anoikis [16]. The colony is a feature of standard hES culture conditions [17]. Standard hES cultures exhibit a wide diversity of colony and cellular phe17 notypes. Cells in large, dense, colonies receive a different set of chemical and mechanical signals than cells residing in smaller, sparser, colonies [18]. Moreover, within any given colony, there may emerge cellular subsets that spontaneously differentiate from hES cells and help to support the growing colony [19]. 1.2 Related Work Population context has been shown to correlate with heterogeneous cellular states in other cell types [20, 21], and we hypothesize that it is vital to understanding hPS cell heterogeneity. Studies have started to explore the structure of the hES cell niche and how ES cells self-organize into subpopulations [19, 22, 23], but a complete understanding of cellular dynamics within the colony structure remains elusive. Traditional molecular biology approaches such as immunoblotting or gene expression analysis average cells over the population, which can conceal the true cellular dynamics [24, 25]. And, while flow cytometry provides single cell data, it requires breaking up colonies to create cellular suspensions. Thus, novel imaging approaches applied to stem cells are necessary. Some previously published methods have focused on live cellular analysis. Li et al. [26] developed an in vitro cellular tracking tool using time-lapse phase contrast images. Their method integrates spatiotemoporal contextual information and constructs cell lineages. Alternatively, Padfield et al. [27] used 18 live cell cycle fluorescent phase markers to monitor cell cycle progression in live cells. However, it is difficult to scale single cellular tracking methods up to sufficiently track individual cells within a structure as dense as a human embyronic stem cell colony. One possibility is to create a diluted mixture of labeled cells in non-labeled cells. However, there is also much information to be gained from imaging fixed cells (including dynamic information, assuming ergodicity [28]), or a combination of imaging live cells pre- and post-fixation and linking the information together. Both of the latter approaches are described in this thesis. Two recent studies have developed methods focusing specifically on analyzing stem cell colonies. Paduano et al. (2013) reported an image analysis pipeline for segmenting stem cell colonies and performing location analysis on specifically-labeled cells [29]. Alternatively, Scherf et al. (2012) developed an automated analysis and tracking system to monitor the growth and morphology of live colonies over time, using phase contrast light [30]. Another pair of recent studies have used micropatterning technology to force hPS colonies into defined shapes and sizes. Nazareth et al. [31] demonstrated that using microcontact printing to control colony densities and shapes can reduce cellular heterogeneity and improve differentiation efficiency. Warmflash et al. [32] found that constraining colony size triggered self-organization, particularly causing differentiation on the outer edges of the colonies. Those two studies look at single cellular properties in the context of the colony microenvironment. However, we have endeavoured to study hPS 19 cells in standard culture conditions and attempt to measure the emergent heterogeneity in that system, which includes accounting for colonies of all different shapes and sizes in any arrangement. 1.3 Approach and contributions Our approach presented here differs from these methods for analyzing stem cell colonies by focusing on generating in situ maps of all cells and their locations relative to each other, as well as within the structure of the colony, allowing comparison across multiple length scales (Figure 1.1). The method we describe integrates automated fluorescent microscopy with an analysis pipeline that enables high-throughput detection of colonies at low resolution, combined with single-cellular and sub-cellular analysis at high resolutions, generating seamless cell maps and single-cellular data organized by colony and neighborhood. This rigorous informatics approach will lay the foundation for understanding the influence of spatial population context in early differentiation. The workflow of our tool is illustrated in Figure 1.2. We developed a system for investigating in situ heterogeneity that evaluates individual cells in the context of their micro-environments. This system extracts a vector of image-based features for each cell, such as fluorescent intensity, texture, and morphology, as well as the cell's position within the colony. It also constructs a virtual slide through an automated process of image mosaicking, colony 20 System for rapidly analyzing intercellular dynamics Single cell properties Cell-cell Colony-colony Cell-colony Figure 1.1: Our system enables researchers to analyze intercellular dynamics in hES cells by structuring relationships between cells within a colony; between cells and the colony they belong to; and from one colony to another. Adapted from Gorman et al. [1] segregation, and cropping. Finally, it exports the results in software that allows visual manipulation and gating on features or space. 1.4 Organization The thesis is organized as follows: Chapter 2 contains a brief review of the necessary biology that motivates the experimental research described the following pages. An overview of basic pluripotent stem cell biology is presented, as well as a discussion of stem cell heterogeneity, and cell cycle regulation in pluripotent stem cell biology. Chapter 3 walks through the acquisition software we developed for rapidly imaging single cells with the context of their colony microenviron21 2a. Automatic Colony 3. High Mag. Colony Scanning Detection 4. High Content Feature Extraction Cellular Properties - Expression - Size - Shape - Global Location -- + - Relative Location / 1. Low Mag. Prescan Colony Properties - Size - Shape - Density - Texture 2b. Manual Re Selection Figure 1.2: A workflow to obtain image-derived features from single cells, while placing them in the context of the colony they belong to. Thus, the pipeline involves a multi-scale segmentation of the colonies within the sample, and the cells within the colonies. Each cell not only has a physical address within the colony, but is linked with colony-wide properties, such as the size and shape of the colony it is derived from. Adapted from Gorman et al. [1] 22 ments. The strength of our system is that it integrates image acquisition with image analysis, using a prescan to segment colony borders and selecting them for high-resolution imaging either manually or automatically. Chapter 4 details the techniques employed and the software used for segmenting cells and colonies, generating a 'virtual slide' representation of the data, processing image stacks to detect single molecule FISH probes, quantitating image spots per cell, and thresholding the smFISH signal using Gaussian Mixture Model-Expectation Maximization. Chapter 5 describes the data pipeline and methods for processing and analyzing these data. The chapter begins with a discussion of spatial analysis. Then, we delve into wavelet-based texture analysis. To solve for regional DAPI staining variability, we introdouce a method for normalizing DAPI levels on a per-colony basis. Finally, we describe a novel method for adjusting mRNA FISH spot detection thresholds to ensure consistent measurements across adjacent fields. Chapter 6 contains the results of applying the tool to a variety of biological problems. We present data on cell cycle heterogeneity as a function of colony position (distance from the edge). We also present data showing that phase-contrast-based texture analysis may be used to monitor the hourto-hour progression of retinoic acid differentiation. Then, we demonstrate that we can classify different regions based on their macro-scale properties, and correlate those regions with the underlying cellular behavior. Finally, we present preliminary results from the application of the pipeline to single 23 molecule FISH measurements over wide areas. Chapter 7 concludes with a discussion of the contributions of this thesis and suggestions for future work. 24 Chapter 2 Biological background 2.1 Pluripotent stem cells Ever since human embryonic stem cells (hES) cells were first isolated from the inner cell mass of a human blastocyst in 1998 by Thomson et al. [2], they have been viewed as a 'holy grail' of medical promise. Because they have the ability to self-renew indefinitely and differentiate into any cell type of the body, they are potentially an unlimited source of cells for patients in need of cellular therapy [3]. Moreover, due to their provenance, hES cells are an ideal system to study cellular fate decisions in early human development. More recently, Yamanaka and colleagues devised a method to convert fully differentiated somatic cells into an embryonic-like state through the over-expression of certain transcription factors [4, 5]. These cells, known as induced pluripotent stem cells (iPS cells), have the advantage of being 25 immuno-matched to the individual patient. In most important respects, hES cells and iPS cells are identical; however, hES cells remain the 'gold standard' due in part to an incomplete understanding of the stochastic process of reprogramming. Collectively, we refer to hES cells and iPS cells as human pluripotent stem (hPS) cells. A major branch of therapeutic stem cell research is aimed at understanding how pluripotent cells acquire their ultimate fate as a defined tissue. Considerable effort has gone into developing directed differentiation protocols by empirically adding or removing inductive signals to the differentiating cell population in order to progressively enrich specific cell subsets that will yield the cell of interest [6]. Most of these studies are carried out in tissue culture and thus fail to achieve the degree of spatiotemporal regulation found in vivo; as a result, current directed differentiation protocols are inefficient and highly variable [33]. 2.2 Stem cell heterogeneity Heterogeneity is a consistent and necessary feature of human embryonic stem cells (hESCs). Lineage-biased progenitor cells, identified by expression of specific cell-surface markers, can be isolated from a clonal population of undifferentiated hESCs [11, 34]. This inherent heterogeneity is thought to be necessary for the ability of ESCs to differentiate into any cell type. Nevertheless, it poses vexing problems for the clinical use of pluripotent stem cells. 26 Meanwhile, stem cells cannot be made more homogeneous by population fractionation; if a subset is identified, expanded in culture, and re-assayed, the original population heterogeneity is generally recovered [35]. There are several proposed mechanisms explaining heterogeneity in hES cells. One such mechanism is thought to a consequence of chromatin regulation via the NuRD complex [36]. NuRD inhibits the transcription of key pluripotency regulators; its dynamic interaction with pro-transcription signals results in heterogeneity. Another likely source of heterogeneity is induced as an artifact of the cell culture micro-environment. Standard hESC cultures typically exhibit a wide diversity of colony and cellular phenotypes [37,38]. At any given snapshot in time, there exists a large spectrum of colony sizes, cellular density, and differentiation status. For instance, cells residing in large, dense, colonies receive a completely different set of chemical and mechanical signals than cells residing in smaller, sparser, colonies [18]. Moreover, within any given colony, there appears to be cellular subsets that spontaneously differentiate from hESCs and help to support the growing colony [19]. Because most assays involve either examination of cells in aggregate, or breaking up colonies to create cellular suspensions, to address this source of heterogeneity, it is vital to use imaging. 27 2.3 Cell cycle regulation in pluripotent stem cells Cell cycle regulation is fundamental to the identity of embryonic stem cells and their differentiation [39-41]. One characteristic of human pluripotent stem cells (hPSCs) is a short G1 phase that generally lasts about 3 hours out of a total cell cycle length of 16 hours. Although it varies by cell type and lineage, G1 phase is generally much longer (about 6-10 hours) in somatic (i.e., fully differentiated) cells. Hence, a greater percentage of an unsynchronized population of differentiated cells will be in G1, relative to an unsynchronized population of undifferentiated cells. Although artificially lengthening G1 in mESCs by independently targeting several key cell cycle regulators (CDK inhibitors p21 and p27, Rb, and E2F) was recently shown not to affect differentiation propensity [42], changes in the activities of these cell cycle regulators are nonetheless tightly linked to differentiation status [43]. Imaging and image analysis are powerful tools in cell cycle analysis. In addition to cell cycle analysis based on DNA content (DAPI staining) and DNA synthesis (BrdU/EdU staining), individual cellular image-based features such as size, shape, and texture, can be used to distinguish mitotic cells. Moreover, imaging provides the location of cells in the context of their in vitro microenvironment (the colony). As inter- and intra-colony heterogeneity is thought to be critically related to differentiation status and mitotic activity of embryonic stem cells, quantitative image analysis provides a rig28 orous way to address questions about how colonies develop and cells are distributed throughout them. For example, a recent study claimed that Sphase cells were distributed as clusters throughout the colony [44]. However, in the absence of rigorous statistical analysis, it is not clear whether such clusters are significantly different than would be expected through random dispersal of S-phase cells. This dissertation investigates how the cell cycle varies as a function of spatial heterogeneity. 29 Chapter 3 Image acquisition 3.1 Hardware We utilized a Nikon Eclipse Ti inverted microscope equipped with a Prior motorized stage and a high-resolution CoolSNAP HQ2 CCD camera. We wrote custom software in MATLAB to control the microscope, camera, shutters, and stage hardware. 3.2 Main acquisition graphical user interface We constructed a program that allows multi-scale imaging of ESCs in different regions across the slide. In order to facilitate usage by people with varying computer backgrounds, we also constructed a graphical user interface (GUI) to control this program. The main window of the GUI is shown in 30 Figure 3.1. The main GUI window contains UI elements that allow the user to adjust all necessary parameters of the program. This includes parameters controlling the microscope nosepiece (objective lens to be used), the camera (binning, gain, exposure, and region of interest properties), the z-drive (number and interval of z-stack slices), and the program I/O (save directory, notification of completion). 3.3 Pre-scan graphical user interface The second major element of the program is region selection. In selecting regions, the user has the option of generating a lower-magnification pre-scan imaged with transmitted phase light and/or fluorescent channels. The prescan sub-GUI is illustrated in Figure 3.2. Once the pre-scan is executed, the user may manually select specific regions of interest by drawing boxes onto macroscopic image of the entire slide (Figure 3.3). Alternatively, bounding boxes can be automatically drawn around colonies through integrated processing and segmentation of the colonies from the pre-scan image. Selected regions can then be flexibly reviewed and edited before the region positional data is transferred into the main program for execution of the final fullcontent image acquisition. The software offers many flexible options for the high-resolution scan. Users may select any level of magnification, including both dry and oilimmersion objective lenses; any combination of fluorescent filters and expo- 31 mRNA FISH Acquisition tion-DoThis -z FrsE - 1. kwlakze F formautin Inhiahz Micro-Afnager GUW Diecor (-no p-c) Person to toxt I 111111a 1111:10,41PWcore D:\Annd\2D1 3-4-23_DE7-mRNA-FISH-Covemipa None Iniazed Intialized &.hiti"-toge pecn No Prescan Update stage upper left poaton jate ObjAce Obijcve lans PApdx stage owert on 4Use Prescan Z-ksck____________ 4 OpenPftU05i GUI 45 Number of elos iterval between skees (um) x -82456.5, y - 2001.3 ...TO ... xx -6377?2.8. - -40785.7 .. TO .. x- .25 -8 I m Overlap (%) Movestage to upper left poelon Camara E] Rol x1 1 %(1 xSv e 2 Use conslant focal center Use variable focal center Region 1: /Size1C4C 1603.1 of colors 1 Color *1: DAP, Gain: 2 2 Region 4: 1599 4 3 ".2 Focal center (um) Region 2: Region 3: 1816.5 1607 Region S: Region 6: 1602 cowo O2: 1Cys 100 Geb: 1000 Gm: Coor 3: Cy3 Gain: poelon ilt r Focus Binning -Number Move stage to lower 2 800 Cobr#4: Figure 3.1: The main window of the acquisition graphical user interface (GUI). Adapted from Gorman et al. [1] 32 Array Prescan - Generate a pre-scan of the side with 10x Phase. 1. Move Stage to Upper Let of Cover Slip x - -87399.5, y - -46896.4 IUpdaft stop upper " 2. Move Stage to Lower Right of Cover Slip IUpdate gap lower f ght posinI x - -81067.8, y - -40074.6 3. Choose a region, enable PFS and adjust offset to focus (lOx phase). Offsdt 119.3S Update PFS offet 5. Save Directory na 20 4. Set prescan exposure. Dii: 1 D:Anna\2013-04-23_DE7-mRNA-FISH-Cove Execte scan Draw regions to scan at higher resolution. Add current region to Not dregaln from flt Remove x - -2015.2, y - 42520.1 ... TO ... x - -81535.2 y - 42132.9 x--84872.3, y - - 4 411 5 .1 ... TO ... x - -84392.2 y - -43727.9 x - 82456.5. Y - 42001.3... TO ... x - -81976.5 y - -41614.2 x - -82619.1, y - -40994.8 ... TO ... x - -82092.6 y - -40576.7 PANIC BUTrON - Press and hold to cancel scan I Figure 3.2: The prescan daughter window. Adapted from Gorman et al. [1] 33 sure times; and may opt for an overlap between adjacent fields. Additionally, we implemented the ability to image Z-stacks, which may be useful for many applications. Because the focus level may vary across a slide, users have the option of automatic focusing using the microscope hardware (e.g. Nikon Perfect Focus), or manually defining a constant or variable focal plane for each selected region. As the program scans the selected regions, tiff stacks are saved in a specified directory, along with data files recording the imaging parameters and each stage position. Finally, a composite array of each region is constructed, by tiling together maximum projections of each tiff stack. This system provides powerful and flexible controls to image and analyze the heterogeneous population context of hPS cell culture at multiple scales. Because it seamlessly integrates imaging with analysis, including automatic region selection based on colony segmentation, our setup is an improvement over currently available solutions to study long-range relationships of single hPS cells in culture. 3.4 Manual and automated colony selection Once the scan is executed, the user is presented with a macroscopic field of view of the entire slide, as shown in Figure 3.3, which the user can then draw rectangles on to select regions. Once the user has selected a region by drawing a rectangle around it, the user then adds the selection to a list by clicking a button on the window. Users may see and edit previously selected 34 Figure 3.3: Selection of regions for high-magnification imaging based on a low-magnification prescan. 35 Figure 3.4: High-magnification imaging of region selected in Figure 3.3. Maximum projection images are automatically generated from z-stacks and titled into a regional array. The array is composed of 9 tiff stacks in a 3x3 grid. 36 region by highlighting selections in the textbox. Upon selecting all regions of interest, the user clicks an update button on the main GUI window to transfer the data into the main program for execution. Finally, because the focus level may change across a slide, users have the option of either using a constant focal center, or defining one for each region they select. Figure 3.4 shows an example of the high-resolution image derived from the pre-scan in Figure 3.3 37 Chapter 4 Image processing: colonies to cells 4.1 Colony segmentation and identification We mosaicked constituent field images into complete regions, taking advantage of the submicron accuracy of our Prior ProScan III stage. Using a combination of Gaussian blurring and thresholding, we identified the boundaries of the colonies. Those colonies were then cropped from the mosaic and sent to the segmentation module. Due to slight discontinuities at the borders between regions, cells on the borders were discarded from the analysis. 38 4.2 Segmentation of cells and feature extraction Field images were loaded into the segmentation module for processing and feature extraction. CellProfiler [45, 46] was called directly from MATLAB, with the resulting segmentation label matrices and feature sets automatically reimported. Within CellProfiler and MATLAB, we extracted numerous features from cells, including integrated intensity of multiple stains, simple measures of texture, and shape measurements, such as area, perimeter, and eccentricity. Additional regional and global properties were calculated and appended to the data structure. These include regional and global indices for each cell, regional and global virtual addresses and physical addresses, and levels of DAPI and other relevant staining signals. 4.3 Laser Scanning Cytometry The analysis and informatics pipeline is easily portable. As an example, images acquired with Laser Scanning Cytometry (CompuCyte; Cambridge, MA), a non-confocal laser scanning microscope, can also be processed. 39 4.4 Data visualization with FCS Express We wrote custom scripts to export data into the FCS Express 4 Image Cytometry format (De Novo Software), allowing flexible manipulation of image and feature data. 4.5 Construction of a virtual slide The virtual slide abstracts the important information about cellular space without having to deal with massive, cumbersome image files. The segmentation label matrix for each field is loaded into the region label matrix. Field labels are converted to global labels. For each field, the feature data is loaded and collated in to a data array structure addressed by the region number, field coordinates, and feature of interest. The centroid of each cell is derived from the segmentation results. Those values were combined with the colony identification step to yield an "address" of the cell within the context of its neighbors and the colony in which it resides. This allowed us to achieve our goal of analyzing images of single stem cells at multiple scales. The lowest resolution of analysis is comparing cells aggregated over one colony to another, e.g. the properties of cells in large colonies versus small colonies. The next level of resolution is comparing regions of the colonies, e.g. the center of the colony to the periphery. The next level beyond that is evaluating cells within the context of their immediate cellular neighborhood. The final level is subcellular resolution, wherein we 40 analyze the texture of individual cells. Our system incorporates a method for generating seamless maps of cellular fields (Figure 4.1). We apply a heuristic based on the accuracy of our robotic stage, which is that the same object existing in adjacent fields should overlap in pixel space. As long as the overlapped margin is at least twice as large as the diameter of a large cell, all cells will completely exist in at least one field. Corresponding cellular objects are identified across image fields, and a consensus seamless segmentation is arrived at by generally taking the larger of the two objects. 4.6 Single molecule FISH processing To test if the platform is amenable to subnuclear resolution image data, we adapted the acquisition and analysis pipelines to single-molecule mRNA FISH (smFISH) data where each FISH spot corresponds to the diffractionlimited image of a RNA molecule with a size less than the resolution of the light microscope. RNA spots were detected by imaging Z-stacks of 50 slices at 0.3 plm apart with a 60x Plan Apo objective. To segment RNA spots computationally, we adapted an algorithm that uses a morphological opening filter to remove out-of-focus light, followed by a Laplacian of Gaussian filter to detect candidate foci [47]. A test statistic was then derived for each foci based on a three-dimensional Hessian matrix to detect curvature of the spot (how much it resembles a point source of 41 Incomplete cell Complete cell retained discarded from in combined label label matrix matrix Field 1 Field 2 Combined Label Matrix of Fields 1 and 2 Figure 4.1: Heuristic method for obtaining seamless segmentation fields using adjacent overlapping regions without stitching the raw image data directly. Adapted from Gorman et al. [1] 42 Figure 4.2: Quantification of Oct4 mRNA level with smFISH in a hES cell. Left: raw image; Right: segmented FISH spots. Adapted from Gorman et al. [1] light) and the intensity of the Laplacian of Gaussian. This statistic was then thresholded to separate real foci from random noise. Figure 4.2 presents examples of processed images of hES cells hybridized for Oct4 mRNA. Additionally, a colony mask was generated using Otsu's method [48] to threshold the maximum projection of the RNA background staining. A Veronoi tessellation was used as a first-order approximation for the cellular boundaries, generated by seeding the nuclear outlines and expanding them until touching. An example of the segmentation of the nuclei, approximate segmentation of the cell boundaries, and detection of spots in a colony is presented in Figures 4.3 and 4.4. 43 Figure 4.3: Segmentation of high resolution stacks of hES cells with nuclear boundaries in red, cell boundaries in yellow, and colonies boundaries in green. Adapted from Gorman et al. [1] 44 Figure 4.4: Overlay of detected Oct4 transcripts on the segmented cellular field. Adapted from Gorman et al. [1] 4.6.1 Threshold selection After repeated exposure of the same field, the signal peak begins to deteriorate. This loss of signal has a profound effect on individual cellular measurements. Since our objective is to image a wide area, some fields inevitably overlapped and are more exposed relative to others. It also poses a practical problem, as the same threshold for separating background and foreground noise is not equally applicable to every field. We tested this signal degradation by repeatedly imaging Z-stacks of the same field of view on a sample. We then applied the method in Section 4.6 to detect foreground spots from the background noise. The distribution of the test statistic is shown in Figure 4.5. The degradation of the signal peak 45 - t=1 --5 024 02 2 0n 3 0, r 02" 02 t 4 se Ieds - 02 0 5 ,to 5 5 D 6 ' 02 02 01 015 01- ON -f 0 S to 16 20 Figure 4.5: Distribution of mRNA test statistic resulting from repeatedly re-imaging the same field of view, six times total. is observable between the first and second exposure sets. Moreover, by the fourth exposure set, the signal peak was no longer visible by eye. Applying the same threshold to each field resulted in Figure 4.6. Applying separate thresholds to equalize the number of spots in each field resulted in Figure 4.7. Thus, we needed to incorporate a method for automatically thresholding fields that was robust. Since the background and foreground signal distributions approximate Gaussians, we used Gaussian Mixture Model ExpectationMaximization (GMM-EM) to determine the threshold between between background and foreground. This involves modeling the background and foreground distributions as Gaussians with means po, p1i and standard deviations o, (-1. For each data point x, we wish to classify each as Z(x) = 0 or Z(x) = 1. We define qz(x) to be the probability that data point x belongs 46 T=1 3 2 I 4 5 6 Figure 4.6: Repeated imaging of the same field of view and applying the same smFISH spot detection threshold to each iteration. The number of spots detected decreases rapidly. T=1 2 3 4 5 6 Figure 4.7: Repeated imaging of the same field and adjusting the smFISH spot detection threshold such that each iteration has the same number of detected spots. The accuracy of the individual detected spots is reduced. 47 to class Z, Nz to be the number of data points in class Z, and 7z to be the proportion of the points in class Z as a fraction of the total population. Lowry [49] derived the following parameter equations, which are re-estimated at each iteration: qz(x), 7TZ= (4.1) N Nz 1qz(x), x=1 (4.2) INqz(x)Y(x), Jz = (43) and N 2 qz(x)(Y(x) = - Pz)2. (4.4) x=1 Then, the probability vector is updated [49]: _z pizN(Y(x); pz, cz c,= 0, 11 7cN (Y (x); p,, a-, We found that a random initial condition would typically not converge on the optimal threshold. Thus, for an initial condition, we set qO = 1 and qi = 0 for all points less than 1 standard deviation above the background peak and qO = 1 and qi = 0 for all points greater than 1 standard deviation above the background peak. 48 Chapter 5 Data pipeline and informatics 5.1 Spatial analysis In addition to measuring the distance-to-edge dependence of cell cycle variation, we can use a spatial statistical approach to analyze whether a certain cells residing in a given cell cycle phase are clustered together, the presence of which suggests local synchrony. If a cell state distribution is uniform and continuous, then it may be modeled as a spatial poisson process. Thus, if a random point is chosen in a colony and a circle of radius r is extended outward, the probability of k cells belonging to a given cell state contained in the circle, X(S), is [50]: [w2]k e-Arr 2 P{X(S) = k} = k! . (5.1) Thus, in analyzing whether the distribution of certain subsets of cells, 49 Figure 5.1: Spatial Poisson method for analyzing colony structure. Random points in the colony are chosen and a circle is expanded outwards. The distribution of arrivals of consecutive cells is compared to an exponential distribution. such as S-phase cells, is uniform in a population, we can statistically compare it to a spatial Poisson process. We did this by choosing random points in the colony and expanding a circle outwards (Figure 5.1). Each additional cell added as the circle expands is an arrival, and the increase in area between consecutive cells added is the inter-arrival 'time'. In Poisson processes, the distribution of inter-arrival times is exponential. Therefore, the hypothesis of uniformity can be tested using a one-sample Kolmogorov-Smirnov test, where the observed distribution of inter-arrival times is compared against an exponential distribution. 50 5.2 Texture analysis A major challenge impeding the utility of human pluripotent stem (hPS) cells is heterogeneity in gene expression and differentiation potential. This heterogeneity is caused in part by culturing embryo-derived cells on a plastic dish. Within a culture, there exists a wide spectrum of colony sizes and cellular density. Thus, we sought to examine colony heterogeneity by developing a multi-resolution analysis, linking a macro-property phase contrast microscopy - colony texture via to the underlying cellular behavior. Texture is broadly defined as spatial variation in pixel intensity, and due to texture manifesting over multiple spatial scales, wavelets are convenient for analyzing it. Following work by Mangoubi and colleagues [51-57], we developed methods for characterizing colony texture and correlating it with single-cellular properties. The method involves splitting colony images into small square patches. The pixel size of the squares is a trade-off between texture information and resolution. The discrete wavelet transform is then used to decompose these patches into subbands. Coefficients of each subband are modeled as a generalized Gaussian distribution [51,52]: p(x; a, f) = 2aF'(1/#) e-Cl"|/" , (5.2) where the estimated parameters a and 3 collectively form a feature vector. The statistical dissimilarity of two textures is calculated using the Kullback- 51 Liebler divergence (KLD) [52, 581, DKL(1, 2) P1(X) In (x) dx, (5.3) which can be conveniently simplified by assuming that the wavelet coefficient distributions are independent. With that simplification, the KLDs may be summed across subbands [51]. Figure 5.3 illustrates the application of this methodology to different colonies within a putatively undifferentiated sample of hES cells in standard culture conditions. 5.3 DAPI normalization DAPI (4',6-diamidino-2-phenylindole) is a fluorescent stain that binds to DNA and allows visualization of cell nuclei. It is also useful for quantifying the amount of DNA in a cell, which is approximately proportional to the integrated fluorescent intensity of the nuclear signal [59, 60]. Because many cells are stationary at 2N or 4N DNA, distributions of cellular DNA show peaks corresponding to those amounts, with 4N cells approximately double the integrated intensity of 2N cells. When we combined cellular DAPI levels across an entire coverslip, we observed an uncharacteristically noisy distribution (Figure 5.4C). When we looked at the DNA profiles for individual colonies, we discovered that they were far less noisy (Figure 5.4D). We eventually discovered that the artifact was due to regional variation in background and staining that particularly affected the DAPI signal due to 52 A Subdivide image into square patches. B Apply Discrete Wavelet Transform to decompose patches into subbands. H - C Model subband coefficients as Generalized Gaussian Distribution (GGD). v ~DU H Calculate KullbeckLiebler Divergence Histogram of subband coefficients. (KLD) between GGDs. Figure 5.2: The wavelet-GGD texture analysis method developed by Do and Vetterli [58] and originally applied to the analysis of stem cell data by Mangoubi et al [52]. A) A stem cell colony imaged using light microscopy is subdivided into square patches. The pixel size of the squares is a trade-off between texture information and resolution. B) The patches are decomposed using the Discrete Wavelet Transform. The top inset shows visual representations of the decomposition, and the bottom inset contains histograms of the sub-band coefficients. C) Each set of coefficients may be modeled as belonging to a Generalized Gaussian Distribution (GGD), which has two parameters, a and 3. Collectively, the parameters from each subband form a feature vector whose divergence is calculated using the Kullback-Liebler Divergence (KLD). 53 Pluripotent 1 2 3 4 5 6 Mildly Differentiated 3 1 2.5 Pluripotent 2 2 1.5Q 4 Mildly Differentiated 5 0.5 6 2 3 Pluripotent 4 5 6 0 Mildly Differentiated Figure 5.3: Illustration of intra- and inter-class texture differences in hES cells. The textures of three pluripotent patches and three mildly differentiated patches were calculated and compared in a pairwise fashion. Generally, the inter-class KLD was substantially larger than the intra-class KLD. 54 mounting media fluorescence in the blue wavelength (Figure 5.4A). To address this problem, we leveraged our data pipeline, which segregates cells into local colony structures. Using a Gaussian kernel density estimation, the DNA distribution for each colony was modeled, and 2N and 4N peaks were identified. Each colony then became an internal control with its own cellular DNA distribution containing peaks at 2N and 4N DNA. Plotting the 2N and 4N peaks, we found that most colonies indeed had 4N peaks at approximately twice the DAPI intensity of the 2N peaks (Figure 5.4B). Colonies that fell outside a narrow linear range were found to be small colonies (less than 100 cells) which were poorly modeled, and were discarded from further analysis. Because the peaks define DNA content at an absolute level, we normalized the DAPI values of every cell by globally aligning the colony peaks to values of X and 2X: D' = (D + D4N - 2D2N) (D4N D2N (5.4) Where D is the intensity of the DAPI stain, D' is the rescaled intensity, and D 2 N and D4N are the intensities of the identified 2N and 4N peaks. Applying this method, we normalized the cellular DAPI values to align the individual colony peaks (Figure 5.4F) and found that resulted in a much less noisy total DAPI distribution (Figure 5.4E). 55 A Estimated G1 Peak vs Estimated G2 Peak Each Do, Reresents One Coln B OAtoldeof ewue fegow Aeeunled to Be Poorly Modeled: Boclude From Fuelhednkeyels 201Evury#at C04, 004 1002000 410 12 003 0 ot 00 10 50 300 20 0.015 0.0 0.02 0.015 0 2W sA . E 100 DA PI C20te t (A 00 wF 0.025012D 0.02 1000.0 a, 50s 100 0 1AP 0 CQ2AU)050 00 DAPI Conln (A.V,) 0 Figure 5.4: DAPI normalization method. (A) A laser-scanned DAPI image of an entire coverslip, displayed using a colormap where red is high and blue is low. The range is truncated in order to see the variation in the background levels. (B) To correct for DAPI variation, estimated G1 and G2 peaks are found for each colony. Generally, the G2 peak should be approximately 2 times greater than the GI peak; colonies that deviate from that specification are discarded. (C) Without applying any correction to the data, the DNA distribution is poorly defined. (D) By examining cell cycle profiles for individual colonies, we find that they are resolved into well-defined peaks, but have offsets at different levels due to background variation. (E) Applying the correction method resolves the poorly defined DNA distri-bution. (E) Alignment of colony cell cycle peaks Wto a single distribution. Ensuring smFISH spot detection consis- 5.4 tency across adjacent fields Due to heterogeneity from one field to another, as well as potential diminution of the signal from adjacent fields due to imaging wide area, we found that it was difficult to ensure consistent spot detection from field to field. Therefore, we developed a method for adaptively adjusting the spot thresholds s(x,y) at each grid x, y, using adjacent overlapped fields (Figure 5.5). We subdivided each image space into top, bottom, left, and right margins. We then minimized an error function based on the sum squared differences in the number of detected spots at the overlapping margins between fields, M-1 N-1 i Z{(f x=2 y=2 y(sxy) - f +1(Sxy+1))2+ (f ,,(sx'Y) (frY(sx'Y) - fI+1,Y(sx+1,y))2 + (fx,,(sx'Y) f_ - - 1 ,Y(sx-1,y)) 2 + min fx_1(sx'Y-1))21 (5.5) where M and N are the size of the regions in number of image fields and fx,y is the number of spots in the (x, y) image field calculated in the top (t), bottom (b), left (1), or right (r) margins using the threshold level sx'Y. Minimizing the error function across all grids by locally or adaptively adjusting thresholds provides continuity and consistency as we move from one grid to another. The objective function uses a 2-norm, but a 1-norm may 57 Calibrating spot detection across large regions field x,y right field x+1,y left 4 0 Figure 5.5: Method for setting smFISH spot detection thresholds to ensure spot detection consistency across adjacent fields in a large region. Adapted from Gorman et al. [1] also be used for greater robustness to outliers. The minimization provided a matrix of local or grid specific threshold values S = [s,,]. An initial estimate for S, So, was obtained by thresholding the combined data for all fields and setting as constant. With this equalized spot detection, we are able to quantitate the FISH spots in each cell with high confidence, and follow the dynamic down regulation of Oct4 levels during differentiation of hES cells (Figure 5.5). This approach, which allows tiling of individual mRNA signals over multiple fields of view, allows large-scale analysis of heterogeneity in Oct4 levels of individual stem cells. 58 6 70 20 X1 Chapter 6 Results and discussion We analyzed quantitative relationships of individual cell properties to their location within the colony and colony-wide features such as size, shape, and texture. These properties include cell cycle state, expression of pluripotency markers such as Oct4 and Nanog, and shape and size of the individual nuclei. In addition, we used this tool to examine relationship of cell cycle variation as a function of colony properties and cellular position. 6.1 Cell cycle heterogeneity as a function of location within colonies Cell cycle regulation and pluripotency are closely linked; changes in pluripotency are coupled to changes in cell cycle progression [61]. We used our platform to investigate whether cell cycle varies as a function of cellular lo59 A B C - 031 0 w00 0 aosG1 7S0 G2/M 1%0 2iSO 3(O Integrated DAPI Intensity 0 O1s35 . '(.s (114 Mean Nanog Intensity Figure 6.1: (A) Density scatter plot of integrated DAPI intensity versus mean EdU intensity. Three gates were chosen to categorize cells based on their cell cycle state: G1, S, and G2/M. (B) Normalized histograms of mean Nanog intensity for the G1 (red), S (magenta), and G2/M (blue) subpopulations. Adapted from Gorman et al. [1] 60 cation within a colony. 80 CHB8 hES colonies across two coverslips were imaged for DAPI, EdU (S phase), phospho-H3 (M phase), and the pluripotency marker Nanog. Taking the processed data, we generated a scattered density plot of single-cell values of mean intensity of the EdU stain versus integrated intensity of DAPI and categorized cells into GI, S, and G2/M subpopulations corresponding to their progression through the cell cycle (Figure 6. 1A). The mean Nanog intensity distribution of these subpopulations showed that the GI subpopulation was enriched for cells with low Nanog levels. Approximately 10% of GI cells were Nanog negative, versus just 1% of S and G2/M cells (Figure 6.1B). This finding is consistent with the observation that pluripotent cells transition through GI phase quickly, and transition more slowly as they differentiate [40]. One of the features derived from our informatics pipeline is the distance of a cell from the edge of the colony it resides in. Using the distance transform illustrated in Figure 6.2, cells outside or peripheral to the colony have a distance of 0 from the edge of the colony, and the measurement increases from 0 to a maximum value equal to the shortest radial distance from the center of the colony to the edge. Our hypothesis for applying this particular transform is that cells on the outer edges of colonies experience a similar set of stimuli such that grouping them in this fashion is reasonable. First, we partitioned the data set into intervals along the distance-fromedge axis with approximately equal number of cells in each bin. Logical gates were specified for cells in the GI, S, G2, and M subpopulations, as well as 61 Figure 6.2: Illustration of the distance transform applied to cells in all colonies. Outside of the boundaries of the colony, the distance from edge is '0', and within the border of the colony, the distance ranges between 0 (blue) and the maximum radius of the colony (red). Adapted from Gorman et al. [1] for each distance interval. The frequency of the G1, S, G2, and M subpopulations belonging to each distance interval was calculated. For each distance interval, we then resampled the data (with replacement) from the population 1000 times, and calculated the G1, S, and G2/M frequencies at that interval. Lilifors and Jaque-Bera tests were used to verify that the resampled frequencies had a normal distribution, whose mean was approximately equal to the population frequency. Confidence intervals were calculated from 95% of the bootstrap samples. We uncovered an interesting pattern (Figure 6.3): the frequency of G1, S, G2, and M subpopulations was largely stable over the span of cellular distances, except at the periphery of the colony (distance of 0), where GI 62 0.1 1 2G3 0 ClLaes50 100 150 3 1 2 Distance from edge (urn) Figure 6.3: Frequencies of cells as a function of distance from edge for the Gi (red), S (magenta), G2 (blue), and M (green) subpopulations. Error bars indicate the 95% confidence interval based on 1000 bootstrap samples. Abscissa tick marks indicate the distance intervals from which the population and bootstrap frequencies were calculated. Each distance interval contains approximately 1500 cells. The data points are located halfway in between each interval. The frequencies at each point add up to 1. Of note is the statistically significant peak in the G2/M subpopulation at 25 microns. Adapted from Gorman et al. [1] 63 A B :: G G2 - 0.5 0.4.3 0.3- i0.20.10 Peiphery Cell Layer 1 Cell Layer 2 Cell Layer 3 Figure 6.4: (A) A cellular map illustrating the periphery (blue), Cell Layer 1 (green), Cell Layer 2 (red), and Cell Layer 3 (blue) subpopulations. (B) Distributions of cell cycle phases for each cell layer. There is a significant enrichment of G2 cells in Cell Layer 1 relative to Cell Layers 2 and 3. Adapted from Gorman et al. [1] 64 450-- Pephery Can Layr I -C Layer 2 -- C*d Layer 3 400350 300 250 L 200 150 100 50 0 0.02 0.04 0.08 0.06 Moan Nanog hitensity 0.1 0.12 0.14 Figure 6.5: Histogram distribution of mean Nanog intensity of cells on periphery (green), cell layer 1 (yellow), cell layer 2 (cyan), and cell layer 3 (black) of the colonies. Adapted from Gorman et al. [1] cells were enriched; and within the first cell layer from the edge, where G2/M cells were enriched. We then gated on the peripheral population with the GI peak (Gate Periphery), and the populations within a distance of 1-3 cell diameters (Cell Layers 1-3). Figure 6.4A showed that these gates correctly classified cells into the desired subpopulations. The peripheral subpopulation displayed a substantially enriched G1 peak (Figure 6.4B) as well as substantially enriched Nanog negative levels of nearly 63% (Figure 6.5 and Table 6.1). In Cell Layer 1, we found a significant enrichment of the G2 population compared to Cell Layers 2-3. However, M phase cells were not enriched in Cell Layer 1, possibly suggesting a G2/M block. Nearly 100% of cells were Nanog positive in Cell Layers 1-3. To check whether this was primarily a phenomenon of cells belong to 65 Periphery Nanog Neg. Nanog Pos. Total Layer 1 Nanog Neg. Nanog Pos. Layer 2 Layer 3 Total Nanog Nanog Total Nanog Nanog Total Neg. Pos. Neg. Pos. # cells 1867 1421 3288 13 4764 4777 6 4374 4380 0 4456 4456 percent 56.78% 43.22% 0.27% 99.73% 0.14% 99.86% 0.00% 100.00% Table 6.1: Differential spatial distribution of Nanog positive and negative cells in hES colonies. colonies of a certain size, we divided cells into groups on the basis of they belonged to a small, medium, or large-sized colony. We then similarly assessed the cell cycle percentages of those different colony types in each of the identified Cell Layers. We found that cells of each colony size were enriched in S-phase in Cell Layer 1 relative to Cell Layers 2 and 3, though the enrichment was slightly stronger in small colonies (Figure 6.6), probably a result of rapid proliferation in particular for small colonies. The enrichment of G2 phase cells in cell layer 1 is similarly present in all small, medium, and large colonies. This analysis demonstrated the utility of the system in analyzing inter-colony heterogeneity. Further understanding of the biological meaning of the enrichment of G2 phase cells at colony edge would require use of other methodologies, and measurement across different hES cell lines and culture conditions. However, 66 A 90 Histogram of cellular distance from edge - -6- 8000 -'- Ceab in Smel Colonis (N = 12009) Cel in Medium Colnies (N = 59612) Cells in Lwpg Colonis (N =69662) 7000 S6000 04000 3000 2000 10 Distance from edge (um) B G1 G1S MG2 OA6 - - CL 0.5 0.5 e04 C 0 02- U0 small med. large Cell Layer 1 small med. large Cell Layer 2 small med. large Cell Layer 3 Figure 6.6: Inter-colony heterogeneity of cell cycle distribution. (A) Histogram of cellular distance from edge in cells belonging to differently-sized colonies. The maximum cellular distance from the edge of the colony to the center was used to separate colonies into small (< 150 pm), medium (150 - 300 pm) and large (> 300 pm) sizes. (B) Distributions of cell cycle phases for each cell layer in small, medium, and large-sized colonies. The inter-colony heterogeneity is not significant except for a slight enrichment in S-phase cells in Cell Layer 1 of small colonies. Error bars represent 95% of bootstrap samples. Adapted from Gorman et al. [1] 67 it is a useful demonstration of the applicability of the tool we have described herein. Cellular decision-making is inherently stochastic and influenced by the cellular micro-environment, necessitating a high-throughput system capable of measuring and abstracting those relationships. 6.2 Monitoring differentiation progress through texture analysis The work of Mangoubi et al [52] demonstrated that texture analysis could be used to distinguish between different differentiated tissue types and different stages of differentiation, but we were interested in extending that work to see if changes in texture could be distinguishable at at much finer gradations in time scale (hours vs. days), such that the progress of a differentiation protocol could be monitored noninvasively, correlating macro-scale behaviors of cells in culture with their underlying physiological state. We differentiated Hi hES cells with retinoic acid-based culture media. Cells were imaged at lOx magnification with phase contrast microscopy over seven 1-hour intervals. An example of one such colony changing in morphology over time is illustrated in Figure 6.7. A mask segmenting the colony was determined at each time point. A grid was then arbitrarily imposed over the colony space, and only boxes within that space were analyzed as described in Section 5.2. Figure 6.8 contains the pairwise Kullback-Liebler Divergence (KLD) values between all patches at every time point. The di68 Figure 6.7: Quantifying colony morphology change upon addition of retinoic acid to culture media. A square grid is superimposed onto colonies, dividing them into boxes, which are then processed. All boxes not completely contained within the colony borders are excluded. agonal (upper-left to bottom-right) contains pairwise comparisons between patches within the same time point, and the off-diagonal contains pairwise comparisons at different time points. The progression of textures becoming more dissimilar over time is demonstrated with generally higher KLD values in the off-diagonals. We then defined a series of distributions, fxy, which is the set of pairwise KLD values between all patches in time points X and Y. Figure 6.9A plots the series of distributions relative to time point 0, foo through fo7, and demonstrates that the distribution shifts to increasing median KLD and variance over time, values of which are tabulated in Figure 6.9B. Figure 6.9C shows the results of Kolmogorov-Smirnov hypothesis tests, demonstrating that the fox distributions are significantly different from one another at the p = .001 significance threshold. 69 tO ti t2 t3 t8 t5 t4 t7 to t1 1 100 t3 wo t4 ,0 t5 6a o OA G7 02 1100 11M 200 10 AMP 00 600 7W0 Soo 300 MW0 I100 Figure 6.8: Pairwise Kullback-Liebler Divergence (KLD) values between all patches at every time point, 0-7 hours of differentiation. 70 A ".7 f : time 0 vs. time 0 -VI2 05 ID7 0.3 time 0 0.2f: vs. time 7 Kulbac-L&Ibw Divgence B fOO fOl f02 f03 f04 f05 f06 f07 Median IVarian ce 1 .12 0.90 1 .48 1.20 N 1540 4760 1.25 2 .61 5824 1.24 1. 1.5C 1.7d 1.61 2 .30 3 .25 5 .09 5 .05 4 .07 7728 8568 10248 12040 14168 %ed:St*.of cUs~ilcuA. p' 001A Figure 6.9: Comparing distributions of KLD values. (A) Distributions of time 0 vs. time x KLD values. (B) Table of median and variance values for the distributions of KLD values. (C) Pairwise hypothesis tests between distributions demonstrate significant differences at the p = .001 threshold. 71 6.3 Regional heterogeneity in cellular response to DNA damage Liu et al. previously reported that cellular differentiation state has a profound impact on DNA damage regulation [62]. Undifferentiated cells are highly sensitive to DNA damage, and will undergo apoptosis more readily. Differentiated cells, by contrast, are less sensitive to DNA damage. We hypothesized that colony-level heterogeneity may play a role in determining cell fate in response to DNA damage induction. To test this, we differentiated cells with retinoic acid (RA) for 0, 2 or 4 days. As expected, cells treated with RA exhibited the anticipated reduction in Oct4 (a pluripotency regulatory factor) and increased proportion of cells in GI phase (Figure 6.10). After RA differentiation, wells were treated with neocarzinostatin (NCS), a radiomimetic drug that induces DNA double-strand breaks, for 0, 1, or 2 hours, and stained with cPARP antibodies to detect cells' apoptotic response. Looking at the colony-level behavior of the cells within a single condition (after 1 day of RA differentiation), we found that instead of cells in all colonies responding in a uniform fashion to the external signals, there were clear regional behaviors from one colony to another and sometimes different patches within the same colony. Figure 6.11A-6.11C is an example of one such colony we observed. This colony was a mosaic of undifferentiated and differentiated patches. Moreover, the undifferentiated patches were positive for Oct4 and more sensitive to DNA damage (had greater expression 72 A Loss of Oct4 after RA diff 354 Change in cell cycle after RA 1085 472 0 B Black: Od RA Light Gray: 1d RA Dark Gray: 2d RA Black: Od RA Light Gray: 1d RA Dark Gray: 2d RA 814 236 543 118 271 DAPI (au) Oct4 (au) Figure 6.10: Phenotypic changes of hES cells during retinoic acid-induced differentiation. Oct4 level goes down (A) and more cells are in Gi-phase (B). Adapted from Gorman et al. [1] of cPARP), while the differentiated patches were negative for Oct4 and less sensitive to DNA damage (lower cPARP). In addition, we observed that the differentiated patches generally had a higher cellular density (possibly early stages of embryoid body formation), suggesting that gating cells according to cell density might reveal regional differences in expression of cPARP. The cell density classifier from our image analysis tool did a reasonable job at separating cells into subpopulations. The high-density cell regions had a lower Oct4 distribution than the low-density cell regions (Figure 6.12), and the intermediate, or mixed subpopulation had Oct4 enrichment in between the two. Furthermore, the cPARP distribution was plotted and the cPARP positive subpopulation was gated (Figure 6.13). As expected, the lower density (undifferentiated subpopulations) had a greater proportion of 73 Figure 6.11: Regional heterogeneity in response to DNA damage. (A-C) In order to investigate regional heterogeneity, colonies of differentiated and NCS treated cells was computationally divided into windows which were then classified according to the density of cells contained. As demonstrated, high cell density regions (purple) tend to hqv* low Oct4 (B), and reduced induction of cPARP (C) upon DNA damage, than low (blue) and mixed (red) density regions. Adapted from Gorman et al. [1] Oct4 distribution 2281 - 171 - 0 C 114 -- High density Mixe d density Low density Total S7 0i 0.02M 0.0575 0.0863 0.115 Oct4 (au) Figure 6.12: Histogram of Oct4 distribution of the cells in the above classified subregions. Black line shows the distribution of all cells. Adapted from Gorman et al. [1] cPARP+ cells (5%) than the mixed (2%) and higher density subpopulation (1%). Taken together, these results again underscore the utility of our tool to employ regional analysis of single-cell measurements that can reveal new physiological relationships. 6.4 High resolution quantitative analysis of early endoderm differentiation With the equalized spot detection method described in Section 5.4, we were able to quantitate the FISH spots in each cell with high confidence, and follow the dynamic down-regulation of Oct4 levels during differentiation of hES cells (Figure 6.14). This approach, which allows tiling of individual mRNA signals 75 100% - 80% - Proportion of cells Oct4+ B 6% - A Proportion of cells cPARP+ - 40% - - 4% 60% 0% 0% - Low Mixed High I Low Mixed . 20% - 2% High Figure 6.13: Single-cellular levels of Oct4 and cPARP in low density, mixed density, and high density subpopulations. (A) Proportion of Oct4 positive cells in each subpopulation. (B) Proportion of cPARP positive cells. Adapted from Gorman et al. [1] 76 1400 n C. 0 700 0 500 Soo CL I 250 0 Day 0 Day 1 Day 2 Figure 6.14: Reconstruction of Oct4 mRNA levels across 6 contiguous fields of view using the spot matching procedure detailed in the main text. Each circle represents an individual cell and circle color denotes the expression level of Oct4. Overall, Oct4 levels diminish through differentiation, although small clusters of cells with elevated Oct4 levels can be observed after 2 days of differentiation. Adapted from Gorman et al. [1] over multiple fields of view, allows large-scale analysis of heterogeneity in Oct4 levels of individual stem cells. In order to better understand the dynamics of differentiation heterogeneity, we conducted an experiment where we delayed the induction of early endoderm differentiation by 12 hours in one sample and measured the changes in Oct4 and Sox17 transcripts over time, relative to the control sample, which followed the standard protocol. We observed a 'drift' in Oct4 levels on Day 1 in the delayed induction, compared to the immediate induction (Figure 6.15). The delayed induction corresponded with poorer differentiation, i.e., 77 Immediate 0. 10 * 0 Delayed 10 Day 0 Day I 10 - C. 2 S 4 0 10 E 0.6 Day 2 U. z S * . CL , :. 'go 101 10 10 1012 10 10 10 10 Number of SOX17 transcripts Figure 6.15: Sox17 vs Oct4 transcript counts in immediate vs. delayed early endoderm differentiation. The delayed induction corresponded with poorer differentiation (more heterogeneity in Sox17). more heterogeneity in Sox17 at Day 2. These preliminary data may hint that the change in Oct4 level measured in the delayed sample causes bias in lineage selection, a hypothesis we are currently working on testing further. 78 Chapter 7 Conclusion 7.1 Discussion We present a bioimage informatics platform designed to rapidly assay single pluripotent stem cell behavior in the context of their colony microenvironment. Other individuals in our lab with varying computer backgrounds were able to use the software successfully. The platform enables integrated imaging and analysis of pluripotent stem cell colonies and their constituent cells through manual or automated means. An integrated and portable data pipeline measures single-cell properties and places them within a hierarchical data structure corresponding to different scales of cellular micro-environment, in particular their spatial properties. Unlike most available commercial systems, where images are first stitched into a single large image before processed - drastically increasing the computational and memory requirements - our 79 system creates consensus segmentation label matrices at the analysis stage, after applying segmentation algorithms to the individual fields. Subsequently, we demonstrated the utility of this platform to analyze both inter- and intra-colony heterogeneity of human pluripotent stem cells. We found a link between cellular location within the colony and cell cycle regulation and pluripotency marker expression, which potentially signifies a novel finding that cells on the outer edge of the colony have slightly greater mitotic activity and contribute relatively more to colony growth. Additionally, we looked at regional heterogeneity of colonies varying by local cell density. When we separated subpopulations of cells belonging to different regional density classes, we found a stronger correlation between differentiation and DNA damage response than we had observed by simply aggregating all cells in the population. Discovery of these relationships would not be possible without in situ imaging and analysis of many colonies simultaneously. With the ability to precisely quantify single-cell gene expression across many colonies, this approach may be used to discover subtle relationships between cells that explain the observed heterogeneity. For example, non-cell autonomous signaling between cells may generate certain expression patterns and spatial configurations within and between colonies. A better understanding of these factors will help guide perturbation strategies to control directed differentiation of pluripotent cells. 80 7.2 Suggestions for future work Below, we detail specific suggestions for future work improving upon and extending the methods developed in this study. 7.2.1 Improving and streamlining program operation The program source code is freely available to the public online at https: //github. com/bgorm/lerou-stemcell and will be incrementally improved over time. However, there are a number of aspects where the software can still be improved in usability and portability. First, it runs on MATLAB and thus requires a MATLAB installation on the microscope computer. Given the expense of MATLAB licenses and the general difficulty that MATLAB poses to non-technical users, enabling the program to run independently of MATLAB by compiling in into an executable binary would potentially increase access to the software. Second, the imaging and analysis steps could be further coupled together, and a better interface for using the analysis portion could be devised. Finally, adding additional experimental metadata to the images and image folders, and indexing that metadata to enable largerscale experimental comparisons would be useful. 7.2.2 Extension to other tissue types We have created a tool for advanced tissue analysis that enables one to accurately measure cellular relationships over multiple length scales and res81 olutions of tissue morphology. In situ approaches are needed because they do not destroy the non-cell autonomous context of individual cells. This tool could be further applied to analyze other types of heterogeneous tissue, e.g. cancer cell aggregates. Further applications include analysis of heterogeneity at the transcriptional level and protein level during cell fate changes, and using data collected from live-cell imaging prior to fixation - thus establishing a framework for generation of high-quality data for modeling cellular decision-making in heterogeneous tissues. 7.2.3 Modeling hES colony formation Human embryonic stem cell colonies generally grow radially outward in a flat monolayer. However, in larger, more advanced colonies, cells in the center begin to grow on top of each other. Presently, it is unclear what factors account for the unique growth pattern of hES cell colonies, and to what extent this patterning is related to heterogeneity. Those factors may include migration of cells from the center outward and variable mitosis rates of heterogenous regions. Using data derived from our tool, combined with other software developed for tracking live cells with fluorescent reporters, the cell motility forces that interact in the growth of colonies can be measured and quantified. In order to aid individual cell tracking within a dense embryonic stem cell colony, a dilution of reporter-positive to reporter-negative cells could possibly be used. Using H2B-GFP or FUCCI cell lines [63] one can track individual 82 cell trajectories over time, within a growing colony and recording the timing of cell divisions and the identity of the progeny. Mitoses can be detected manually, or using machine-learning-based classification methods described in the literature [64]. Additionally, texture-based level-set segmentation algorithms, such as those developed by Lowry and Mangoubi [49], can be usefully applied to segment the colony area from the background, a generally difficult task. This data could then be used to investigate mechanics underlying colony formation by generating an increasingly complex series of models, and by comparing them to live-cell data obtained using the methods described above. The simplest model may involve only cellular division and cellular diffusion, based on the assumption that cells are uniformly dividing a with random distribution of division times based on an empirically-determined average of 16 hours per cell division, and that cells are randomly walking throughout the colony at an empirically determined diffusion rate. An error function based on differences in size and spatial density between measured and modeled colony growth can be integrated over the area of the colony to define a goodness-of-fit. Subsequently, cellular motility factors that assume intercellular forces are greatest when the colony is most dense, and that these intercellular forces drive cellular migration in a non-random direction, may be introduced. Then, mitosis rates that vary as a function of cellular density may be considered. The goodness-of-fit obtained from the additional mechanism would then be compared to the simpler model, to determine whether 83 the data is explained substantially better. Should explanatory value significantly increase, it would imply that cells experiencing the highest density are stimulated to express a molecular marker inhibiting cell cycle division, potentially driving them to an alternate cell fate. 7.2.4 Modeling in vitro differentiation patterning We have observed that early-stage definitive endoderm (DE) differentiation of hES cells results in a pattern formation with pockets of Oct4-positive (presumably undifferentiated) cells surrounded by differentiated cells. Though we hypothesize that this phenomenon may be due to a acquired resistance to differentiation that is then inherited by cellular progeny, it is not currently understood how this patterning evolves in culture. In order to elucidate the mechanisms underlying this patterning, modeling may be necessary. One possible modeling methodology would discretize time and/or space, such that single cells are associated with a specific location in space and have properties such as their position in the cell cycle, and each an associated probability of differentiating, dividing, and moving from one location to another within a certain time interval. The initial state of a population would be derived from imaging a culture based on a defined starting condition. Such modeling would potentially help confirm a mechanistic explanation for the observed spatial configurations. Our method can then be extended to tracking livecell reporters pre-fixation and linking that cellular data to the post-fixation smFISH. 84 Furthermore, it is possible that patterning cells with constrained geometry or volume may drive cells to a particular fate. Previous efforts have constrained geometry such that cells are not allowed to expand outwards [32]. It would be interesting to see how cells respond to being intially placed in geometries but then being allowed to expand. Finally, microfluidic gradient generators [65] could also be used to investigate the impact of limited vol- ume on cellular differentiation, with and without flow through, and with and without gradients of specific factors. 85 Bibliography [1] Gorman, B. R. et al. Multi-scale imaging and informatics pipeline for in situ pluripotent stem cell analysis. PloS One 9, e116037 (2014). [2] Thomson, J. A. et al. Embryonic stem cell lines derived from human blastocysts. Science (New York, N. Y.) 282, 1145-1147 (1998). [3] Daley, G. Q. Stem cells: roadmap to the clinic. The Journal of Clinical Investigation 120, 8-10 (2010). [4] Takahashi, K. & Yamanaka, S. Induction of pluripotent stem cells from mouse embryonic and adult fibroblast cultures by defined factors. Cell 126, 663-676 (2006). [5] Takahashi, K. et al. Induction of pluripotent stem cells from adult human fibroblasts by defined factors. Cell 131, 861-872 (2007). [6] Lerou, P. H. & Daley, G. Q. Therapeutic potential of embryonic stem cells. Blood Reviews 19, 321-331 (2005). 86 [7] Masaki, H. et al. Heterogeneity of pluripotent marker gene expression in colonies generated in human iPS cell induction culture. Stem Cell Research 1, 105-115 (2007). URL http://www.ncbi.nlm.nih.gov/ pubmed/19383391. [8] Osorno, R. & Chambers, I. Transcription factor heterogeneity and epiblast pluripotency. Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences 366, 2230-2237 (2011). URL http://www.ncbi.nlm.nih.gov/pubmed/21727128. [9] Martinez Arias, A. & Brickman, J. M. Gene expression heterogeneities in embryonic stem cell populations: origin and function. Current Opinion in Cell Biology (2011). [10] Graf, T. & Stadtfeld, M. Heterogeneity of embryonic and adult stem cells. Cell Stem Cell 3, 480-483 (2008). [11] Drukker, M. et al. Isolation of primitive endoderm, mesoderm, vascular endothelial and trophoblast progenitors from human pluripotent stem cells. Nature biotechnology (2012). [12] Huang, S. Cell lineage determination in state space: a systems view brings flexibility to dogmatic canonical rules. PLoS Biology 8, e1000380 (2010). URL http: //www.ncbi.nim.nih.gov/pubmed/20520792. 87 [13] Underhill, G. H. & Bhatia, S. N. High-throughput analysis of signals regulating stem cell fate and function. Current Opinion in Chemical Biology 11, 357-366 (2007). [14] Wong, R. C., Pera, M. F. & Pebay, A. Role of gap junctions in embryonic and somatic stem cells. Stem Cell Rev 4, 283-92 (2008). [15] Blagovic, K., Kim, L. Y. & Voldman, J. Microfluidic perfusion for regulating diffusible signaling in stem cells. PloS One 6 (2011). [16] Wang, X., Lin, G., Martins-Taylor, K., Zeng, H. & Xu, R. H. Inhibition of caspase-mediated anoikis is critical for basic fibroblast growth factorsustained culture of human pluripotent stem cells. J Biol Chem 284, 34054-64 (2009). [17] Stewart, M. H., Bendall, S. C. & Bhatia, M. Deconstructing human embryonic stem cell cultures: niche regulation of self-renewal and pluripotency. Journal of Molecular Medicine (Berlin, Germany) 86, 875-886 (2008). URL http: //www.ncbi.n1m.nih.gov/pubmed/18521556. [18] Bauwens, C. L. et al. Control of human embryonic stem cell colony and aggregate size heterogeneity influences differentiation trajectories. Stem Cells (Dayton, Ohio) 26, 2300-2310 (2008). URL http://www.ncbi. nlm.nih.gov/pubmed/18583540. [19] Moogk, D., Stewart, M., Gamble, D., Bhatia, M. & Jervis, E. Human esc colony formation is dependent on interplay between self-renewing 88 hescs and unique precursors responsible for niche generation. Cytometry A 77, 321-7 (2010). Moogk, Duane Stewart, Morag Gamble, Darik Bhatia, Mickie Jervis, Eric eng 2010/03/11 06:00 Cytometry A. 2010 Apr;77(4):321-7. doi: 10.1002/cyto.a.20878. [20] Snijder, B. et al. Population context determines cell-to-cell variability in endocytosis and virus infection. Nature 461, 520-523 (2009). [21] Snijder, B. et al. Single-cell analysis of population context advances rnai screening at multiple levels. Mol Syst Biol 8, 579 (2012). [22] Wagers, A. J. The stem cell niche in regenerative medicine. Cell stem cell 10, 362-369 (2012). URL http://www.ncbi.nlm.nih.gov/pubmed/ 22482502. [23] Giobbe, G. G. et al. Confined 3d microenvironment regulates early differentiation in human pluripotent stem cells. Biotechnology and bioengineering (2012). URL http://www.ncbi.nlm.nih.gov/pubmed/ 22674472. [24] Lahav, G. et al. Dynamics of the p53-mdm2 feedback loop in individual cells. Nature Genetics 36, 147-150 (2004). URL http://www.ncbi. nlm.nih.gov/pubmed/14730303. [25] Sakaue-Sawano, A. et al. Visualizing spatiotemporal dynamics of multicellular cell-cycle progression. Cell 132, 487-98 (2008). 89 [26] Li, K. et al. Cell population tracking and lineage construction with spatiotemporal context. Medical Image Analysis 12, 546-566 (2008). [27] Padfield, D., Rittscher, J., Thomas, N. & Roysam, B. Spatio-temporal cell cycle phase analysis using level sets and fast marching methods. Medical Image Analysis 13, 143-155 (2009). [28] Kafri, R. et al. Dynamics extracted from fixed cells reveal feedback linking cell growth to cell cycle. Nature 494, 480-3 (2013). [29] Paduano, V., Tagliaferri, D., Falco, G. & Ceccarelli, M. Automated identification and location analysis of marked stem cells colonies in optical microscopy images. PLoS One 8, e80776 (2013). [30] Scherf, N. et al. Imaging, quantification and visualization of spatiotemporal patterning in mesc colonies under different culture conditions. Bioinformatics 28, i556-i561 (2012). [31] Nazareth, E. J. P. et al. High-throughput fingerprinting of human pluripotent stem cell fate responses and lineage bias. Nature methods 10, 1225-1231 (2013). [32] Warmflash, A., Sorre, B., Etoc, F., Siggia, E. D. & Brivanlou, A. H. A method to recapitulate early embryonic spatial patterning in human embryonic stem cells. Nature Methods 11, 847-854 (2014). URL http: //www. nature. com/nmeth/journal/vll/n8/full/nmeth. 3016 .html. 90 [33] Loh, K. et al. Efficient endoderm induction from human pluripotent stem cells by logically directing signals controlling lineage bifurcations. Cell Stem Cell 14, 237-252 (2014). URL http: //www. sciencedirect. com/science/article/pii/S1934590913005560. [34] Hong, S.-H. et al. Cell fate potential of human pluripotent stem cells is encoded by histone modifications. Cell Stem Cell 9, 24-36 (2011). [35] Huang, S. Non-genetic heterogeneity of cells in development: more than just noise. Development (Cambridge, England) 136, 3853-3862 (2009). [36] Reynolds, N. et al. Nurd suppresses pluripotency gene expression to promote transcriptional heterogeneity and lineage commitment. Cell Stem Cell 10, 583-94 (2012). Reynolds, Nicola Latos, Paulina Hynes-Allen, Antony Loos, Remco Leaford, Donna O'Shaughnessy, Aoife Mosaku, Olukunbi Signolet, Jason Brennecke, Philip Kalkan, Tuzer Costello, Ita Humphreys, Peter Mansfield, William Naka- gawa, Kentaro Strouboulis, John Behrens, Axel Bertone, Paul Hendrich, Brian eng 079249/Wellcome Trust/United Kingdom 098021/Wellcome Trust/United Kingdom G0600275/Medical Research Coun- cil/United Kingdom G0800784/Medical Research Council/United Kingdom G1000847/Medical Research Council/United Kingdom Biotechnology and Biological Sciences Research Council/United Kingdom Cancer Research UK/United Kingdom Medical Research Council/United Kingdom Wellcome Trust/United Kingdom Research Support, Non-U.S. 91 Gov't 2012/05/09 06:00 Cell Stem Cell. 2012 May 4;10(5):583-94. doi: 10. 1016/j.stem.2012.02.020. [37] Allegrucci, C. & Young, L. E. Differences between human embryonic stem cell lines. Human Reproduction Update 13, 103-120 (2007). URL http://www.ncbi.nlm.nih.gov/pubmed/16936306. [38] Bock, C. et al. Reference maps of human ES and iPS cell variation enable high-throughput characterization of pluripotent cell lines. Cell 144, 439-452 (2011). URL http://www.ncbi.nlm.nih.gov/pubmed/ 21295703. [39] Momeilovid, 0., Navara, C. & Schatten, G. Cell cycle adaptations and maintenance of genomic integrity in embryonic stem cells and induced pluripotent stem cells. Results and Problems in Cell Differentiation 53, 415-458 (2011). [40] Filipczyk, A. A., Laslett, A. L., Mummery, C. & Pera, M. F. Differentiation is coupled to changes in the cell cycle regulatory apparatus of human embryonic stem cells. Stem Cell Res 1, 45-60 (2007). [41] Ballabeni, A. et al. Cell cycle adaptations of embryonic stem cells. Proceedings of the National Academy of Sciences of the United States of America 108, 19252-19257 (2011). 92 [42] Li, V. C., Ballabeni, A. & Kirschner, M. W. Gap 1 phase length and mouse embryonic stem cell self-renewal. Proceedings of the National Academy of Sciences of the United States of America (2012). [43] Calder, A. et al. Lengthened gi phase indicates differentiation status in human embryonic stem cells. Stem Cells and Development (2012). [44] Jin, Q., Duggan, R., Dasa, S. S. K., Li, F. & Chen, L. Random mitotic activities across human embryonic stem cell colonies. Stem Cells and Development 19, 1241-1248 (2010). [45] Carpenter, A. E. et al. Cellprofiler: image analysis software for identifying and quantifying cell phenotypes. Genome Biology 7 (2006). [46] Kamentsky, L. et al. Improved structure, function, and compatibility for cellprofiler: modular high-throughput image analysis software. Bioinformatics (Oxford, England) (2011). & [47] Raj, A., van den Bogaard, P., Rifkin, S. A., van Oudenaarden, A. Tyagi, S. Imaging individual mrna molecules using multiple singly labeled probes. Nat Methods 5, 877-9 (2008). [48] Otsu, N. A threshold selection method from gray-level histograms. IEEE Transactions on Systems, Man and Cybernetics 9, 62-66 (1979). [49] Lowry, N. C. Bayesian level sets and texture models for image seg- rnentation and classification with application to non-invasive stem cell 93 monitoring. Thesis, Massachusetts Institute of Technology (2013). URL http://dspace.mit.edu/handle/1721.1/82500. Thesis (Sc. D.)-Massachusetts Institute of Technology, Dept. of Aeronautics and Astronautics, 2013. [50] Larson, R. C. & Odoni, A. R. Urban OperationsResearch (Prentice-Hall, Englewood Cliffs, N.J., 1981). [51] Jeffreys, C. G. Support vector machine and parametric wavelet-based texture classification of stem cell images. Thesis (s.m.), Massachusetts Institute of Technology (2004). [52] Mangoubi, R., Jeffreys, C., Copeland, A., Desai, M. & Sammak, P. Non-invasive image based support vector machine classification of human embryonic stem cells. In 2007 4th IEEE International Sympo- sium on Biomedical Imaging: From Nano to Macro, 284-287 (Arlington, VA, USA, 2007). URL http://ieeexplore.ieee.org/lpdocs/ epic03/wrapper . htm?arnumber=4193278. [53] Sammak, P. J. et al. High content analysis of human embryonic stem cell growth and differentiation. In High content screening: science, techniques and applications (John Wiley & Sons, Inc., 2007). URL http://dx.doi.org/10.1002/9780470229866.ch9. [54] Mangoubi, R., Desai, M., Lowry, N. & Sammak, P. Performance evaluation of multiresolution texture analysis of stem cell chromatin. In 2008 94 5th IEEE InternationalSymposium on Biomedical Imaging: From Nano to Macro, 380-383 (Paris, France, 2008). URL http: //ieeexplore. ieee.org/lpdocs/epic03/wrapper.htm?arnumber=4541012. [55] Mangoubi, R., Desai, M. & Sammak, ods in biomedical imaging. In P. Non-gaussian meth- 2008 37th IEEE Applied Im- agery Pattern Recognition Workshop, 1-6 (Washington, DC, USA, 2008). URL http: //ieeexplore. ieee. org/lpdocs/epic3/wrapper. htm?arnumber=4906453. [56] Erb, T. M. et al. Paracrine and epigenetic control of trophectoderm differentiation from human embryonic stem cells: the role of bone morphogenic protein 4 and histone deacetylases. Stem Cells and Devel- opment 20, 1601-1614 (2011). URL http://www.ncbi.nlm.nih.gov/ pubmed/21204619. [57] Lowry, N., Mangoubi, R., Desai, M., Marzouk, Y. & Sammak, P. A unified approach to expectation-maximization and level set segmentation applied to stem cell and brain MRI images. 1446-1450 (IEEE, 2011). URL http: //ieeexplore. ieee. org/lpdocs/epic03/wrapper. htm?arnumber=5872672. [58] Do, M. N. & Vetterli, M. Wavelet-based texture retrieval using generalized gaussian density and kullback-leibler distance. IEEE Transactions on Image Processing 11, 146-158 (2002). 95 [59] Coleman, A. W., Maguire, M. J. & Coleman, J. R. Mithramycin- and 4'6-diamidino-2-phenylindole (DAPI)-DNA staining for fluorescence microspectrophotometric measurement of DNA in nuclei, plastids, and virus particles. Journal of Histochemistry & Cytochemistry 29, 959968 (1981). URL http: //jhc. sagepub. com/content/29/8/959. [60] Tarnowski, B. I. et al. Automatic quantitation of cell growth and determination of mitotic index using DAPI nuclear staining. Pediatric Pathology / Affiliated with the International Paediatric Pathology Association 13, 249-265 (1993). URL http://www.ncbi. nlm.nih. gov/ pubmed/8385326. [61] Maciorowski, Z. et al. Comparison of fixation procedures for fluorescent quantitation of dna content using image cytometry. Cytometry 28, 1239 (1997). [62] Liu, J. C. et al. High mitochondrial priming sensitizes hESCs to DNAdamage-induced apoptosis. Cell Stem Cell 13, 483-491 (2013). URL http://www.cell.com/article/S1934590913003639/abstract. [63] Sakaue-Sawano, A. et al. Visualizing spatiotemporal dynamics of multicellular cell-cycle progression. URL Cell 132, 487-98 (2008). file: //localhost/Users/paullerou/Document s/Papers/ 2008/Sakaue-Sawano/Cell/Cell%202008%2OSakaue-Sawano. pdf. 18267078; S0092-8674(08)00054-8. 96 [64] Conrad, C. et al. Automatic identification of subcellular phenotypes on human cell arrays. Genome Research 14, 1130-1136 (2004). URL http://www.ncbi.nlm.nih.gov/pubmed/15173118. [65] Gorman, B. R. & Wikswo, J. P. Characterization of transport in microfluidic gradient generators. Microfluidics and Nanofluidics 4, 273285 (2008). URL http://www.springerlink.com/index/10. 1007/ s10404-007-0169-0. 97 Appendix A Wet lab protocols The following experimental protocols used to generate the samples analyzed in this work. A.1 Cell culture The hESCs were maintained on irradiated CF-1 MEFs (GlobalStein) in DMEM/F12 supplemented with 20% KnockOut Serum Replacement (Gibco), 0.1mM 2-mercaptoethanol (Gibco), 13 GlutaMAX (Gibco), non-essential amino acids (Gibco), Beta-mercaptoethanol, and 10 ng/ml FGF2 (Sigma), or in feeder-free conditions on Matrigel in mTeSR medium with 5X supplement (Stem Cell Technologies). Human iPSCs were passaged using 50 U/mL type IV collagenase (Gibco) (for feeder culture) or dispase (for feeder-free culture) approximately every 5-7 days. For imaging experiments, cells were typically 98 plated on Matrigel-coated glass bottom culture dishes (MatTek). A.2 Immunofluorescent protein labeling Cells were rinsed in PBS and fixed for 20 minutes at room temperature in fresh 4% paraformaldehyde (Electron Microscopy Sciences). They were blocked/permeablized for 1.5 hours in a combined fashion in PBS containing 3% BSA and 0.2% Triton X-100. Primary antibodies for Oct4 (Abcam) and Nanog (R&D) were diluted at 1:400 (Oct4) and 1:100 (Nanog) in the same BSA/triton buffer, and applied to the cells overnight at 4*C. We found that placing the dish on a shaker resulted in the most even antibody distribution. The next day, cells were rinsed 3x in PBS and stained for 1.25 hours at room temperature with fluorescent secondary antibodies (Life Technologies) diluted 1:1000 in BSA buffer. They were stained with DAPI (Life Technologies), rinsed, and slide-mounted (if coverslip format was used) using ProLong Gold anti-fade mounting medium (Life Technologies). If cells were grown and stained in a dish, they were covered in lmL PBS during imaging. A.3 Single molecule FISH (smFISH) labeling of gene expression Custom Stellaris mRNA probes were obtained from Biosearch Technologies. The dried oligonucleotide blend was dissolved in 200 pL of TE buffer at 99 pH 8.0 to create a probe stock at a total oligo concentration of 25 11M. Additional working dilutions of the Oct4-Cy5 and Sox17-Cy3 probes were prepared by diluting 1:1 in TE buffer. This stock was diluted 1:100 in 37*C pre-warmed hybridization buffer (10% formamide/10% dextran sulfate in 2xSSC made with nuclease-free water) and added to samples grown on glass coverslips or glass-bottom dishes, that had been previously fixed with fresh 4% paraformaldehyde 10 minutes at room temperature, followed by overnight (minimum) permeablization in cold 75% ethanol at 4"C. A brief (2-5 minute) wash step in 10% formamide/2x SSC buffer was carried out directly prior to adding the probe-containing hybridization solution. Dishes were wrapped tightly in parafilm, blocked from light, and incubated at 37"C for 4 hours. Hybridization buffer was rinsed out in a series of wash steps with the formamide wash buffer (first time quick, second time 30 minutes, third time 45 minutes), all at 37"C. Solution changes were carried out on a metal heat block warmed to 37"C to improve stringency. Finally, the samples were counterstained with DAPI (5 ng/mL in formamide wash buffer) for 12 minutes in the dark and rinsed twice with 2xSSC. They were stored in 2xSSC at 4"C overnight to help remove fluorescent background; then rinsed once in distilled water to prevent salt crystals before slide-mounting (coverslips only) on a small drop of pre-warmed Prolong Gold anti-fade mounting medium (Life Technologies). Dishes, alternatively, were rinsed and imaged after the addition of approximately 200 pL GLOX anti-fade buffer: 10% glucose in water, 1 M Tris-HCl, pH 8.0, 20X SSC, nuclease-free water, glucose oxidase 100 pre-diluted to 3.7 mg/mL stock (Sigma), and Bovine Catalase (Sigma). 101 Appendix B Cellular features list The following table lists the cellular, neighborhood, and colony-based features generated by the tool. 102 Table B.1: Cellular and colony features derived by the data workflow discussed in Chapter 4. Category Feature name Description Colony Location ColonyLocationX ColonyLocationY Location of colony centroid in Cartesian space. Location of colony centroid in Cartesian space. Colony Size and Shape colonyMaxRadius The maximum distance from any point in the colony to the edge. colonyArea The actual number of pixels in the region. colonyPerimeter The distance around the boundary of the region. colonyEquivDiameter The diameter of a circle with the same area as the region. colonyEccentricity The eccentricity of the ellipse that has the same secondmoments as the region. eccentricity is the ratio of the distance between the foci of the ellipse and its major axis length. The value is between 0 and 1, where 0 is the eccentricity of a circle and 1 is the eccentricity of a line segment. colonyMajorAxisLengthi Scalar specifying the length (in pixels) of the major axis of the ellipse that has the same normalized second central moments as the region. colonyMinorAxisLength Scalar; the length (in pixels) of the minor axis of the ellipse that has the same normalized second central moments as the region. colonySolidarity colonyOrientation colonyNumCells Colony Density colonyCellDensity colonyEulerNumber colonyMaxIntensity colonyMeanIntensity colonyMinIntensity LocationCenterX Location-CenterY Cell Location Location Cell based) (colony- Distance-to-edge Distance-to-center Neighborhood Cellular (based on DNA stain) NumberOfNeighbors Scalar specifying the proportion of the pixels in the convex hull that are also in the region. The angle (in degrees ranging from -90 to 90 degrees) between the x-axis and the major axis of the ellipse that has the same second-moments as the region Number of cells in the colony. Number of cells in the colony divided by the area of the colony. The number of objects in the region minus the number of holes in those objects. Maximum intensity of specified stain in colony. Mean intensity of specified stain in colony. Minimum intensity of specified stain in colony. Location of cell centroid in Cartesian space. Location of cell centroid in Cartesian space. Distance from cell to the nearest edge of the colony it belongs to. Distance from cell center to colony center. Number of neighboring nuclei defined by three possible criteria: nuclei adjacent, nuclei adjacent if expanded, nuclei within specified distance. PercentTouching FirstClosestObjectNun FirstClosestDistance SecondClosestObjectNi SecondClosestDistance AngleBetweenNeighbor Local Cellular Density Local Staining Density c~1 Intensity measurement (lx per stain) Percent of the nucleus's boundary pixels that touch neighbors, after nuclei have been expanded to the specified distance. bThe index of the closest object. The distance to the closest object. [rilhfrindex of the second closest object. The distance to the second closest object. 3 The angle formed with the object center as the vertex and the first and second closest object centers along the vectors. Number of nuclei within a specificed distance, divided by area Amount of stain intensity within a specified distance, divided by area. May be more robust than nuclei/area and ratio of local cellular density/local staining density could help identify breakdown in segmentation due to clumped cells. IntegratedIntensity The sum of the pixel intensities within an object. MeanIntensity StdIntensity The average pixel intensity within an object. The standard deviation of the pixel intensities within an object. The maximal pixel intensity within an object. The minimal pixel intensity within an object. MaxIntensity MinIntensity Rescaled DNA Intensity Nucleus Size and Shape (based on DNA stain) IntegratedlntensityEdgb The sum of the edge pixel intensities of an object. The average edge pixel intensity of an object. MeanlntensityEdge The standard deviation of the edge pixel intensities of StdlntensityEdge an object. The maximal edge pixel intensity of an object. MaxIntensityEdge The minimal edge pixel intensity of an object. MinIntensityEdge The distance between the centers of gravity in the grayMassDisplacement level representation of the object and the binary representation of the object. LowerQuartileIntensity The intensity value of the pixel for which 25% of the pixels in the object have lower values. The median intensity value within the object. MedianIntensity UpperQuartileIntensity The intensity value of the pixel for which 75% of the pixels in the object have lower values. Integratedlntensityres aldegrated DAPI levels rescaled based on internal colony control. Area The number of pixels in the region. Perimeter The total number of pixels around the boundary of each region in the image. Calculated as 4*?*Area/Perimeter2. Equals 1 for a perfectly circular object. FormFactor Solidity Extent EulerNumber Eccentricity MajorAxisLength -'1 MinorAxisLength Orientation Compactness Nucleus Texture (based on DNA stain, but also could be done Ix per stain) FracAtD The proportion of the pixels in the convex hull that are also in the region. The proportion of the pixels in the bounding box that are also in the region. Computed as the Area divided by the area of the bounding box. The number of objects in the region minus the number of holes in those objects, assuming 8-connectivity. The eccentricity of the ellipse that has the same secondmoments as the region. The eccentricity is the ratio of the distance between the foci of the ellipse and its major axis length. The value is between 0 and 1. The length (in pixels) of the major axis of the ellipse that has the same normalized second central moments as the region. The length (in pixels) of the minor axis of the ellipse that has the same normalized second central moments as the region. The angle (in degrees ranging from -90 to 90 degrees) between the x-axis and the major axis of the ellipse that has the same second-moments as the region. The variance of the radial distance of the object's pixels from the centroid divided by the area. Fraction of total stain in an object at a given radius. MeanFrac RadialCV Haralick Texture Features 00A H1 H2 H3 H4 H5 H6 H7 H8 H9 H10 H11 H12 H13 Gabor Features Mean fractional intensity at a given radius; calculated as fraction of total intensity normalized by fraction of pixels at a given radius. Coefficient of variation of intensity within a ring, calculated over 8 slices. Haralick texture features are derived from the cooccurrence matrix, which contains information about how image intensities in pixels with a certain position in relation to each other occur together. Angular Second Moment Contrast Correlation Sum of Squares: Variation Inverse Difference Moment Sum Average Sum Variance Sum Entropy Entropy Difference Variance Difference Entropy Information Measure of Correlation 1 Information Measure of Correlation 2 Applies Gabor filters to object image. The Gabor filters measure the frequency content in different orientations, similar to wavelets. Appendix C Software manual C.1 Availability Please visit https: //github. com/bgorm/lerou-stemcell for updated in- formation on usage and access to source code. C.2 Installation Download the MATLAB files into a directory as specified on the GitHub page. Download and install the CellProfiler 2.0 binaries at: http: //www. cellprof iler. org/previousReleases .shtml. Download and install the Micro-Manager 1.4 binaries at: https : //micro-manager. org/wiki/Micro-Manager VersionArchive. Run the configuration wizard in Micro-Manager 1.4 in order to initialize all camera hardware. Then, edit the MATLAB adapter files 109 as specified online to correctly call each microscope function from within MATLAB. C.3 Compatibility The software has been tested and found compatible with all versions of MATLAB 2011a and later, CellProfiler 2.0, and Micro-Manager 1.4. The software was tested on a Windows 7 64-bit computer. C.4 Motivation Human embryonic stem cells (hESCs) exhibit heterogeneity on multiple scales. Many high-magnification imaging applications such as single molecule mRNA FISH (smFISH) of hESCs are sufficiently time-consuming and labor-intensive that it may be difficult to capture enough data to get a sufficiently clear picture of the macro-scale heterogeneity that exists in the cell culture. To address this problem, we built a software program that allows high-magnification imaging within the context of the heterogeneity existing on the dish with smFISH as a working example. C.5 Image Acquisition We utilized a Nikon Ti 2000 microscope, with a Prior ProScan III motorized stage, and a CoolSNAP-HQ2 high resolution CCD camera. Custom scripts 110 were written in MATLAB to control the control the microscope, camera, and stage hardware. The Micro-Manager API was used to control the microscope and camera drivers. The stage was independently controlled with commands issued through the serial port. We constructed a program that allows multi-resolution imaging of embryonic stem cells in different regions across the slide. In order to facilitate usage by people with varying computer backgrounds, we also constructed a GUI to control this program. The main window of the GUI is shown in Figure 3.1. The main GUI window contains UI elements that allow the user to adjust all necessary parameters of the program. This includes parameters controlling the microscope nosepiece (objective lens to be used), the camera (binning, gain, exposure, and region of interest properties), the z-drive (number and interval of z-stack slices), and the program I/O (save directory, notification of completion). The second major element of the program is region selection. In selecting regions, the user has the option of utilizing a lower-magnification prescan imaged with phase light. The prescan sub- GUI is illustrated in Figure 3.2. To generate a prescan, the program firsts asks the user to define the upperleft and lower-right positions of the slide, and then asks the user to enable perfect focus and set it to the correct offset to maintain proper focus, and finally the exposure level. Once the scan is executed, the user is presented with a macroscopic field of view of the entire slide, as shown in Figure 3.3, which the user can then draw rectangles on to select regions. Alternatively, 111 if fluorescent imaging is used, the rectangles outlining colony areas can be drawn automatically using an integrated segmentation algorithm. In order to optimize this to a particular sample, the user adjusts the following four parameters: hsize and sigma (properties of the Gaussian filter), threshold correction factor (a number that is multiplied by the Otsu threshold, such that > 1 is a higher threshold and < 1 is a lower threshold), and the minimum colony size (chosen such that randomly dispersed cells or debris are discarded). Once the user has selected a region by drawing a rectangle around it, the user then adds the selection to a list. Users may see and edit previously selected region by highlighting selections in the textbox. Upon selecting all regions of interest, the user clicks an update button on the main GUI window to transfer the data into the main program for execution. Finally, because the focus level may change across a slide, users have the option of either using a constant focal center, or defining one for each region they select. As the program is scanning the selected regions, tiff stacks are saved in specified directory. Additionally a comma-separated value (CSV) file is updated at each new stage location with the spatial coordinates of the stage. Finally, upon the successful conclusion of the program, a CSV file containing all parameters is written to the save folder. Additionally, a composite array of each region is constructed, by tiling together maximum projections of each tiff stack. 112 C.6 Analysis Further analysis scripts take the output of the image scanning stage and segment cells (using CellProfiler binaries modularly; cellprofiler.org), extract features of interest, detect smFISH spots, and output a nested MATLAB data structure or CSV file containing cellular properties linked with colony properties. An additional script outputs the files to a format that may be used with FCS Express (denovosoftware.com). Alternatively, the analysis files may be used to process arrays of images captured independently of the imaging stage using off-the-shelf software, or other imaging platforms entirely, such as Laser Scanning Cytometry if the files are named appropriately. 113