Multi-scale imaging and informatics pipeline for in situ
pluripotent stem cell analysis
ARCHIVES
MASSACHUSETTS INSTITUTE
OF TECHNOLOLGY
by
Bryan Robert Gorman
APR 14 2015
B.Eng., Vanderbilt University, 2006
M.S., Vanderbilt University, 2007
LIBRARIES
Submitted to the Harvard-MIT Division of Health Sciences and Technology
in partial fulfillment of the requirements for the degree of
DOCTOR OF PHILOSOPHY
at the
MASSACHUSETTS INSTITUTE OF TECHNOLOGY
FEBRUARY 2015
C Massachusetts Institute of Technology 2015. All rights reserved.
...
?
/7
Signature redacted
Author...........................%.
i4irvard-MI
ivision of Health Sciences and Technology
February 2, 2015
Certified by ...................................
.
Signature redacted
-....................
Paul H. Lerou, MD
Assistan t Professor of Pediatrics, Harvard Medical School
Thesis Supervisor
Accepted by......Signature
r edacted
.....................................
Emery N. Brown, MD, PhD
Director, Harvard-MIT Program in Health Sciences and Technology
Professor of Computational Neuroscience and Health Sciences and Technology
Multi-scale imaging and informatics pipeline for in situ
pluripotent stem cell analysis
by
Bryan Robert Gorman
Submitted to the Harvard-MIT Division of Health Sciences and Technology on February 6, 2015,
in partial fulfillment of the requirements for the degree of
Doctor of Philosophy in Bioinformatics and Integrative Genomics
Abstract
Human pluripotent stem (hPS) cells have the ability to reproduce indefinitely and differentiate into any
cell type of the body, making them a potential source of cells for medical therapy and an ideal system to
study fate decisions in early development. However, hPS cells exhibit a high degree of heterogeneity,
which may be an obstacle to their clinical translation. Heterogeneity is at least partially induced as an
artifact of removing the cells from the embryo and culturing them on a plastic dish. hPS cells grow in
spatially patterned colony structures, which necessitates in situ quantitative single-cell image analysis.
This dissertation offers a tool for analyzing the spatial population context of hPS cells that integrates
automated fluorescent microscopy with an analysis pipeline. It enables high-throughput detection of
colonies at low resolution, with single-cellular and sub-cellular analysis at high resolutions, generating
seamless in situ maps of single-cellular data organized by colony. We demonstrate the tool's utility by
analyzing inter- and intra-colony heterogeneity of hPS cell cycle regulation and pluripotency marker
expression. We measured the heterogeneity within individual colonies by analyzing cell cycle as a
function of distance. Cells loosely associated with the outside of the colony are more likely to be in G1,
reflecting a less pluripotent state, while cells within the first pluripotent layer are more likely to be in G2,
possibly reflecting a G2/M block. Our analysis tool can group colony regions into density classes, and
cells belonging to those classes have distinct distributions of pluripotency markers and respond differently
to DNA damage induction. Our platform also enabled noninvasive texture analysis of live hPS colonies,
which was applied to monitoring subtle changes in differentiation state. Lastly, we demonstrate that our
pipeline can robustly handle high-content, high-resolution single molecular mRNA FISH data by using
novel image processing techniques. Overall, the imaging informatics pipeline presented offers a novel
approach to the analysis of hPS cells, which includes not only single cell features but also spatial
configuration across multiple length scales.
Thesis Supervisor: Paul H. Lerou, MD
Title: Assistant Professor of Pediatrics, Harvard Medical School
3
4
Acknowledgements
I wish to thank my thesis advisor, Professor Paul Lerou, for taking me under
his wing and guiding me and supporting this work. I also thank Dr. Rami
Mangoubi, who advised the technical aspects of this project. I am grateful to
my other committee members, Professor Leonid Mirny and Professor Suneet
Agarwal, for tremendous advice and direction. Additionally, I thank Professor Jeremy Purvis at UNC and Dr. Nathan Lowry at Draper Laboratory for
many invaluable discussions.
Thank you to all current and past members of the Lerou Lab: Anna,
Anthony, Faisal, Faith, Hilary, Jeanne, Junjie, Junning, Julia, Promise, Sean,
Sutini, Vishesh, and Xiao. I am especially indebted to Anna Baccei, who
assisted me with all of the projects described in this document by testing
and debugging software, and generating most of the samples that I analyzed;
and Dr. Junjie Lu, who assisted me with editing and revising the manuscript.
Thank you to those that provided financial support for this work, especially the National Institutes of Health through the Bioinformatics and
Integrative Genomics training grant, the Harvard-MIT Division of Health
5
Sciences and Technology (HST), and the HST IDEA-squared program.
I am indebted to Dr.
Julie Greenberg of HST, who went above and
beyond in guiding me through the doctoral process. I am also grateful to the
entire HST support staff, especially Laurie Ward, Traci Anderson, and my
academic advisor, Professor Pete Szolovits.
Most of all, I thank my wife Emily, my daughter Daphne, and my parents,
for everything.
6
Contents
M otivation . . . . . . . . . . . . . . . . . . .
. . . . . . . . . 16
1.2
Related Work ......................
. . . . . . . . . 18
1.3
Approach and contributions . . . . . . . . .
. . . . . . . . . 20
1.4
Organization
. . . . . . . . . 21
.
.
1.1
.
. . . . . . . . . . . . . . . . .
Biological background
25
2.1
Pluripotent stem cells . . . . . . . . . . . . .
. . . . . . . . . 25
2.2
Stem cell heterogeneity . . . . . . . . . . . .
. . . . . . . . . 26
2.3
Cell cycle regulation in pluripotent stem cells
. . . . . . . . . 28
.
2
16
Introduction
.
1
3 Image acquisition
30
H ardware . . . . . . . . . . . . . . . . . . .
30
3.2
Main acquisition graphical user interface . .
30
3.3
Pre-scan graphical user interface . . . . . . .
31
3.4
Manual and automated colony selection . . .
34
.
.
.
.
3.1
7
38
Colony segmentation and identification
. . . . . . 38
4.2
Segmentation of cells and feature extraction
. . . . . . 39
4.3
Laser Scanning Cytometry . . . . . . . . .
. . . . . . 39
4.4
Data visualization with FCS Express . . .
. . . . . . 40
4.5
Construction of a virtual slide . . . . . . .
. . . . . . 40
4.6
Single molecule FISH processing . . . . . .
. . . . . . 41
4.6.1
. . . . . . 45
.
.
.
.
4.1
.
Threshold selection . . . . . . . . .
Data pipeline and informatics
49
Spatial analysis . . . . . . . . . . . . . . .
49
5.2
Texture analysis . . . . . . . . . . . . . . .
51
5.3
DAPI normalization
. . . . . . . . . . . .
52
5.4
Ensuring smFISH spot detection consistency across adjacent
.
5.1
.
5
Image processing: colonies to cells
.
4
fields . . . . . . . . . . . . . . . . . . . . .
.
57
6
Results and discussion
59
6.1
Cell cycle heterogeneity as a function of location within colonies 59
6.2
Monitoring differentiation progress through texture analysis
68
6.3
Regional heterogeneity in cellular response to DNA damage.
72
6.4
High resolution quantitative analysis of early endoderm differ.
entiation . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7
Conclusion
75
79
8
D iscussion . . . . . . . . . . . . . . . . . . . . . . . . . .
79
7.2
Suggestions for future work
. . . . . . . . . . . . . . . .
.
81
.
81
.
7.1
Improving and streamlining program operation
7.2.2
Extension to other tissue types
. . . . . . . . . .
81
7.2.3
Modeling hES colony formation . . . . . . . . . .
82
7.2.4
Modeling in vitro differentiation patterning . . . .
84
.
.
.
.
7.2.1
98
98
A.2 Immunofluorescent protein labeling . . . . . . . . . . . .
99
A.3 Single molecule FISH (smFISH) labeling of gene expression
99
.
A .1 C ell culture . . . . . . . . . . . . . . . . . . . . . . . . .
.
A Wet lab protocols
B Cellular features list
102
C Software manual
109
C. 1 Availability
. . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
C.2 Installation
. . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
C.3 Compatibility . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
C.4 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
C.5 Image Acquisition . . . . . . . . . . . . . . . . . . . . . . . . . 110
C.6 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
9
List of Figures
1.1
Overview of research approach.
1.2
A workflow to obtain image-derived features from single cells,
. . . . . . . . . . . . . . . . . 21
while placing them in the context of the colony they belong to. 22
3.1
The main window of the acquisition graphical user interface. . 32
3.2
The prescan sub-interface. . . . . . . . . . . . . . . . . . . . . 33
3.3
Selection of regions for high-magnification imaging based on a
low-magnification prescan. . . . . . . . . . . . . . . . . . . . . 35
3.4
High-magnification imaging of region selected in Figure 3.3.
Maximum projection images are automatically generated from
z-stacks and titled into a regional array. The array is composed
of 9 tiff stacks in a 3x3 grid. . . . . . . . . . . . . . . . . . . . 36
4.1
Heuristic method for obtaining seamless segmentation fields
using adjacent overlapping regions without stitching the raw
im age data directly. . . . . . . . . . . . . . . . . . . . . . . . . 42
4.2
Quantification of Oct4 mRNA level with smFISH in a hES cell. 43
10
4.3
Segmentation of high resolution stacks of hES cells with nuclear boundaries.
4.4
. . . . . . . . . . . . . . . . . . . . . . . . . 44
Overlay of detected Oct4 transcripts on the segmented cellular
field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 5
4.5
Distribution of mRNA test statistic resulting from repeatedly
re-imaging the same field of view, six times total.
4.6
. . . . . . . 46
Repeated imaging of the same field of view and applying the
same smFISH spot detection threshold to each iteration. The
number of spots detected decreases rapidly. . . . . . . . . . . . 47
4.7
Repeated imaging of the same field and adjusting the smFISH
spot detection threshold such that each iteration has the same
number of detected spots.
detected spots is reduced.
The accuracy of the individual
. . . . . . . . . . . . . . . . . . . . 47
. . . . 50
5.1
Spatial Poisson method for analyzing colony structure.
5.2
Overview of the wavelet-based texture analysis method. . . . . 53
5.3
Illustration of intra- and inter-class texture differences in hES
cells.
The textures of three pluripotent patches and three
mildly differentiated patches were calculated and compared
in a pairwise fashion. Generally, the inter-class KLD was substantially larger than the intra-class KLD.
5.4
. . . . . . . . . . . 54
DAPI normalization method to correct for background variation. 56
11
5.5
Method for setting smFISH spot detection thresholds to ensure spot detection consistency across adjacent fields in a large
region . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
6.1
(A) Density scatter plot of integrated DAPI intensity versus
mean EdU intensity. Three gates were chosen to categorize
cells based on their cell cycle state: G1, S, and G2/M. (B)
Normalized histograms of mean Nanog intensity for the G1,
S, and G2/M subpopulations.
6.2
. . . . . . . . . . . . . . . . . . 60
Illustration of the distance transform applied to cells in all
colonies. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
6.3
Frequencies of cells as a function of distance from edge for the
GI, S, G2, and M subpopulations. . . . . . . . . . . . . . . . . 63
6.4
(A) A cellular map illustrating the periphery, Cell Layer 1, Cell
Layer 2, and Cell Layer 3 subpopulations. (B) Distributions
of cell cycle phases for each cell layer. . . . . . . . . . . . . . . 64
6.5
Histogram distribution of mean Nanog intensity of cells on
periphery and cell layers 1-3. . . . . . . . . . . . . . . . . . . . 65
12
6.6
Inter-colony heterogeneity of cell cycle distribution. (A) Histogram of cellular distance from edge in cells belonging to
differently-sized colonies. The maximum cellular distance from
the edge of the colony to the center was used to separate
colonies into small (< 150 pm), medium (150 - 300 pm) and
large (> 300 pm) sizes. (B) Distributions of cell cycle phases
for each cell layer in small, medium, and large-sized colonies.
The inter-colony heterogeneity is not significant except for a
slight enrichment in S-phase cells in Cell Layer 1 of small
colonies. Error bars represent 95% of bootstrap samples. Adapted
from Gorman et al. [1] . . . . . . . . . . . . . . . . . . . . . . 67
6.7
Quantifying colony morphology change upon addition of retinoic
acid to culture media. A square grid is superimposed onto
colonies, dividing them into boxes, which are then processed.
All boxes not completely contained within the colony borders
are excluded.
6.8
. . . . . . . . . . . . . . . . . . . . . . . . . . . 69
Pairwise Kullback-Liebler Divergence (KLD) values between
all patches at every time point, 0-7 hours of differentiation. . . 70
6.9
Comparing distributions of KLD values.
. . . . . . . . . . . . 71
6.10 Phenotypic changes of hES cells during retinoic acid-induced
differentiation. . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
6.11 Regional heterogeneity in response to DNA damage. . . . . . . 74
13
6.12 Oct4 distribution of cells in low density, mixed density, and
high density subpopulations. . . . . . . . . . . . . . . . . . . . 75
6.13 Single-cellular levels of Oct4 and cPARP in low density, mixed
density, and high density subpopulations. . . . . . . . . . . . . 76
6.14 Reconstruction of Oct4 mRNA levels across 6 contiguous fields
of view using the spot matching procedure. . . . . . . . . . . . 77
6.15 Sox17 vs Oct4 transcript counts in immediate vs.
delayed
early endoderm differentiation. . . . . . . . . . . . . . . . . . . 78
14
List of Tables
6.1
Differential spatial distribution of Nanog positive and negative
cells in hES colonies. . . . . . . . . . . . . . . . . . . . . . . . 66
B. 1
Cellular and colony features derived by the data workflow discussed in Chapter 4.
. . . . . . . . . . . . . . . . . . . . . . . 103
15
Chapter 1
Introduction
1.1
Motivation
Ever since human embryonic stem cells (hES) cells were first isolated from
the inner cell mass of a human blastocyst [2], they have been viewed as
a 'holy grail' of medical promise.
Because they have the ability to self-
renew indefinitely and differentiate into any cell type of the body, they are
potentially an unlimited source of cells for patients in need of cellular therapy
[3]. Moreover, due to their provenance, hES cells are an ideal system to study
cellular fate decisions in early human development. More recently, Yamanaka
and colleagues devised a method to convert fully differentiated somatic cells
into an embryonic-like state, known as induced pluripotent stem (iPS) cells,
through the over-expression of certain transcription factors [4,5]. Collectively,
we refer to hES cells and iPS cells as human pluripotent stem (hPS) cells.
16
A major branch of therapeutic stem cell research is aimed at understanding how pluripotent cells acquire their ultimate fate as a defined tissue. Considerable effort has gone into developing directed differentiation protocols by
empirically adding or removing inductive signals to the differentiating cell
population in order to progressively enrich specific cell subsets that will yield
the cell of interest [6], however current directed differentiation protocols are
often low yield and highly variable. Compounding the complexity of in vitro
differcntiation is that hPS cells are inherently highly heterogeneous [7, 8].
Heterogeneity (cell-to-cell phenotypic variation) is a consistent and necessary feature of hES cells [9, 10]. Lineage-biased progenitor cells, identified
by expression of specific cell-surface markers, can be isolated from a clonal
population of undifferentiated hES cells [11]. This inherent heterogeneity is
thought to contribute to the ability of hES cells to differentiate into multiple
lineages [9]. Nevertheless, it poses problems for the clinical use of pluripotent
stem cells by biasing subsets of cells to different lineages [12].
An additional source of heterogeneity is induced as an artifact of the
cell culture micro-environment, and includes such features as proximity to
other cells, density, and gradients in growth factors and other cytokines [13].
hES cells share direct cell-to-cell contacts in the form of gap junctions [14];
are maintained through diffusible autocrine and paracrine signaling [15]; display high rates of apoptosis when plated as single cells [14]; and undergo
anoikis [16]. The colony is a feature of standard hES culture conditions [17].
Standard hES cultures exhibit a wide diversity of colony and cellular phe17
notypes. Cells in large, dense, colonies receive a different set of chemical
and mechanical signals than cells residing in smaller, sparser, colonies [18].
Moreover, within any given colony, there may emerge cellular subsets that
spontaneously differentiate from hES cells and help to support the growing
colony [19].
1.2
Related Work
Population context has been shown to correlate with heterogeneous cellular states in other cell types [20, 21], and we hypothesize that it is vital
to understanding hPS cell heterogeneity.
Studies have started to explore
the structure of the hES cell niche and how ES cells self-organize into subpopulations [19, 22, 23], but a complete understanding of cellular dynamics
within the colony structure remains elusive. Traditional molecular biology
approaches such as immunoblotting or gene expression analysis average cells
over the population, which can conceal the true cellular dynamics [24, 25].
And, while flow cytometry provides single cell data, it requires breaking up
colonies to create cellular suspensions.
Thus, novel imaging approaches applied to stem cells are necessary. Some
previously published methods have focused on live cellular analysis. Li et
al. [26] developed an in vitro cellular tracking tool using time-lapse phase
contrast images. Their method integrates spatiotemoporal contextual information and constructs cell lineages. Alternatively, Padfield et al. [27] used
18
live cell cycle fluorescent phase markers to monitor cell cycle progression in
live cells. However, it is difficult to scale single cellular tracking methods up
to sufficiently track individual cells within a structure as dense as a human
embyronic stem cell colony. One possibility is to create a diluted mixture of
labeled cells in non-labeled cells. However, there is also much information to
be gained from imaging fixed cells (including dynamic information, assuming
ergodicity [28]), or a combination of imaging live cells pre- and post-fixation
and linking the information together. Both of the latter approaches are described in this thesis.
Two recent studies have developed methods focusing specifically on analyzing stem cell colonies. Paduano et al. (2013) reported an image analysis
pipeline for segmenting stem cell colonies and performing location analysis
on specifically-labeled cells [29]. Alternatively, Scherf et al. (2012) developed an automated analysis and tracking system to monitor the growth and
morphology of live colonies over time, using phase contrast light [30].
Another pair of recent studies have used micropatterning technology
to force hPS colonies into defined shapes and sizes.
Nazareth et al. [31]
demonstrated that using microcontact printing to control colony densities
and shapes can reduce cellular heterogeneity and improve differentiation efficiency. Warmflash et al. [32] found that constraining colony size triggered
self-organization, particularly causing differentiation on the outer edges of the
colonies. Those two studies look at single cellular properties in the context of
the colony microenvironment. However, we have endeavoured to study hPS
19
cells in standard culture conditions and attempt to measure the emergent
heterogeneity in that system, which includes accounting for colonies of all
different shapes and sizes in any arrangement.
1.3
Approach and contributions
Our approach presented here differs from these methods for analyzing stem
cell colonies by focusing on generating in situ maps of all cells and their
locations relative to each other, as well as within the structure of the colony,
allowing comparison across multiple length scales (Figure 1.1). The method
we describe integrates automated fluorescent microscopy with an analysis
pipeline that enables high-throughput detection of colonies at low resolution, combined with single-cellular and sub-cellular analysis at high resolutions, generating seamless cell maps and single-cellular data organized by
colony and neighborhood.
This rigorous informatics approach will lay the
foundation for understanding the influence of spatial population context in
early differentiation.
The workflow of our tool is illustrated in Figure 1.2. We developed a
system for investigating in situ heterogeneity that evaluates individual cells
in the context of their micro-environments. This system extracts a vector of
image-based features for each cell, such as fluorescent intensity, texture, and
morphology, as well as the cell's position within the colony. It also constructs
a virtual slide through an automated process of image mosaicking, colony
20
System for rapidly analyzing
intercellular dynamics
Single cell
properties
Cell-cell
Colony-colony
Cell-colony
Figure 1.1: Our system enables researchers to analyze intercellular dynamics
in hES cells by structuring relationships between cells within a colony; between cells and the colony they belong to; and from one colony to another.
Adapted from Gorman et al. [1]
segregation, and cropping. Finally, it exports the results in software that
allows visual manipulation and gating on features or space.
1.4
Organization
The thesis is organized as follows:
Chapter 2 contains a brief review of the necessary biology that motivates
the experimental research described the following pages. An overview of basic
pluripotent stem cell biology is presented, as well as a discussion of stem cell
heterogeneity, and cell cycle regulation in pluripotent stem cell biology.
Chapter 3 walks through the acquisition software we developed for
rapidly imaging single cells with the context of their colony microenviron21
2a. Automatic Colony
3. High Mag.
Colony Scanning
Detection
4. High Content
Feature Extraction
Cellular Properties
- Expression
- Size
- Shape
- Global Location
-- + - Relative Location
/
1. Low Mag. Prescan
Colony Properties
- Size
- Shape
- Density
- Texture
2b. Manual Re
Selection
Figure 1.2: A workflow to obtain image-derived features from single cells,
while placing them in the context of the colony they belong to. Thus, the
pipeline involves a multi-scale segmentation of the colonies within the sample,
and the cells within the colonies. Each cell not only has a physical address
within the colony, but is linked with colony-wide properties, such as the size
and shape of the colony it is derived from. Adapted from Gorman et al. [1]
22
ments. The strength of our system is that it integrates image acquisition
with image analysis, using a prescan to segment colony borders and selecting
them for high-resolution imaging either manually or automatically.
Chapter 4 details the techniques employed and the software used for
segmenting cells and colonies, generating a 'virtual slide' representation of
the data, processing image stacks to detect single molecule FISH probes,
quantitating image spots per cell, and thresholding the smFISH signal using
Gaussian Mixture Model-Expectation Maximization.
Chapter 5 describes the data pipeline and methods for processing and
analyzing these data. The chapter begins with a discussion of spatial analysis.
Then, we delve into wavelet-based texture analysis. To solve for regional
DAPI staining variability, we introdouce a method for normalizing DAPI
levels on a per-colony basis. Finally, we describe a novel method for adjusting
mRNA FISH spot detection thresholds to ensure consistent measurements
across adjacent fields.
Chapter 6 contains the results of applying the tool to a variety of biological problems. We present data on cell cycle heterogeneity as a function
of colony position (distance from the edge). We also present data showing
that phase-contrast-based texture analysis may be used to monitor the hourto-hour progression of retinoic acid differentiation. Then, we demonstrate
that we can classify different regions based on their macro-scale properties,
and correlate those regions with the underlying cellular behavior. Finally,
we present preliminary results from the application of the pipeline to single
23
molecule FISH measurements over wide areas.
Chapter 7 concludes with a discussion of the contributions of this thesis
and suggestions for future work.
24
Chapter 2
Biological background
2.1
Pluripotent stem cells
Ever since human embryonic stem cells (hES) cells were first isolated from
the inner cell mass of a human blastocyst in 1998 by Thomson et al. [2], they
have been viewed as a 'holy grail' of medical promise. Because they have the
ability to self-renew indefinitely and differentiate into any cell type of the
body, they are potentially an unlimited source of cells for patients in need
of cellular therapy [3]. Moreover, due to their provenance, hES cells are an
ideal system to study cellular fate decisions in early human development.
More recently, Yamanaka and colleagues devised a method to convert
fully differentiated somatic cells into an embryonic-like state through the
over-expression of certain transcription factors [4, 5].
These cells, known
as induced pluripotent stem cells (iPS cells), have the advantage of being
25
immuno-matched to the individual patient. In most important respects, hES
cells and iPS cells are identical; however, hES cells remain the 'gold standard' due in part to an incomplete understanding of the stochastic process
of reprogramming. Collectively, we refer to hES cells and iPS cells as human
pluripotent stem (hPS) cells.
A major branch of therapeutic stem cell research is aimed at understanding how pluripotent cells acquire their ultimate fate as a defined tissue. Considerable effort has gone into developing directed differentiation protocols by
empirically adding or removing inductive signals to the differentiating cell
population in order to progressively enrich specific cell subsets that will yield
the cell of interest [6]. Most of these studies are carried out in tissue culture and thus fail to achieve the degree of spatiotemporal regulation found
in vivo; as a result, current directed differentiation protocols are inefficient
and highly variable [33].
2.2
Stem cell heterogeneity
Heterogeneity is a consistent and necessary feature of human embryonic stem
cells (hESCs). Lineage-biased progenitor cells, identified by expression of
specific cell-surface markers, can be isolated from a clonal population of undifferentiated hESCs [11, 34]. This inherent heterogeneity is thought to be
necessary for the ability of ESCs to differentiate into any cell type. Nevertheless, it poses vexing problems for the clinical use of pluripotent stem cells.
26
Meanwhile, stem cells cannot be made more homogeneous by population
fractionation; if a subset is identified, expanded in culture, and re-assayed,
the original population heterogeneity is generally recovered [35].
There are several proposed mechanisms explaining heterogeneity in hES
cells. One such mechanism is thought to a consequence of chromatin regulation via the NuRD complex [36]. NuRD inhibits the transcription of key
pluripotency regulators; its dynamic interaction with pro-transcription signals results in heterogeneity.
Another likely source of heterogeneity is induced as an artifact of the cell
culture micro-environment. Standard hESC cultures typically exhibit a wide
diversity of colony and cellular phenotypes [37,38]. At any given snapshot
in time, there exists a large spectrum of colony sizes, cellular density, and
differentiation status. For instance, cells residing in large, dense, colonies
receive a completely different set of chemical and mechanical signals than cells
residing in smaller, sparser, colonies [18]. Moreover, within any given colony,
there appears to be cellular subsets that spontaneously differentiate from
hESCs and help to support the growing colony [19]. Because most assays
involve either examination of cells in aggregate, or breaking up colonies to
create cellular suspensions, to address this source of heterogeneity, it is vital
to use imaging.
27
2.3
Cell cycle regulation in pluripotent stem
cells
Cell cycle regulation is fundamental to the identity of embryonic stem cells
and their differentiation [39-41]. One characteristic of human pluripotent
stem cells (hPSCs) is a short G1 phase that generally lasts about 3 hours out
of a total cell cycle length of 16 hours. Although it varies by cell type and
lineage, G1 phase is generally much longer (about 6-10 hours) in somatic (i.e.,
fully differentiated) cells. Hence, a greater percentage of an unsynchronized
population of differentiated cells will be in G1, relative to an unsynchronized population of undifferentiated cells. Although artificially lengthening
G1 in mESCs by independently targeting several key cell cycle regulators
(CDK inhibitors p21 and p27, Rb, and E2F) was recently shown not to affect differentiation propensity [42], changes in the activities of these cell cycle
regulators are nonetheless tightly linked to differentiation status [43].
Imaging and image analysis are powerful tools in cell cycle analysis. In
addition to cell cycle analysis based on DNA content (DAPI staining) and
DNA synthesis (BrdU/EdU staining), individual cellular image-based features such as size, shape, and texture, can be used to distinguish mitotic
cells. Moreover, imaging provides the location of cells in the context of their
in vitro microenvironment (the colony). As inter- and intra-colony heterogeneity is thought to be critically related to differentiation status and mitotic
activity of embryonic stem cells, quantitative image analysis provides a rig28
orous way to address questions about how colonies develop and cells are
distributed throughout them. For example, a recent study claimed that Sphase cells were distributed as clusters throughout the colony [44]. However,
in the absence of rigorous statistical analysis, it is not clear whether such
clusters are significantly different than would be expected through random
dispersal of S-phase cells. This dissertation investigates how the cell cycle
varies as a function of spatial heterogeneity.
29
Chapter 3
Image acquisition
3.1
Hardware
We utilized a Nikon Eclipse Ti inverted microscope equipped with a Prior
motorized stage and a high-resolution CoolSNAP HQ2 CCD camera. We
wrote custom software in MATLAB to control the microscope, camera, shutters, and stage hardware.
3.2
Main acquisition graphical user interface
We constructed a program that allows multi-scale imaging of ESCs in different regions across the slide. In order to facilitate usage by people with
varying computer backgrounds, we also constructed a graphical user interface (GUI) to control this program. The main window of the GUI is shown in
30
Figure 3.1. The main GUI window contains UI elements that allow the user
to adjust all necessary parameters of the program. This includes parameters
controlling the microscope nosepiece (objective lens to be used), the camera (binning, gain, exposure, and region of interest properties), the z-drive
(number and interval of z-stack slices), and the program I/O (save directory,
notification of completion).
3.3
Pre-scan graphical user interface
The second major element of the program is region selection. In selecting
regions, the user has the option of generating a lower-magnification pre-scan
imaged with transmitted phase light and/or fluorescent channels. The prescan sub-GUI is illustrated in Figure 3.2. Once the pre-scan is executed, the
user may manually select specific regions of interest by drawing boxes onto
macroscopic image of the entire slide (Figure 3.3). Alternatively, bounding
boxes can be automatically drawn around colonies through integrated processing and segmentation of the colonies from the pre-scan image. Selected
regions can then be flexibly reviewed and edited before the region positional
data is transferred into the main program for execution of the final fullcontent image acquisition.
The software offers many flexible options for the high-resolution scan.
Users may select any level of magnification, including both dry and oilimmersion objective lenses; any combination of fluorescent filters and expo-
31
mRNA FISH Acquisition
tion-DoThis
-z
FrsE
-
1. kwlakze
F formautin
Inhiahz
Micro-Afnager GUW
Diecor
(-no
p-c)
Person to toxt
I 111111a 1111:10,41PWcore
D:\Annd\2D1 3-4-23_DE7-mRNA-FISH-Covemipa
None
Iniazed
Intialized
&.hiti"-toge
pecn
No Prescan
Update stage upper left poaton
jate
ObjAce
Obijcve lans
PApdx
stage owert
on
4Use Prescan
Z-ksck____________
4
OpenPftU05i GUI
45
Number of elos
iterval between skees (um)
x -82456.5, y - 2001.3 ...TO ... xx -6377?2.8. - -40785.7 .. TO .. x-
.25
-8
I
m
Overlap (%)
Movestage to upper left poelon
Camara
E] Rol
x1 1
%(1
xSv
e
2
Use conslant focal center
Use variable focal center
Region 1:
/Size1C4C
1603.1
of colors
1
Color *1:
DAP,
Gain:
2
2
Region 4:
1599
4
3
".2
Focal center (um)
Region 2:
Region 3:
1816.5
1607
Region S:
Region 6:
1602
cowo O2: 1Cys
100
Geb:
1000
Gm:
Coor 3: Cy3
Gain:
poelon
ilt
r Focus
Binning
-Number
Move stage to lower
2
800
Cobr#4:
Figure 3.1: The main window of the acquisition graphical user interface
(GUI). Adapted from Gorman et al. [1]
32
Array Prescan
-
Generate a pre-scan of the side with 10x Phase.
1. Move Stage to Upper Let of Cover Slip
x - -87399.5, y - -46896.4
IUpdaft stop upper "
2. Move Stage to Lower Right of Cover Slip
IUpdate gap lower f ght posinI
x - -81067.8, y - -40074.6
3. Choose a region, enable PFS and adjust offset to focus (lOx phase).
Offsdt 119.3S
Update PFS offet
5. Save Directory
na
20
4. Set prescan exposure.
Dii:
1
D:Anna\2013-04-23_DE7-mRNA-FISH-Cove
Execte scan
Draw regions to scan at higher resolution.
Add current region to Not
dregaln from flt
Remove
x - -2015.2, y - 42520.1 ... TO ... x - -81535.2 y - 42132.9
x--84872.3, y - - 4 411 5 .1 ... TO ... x - -84392.2 y - -43727.9
x - 82456.5. Y - 42001.3... TO ... x - -81976.5 y - -41614.2
x - -82619.1, y - -40994.8 ... TO ... x - -82092.6 y - -40576.7
PANIC BUTrON
-
Press and hold to cancel scan
I
Figure 3.2: The prescan daughter window. Adapted from Gorman et al. [1]
33
sure times; and may opt for an overlap between adjacent fields. Additionally,
we implemented the ability to image Z-stacks, which may be useful for many
applications. Because the focus level may vary across a slide, users have
the option of automatic focusing using the microscope hardware (e.g. Nikon
Perfect Focus), or manually defining a constant or variable focal plane for
each selected region. As the program scans the selected regions, tiff stacks
are saved in a specified directory, along with data files recording the imaging
parameters and each stage position. Finally, a composite array of each region
is constructed, by tiling together maximum projections of each tiff stack.
This system provides powerful and flexible controls to image and analyze
the heterogeneous population context of hPS cell culture at multiple scales.
Because it seamlessly integrates imaging with analysis, including automatic
region selection based on colony segmentation, our setup is an improvement
over currently available solutions to study long-range relationships of single
hPS cells in culture.
3.4
Manual and automated colony selection
Once the scan is executed, the user is presented with a macroscopic field
of view of the entire slide, as shown in Figure 3.3, which the user can then
draw rectangles on to select regions. Once the user has selected a region by
drawing a rectangle around it, the user then adds the selection to a list by
clicking a button on the window. Users may see and edit previously selected
34
Figure 3.3: Selection of regions for high-magnification imaging based on a
low-magnification prescan.
35
Figure 3.4: High-magnification imaging of region selected in Figure 3.3. Maximum projection images are automatically generated from z-stacks and titled
into a regional array. The array is composed of 9 tiff stacks in a 3x3 grid.
36
region by highlighting selections in the textbox. Upon selecting all regions
of interest, the user clicks an update button on the main GUI window to
transfer the data into the main program for execution. Finally, because the
focus level may change across a slide, users have the option of either using a
constant focal center, or defining one for each region they select. Figure 3.4
shows an example of the high-resolution image derived from the pre-scan in
Figure 3.3
37
Chapter 4
Image processing: colonies to
cells
4.1
Colony segmentation and identification
We mosaicked constituent field images into complete regions, taking advantage of the submicron accuracy of our Prior ProScan III stage.
Using a
combination of Gaussian blurring and thresholding, we identified the boundaries of the colonies. Those colonies were then cropped from the mosaic and
sent to the segmentation module. Due to slight discontinuities at the borders
between regions, cells on the borders were discarded from the analysis.
38
4.2
Segmentation of cells and feature extraction
Field images were loaded into the segmentation module for processing and
feature extraction. CellProfiler [45, 46] was called directly from MATLAB,
with the resulting segmentation label matrices and feature sets automatically reimported. Within CellProfiler and MATLAB, we extracted numerous
features from cells, including integrated intensity of multiple stains, simple
measures of texture, and shape measurements, such as area, perimeter, and
eccentricity. Additional regional and global properties were calculated and
appended to the data structure. These include regional and global indices
for each cell, regional and global virtual addresses and physical addresses,
and levels of DAPI and other relevant staining signals.
4.3
Laser Scanning Cytometry
The analysis and informatics pipeline is easily portable. As an example,
images acquired with Laser Scanning Cytometry (CompuCyte; Cambridge,
MA), a non-confocal laser scanning microscope, can also be processed.
39
4.4
Data visualization with FCS Express
We wrote custom scripts to export data into the FCS Express 4 Image Cytometry format (De Novo Software), allowing flexible manipulation of image
and feature data.
4.5
Construction of a virtual slide
The virtual slide abstracts the important information about cellular space
without having to deal with massive, cumbersome image files. The segmentation label matrix for each field is loaded into the region label matrix. Field
labels are converted to global labels. For each field, the feature data is loaded
and collated in to a data array structure addressed by the region number,
field coordinates, and feature of interest. The centroid of each cell is derived
from the segmentation results. Those values were combined with the colony
identification step to yield an "address" of the cell within the context of its
neighbors and the colony in which it resides.
This allowed us to achieve our goal of analyzing images of single stem
cells at multiple scales. The lowest resolution of analysis is comparing cells
aggregated over one colony to another, e.g. the properties of cells in large
colonies versus small colonies. The next level of resolution is comparing regions of the colonies, e.g. the center of the colony to the periphery. The next
level beyond that is evaluating cells within the context of their immediate
cellular neighborhood. The final level is subcellular resolution, wherein we
40
analyze the texture of individual cells.
Our system incorporates a method for generating seamless maps of cellular fields (Figure 4.1). We apply a heuristic based on the accuracy of our
robotic stage, which is that the same object existing in adjacent fields should
overlap in pixel space. As long as the overlapped margin is at least twice as
large as the diameter of a large cell, all cells will completely exist in at least
one field. Corresponding cellular objects are identified across image fields,
and a consensus seamless segmentation is arrived at by generally taking the
larger of the two objects.
4.6
Single molecule FISH processing
To test if the platform is amenable to subnuclear resolution image data,
we adapted the acquisition and analysis pipelines to single-molecule mRNA
FISH (smFISH) data where each FISH spot corresponds to the diffractionlimited image of a RNA molecule with a size less than the resolution of the
light microscope. RNA spots were detected by imaging Z-stacks of 50 slices
at 0.3 plm apart with a 60x Plan Apo objective.
To segment RNA spots computationally, we adapted an algorithm that
uses a morphological opening filter to remove out-of-focus light, followed by
a Laplacian of Gaussian filter to detect candidate foci [47]. A test statistic
was then derived for each foci based on a three-dimensional Hessian matrix
to detect curvature of the spot (how much it resembles a point source of
41
Incomplete cell
Complete cell retained
discarded from
in combined label
label matrix
matrix
Field 1
Field 2
Combined Label Matrix of Fields 1 and 2
Figure 4.1: Heuristic method for obtaining seamless segmentation fields using
adjacent overlapping regions without stitching the raw image data directly.
Adapted from Gorman et al. [1]
42
Figure 4.2: Quantification of Oct4 mRNA level with smFISH in a hES cell.
Left: raw image; Right: segmented FISH spots. Adapted from Gorman et
al. [1]
light) and the intensity of the Laplacian of Gaussian. This statistic was then
thresholded to separate real foci from random noise. Figure 4.2 presents
examples of processed images of hES cells hybridized for Oct4 mRNA.
Additionally, a colony mask was generated using Otsu's method [48] to
threshold the maximum projection of the RNA background staining.
A
Veronoi tessellation was used as a first-order approximation for the cellular
boundaries, generated by seeding the nuclear outlines and expanding them
until touching. An example of the segmentation of the nuclei, approximate
segmentation of the cell boundaries, and detection of spots in a colony is
presented in Figures 4.3 and 4.4.
43
Figure 4.3: Segmentation of high resolution stacks of hES cells with nuclear
boundaries in red, cell boundaries in yellow, and colonies boundaries in green.
Adapted from Gorman et al. [1]
44
Figure 4.4: Overlay of detected Oct4 transcripts on the segmented cellular
field. Adapted from Gorman et al. [1]
4.6.1
Threshold selection
After repeated exposure of the same field, the signal peak begins to deteriorate. This loss of signal has a profound effect on individual cellular measurements. Since our objective is to image a wide area, some fields inevitably
overlapped and are more exposed relative to others. It also poses a practical
problem, as the same threshold for separating background and foreground
noise is not equally applicable to every field.
We tested this signal degradation by repeatedly imaging Z-stacks of the
same field of view on a sample. We then applied the method in Section 4.6
to detect foreground spots from the background noise. The distribution of
the test statistic is shown in Figure 4.5. The degradation of the signal peak
45
- t=1
--5
024
02
2
0n
3
0,
r
02"
02
t
4
se
Ieds
-
02
0
5
,to
5
5
D
6
'
02
02
01
015
01-
ON
-f
0
S
to
16
20
Figure 4.5: Distribution of mRNA test statistic resulting from repeatedly
re-imaging the same field of view, six times total.
is observable between the first and second exposure sets. Moreover, by the
fourth exposure set, the signal peak was no longer visible by eye. Applying
the same threshold to each field resulted in Figure 4.6. Applying separate
thresholds to equalize the number of spots in each field resulted in Figure
4.7.
Thus, we needed to incorporate a method for automatically thresholding
fields that was robust. Since the background and foreground signal distributions approximate Gaussians, we used Gaussian Mixture Model ExpectationMaximization (GMM-EM) to determine the threshold between between background and foreground. This involves modeling the background and foreground distributions as Gaussians with means po, p1i and standard deviations o, (-1. For each data point x, we wish to classify each as Z(x)
=
0 or
Z(x) = 1. We define qz(x) to be the probability that data point x belongs
46
T=1
3
2
I
4
5
6
Figure 4.6: Repeated imaging of the same field of view and applying the
same smFISH spot detection threshold to each iteration. The number of
spots detected decreases rapidly.
T=1
2
3
4
5
6
Figure 4.7: Repeated imaging of the same field and adjusting the smFISH
spot detection threshold such that each iteration has the same number of
detected spots. The accuracy of the individual detected spots is reduced.
47
to class Z, Nz to be the number of data points in class Z, and 7z to be
the proportion of the points in class Z as a fraction of the total population.
Lowry [49] derived the following parameter equations, which are re-estimated
at each iteration:
qz(x),
7TZ=
(4.1)
N
Nz
1qz(x),
x=1
(4.2)
INqz(x)Y(x),
Jz =
(43)
and
N
2
qz(x)(Y(x)
=
-
Pz)2.
(4.4)
x=1
Then, the probability vector is updated [49]:
_z
pizN(Y(x); pz, cz
c,= 0, 11 7cN (Y (x); p,, a-,
We found that a random initial condition would typically not converge
on the optimal threshold. Thus, for an initial condition, we set qO = 1 and
qi = 0 for all points less than 1 standard deviation above the background
peak and qO
=
1 and qi = 0 for all points greater than 1 standard deviation
above the background peak.
48
Chapter 5
Data pipeline and informatics
5.1
Spatial analysis
In addition to measuring the distance-to-edge dependence of cell cycle variation, we can use a spatial statistical approach to analyze whether a certain
cells residing in a given cell cycle phase are clustered together, the presence
of which suggests local synchrony. If a cell state distribution is uniform and
continuous, then it may be modeled as a spatial poisson process. Thus, if
a random point is chosen in a colony and a circle of radius r is extended
outward, the probability of k cells belonging to a given cell state contained
in the circle, X(S), is [50]:
[w2]k e-Arr 2
P{X(S) = k} =
k!
.
(5.1)
Thus, in analyzing whether the distribution of certain subsets of cells,
49
Figure 5.1: Spatial Poisson method for analyzing colony structure. Random
points in the colony are chosen and a circle is expanded outwards. The
distribution of arrivals of consecutive cells is compared to an exponential
distribution.
such as S-phase cells, is uniform in a population, we can statistically compare
it to a spatial Poisson process. We did this by choosing random points in the
colony and expanding a circle outwards (Figure 5.1). Each additional cell
added as the circle expands is an arrival, and the increase in area between
consecutive cells added is the inter-arrival 'time'. In Poisson processes, the
distribution of inter-arrival times is exponential. Therefore, the hypothesis
of uniformity can be tested using a one-sample Kolmogorov-Smirnov test,
where the observed distribution of inter-arrival times is compared against an
exponential distribution.
50
5.2
Texture analysis
A major challenge impeding the utility of human pluripotent stem (hPS)
cells is heterogeneity in gene expression and differentiation potential. This
heterogeneity is caused in part by culturing embryo-derived cells on a plastic
dish. Within a culture, there exists a wide spectrum of colony sizes and cellular density. Thus, we sought to examine colony heterogeneity by developing
a multi-resolution analysis, linking a macro-property phase contrast microscopy -
colony texture via
to the underlying cellular behavior.
Texture is broadly defined as spatial variation in pixel intensity, and due
to texture manifesting over multiple spatial scales, wavelets are convenient
for analyzing it. Following work by Mangoubi and colleagues [51-57], we
developed methods for characterizing colony texture and correlating it with
single-cellular properties. The method involves splitting colony images into
small square patches. The pixel size of the squares is a trade-off between
texture information and resolution. The discrete wavelet transform is then
used to decompose these patches into subbands. Coefficients of each subband
are modeled as a generalized Gaussian distribution [51,52]:
p(x; a, f) =
2aF'(1/#)
e-Cl"|/" ,
(5.2)
where the estimated parameters a and 3 collectively form a feature vector.
The statistical dissimilarity of two textures is calculated using the Kullback-
51
Liebler divergence (KLD) [52, 581,
DKL(1, 2)
P1(X) In
(x)
dx,
(5.3)
which can be conveniently simplified by assuming that the wavelet coefficient distributions are independent. With that simplification, the KLDs
may be summed across subbands [51]. Figure 5.3 illustrates the application
of this methodology to different colonies within a putatively undifferentiated
sample of hES cells in standard culture conditions.
5.3
DAPI normalization
DAPI (4',6-diamidino-2-phenylindole) is a fluorescent stain that binds to
DNA and allows visualization of cell nuclei. It is also useful for quantifying the amount of DNA in a cell, which is approximately proportional to the
integrated fluorescent intensity of the nuclear signal [59, 60]. Because many
cells are stationary at 2N or 4N DNA, distributions of cellular DNA show
peaks corresponding to those amounts, with 4N cells approximately double the integrated intensity of 2N cells. When we combined cellular DAPI
levels across an entire coverslip, we observed an uncharacteristically noisy
distribution (Figure 5.4C). When we looked at the DNA profiles for individual colonies, we discovered that they were far less noisy (Figure 5.4D).
We eventually discovered that the artifact was due to regional variation in
background and staining that particularly affected the DAPI signal due to
52
A
Subdivide image into
square patches.
B
Apply Discrete Wavelet
Transform to decompose
patches into subbands.
H
-
C
Model subband
coefficients as
Generalized
Gaussian
Distribution
(GGD).
v
~DU
H
Calculate KullbeckLiebler Divergence
Histogram of subband
coefficients.
(KLD) between
GGDs.
Figure 5.2: The wavelet-GGD texture analysis method developed by Do and
Vetterli [58] and originally applied to the analysis of stem cell data by Mangoubi et al [52]. A) A stem cell colony imaged using light microscopy is
subdivided into square patches. The pixel size of the squares is a trade-off
between texture information and resolution. B) The patches are decomposed
using the Discrete Wavelet Transform. The top inset shows visual representations of the decomposition, and the bottom inset contains histograms of
the sub-band coefficients. C) Each set of coefficients may be modeled as
belonging to a Generalized Gaussian Distribution (GGD), which has two parameters, a and 3. Collectively, the parameters from each subband form
a feature vector whose divergence is calculated using the Kullback-Liebler
Divergence (KLD).
53
Pluripotent
1
2
3
4
5
6
Mildly
Differentiated
3
1
2.5
Pluripotent
2
2
1.5Q
4
Mildly
Differentiated
5
0.5
6
2
3
Pluripotent
4
5
6
0
Mildly Differentiated
Figure 5.3: Illustration of intra- and inter-class texture differences in hES
cells. The textures of three pluripotent patches and three mildly differentiated patches were calculated and compared in a pairwise fashion. Generally,
the inter-class KLD was substantially larger than the intra-class KLD.
54
mounting media fluorescence in the blue wavelength (Figure 5.4A). To address this problem, we leveraged our data pipeline, which segregates cells
into local colony structures. Using a Gaussian kernel density estimation, the
DNA distribution for each colony was modeled, and 2N and 4N peaks were
identified. Each colony then became an internal control with its own cellular
DNA distribution containing peaks at 2N and 4N DNA. Plotting the 2N
and 4N peaks, we found that most colonies indeed had 4N peaks at approximately twice the DAPI intensity of the 2N peaks (Figure 5.4B). Colonies
that fell outside a narrow linear range were found to be small colonies (less
than 100 cells) which were poorly modeled, and were discarded from further
analysis. Because the peaks define DNA content at an absolute level, we normalized the DAPI values of every cell by globally aligning the colony peaks
to values of X and 2X:
D' = (D + D4N
-
2D2N) (D4N
D2N
(5.4)
Where D is the intensity of the DAPI stain, D' is the rescaled intensity,
and D 2 N and D4N are the intensities of the identified 2N and 4N peaks.
Applying this method, we normalized the cellular DAPI values to align the
individual colony peaks (Figure 5.4F) and found that resulted in a much less
noisy total DAPI distribution (Figure 5.4E).
55
A
Estimated G1 Peak vs Estimated G2 Peak
Each Do, Reresents One Coln
B
OAtoldeof ewue fegow
Aeeunled to Be Poorly Modeled:
Boclude From Fuelhednkeyels
201Evury#at
C04,
004
1002000
410
12
003
0
ot
00
10
50
300
20
0.015
0.0
0.02
0.015
0
2W
sA
.
E
100 DA PI C20te t
(A
00
wF
0.025012D
0.02
1000.0
a,
50s
100
0
1AP
0 CQ2AU)050 00
DAPI
Conln (A.V,)
0
Figure 5.4: DAPI normalization method. (A) A laser-scanned DAPI image
of an entire coverslip, displayed using a colormap where red is high and blue
is low. The range is truncated in order to see the variation in the background
levels. (B) To correct for DAPI variation, estimated G1 and G2 peaks are
found for each colony. Generally, the G2 peak should be approximately 2
times greater than the GI peak; colonies that deviate from that specification
are discarded. (C) Without applying any correction to the data, the DNA
distribution is poorly defined. (D) By examining cell cycle profiles for individual colonies, we find that they are resolved into well-defined peaks, but
have offsets at different levels due to background variation. (E) Applying
the correction method resolves the poorly defined DNA distri-bution. (E)
Alignment of colony cell cycle peaks Wto a single distribution.
Ensuring smFISH spot detection consis-
5.4
tency across adjacent fields
Due to heterogeneity from one field to another, as well as potential diminution of the signal from adjacent fields due to imaging wide area, we found
that it was difficult to ensure consistent spot detection from field to field.
Therefore, we developed a method for adaptively adjusting the spot thresholds s(x,y) at each grid x, y, using adjacent overlapped fields (Figure 5.5).
We subdivided each image space into top, bottom, left, and right margins.
We then minimized an error function based on the sum squared differences
in the number of detected spots at the overlapping margins between fields,
M-1 N-1
i
Z{(f
x=2
y=2
y(sxy)
-
f
+1(Sxy+1))2+ (f ,,(sx'Y)
(frY(sx'Y) - fI+1,Y(sx+1,y))2 + (fx,,(sx'Y)
f_
-
-
1
,Y(sx-1,y)) 2
+
min
fx_1(sx'Y-1))21
(5.5)
where M and N are the size of the regions in number of image fields
and fx,y is the number of spots in the (x, y) image field calculated in the
top (t), bottom (b), left (1), or right (r) margins using the threshold level
sx'Y. Minimizing the error function across all grids by locally or adaptively
adjusting thresholds provides continuity and consistency as we move from one
grid to another. The objective function uses a 2-norm, but a 1-norm may
57
Calibrating spot detection across large regions
field x,y right
field x+1,y left
4
0
Figure 5.5: Method for setting smFISH spot detection thresholds to ensure
spot detection consistency across adjacent fields in a large region. Adapted
from Gorman et al. [1]
also be used for greater robustness to outliers. The minimization provided
a matrix of local or grid specific threshold values S = [s,,].
An initial
estimate for S, So, was obtained by thresholding the combined data for all
fields and setting as constant. With this equalized spot detection, we are able
to quantitate the FISH spots in each cell with high confidence, and follow the
dynamic down regulation of Oct4 levels during differentiation of hES cells
(Figure 5.5). This approach, which allows tiling of individual mRNA signals
over multiple fields of view, allows large-scale analysis of heterogeneity in
Oct4 levels of individual stem cells.
58
6
70
20
X1
Chapter 6
Results and discussion
We analyzed quantitative relationships of individual cell properties to their
location within the colony and colony-wide features such as size, shape, and
texture. These properties include cell cycle state, expression of pluripotency
markers such as Oct4 and Nanog, and shape and size of the individual nuclei.
In addition, we used this tool to examine relationship of cell cycle variation
as a function of colony properties and cellular position.
6.1
Cell cycle heterogeneity as a function of
location within colonies
Cell cycle regulation and pluripotency are closely linked; changes in pluripotency are coupled to changes in cell cycle progression [61].
We used our
platform to investigate whether cell cycle varies as a function of cellular lo59
A
B
C
-
031
0
w00
0
aosG1
7S0
G2/M
1%0
2iSO
3(O
Integrated DAPI Intensity
0
O1s35
.
'(.s (114
Mean Nanog Intensity
Figure 6.1: (A) Density scatter plot of integrated DAPI intensity versus mean
EdU intensity. Three gates were chosen to categorize cells based on their cell
cycle state: G1, S, and G2/M. (B) Normalized histograms of mean Nanog
intensity for the G1 (red), S (magenta), and G2/M (blue) subpopulations.
Adapted from Gorman et al. [1]
60
cation within a colony. 80 CHB8 hES colonies across two coverslips were
imaged for DAPI, EdU (S phase), phospho-H3 (M phase), and the pluripotency marker Nanog. Taking the processed data, we generated a scattered
density plot of single-cell values of mean intensity of the EdU stain versus
integrated intensity of DAPI and categorized cells into GI, S, and G2/M subpopulations corresponding to their progression through the cell cycle (Figure
6. 1A). The mean Nanog intensity distribution of these subpopulations showed
that the GI subpopulation was enriched for cells with low Nanog levels. Approximately 10% of GI cells were Nanog negative, versus just 1% of S and
G2/M cells (Figure 6.1B). This finding is consistent with the observation that
pluripotent cells transition through GI phase quickly, and transition more
slowly as they differentiate [40].
One of the features derived from our informatics pipeline is the distance of
a cell from the edge of the colony it resides in. Using the distance transform
illustrated in Figure 6.2, cells outside or peripheral to the colony have a
distance of 0 from the edge of the colony, and the measurement increases
from 0 to a maximum value equal to the shortest radial distance from the
center of the colony to the edge. Our hypothesis for applying this particular
transform is that cells on the outer edges of colonies experience a similar set
of stimuli such that grouping them in this fashion is reasonable.
First, we partitioned the data set into intervals along the distance-fromedge axis with approximately equal number of cells in each bin. Logical gates
were specified for cells in the GI, S, G2, and M subpopulations, as well as
61
Figure 6.2: Illustration of the distance transform applied to cells in all
colonies. Outside of the boundaries of the colony, the distance from edge
is '0', and within the border of the colony, the distance ranges between 0
(blue) and the maximum radius of the colony (red). Adapted from Gorman
et al. [1]
for each distance interval. The frequency of the G1, S, G2, and M subpopulations belonging to each distance interval was calculated. For each distance
interval, we then resampled the data (with replacement) from the population
1000 times, and calculated the G1, S, and G2/M frequencies at that interval.
Lilifors and Jaque-Bera tests were used to verify that the resampled frequencies had a normal distribution, whose mean was approximately equal to the
population frequency. Confidence intervals were calculated from 95% of the
bootstrap samples.
We uncovered an interesting pattern (Figure 6.3): the frequency of G1,
S, G2, and M subpopulations was largely stable over the span of cellular
distances, except at the periphery of the colony (distance of 0), where GI
62
0.1
1
2G3
0 ClLaes50
100
150
3 1 2 Distance from edge (urn)
Figure 6.3: Frequencies of cells as a function of distance from edge for the
Gi (red), S (magenta), G2 (blue), and M (green) subpopulations. Error bars
indicate the 95% confidence interval based on 1000 bootstrap samples. Abscissa tick marks indicate the distance intervals from which the population
and bootstrap frequencies were calculated. Each distance interval contains
approximately 1500 cells. The data points are located halfway in between
each interval. The frequencies at each point add up to 1. Of note is the statistically significant peak in the G2/M subpopulation at 25 microns. Adapted
from Gorman et al. [1]
63
A
B
::
G
G2
-
0.5
0.4.3
0.3-
i0.20.10
Peiphery
Cell Layer 1
Cell Layer 2
Cell Layer 3
Figure 6.4: (A) A cellular map illustrating the periphery (blue), Cell Layer
1 (green), Cell Layer 2 (red), and Cell Layer 3 (blue) subpopulations. (B)
Distributions of cell cycle phases for each cell layer. There is a significant
enrichment of G2 cells in Cell Layer 1 relative to Cell Layers 2 and 3. Adapted
from Gorman et al. [1]
64
450--
Pephery
Can Layr I
-C
Layer 2
-- C*d Layer 3
400350
300
250
L 200
150
100
50
0
0.02
0.04
0.08
0.06
Moan Nanog hitensity
0.1
0.12
0.14
Figure 6.5: Histogram distribution of mean Nanog intensity of cells on periphery (green), cell layer 1 (yellow), cell layer 2 (cyan), and cell layer 3
(black) of the colonies. Adapted from Gorman et al. [1]
cells were enriched; and within the first cell layer from the edge, where G2/M
cells were enriched. We then gated on the peripheral population with the
GI peak (Gate Periphery), and the populations within a distance of 1-3 cell
diameters (Cell Layers 1-3). Figure 6.4A showed that these gates correctly
classified cells into the desired subpopulations. The peripheral subpopulation
displayed a substantially enriched G1 peak (Figure 6.4B) as well as substantially enriched Nanog negative levels of nearly 63% (Figure 6.5 and Table
6.1). In Cell Layer 1, we found a significant enrichment of the G2 population
compared to Cell Layers 2-3. However, M phase cells were not enriched in
Cell Layer 1, possibly suggesting a G2/M block. Nearly 100% of cells were
Nanog positive in Cell Layers 1-3.
To check whether this was primarily a phenomenon of cells belong to
65
Periphery Nanog Neg.
Nanog Pos.
Total
Layer 1
Nanog Neg.
Nanog Pos.
Layer 2
Layer 3
Total
Nanog
Nanog
Total
Nanog
Nanog
Total
Neg.
Pos.
Neg.
Pos.
# cells
1867
1421
3288
13
4764
4777
6
4374
4380
0
4456
4456
percent
56.78%
43.22%
0.27%
99.73%
0.14%
99.86%
0.00%
100.00%
Table 6.1: Differential spatial distribution of Nanog positive and negative
cells in hES colonies.
colonies of a certain size, we divided cells into groups on the basis of they
belonged to a small, medium, or large-sized colony. We then similarly assessed the cell cycle percentages of those different colony types in each of the
identified Cell Layers. We found that cells of each colony size were enriched
in S-phase in Cell Layer 1 relative to Cell Layers 2 and 3, though the enrichment was slightly stronger in small colonies (Figure 6.6), probably a result
of rapid proliferation in particular for small colonies. The enrichment of G2
phase cells in cell layer 1 is similarly present in all small, medium, and large
colonies. This analysis demonstrated the utility of the system in analyzing
inter-colony heterogeneity.
Further understanding of the biological meaning of the enrichment of G2
phase cells at colony edge would require use of other methodologies, and
measurement across different hES cell lines and culture conditions. However,
66
A
90
Histogram of cellular distance from edge
-
-6-
8000
-'-
Ceab in Smel Colonis (N = 12009)
Cel in Medium Colnies (N = 59612)
Cells in Lwpg Colonis (N =69662)
7000
S6000
04000
3000
2000
10
Distance from edge (um)
B
G1
G1S
MG2
OA6 -
-
CL 0.5
0.5
e04
C
0
02-
U0
small
med.
large
Cell Layer 1
small
med.
large
Cell Layer 2
small med.
large
Cell Layer 3
Figure 6.6: Inter-colony heterogeneity of cell cycle distribution. (A) Histogram of cellular distance from edge in cells belonging to differently-sized
colonies. The maximum cellular distance from the edge of the colony to
the center was used to separate colonies into small (< 150 pm), medium
(150 - 300 pm) and large (> 300 pm) sizes. (B) Distributions of cell cycle
phases for each cell layer in small, medium, and large-sized colonies. The
inter-colony heterogeneity is not significant except for a slight enrichment in
S-phase cells in Cell Layer 1 of small colonies. Error bars represent 95% of
bootstrap samples. Adapted from Gorman et al. [1]
67
it is a useful demonstration of the applicability of the tool we have described
herein. Cellular decision-making is inherently stochastic and influenced by
the cellular micro-environment, necessitating a high-throughput system capable of measuring and abstracting those relationships.
6.2
Monitoring differentiation progress through
texture analysis
The work of Mangoubi et al [52] demonstrated that texture analysis could be
used to distinguish between different differentiated tissue types and different
stages of differentiation, but we were interested in extending that work to see
if changes in texture could be distinguishable at at much finer gradations in
time scale (hours vs. days), such that the progress of a differentiation protocol
could be monitored noninvasively, correlating macro-scale behaviors of cells
in culture with their underlying physiological state.
We differentiated Hi hES cells with retinoic acid-based culture media.
Cells were imaged at lOx magnification with phase contrast microscopy over
seven 1-hour intervals. An example of one such colony changing in morphology over time is illustrated in Figure 6.7. A mask segmenting the colony
was determined at each time point. A grid was then arbitrarily imposed
over the colony space, and only boxes within that space were analyzed as
described in Section 5.2. Figure 6.8 contains the pairwise Kullback-Liebler
Divergence (KLD) values between all patches at every time point. The di68
Figure 6.7: Quantifying colony morphology change upon addition of retinoic
acid to culture media. A square grid is superimposed onto colonies, dividing them into boxes, which are then processed. All boxes not completely
contained within the colony borders are excluded.
agonal (upper-left to bottom-right) contains pairwise comparisons between
patches within the same time point, and the off-diagonal contains pairwise
comparisons at different time points. The progression of textures becoming
more dissimilar over time is demonstrated with generally higher KLD values
in the off-diagonals.
We then defined a series of distributions, fxy, which is the set of pairwise KLD values between all patches in time points X and Y. Figure 6.9A
plots the series of distributions relative to time point 0, foo through fo7, and
demonstrates that the distribution shifts to increasing median KLD and variance over time, values of which are tabulated in Figure 6.9B. Figure 6.9C
shows the results of Kolmogorov-Smirnov hypothesis tests, demonstrating
that the fox distributions are significantly different from one another at the
p = .001 significance threshold.
69
tO ti
t2
t3
t8
t5
t4
t7
to
t1
1
100
t3 wo
t4
,0
t5
6a
o
OA
G7
02
1100
11M
200
10
AMP
00
600
7W0
Soo
300
MW0
I100
Figure 6.8: Pairwise Kullback-Liebler Divergence (KLD) values between all
patches at every time point, 0-7 hours of differentiation.
70
A
".7
f : time 0 vs. time 0
-VI2
05
ID7
0.3
time 0
0.2f:
vs. time 7
Kulbac-L&Ibw Divgence
B
fOO
fOl
f02
f03
f04
f05
f06
f07
Median IVarian ce
1 .12
0.90
1 .48
1.20
N
1540
4760
1.25
2 .61
5824
1.24
1.
1.5C
1.7d
1.61
2 .30
3 .25
5 .09
5 .05
4 .07
7728
8568
10248
12040
14168
%ed:St*.of cUs~ilcuA. p' 001A
Figure 6.9: Comparing distributions of KLD values. (A) Distributions of
time 0 vs. time x KLD values. (B) Table of median and variance values
for the distributions of KLD values. (C) Pairwise hypothesis tests between
distributions demonstrate significant differences at the p = .001 threshold.
71
6.3
Regional heterogeneity in cellular response
to DNA damage
Liu et al. previously reported that cellular differentiation state has a profound impact on DNA damage regulation [62].
Undifferentiated cells are
highly sensitive to DNA damage, and will undergo apoptosis more readily.
Differentiated cells, by contrast, are less sensitive to DNA damage. We hypothesized that colony-level heterogeneity may play a role in determining cell
fate in response to DNA damage induction. To test this, we differentiated
cells with retinoic acid (RA) for 0, 2 or 4 days. As expected, cells treated
with RA exhibited the anticipated reduction in Oct4 (a pluripotency regulatory factor) and increased proportion of cells in GI phase (Figure 6.10).
After RA differentiation, wells were treated with neocarzinostatin (NCS),
a radiomimetic drug that induces DNA double-strand breaks, for 0, 1, or 2
hours, and stained with cPARP antibodies to detect cells' apoptotic response.
Looking at the colony-level behavior of the cells within a single condition (after 1 day of RA differentiation), we found that instead of cells in all
colonies responding in a uniform fashion to the external signals, there were
clear regional behaviors from one colony to another and sometimes different patches within the same colony. Figure 6.11A-6.11C is an example of
one such colony we observed. This colony was a mosaic of undifferentiated
and differentiated patches. Moreover, the undifferentiated patches were positive for Oct4 and more sensitive to DNA damage (had greater expression
72
A
Loss of Oct4 after RA diff
354
Change in cell cycle after RA
1085
472
0
B
Black: Od RA
Light Gray: 1d RA
Dark Gray: 2d RA
Black: Od RA
Light Gray: 1d RA
Dark Gray: 2d RA
814
236
543
118
271
DAPI (au)
Oct4 (au)
Figure 6.10: Phenotypic changes of hES cells during retinoic acid-induced
differentiation.
Oct4 level goes down (A) and more cells are in Gi-phase
(B). Adapted from Gorman et al. [1]
of cPARP), while the differentiated patches were negative for Oct4 and less
sensitive to DNA damage (lower cPARP). In addition, we observed that the
differentiated patches generally had a higher cellular density (possibly early
stages of embryoid body formation), suggesting that gating cells according
to cell density might reveal regional differences in expression of cPARP.
The cell density classifier from our image analysis tool did a reasonable
job at separating cells into subpopulations.
The high-density cell regions
had a lower Oct4 distribution than the low-density cell regions (Figure 6.12),
and the intermediate, or mixed subpopulation had Oct4 enrichment in between the two. Furthermore, the cPARP distribution was plotted and the
cPARP positive subpopulation was gated (Figure 6.13). As expected, the
lower density (undifferentiated subpopulations) had a greater proportion of
73
Figure 6.11: Regional heterogeneity in response to DNA damage. (A-C)
In order to investigate regional heterogeneity, colonies of differentiated and
NCS treated cells was computationally divided into windows which were then
classified according to the density of cells contained. As demonstrated, high
cell density regions (purple) tend to hqv* low Oct4 (B), and reduced induction
of cPARP (C) upon DNA damage, than low (blue) and mixed (red) density
regions. Adapted from Gorman et al. [1]
Oct4 distribution
2281
-
171
-
0
C
114
--
High density
Mixe d density
Low density
Total
S7
0i
0.02M
0.0575
0.0863
0.115
Oct4 (au)
Figure 6.12: Histogram of Oct4 distribution of the cells in the above classified
subregions. Black line shows the distribution of all cells. Adapted from
Gorman et al. [1]
cPARP+ cells (5%) than the mixed (2%) and higher density subpopulation
(1%). Taken together, these results again underscore the utility of our tool
to employ regional analysis of single-cell measurements that can reveal new
physiological relationships.
6.4
High resolution quantitative analysis of
early endoderm differentiation
With the equalized spot detection method described in Section 5.4, we were
able to quantitate the FISH spots in each cell with high confidence, and follow
the dynamic down-regulation of Oct4 levels during differentiation of hES cells
(Figure 6.14). This approach, which allows tiling of individual mRNA signals
75
100%
-
80%
-
Proportion of cells
Oct4+
B
6%
-
A
Proportion of cells
cPARP+
-
40%
-
-
4%
60%
0%
0%
-
Low
Mixed
High
I
Low Mixed
.
20%
-
2%
High
Figure 6.13: Single-cellular levels of Oct4 and cPARP in low density, mixed
density, and high density subpopulations. (A) Proportion of Oct4 positive
cells in each subpopulation. (B) Proportion of cPARP positive cells. Adapted
from Gorman et al. [1]
76
1400
n
C.
0 700
0
500
Soo
CL
I
250
0
Day 0
Day 1
Day 2
Figure 6.14: Reconstruction of Oct4 mRNA levels across 6 contiguous fields
of view using the spot matching procedure detailed in the main text. Each
circle represents an individual cell and circle color denotes the expression
level of Oct4. Overall, Oct4 levels diminish through differentiation, although
small clusters of cells with elevated Oct4 levels can be observed after 2 days
of differentiation. Adapted from Gorman et al. [1]
over multiple fields of view, allows large-scale analysis of heterogeneity in
Oct4 levels of individual stem cells.
In order to better understand the dynamics of differentiation heterogeneity, we conducted an experiment where we delayed the induction of early endoderm differentiation by 12 hours in one sample and measured the changes
in Oct4 and Sox17 transcripts over time, relative to the control sample, which
followed the standard protocol. We observed a 'drift' in Oct4 levels on Day
1 in the delayed induction, compared to the immediate induction (Figure
6.15). The delayed induction corresponded with poorer differentiation, i.e.,
77
Immediate
0.
10
*
0
Delayed
10
Day 0
Day I
10
-
C.
2
S
4
0
10
E
0.6
Day 2
U.
z
S
*
.
CL
, :.
'go
101
10
10
1012
10
10
10
10
Number of SOX17 transcripts
Figure 6.15: Sox17 vs Oct4 transcript counts in immediate vs. delayed early
endoderm differentiation. The delayed induction corresponded with poorer
differentiation (more heterogeneity in Sox17).
more heterogeneity in Sox17 at Day 2. These preliminary data may hint
that the change in Oct4 level measured in the delayed sample causes bias in
lineage selection, a hypothesis we are currently working on testing further.
78
Chapter 7
Conclusion
7.1
Discussion
We present a bioimage informatics platform designed to rapidly assay single pluripotent stem cell behavior in the context of their colony microenvironment. Other individuals in our lab with varying computer backgrounds
were able to use the software successfully. The platform enables integrated
imaging and analysis of pluripotent stem cell colonies and their constituent
cells through manual or automated means. An integrated and portable data
pipeline measures single-cell properties and places them within a hierarchical
data structure corresponding to different scales of cellular micro-environment,
in particular their spatial properties. Unlike most available commercial systems, where images are first stitched into a single large image before processed
- drastically increasing the computational and memory requirements - our
79
system creates consensus segmentation label matrices at the analysis stage,
after applying segmentation algorithms to the individual fields.
Subsequently, we demonstrated the utility of this platform to analyze
both inter- and intra-colony heterogeneity of human pluripotent stem cells.
We found a link between cellular location within the colony and cell cycle
regulation and pluripotency marker expression, which potentially signifies a
novel finding that cells on the outer edge of the colony have slightly greater
mitotic activity and contribute relatively more to colony growth. Additionally, we looked at regional heterogeneity of colonies varying by local cell
density. When we separated subpopulations of cells belonging to different
regional density classes, we found a stronger correlation between differentiation and DNA damage response than we had observed by simply aggregating all cells in the population. Discovery of these relationships would not
be possible without in situ imaging and analysis of many colonies simultaneously. With the ability to precisely quantify single-cell gene expression across
many colonies, this approach may be used to discover subtle relationships between cells that explain the observed heterogeneity. For example, non-cell
autonomous signaling between cells may generate certain expression patterns
and spatial configurations within and between colonies. A better understanding of these factors will help guide perturbation strategies to control directed
differentiation of pluripotent cells.
80
7.2
Suggestions for future work
Below, we detail specific suggestions for future work improving upon and
extending the methods developed in this study.
7.2.1
Improving and streamlining program operation
The program source code is freely available to the public online at https:
//github. com/bgorm/lerou-stemcell and will be incrementally improved
over time. However, there are a number of aspects where the software can
still be improved in usability and portability. First, it runs on MATLAB and
thus requires a MATLAB installation on the microscope computer. Given
the expense of MATLAB licenses and the general difficulty that MATLAB
poses to non-technical users, enabling the program to run independently
of MATLAB by compiling in into an executable binary would potentially
increase access to the software. Second, the imaging and analysis steps could
be further coupled together, and a better interface for using the analysis
portion could be devised. Finally, adding additional experimental metadata
to the images and image folders, and indexing that metadata to enable largerscale experimental comparisons would be useful.
7.2.2
Extension to other tissue types
We have created a tool for advanced tissue analysis that enables one to
accurately measure cellular relationships over multiple length scales and res81
olutions of tissue morphology. In situ approaches are needed because they
do not destroy the non-cell autonomous context of individual cells. This tool
could be further applied to analyze other types of heterogeneous tissue, e.g.
cancer cell aggregates. Further applications include analysis of heterogeneity
at the transcriptional level and protein level during cell fate changes, and
using data collected from live-cell imaging prior to fixation - thus establishing a framework for generation of high-quality data for modeling cellular
decision-making in heterogeneous tissues.
7.2.3
Modeling hES colony formation
Human embryonic stem cell colonies generally grow radially outward in a
flat monolayer.
However, in larger, more advanced colonies, cells in the
center begin to grow on top of each other.
Presently, it is unclear what
factors account for the unique growth pattern of hES cell colonies, and to
what extent this patterning is related to heterogeneity. Those factors may
include migration of cells from the center outward and variable mitosis rates
of heterogenous regions. Using data derived from our tool, combined with
other software developed for tracking live cells with fluorescent reporters, the
cell motility forces that interact in the growth of colonies can be measured
and quantified.
In order to aid individual cell tracking within a dense embryonic stem cell
colony, a dilution of reporter-positive to reporter-negative cells could possibly
be used. Using H2B-GFP or FUCCI cell lines [63] one can track individual
82
cell trajectories over time, within a growing colony and recording the timing
of cell divisions and the identity of the progeny. Mitoses can be detected
manually, or using machine-learning-based classification methods described
in the literature [64]. Additionally, texture-based level-set segmentation algorithms, such as those developed by Lowry and Mangoubi [49], can be usefully
applied to segment the colony area from the background, a generally difficult
task.
This data could then be used to investigate mechanics underlying colony
formation by generating an increasingly complex series of models, and by
comparing them to live-cell data obtained using the methods described above.
The simplest model may involve only cellular division and cellular diffusion,
based on the assumption that cells are uniformly dividing a with random
distribution of division times based on an empirically-determined average of
16 hours per cell division, and that cells are randomly walking throughout
the colony at an empirically determined diffusion rate. An error function
based on differences in size and spatial density between measured and modeled colony growth can be integrated over the area of the colony to define
a goodness-of-fit. Subsequently, cellular motility factors that assume intercellular forces are greatest when the colony is most dense, and that these
intercellular forces drive cellular migration in a non-random direction, may
be introduced. Then, mitosis rates that vary as a function of cellular density
may be considered. The goodness-of-fit obtained from the additional mechanism would then be compared to the simpler model, to determine whether
83
the data is explained substantially better. Should explanatory value significantly increase, it would imply that cells experiencing the highest density
are stimulated to express a molecular marker inhibiting cell cycle division,
potentially driving them to an alternate cell fate.
7.2.4
Modeling in vitro differentiation patterning
We have observed that early-stage definitive endoderm (DE) differentiation
of hES cells results in a pattern formation with pockets of Oct4-positive (presumably undifferentiated) cells surrounded by differentiated cells. Though we
hypothesize that this phenomenon may be due to a acquired resistance to
differentiation that is then inherited by cellular progeny, it is not currently
understood how this patterning evolves in culture. In order to elucidate the
mechanisms underlying this patterning, modeling may be necessary.
One
possible modeling methodology would discretize time and/or space, such that
single cells are associated with a specific location in space and have properties such as their position in the cell cycle, and each an associated probability
of differentiating, dividing, and moving from one location to another within
a certain time interval. The initial state of a population would be derived
from imaging a culture based on a defined starting condition. Such modeling
would potentially help confirm a mechanistic explanation for the observed
spatial configurations. Our method can then be extended to tracking livecell reporters pre-fixation and linking that cellular data to the post-fixation
smFISH.
84
Furthermore, it is possible that patterning cells with constrained geometry or volume may drive cells to a particular fate. Previous efforts have constrained geometry such that cells are not allowed to expand outwards [32].
It would be interesting to see how cells respond to being intially placed in
geometries but then being allowed to expand. Finally, microfluidic gradient
generators
[65]
could also be used to investigate the impact of limited vol-
ume on cellular differentiation, with and without flow through, and with and
without gradients of specific factors.
85
Bibliography
[1] Gorman, B. R. et al. Multi-scale imaging and informatics pipeline for
in situ pluripotent stem cell analysis. PloS One 9, e116037 (2014).
[2] Thomson, J. A. et al. Embryonic stem cell lines derived from human
blastocysts. Science (New York, N. Y.) 282, 1145-1147 (1998).
[3] Daley, G.
Q.
Stem cells: roadmap to the clinic. The Journal of Clinical
Investigation 120, 8-10 (2010).
[4] Takahashi, K. & Yamanaka, S. Induction of pluripotent stem cells from
mouse embryonic and adult fibroblast cultures by defined factors. Cell
126, 663-676 (2006).
[5] Takahashi, K. et al.
Induction of pluripotent stem cells from adult
human fibroblasts by defined factors. Cell 131, 861-872 (2007).
[6] Lerou, P. H. & Daley, G.
Q.
Therapeutic potential of embryonic stem
cells. Blood Reviews 19, 321-331 (2005).
86
[7]
Masaki, H. et al. Heterogeneity of pluripotent marker gene expression
in colonies generated in human iPS cell induction culture. Stem Cell
Research 1, 105-115 (2007).
URL http://www.ncbi.nlm.nih.gov/
pubmed/19383391.
[8]
Osorno, R. & Chambers, I. Transcription factor heterogeneity and epiblast pluripotency. Philosophical Transactions of the Royal Society of
London. Series B, Biological Sciences 366, 2230-2237 (2011).
URL
http://www.ncbi.nlm.nih.gov/pubmed/21727128.
[9] Martinez Arias, A. & Brickman, J. M. Gene expression heterogeneities in
embryonic stem cell populations: origin and function. Current Opinion
in Cell Biology (2011).
[10] Graf, T. & Stadtfeld, M. Heterogeneity of embryonic and adult stem
cells. Cell Stem Cell 3, 480-483 (2008).
[11] Drukker, M. et al. Isolation of primitive endoderm, mesoderm, vascular
endothelial and trophoblast progenitors from human pluripotent stem
cells. Nature biotechnology (2012).
[12] Huang, S. Cell lineage determination in state space: a systems view
brings flexibility to dogmatic canonical rules. PLoS Biology 8, e1000380
(2010). URL http: //www.ncbi.nim.nih.gov/pubmed/20520792.
87
[13] Underhill, G. H. & Bhatia, S. N. High-throughput analysis of signals
regulating stem cell fate and function.
Current Opinion in Chemical
Biology 11, 357-366 (2007).
[14] Wong, R. C., Pera, M. F. & Pebay, A. Role of gap junctions in embryonic
and somatic stem cells. Stem Cell Rev 4, 283-92 (2008).
[15] Blagovic, K., Kim, L. Y. & Voldman, J. Microfluidic perfusion for
regulating diffusible signaling in stem cells. PloS One 6 (2011).
[16] Wang, X., Lin, G., Martins-Taylor, K., Zeng, H. & Xu, R. H. Inhibition
of caspase-mediated anoikis is critical for basic fibroblast growth factorsustained culture of human pluripotent stem cells. J Biol Chem 284,
34054-64 (2009).
[17] Stewart, M. H., Bendall, S. C. & Bhatia, M. Deconstructing human embryonic stem cell cultures: niche regulation of self-renewal and pluripotency. Journal of Molecular Medicine (Berlin, Germany) 86, 875-886
(2008). URL http: //www.ncbi.n1m.nih.gov/pubmed/18521556.
[18] Bauwens, C. L. et al. Control of human embryonic stem cell colony and
aggregate size heterogeneity influences differentiation trajectories. Stem
Cells (Dayton, Ohio) 26, 2300-2310 (2008). URL http://www.ncbi.
nlm.nih.gov/pubmed/18583540.
[19] Moogk, D., Stewart, M., Gamble, D., Bhatia, M. & Jervis, E. Human
esc colony formation is dependent on interplay between self-renewing
88
hescs and unique precursors responsible for niche generation. Cytometry
A 77, 321-7 (2010).
Moogk, Duane Stewart, Morag Gamble, Darik
Bhatia, Mickie Jervis, Eric eng 2010/03/11 06:00 Cytometry A. 2010
Apr;77(4):321-7. doi: 10.1002/cyto.a.20878.
[20] Snijder, B. et al. Population context determines cell-to-cell variability
in endocytosis and virus infection. Nature 461, 520-523 (2009).
[21] Snijder, B. et al. Single-cell analysis of population context advances rnai
screening at multiple levels. Mol Syst Biol 8, 579 (2012).
[22] Wagers, A. J. The stem cell niche in regenerative medicine. Cell stem cell
10, 362-369 (2012).
URL http://www.ncbi.nlm.nih.gov/pubmed/
22482502.
[23] Giobbe, G. G. et al.
Confined 3d microenvironment regulates early
differentiation in human pluripotent stem cells.
Biotechnology and
bioengineering (2012). URL http://www.ncbi.nlm.nih.gov/pubmed/
22674472.
[24] Lahav, G. et al. Dynamics of the p53-mdm2 feedback loop in individual
cells. Nature Genetics 36, 147-150 (2004). URL http://www.ncbi.
nlm.nih.gov/pubmed/14730303.
[25] Sakaue-Sawano, A. et al. Visualizing spatiotemporal dynamics of multicellular cell-cycle progression. Cell 132, 487-98 (2008).
89
[26] Li, K. et al. Cell population tracking and lineage construction with
spatiotemporal context. Medical Image Analysis 12, 546-566 (2008).
[27] Padfield, D., Rittscher, J., Thomas, N. & Roysam, B. Spatio-temporal
cell cycle phase analysis using level sets and fast marching methods.
Medical Image Analysis 13, 143-155 (2009).
[28] Kafri, R. et al. Dynamics extracted from fixed cells reveal feedback
linking cell growth to cell cycle. Nature 494, 480-3 (2013).
[29] Paduano, V., Tagliaferri, D., Falco, G. & Ceccarelli, M. Automated
identification and location analysis of marked stem cells colonies in optical microscopy images. PLoS One 8, e80776 (2013).
[30] Scherf, N. et al. Imaging, quantification and visualization of spatiotemporal patterning in mesc colonies under different culture conditions.
Bioinformatics 28, i556-i561 (2012).
[31] Nazareth, E. J. P. et al.
High-throughput fingerprinting of human
pluripotent stem cell fate responses and lineage bias. Nature methods
10, 1225-1231 (2013).
[32] Warmflash, A., Sorre, B., Etoc, F., Siggia, E. D. & Brivanlou, A. H.
A method to recapitulate early embryonic spatial patterning in human
embryonic stem cells. Nature Methods 11, 847-854 (2014). URL http:
//www. nature. com/nmeth/journal/vll/n8/full/nmeth. 3016 .html.
90
[33] Loh, K. et al. Efficient endoderm induction from human pluripotent
stem cells by logically directing signals controlling lineage bifurcations.
Cell Stem Cell 14, 237-252 (2014). URL http: //www. sciencedirect.
com/science/article/pii/S1934590913005560.
[34] Hong, S.-H. et al. Cell fate potential of human pluripotent stem cells is
encoded by histone modifications. Cell Stem Cell 9, 24-36 (2011).
[35] Huang, S. Non-genetic heterogeneity of cells in development: more than
just noise. Development (Cambridge, England) 136, 3853-3862 (2009).
[36] Reynolds, N. et al.
Nurd suppresses pluripotency gene expression
to promote transcriptional heterogeneity and lineage commitment.
Cell Stem Cell 10, 583-94 (2012).
Reynolds, Nicola Latos, Paulina
Hynes-Allen, Antony Loos, Remco Leaford, Donna O'Shaughnessy,
Aoife Mosaku, Olukunbi Signolet, Jason Brennecke, Philip Kalkan,
Tuzer Costello,
Ita Humphreys,
Peter Mansfield, William Naka-
gawa, Kentaro Strouboulis, John Behrens, Axel Bertone, Paul Hendrich, Brian eng 079249/Wellcome Trust/United Kingdom 098021/Wellcome Trust/United
Kingdom G0600275/Medical
Research
Coun-
cil/United Kingdom G0800784/Medical Research Council/United Kingdom G1000847/Medical Research Council/United Kingdom Biotechnology and Biological Sciences Research Council/United Kingdom Cancer Research UK/United Kingdom Medical Research Council/United
Kingdom Wellcome Trust/United Kingdom Research Support, Non-U.S.
91
Gov't 2012/05/09 06:00 Cell Stem Cell. 2012 May 4;10(5):583-94. doi:
10. 1016/j.stem.2012.02.020.
[37] Allegrucci, C. & Young, L. E. Differences between human embryonic
stem cell lines. Human Reproduction Update 13, 103-120 (2007). URL
http://www.ncbi.nlm.nih.gov/pubmed/16936306.
[38] Bock, C. et al. Reference maps of human ES and iPS cell variation
enable high-throughput characterization of pluripotent cell lines. Cell
144, 439-452 (2011). URL http://www.ncbi.nlm.nih.gov/pubmed/
21295703.
[39] Momeilovid, 0., Navara, C. & Schatten, G. Cell cycle adaptations and
maintenance of genomic integrity in embryonic stem cells and induced
pluripotent stem cells. Results and Problems in Cell Differentiation 53,
415-458 (2011).
[40] Filipczyk, A. A., Laslett, A. L., Mummery, C. & Pera, M. F. Differentiation is coupled to changes in the cell cycle regulatory apparatus of
human embryonic stem cells. Stem Cell Res 1, 45-60 (2007).
[41] Ballabeni, A. et al.
Cell cycle adaptations of embryonic stem cells.
Proceedings of the National Academy of Sciences of the United States of
America 108, 19252-19257 (2011).
92
[42] Li, V. C., Ballabeni, A. & Kirschner, M. W. Gap 1 phase length and
mouse embryonic stem cell self-renewal.
Proceedings of the National
Academy of Sciences of the United States of America (2012).
[43] Calder, A. et al. Lengthened gi phase indicates differentiation status in
human embryonic stem cells. Stem Cells and Development (2012).
[44] Jin,
Q.,
Duggan, R., Dasa, S. S. K., Li, F. & Chen, L. Random mitotic
activities across human embryonic stem cell colonies. Stem Cells and
Development 19, 1241-1248 (2010).
[45] Carpenter, A. E. et al. Cellprofiler: image analysis software for identifying and quantifying cell phenotypes. Genome Biology 7 (2006).
[46] Kamentsky, L. et al. Improved structure, function, and compatibility for
cellprofiler: modular high-throughput image analysis software. Bioinformatics (Oxford, England) (2011).
&
[47] Raj, A., van den Bogaard, P., Rifkin, S. A., van Oudenaarden, A.
Tyagi, S. Imaging individual mrna molecules using multiple singly labeled probes. Nat Methods 5, 877-9 (2008).
[48] Otsu, N. A threshold selection method from gray-level histograms. IEEE
Transactions on Systems, Man and Cybernetics 9, 62-66 (1979).
[49] Lowry, N. C.
Bayesian level sets and texture models for image seg-
rnentation and classification with application to non-invasive stem cell
93
monitoring.
Thesis, Massachusetts Institute of Technology (2013).
URL http://dspace.mit.edu/handle/1721.1/82500.
Thesis (Sc.
D.)-Massachusetts Institute of Technology, Dept. of Aeronautics and
Astronautics, 2013.
[50]
Larson, R. C. & Odoni, A. R. Urban OperationsResearch (Prentice-Hall,
Englewood Cliffs, N.J., 1981).
[51] Jeffreys, C. G. Support vector machine and parametric wavelet-based
texture classification of stem cell images. Thesis (s.m.), Massachusetts
Institute of Technology (2004).
[52] Mangoubi, R., Jeffreys, C., Copeland, A., Desai, M. & Sammak, P.
Non-invasive image based support vector machine classification of human embryonic stem cells.
In 2007 4th IEEE International Sympo-
sium on Biomedical Imaging: From Nano to Macro, 284-287 (Arlington, VA, USA, 2007).
URL http://ieeexplore.ieee.org/lpdocs/
epic03/wrapper . htm?arnumber=4193278.
[53] Sammak, P. J. et al. High content analysis of human embryonic stem
cell growth and differentiation.
In High content screening: science,
techniques and applications (John Wiley & Sons, Inc., 2007).
URL
http://dx.doi.org/10.1002/9780470229866.ch9.
[54] Mangoubi, R., Desai, M., Lowry, N. & Sammak, P. Performance evaluation of multiresolution texture analysis of stem cell chromatin. In 2008
94
5th IEEE InternationalSymposium on Biomedical Imaging: From Nano
to Macro, 380-383 (Paris, France, 2008).
URL http: //ieeexplore.
ieee.org/lpdocs/epic03/wrapper.htm?arnumber=4541012.
[55] Mangoubi,
R.,
Desai,
M.
& Sammak,
ods in biomedical imaging.
In
P.
Non-gaussian meth-
2008 37th IEEE Applied Im-
agery Pattern Recognition Workshop, 1-6 (Washington, DC, USA,
2008). URL http: //ieeexplore. ieee. org/lpdocs/epic3/wrapper.
htm?arnumber=4906453.
[56] Erb, T. M. et al. Paracrine and epigenetic control of trophectoderm
differentiation from human embryonic stem cells: the role of bone morphogenic protein 4 and histone deacetylases.
Stem Cells and Devel-
opment 20, 1601-1614 (2011). URL http://www.ncbi.nlm.nih.gov/
pubmed/21204619.
[57] Lowry, N., Mangoubi, R., Desai, M., Marzouk, Y. & Sammak, P. A
unified approach to expectation-maximization and level set segmentation applied to stem cell and brain MRI images.
1446-1450 (IEEE,
2011). URL http: //ieeexplore. ieee. org/lpdocs/epic03/wrapper.
htm?arnumber=5872672.
[58] Do, M. N. & Vetterli, M. Wavelet-based texture retrieval using generalized gaussian density and kullback-leibler distance. IEEE Transactions
on Image Processing 11, 146-158 (2002).
95
[59] Coleman, A. W., Maguire, M. J. & Coleman, J. R. Mithramycin- and 4'6-diamidino-2-phenylindole (DAPI)-DNA staining for fluorescence microspectrophotometric measurement of DNA in nuclei, plastids, and
virus particles. Journal of Histochemistry & Cytochemistry 29, 959968 (1981). URL http: //jhc. sagepub. com/content/29/8/959.
[60] Tarnowski, B. I. et al. Automatic quantitation of cell growth and determination of mitotic index using DAPI nuclear staining.
Pediatric
Pathology / Affiliated with the International Paediatric Pathology Association 13, 249-265 (1993).
URL http://www.ncbi. nlm.nih. gov/
pubmed/8385326.
[61] Maciorowski, Z. et al. Comparison of fixation procedures for fluorescent
quantitation of dna content using image cytometry. Cytometry 28, 1239 (1997).
[62] Liu, J. C. et al. High mitochondrial priming sensitizes hESCs to DNAdamage-induced apoptosis.
Cell Stem Cell 13, 483-491 (2013). URL
http://www.cell.com/article/S1934590913003639/abstract.
[63] Sakaue-Sawano,
A.
et al.
Visualizing spatiotemporal dynamics
of multicellular cell-cycle progression.
URL
Cell 132, 487-98 (2008).
file: //localhost/Users/paullerou/Document s/Papers/
2008/Sakaue-Sawano/Cell/Cell%202008%2OSakaue-Sawano. pdf.
18267078; S0092-8674(08)00054-8.
96
[64] Conrad, C. et al. Automatic identification of subcellular phenotypes
on human cell arrays. Genome Research 14, 1130-1136 (2004). URL
http://www.ncbi.nlm.nih.gov/pubmed/15173118.
[65] Gorman, B. R. & Wikswo, J. P. Characterization of transport in microfluidic gradient generators. Microfluidics and Nanofluidics 4, 273285 (2008).
URL http://www.springerlink.com/index/10. 1007/
s10404-007-0169-0.
97
Appendix A
Wet lab protocols
The following experimental protocols used to generate the samples analyzed
in this work.
A.1
Cell culture
The hESCs were maintained on irradiated CF-1 MEFs (GlobalStein) in
DMEM/F12 supplemented with 20% KnockOut Serum Replacement (Gibco),
0.1mM 2-mercaptoethanol (Gibco), 13 GlutaMAX (Gibco), non-essential
amino acids (Gibco), Beta-mercaptoethanol, and 10 ng/ml FGF2 (Sigma), or
in feeder-free conditions on Matrigel in mTeSR medium with 5X supplement
(Stem Cell Technologies). Human iPSCs were passaged using 50 U/mL type
IV collagenase (Gibco) (for feeder culture) or dispase (for feeder-free culture)
approximately every 5-7 days. For imaging experiments, cells were typically
98
plated on Matrigel-coated glass bottom culture dishes (MatTek).
A.2
Immunofluorescent protein labeling
Cells were rinsed in PBS and fixed for 20 minutes at room temperature
in fresh 4% paraformaldehyde (Electron Microscopy Sciences). They were
blocked/permeablized for 1.5 hours in a combined fashion in PBS containing
3% BSA and 0.2% Triton X-100. Primary antibodies for Oct4 (Abcam) and
Nanog (R&D) were diluted at 1:400 (Oct4) and 1:100 (Nanog) in the same
BSA/triton buffer, and applied to the cells overnight at 4*C. We found that
placing the dish on a shaker resulted in the most even antibody distribution. The next day, cells were rinsed 3x in PBS and stained for 1.25 hours at
room temperature with fluorescent secondary antibodies (Life Technologies)
diluted 1:1000 in BSA buffer. They were stained with DAPI (Life Technologies), rinsed, and slide-mounted (if coverslip format was used) using ProLong
Gold anti-fade mounting medium (Life Technologies). If cells were grown and
stained in a dish, they were covered in lmL PBS during imaging.
A.3
Single molecule FISH (smFISH) labeling
of gene expression
Custom Stellaris mRNA probes were obtained from Biosearch Technologies.
The dried oligonucleotide blend was dissolved in 200 pL of TE buffer at
99
pH 8.0 to create a probe stock at a total oligo concentration of 25 11M.
Additional working dilutions of the Oct4-Cy5 and Sox17-Cy3 probes were
prepared by diluting 1:1 in TE buffer. This stock was diluted 1:100 in 37*C
pre-warmed hybridization buffer (10% formamide/10% dextran sulfate in
2xSSC made with nuclease-free water) and added to samples grown on glass
coverslips or glass-bottom dishes, that had been previously fixed with fresh
4% paraformaldehyde 10 minutes at room temperature, followed by overnight
(minimum) permeablization in cold 75% ethanol at 4"C. A brief (2-5 minute)
wash step in 10% formamide/2x SSC buffer was carried out directly prior
to adding the probe-containing hybridization solution. Dishes were wrapped
tightly in parafilm, blocked from light, and incubated at 37"C for 4 hours.
Hybridization buffer was rinsed out in a series of wash steps with the formamide wash buffer (first time quick, second time 30 minutes, third time 45
minutes), all at 37"C. Solution changes were carried out on a metal heat block
warmed to 37"C to improve stringency. Finally, the samples were counterstained with DAPI (5 ng/mL in formamide wash buffer) for 12 minutes in
the dark and rinsed twice with 2xSSC. They were stored in 2xSSC at 4"C
overnight to help remove fluorescent background; then rinsed once in distilled water to prevent salt crystals before slide-mounting (coverslips only)
on a small drop of pre-warmed Prolong Gold anti-fade mounting medium
(Life Technologies). Dishes, alternatively, were rinsed and imaged after the
addition of approximately 200 pL GLOX anti-fade buffer: 10% glucose in
water, 1 M Tris-HCl, pH 8.0, 20X SSC, nuclease-free water, glucose oxidase
100
pre-diluted to 3.7 mg/mL stock (Sigma), and Bovine Catalase (Sigma).
101
Appendix B
Cellular features list
The following table lists the cellular, neighborhood, and colony-based features generated by the tool.
102
Table B.1: Cellular and colony features derived by the
data workflow discussed in Chapter 4.
Category
Feature name
Description
Colony Location
ColonyLocationX
ColonyLocationY
Location of colony centroid in Cartesian space.
Location of colony centroid in Cartesian space.
Colony Size and Shape
colonyMaxRadius
The maximum distance from any point in the colony to
the edge.
colonyArea
The actual number of pixels in the region.
colonyPerimeter
The distance around the boundary of the region.
colonyEquivDiameter
The diameter of a circle with the same area as the region.
colonyEccentricity
The eccentricity of the ellipse that has the same secondmoments as the region. eccentricity is the ratio of the
distance between the foci of the ellipse and its major
axis length. The value is between 0 and 1, where 0 is
the eccentricity of a circle and 1 is the eccentricity of a
line segment.
colonyMajorAxisLengthi Scalar specifying the length (in pixels) of the major axis
of the ellipse that has the same normalized second central moments as the region.
colonyMinorAxisLength Scalar; the length (in pixels) of the minor axis of the
ellipse that has the same normalized second central moments as the region.
colonySolidarity
colonyOrientation
colonyNumCells
Colony Density
colonyCellDensity
colonyEulerNumber
colonyMaxIntensity
colonyMeanIntensity
colonyMinIntensity
LocationCenterX
Location-CenterY
Cell Location
Location
Cell
based)
(colony-
Distance-to-edge
Distance-to-center
Neighborhood
Cellular
(based on DNA stain)
NumberOfNeighbors
Scalar specifying the proportion of the pixels in the convex hull that are also in the region.
The angle (in degrees ranging from -90 to 90 degrees)
between the x-axis and the major axis of the ellipse that
has the same second-moments as the region
Number of cells in the colony.
Number of cells in the colony divided by the area of the
colony.
The number of objects in the region minus the number
of holes in those objects.
Maximum intensity of specified stain in colony.
Mean intensity of specified stain in colony.
Minimum intensity of specified stain in colony.
Location of cell centroid in Cartesian space.
Location of cell centroid in Cartesian space.
Distance from cell to the nearest edge of the colony it
belongs to.
Distance from cell center to colony center.
Number of neighboring nuclei defined by three possible
criteria: nuclei adjacent, nuclei adjacent if expanded,
nuclei within specified distance.
PercentTouching
FirstClosestObjectNun
FirstClosestDistance
SecondClosestObjectNi
SecondClosestDistance
AngleBetweenNeighbor
Local Cellular Density
Local Staining Density
c~1
Intensity measurement (lx
per stain)
Percent of the nucleus's boundary pixels that touch
neighbors, after nuclei have been expanded to the specified distance.
bThe index of the closest object.
The distance to the closest object.
[rilhfrindex of the second closest object.
The distance to the second closest object.
3 The angle formed with the object center as the vertex
and the first and second closest object centers along the
vectors.
Number of nuclei within a specificed distance, divided
by area
Amount of stain intensity within a specified distance,
divided by area. May be more robust than nuclei/area
and ratio of local cellular density/local staining density
could help identify breakdown in segmentation due to
clumped cells.
IntegratedIntensity
The sum of the pixel intensities within an object.
MeanIntensity
StdIntensity
The average pixel intensity within an object.
The standard deviation of the pixel intensities within an
object.
The maximal pixel intensity within an object.
The minimal pixel intensity within an object.
MaxIntensity
MinIntensity
Rescaled DNA Intensity
Nucleus Size and Shape
(based on DNA stain)
IntegratedlntensityEdgb The sum of the edge pixel intensities of an object.
The average edge pixel intensity of an object.
MeanlntensityEdge
The standard deviation of the edge pixel intensities of
StdlntensityEdge
an object.
The maximal edge pixel intensity of an object.
MaxIntensityEdge
The minimal edge pixel intensity of an object.
MinIntensityEdge
The distance between the centers of gravity in the grayMassDisplacement
level representation of the object and the binary representation of the object.
LowerQuartileIntensity The intensity value of the pixel for which 25% of the
pixels in the object have lower values.
The median intensity value within the object.
MedianIntensity
UpperQuartileIntensity The intensity value of the pixel for which 75% of the
pixels in the object have lower values.
Integratedlntensityres aldegrated DAPI levels rescaled based on internal colony
control.
Area
The number of pixels in the region.
Perimeter
The total number of pixels around the boundary of each
region in the image.
Calculated as 4*?*Area/Perimeter2. Equals 1 for a perfectly circular object.
FormFactor
Solidity
Extent
EulerNumber
Eccentricity
MajorAxisLength
-'1
MinorAxisLength
Orientation
Compactness
Nucleus Texture (based on
DNA stain, but also could
be done Ix per stain)
FracAtD
The proportion of the pixels in the convex hull that are
also in the region.
The proportion of the pixels in the bounding box that
are also in the region. Computed as the Area divided
by the area of the bounding box.
The number of objects in the region minus the number
of holes in those objects, assuming 8-connectivity.
The eccentricity of the ellipse that has the same secondmoments as the region. The eccentricity is the ratio of
the distance between the foci of the ellipse and its major
axis length. The value is between 0 and 1.
The length (in pixels) of the major axis of the ellipse
that has the same normalized second central moments
as the region.
The length (in pixels) of the minor axis of the ellipse
that has the same normalized second central moments
as the region.
The angle (in degrees ranging from -90 to 90 degrees)
between the x-axis and the major axis of the ellipse that
has the same second-moments as the region.
The variance of the radial distance of the object's pixels
from the centroid divided by the area.
Fraction of total stain in an object at a given radius.
MeanFrac
RadialCV
Haralick Texture Features
00A
H1
H2
H3
H4
H5
H6
H7
H8
H9
H10
H11
H12
H13
Gabor Features
Mean fractional intensity at a given radius; calculated
as fraction of total intensity normalized by fraction of
pixels at a given radius.
Coefficient of variation of intensity within a ring, calculated over 8 slices.
Haralick texture features are derived from the cooccurrence matrix, which contains information about
how image intensities in pixels with a certain position
in relation to each other occur together.
Angular Second Moment
Contrast
Correlation
Sum of Squares: Variation
Inverse Difference Moment
Sum Average
Sum Variance
Sum Entropy
Entropy
Difference Variance
Difference Entropy
Information Measure of Correlation 1
Information Measure of Correlation 2
Applies Gabor filters to object image. The Gabor filters
measure the frequency content in different orientations,
similar to wavelets.
Appendix C
Software manual
C.1
Availability
Please visit https: //github. com/bgorm/lerou-stemcell for updated in-
formation on usage and access to source code.
C.2
Installation
Download the MATLAB files into a directory as specified on the GitHub
page. Download and install the CellProfiler 2.0 binaries at: http: //www.
cellprof iler. org/previousReleases .shtml.
Download and install the
Micro-Manager 1.4 binaries at: https : //micro-manager. org/wiki/Micro-Manager
VersionArchive. Run the configuration wizard in Micro-Manager 1.4 in order to initialize all camera hardware. Then, edit the MATLAB adapter files
109
as specified online to correctly call each microscope function from within
MATLAB.
C.3
Compatibility
The software has been tested and found compatible with all versions of MATLAB 2011a and later, CellProfiler 2.0, and Micro-Manager 1.4. The software
was tested on a Windows 7 64-bit computer.
C.4
Motivation
Human embryonic stem cells (hESCs) exhibit heterogeneity on multiple scales.
Many high-magnification imaging applications such as single molecule mRNA
FISH (smFISH) of hESCs are sufficiently time-consuming and labor-intensive
that it may be difficult to capture enough data to get a sufficiently clear picture of the macro-scale heterogeneity that exists in the cell culture. To address this problem, we built a software program that allows high-magnification
imaging within the context of the heterogeneity existing on the dish with smFISH as a working example.
C.5
Image Acquisition
We utilized a Nikon Ti 2000 microscope, with a Prior ProScan III motorized
stage, and a CoolSNAP-HQ2 high resolution CCD camera. Custom scripts
110
were written in MATLAB to control the control the microscope, camera, and
stage hardware. The Micro-Manager API was used to control the microscope
and camera drivers. The stage was independently controlled with commands
issued through the serial port.
We constructed a program that allows multi-resolution imaging of embryonic stem cells in different regions across the slide. In order to facilitate usage
by people with varying computer backgrounds, we also constructed a GUI to
control this program. The main window of the GUI is shown in Figure 3.1.
The main GUI window contains UI elements that allow the user to adjust all
necessary parameters of the program. This includes parameters controlling
the microscope nosepiece (objective lens to be used), the camera (binning,
gain, exposure, and region of interest properties), the z-drive (number and
interval of z-stack slices), and the program I/O (save directory, notification
of completion).
The second major element of the program is region selection. In selecting
regions, the user has the option of utilizing a lower-magnification prescan
imaged with phase light. The prescan sub- GUI is illustrated in Figure 3.2.
To generate a prescan, the program firsts asks the user to define the upperleft and lower-right positions of the slide, and then asks the user to enable
perfect focus and set it to the correct offset to maintain proper focus, and
finally the exposure level. Once the scan is executed, the user is presented
with a macroscopic field of view of the entire slide, as shown in Figure 3.3,
which the user can then draw rectangles on to select regions. Alternatively,
111
if fluorescent imaging is used, the rectangles outlining colony areas can be
drawn automatically using an integrated segmentation algorithm. In order
to optimize this to a particular sample, the user adjusts the following four
parameters: hsize and sigma (properties of the Gaussian filter), threshold
correction factor (a number that is multiplied by the Otsu threshold, such
that > 1 is a higher threshold and < 1 is a lower threshold), and the minimum colony size (chosen such that randomly dispersed cells or debris are
discarded).
Once the user has selected a region by drawing a rectangle around it,
the user then adds the selection to a list. Users may see and edit previously
selected region by highlighting selections in the textbox. Upon selecting all
regions of interest, the user clicks an update button on the main GUI window
to transfer the data into the main program for execution. Finally, because
the focus level may change across a slide, users have the option of either
using a constant focal center, or defining one for each region they select.
As the program is scanning the selected regions, tiff stacks are saved
in specified directory. Additionally a comma-separated value (CSV) file is
updated at each new stage location with the spatial coordinates of the stage.
Finally, upon the successful conclusion of the program, a CSV file containing
all parameters is written to the save folder. Additionally, a composite array
of each region is constructed, by tiling together maximum projections of each
tiff stack.
112
C.6
Analysis
Further analysis scripts take the output of the image scanning stage and
segment cells (using CellProfiler binaries modularly; cellprofiler.org), extract
features of interest, detect smFISH spots, and output a nested MATLAB
data structure or CSV file containing cellular properties linked with colony
properties. An additional script outputs the files to a format that may be
used with FCS Express (denovosoftware.com).
Alternatively, the analysis files may be used to process arrays of images
captured independently of the imaging stage using off-the-shelf software, or
other imaging platforms entirely, such as Laser Scanning Cytometry if the
files are named appropriately.
113