ppt - University of Connecticut

advertisement
Towards Whole-Transcriptome
Deconvolution with Single-cell
Data
JAMES LINDSAY1
ION MANDOIU1
CRAIG NELSON2
UNIVERSITY OF CONNECTICUT
1DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
2DEPARTMENT OF MOLECULAR AND CELL BIOLOGY
Mouse Embryo
ANTERIOR / HEAD
Somites
Node
Primitive streak
POSTERIOR / TAIL
Unknown Mesoderm Progenitor
• What is the expression
profile of the progenitor
cell type?
NSB=node-streak border;
PSM=presomitic mesoderm; S=somite;
NT=neural tube/neurectoderm;
EN=endoderm
Characterizing Cell-types
• Goal: Whole transcriptome
expression profiles of individual
cell-types
• Technically challenging to measure
whole transcriptome expression
from single-cells
• Approach: Computational
Deconvolution of cell mixtures
• Assisted by single-cell qPCR
expression data for a small number
of genes
Modeling Cell Mixtures
Mixtures (X) are a linear combination of signature matrix (S) and
concentration matrix (C)
π‘‹π‘š π‘₯ 𝑛 = π‘†π‘š π‘₯ π‘˜ βˆ™ πΆπ‘˜ π‘₯ 𝑛
cell types
mixtures
cell types
genes
genes
mixtures
Previous Work
1. Coupled Deconvolution
Given: X, Infer: S, C
•
•
•
NMF
Minimum polytope
Repsilber, BMC Bioinformatics, 2010
Schwartz, BMC Bioinformatics, 2010
2. Estimation of Mixing Proportions
Given: X, S Infer: C
•
•
•
Quadratic Prog
LDA
Gong, PLoS One, 2012
Qiao, PLoS Comp Bio, 2o12
3. Estimation of Expression Signatures
Given: X, C Infer: S
•
•
csSAM
Shen-Orr, Nature Brief Com, 2010
Single-cell Assisted Deconvolution
Given: X and single-cells qPCR data
Infer: S, C
Approach:
1. Identify cell-types and estimate reduced signature
matrix 𝑆 using single-cells qPCR data
•
•
Outlier removal
K-means clustering followed by averaging
2. Estimate mixing proportions C using 𝑆
•
Quadratic programming, 1 mixture at a time
3. Estimate full expression signature matrix S using C
•
Quadratic programming , 1 gene at a time
Step 1: Outlier Removal + Clustering
Remove cells that have maximum Pearson
correlation to other cells below .95
unfiltered
filtered
Step 1: PCA of Clustering
Step 2: Estimate Mixture Proportions
For a given mixture i:
min( 𝑆𝑐 − π‘₯
2
), 𝑠. 𝑑.
𝑐=1
𝑐𝑙 ≥ 0 ∀𝑙 = 0 … π‘˜
π‘₯ = 𝑋𝑗,𝑖 ∀ 𝑗 = 1 … π‘š
Reduced signature matrix.
Centroid of k-means clusters
𝑐 = 𝐢𝑙,𝑖 ∀ 𝑙 = 1 … π‘˜
Step 3: Estimating Full Expression Signatures
cell types
mixtures
cell types
genes
genes
mixtures
C: known from step 2
x: observed signals from new gene
s: new gene to estimate signatures
Now solve:
min( 𝑠𝐢 − π‘₯
2)
Experimental Design
Single Cell Profiles
•
92 profiles
•
31 genes
Simulated Concentrations
•
Sample uniformly at
random [0,1]
•
Scale column sum to 1.
Simulated Mixtures
•
Choose single-cells
randomly with replacement
from each cluster
•
Sum to generate mixture
Data: RT-qPCR
• CT values are the cycle in which gene was detected
• Relative Normalization to house-keeping genes
• HouseKeeping genes
• gapdh, bactin1
• geometric mean
• Vandesompele, 2002
• dCT(x) = geometric mean – CT(x)
• expression(x) = 2^dCT(x)
Accuracy of Inferred Mixing Proportions
Concentration Matrix: Concordance
Concentration by # Genes: Random
Concentration by # Genes: Ranked
RMSE
2^dCT
Leave-one-out: Concentration: 50 mix
Missing Gene
RMSE
2^dCT
Leave-one-out: Signature: 10 mix
Missing Gene
RMSE
2^dCT
Leave-one-out: Signature: 50 mix
Missing Gene
Future Work
• Bootstrapping to report a confidence interval of each
estimated concentration and signature
• Show correlation between large CI and poor accuracy
• Mixing of heterogeneous technologies
• qPCR for single-cells, RNA-seq for mixtures
• Normalization (need to be linear)
• Whole-genome scale
• # genes to estimate 10,000+ signatures
• Data!
Conclusion
Special Thanks to:
•
•
•
•
Ion Mandoiu
Craig Nelson
Caroline Jakuba
Mathew Gajdosik
James.Lindsay@engr.uconn.edu
Download