Supplementary Materials Identification of Novel Markers that Outperform EpCAM in Quantifying Circulating Tumor Cells Min-Ji Kim1, Na Young Choi2, Eun Kyung Lee1,3, Myung-Soo Kang1, 3,* 1 Samsung Biomedical Research Institute and Samsung Medical Center, Seoul, Korea 135- 718, 2Department of Nursing, Kyungdong University. 815 Kyonwhon-ro, Munmap-eup, Gangwon-do 220-804, Korea. 3Department of Health Sciences and Technology, Samsung Advanced Institute for Health Sciences and Technology, Samsung Biomedical Research Institute and Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Korea 135-718 Running Title: Novel Circulating Tumor Cell Markers * Corresponding author: Samsung Advanced Institute Health Sciences and Technology, Samsung Medical Center, Sungkyunkwan University School of Medicine, 50 Irwon-dong, Gangnam-gu, Seoul Korea 135-710 135-710. Tel:82-2-3410-1038, Fax: 82-2-3410-0534; email: mkang@skku.edu Fig. S1. Biostatistical analyses of CTC markers EpCAM, KRT19 and KRT7 Fig. S2. Analyses of additional pan-CTC markers SNX7, PTGR1, LAMB1, PRSS23 and GNG12. Fig. S3. Biostatistics analyses of EpCAM-/low markers assessed by boxplot (A), histogram (B) and ROC (C) analyses. Table S1. Sample information of 967 cancer cell lines used in the Cancer Cell Line Encyclopedia microarray database Table S2. Detail of 48 pan-CTC and 6 lymphocyte-depletion markers that are found as uniformly over-expressed genes (OG) and under-expressed genes (UG), respectively, in AOC over LL groups in Fig. 1, with greater statistical significance than EpCAM and PTPRC, respectively. Table S3. Genes that are over-expressed (pink background) and under-expressed (green background) by more than 32-fold (log2FC > 5) in the specified group over LL. The boxed genes denote the 12 best pan-CTC markers and 6 leukocyte depletion markers. Blue letters denote markers unique to the specified type, a subset of which is cancer type-specific. Genes are ranked from top to bottom in the order of the highest over-expression to the lowest underexpression. Table S4. Detail of 136 genes shown in the heat map from Fig. 2. Gene list, expression mean in the AOC and LL groups, mean difference (log2(Fold Change)) between specified groups of cancer (AOC or 21 other tumor types) and LL (e.g. log2((FC_AOC) /FC_LL)) and p values of the WRS test for mean distribution and Kolmogorov-Smirnov test for distribution of all the data points (not shown) within the specified group compared to those of LL. *Values transformed in log2 scale. #Markers classified by single test between AOC and LL or 21 tests between each of the 21 groups and LL. PAN, pan-CTC marker; E-/low, markers for CTC derived from Non/Low EpCAM cancer groups; Specific, makers for a specific cancer group. Table S5. Validation of CTC markers from this study for a subset of universally overexpressed genes in diverse cancer tissues in an independent Bitter multi-cancer study. Table S6. GO terms in gene ontology analyses using the 136 gene markers shown in Fig. 2 and Table S4.