Supplementary Information for Breast tumor subgroups

advertisement
Supplementary Information for
Breast tumor subgroups reveal diverse clinical prognostic power
Zhaoqi Liu, Xiang-Sun Zhang and Shihua Zhang*
National Center for Mathematics and Interdisciplinary Sciences, Academy of Mathematics and
Systems Science, Chinese Academy of Sciences, Beijing 100190, China
*Corresponding author. Email: zsh@amss.ac.cn
Table S1. Different prognostic results of the multivariate Cox PH model. P-values in bold denote
that the averaged CI is larger than the mean value of the 1000 permutation.
Subtypes
Basal-like
HER2+
Luminal A
Luminal B
Normal-like
All
Samples size
328
238
719
490
200
1981
Clinical+gene
P-value
0.605
0.014
0.597
0.087
0.712
0.010
0.624
0.047
0.709
0.056
0.685
—
Gene only
P-value
0.539
0.030
0.557
0.164
0.624
0.298
0.573
0.082
0.686
0.010
0.626
—
Clinical only
P-value
0.599
0.033
0.625
0.283
0.715
0.001
0.619
0.039
0.667
0.288
0.677
—
Besides the multivariate Cox PH model, we also adopted a random survival forest model in the
same manner which obtained consistent results (Table S2). For gene expression and clinical
features, we used the same variables as in the cox model. Three-fold cross validation repeating for
100 times and permutation test were conducted following the same manner. The CI was used to
estimate the prediction performance of the model.
Table S2. Different prognostic results of the rand survival forest model. P-values in bold denote that
the averaged CI is larger than the mean value of the 1000 permutation.
Subtypes
Basal-like
HER2+
Luminal A
Luminal B
Normal-like
All
Samples size
328
238
719
490
200
1981
Clinical+gene
P-value
0.622
0.029
0.637
0.133
0.726
0.010
0.657
0.093
0.714
0.085
0.702
—
Gene only
P-value
0.544
0.068
0.563
0.216
0.619
0.329
0.568
0.105
0.678
0.020
0.620
—
Clinical only
P-value
0.621
0.017
0.640
0.146
0.732
0.001
0.657
0.114
0.691
0.318
0.690
—
In total, there were 7 gene network modules with a CI larger than 0.60, of which six were from
normal-like subtype and one from basal-like subtype. They were BIRC5 module (CI= 0.650),
MCM10 module (0.640), AURKA module (0.635), RECQL5 module (0.632), POLQ module (0.625)
and CEP55 module (0.624) from the normal-like subtype, and IL18RAP module (0.605) from the
basal-like subtype. Table S3-S4 are functional annotations of these 7 modules based on pathway
and GO enrichment analysis respectively.
Table S3. Functional annotations of 7 prognostic modules based on pathway enrichment analysis.
Module
Gene set
Size
Mitotic M-M/G1 phases
Aurora B signaling
Chromosome Maintenance
Aurora A signaling
222
40
105
31
16
13
8
4
<5.0e-4
<5.0e-4
<3.3e-4
<2.5e-4
Synthesis of DNA
Cell Cycle Checkpoints
Regulation of DNA replication
Mitotic M-M/G1 phases
S Phase
Mitotic G1-G1/S phases
Cell cycle
DNA replication
CDK regulation of DNA replication
ATR signaling pathway
96
118
75
222
112
134
124
36
18
37
13
11
10
13
13
13
9
7
5
3
<1.6e-04
<1.6e-04
<1.6e-04
<1.6e-04
<1.6e-04
<1.6e-04
<1.4e-04
<1.2e-04
<1.1e-04
7.0e-04
Aurora A signaling
31
4
<1.0e-03
Oocyte meiosis
112
3
3.0e-03
Meiosis
85
7
<1.0e-03
Homologous recombination
28
4
<5.0e-04
Fanconi anemia pathway
52
3
2.3e-03
POLQ
Homologous recombination
28
4
<5.0e-04
CEP55
Mitotic G2-G2/M phases
87
4
<1.0e-03
IL23-mediated signaling events(N)
36
4
<1.0e-03
IL12-mediated signaling events(N)
60
4
<5.0e-04
IL12 signaling mediated by STAT4(N)
30
3
3.3e-04
BIRC5
MCM10
AURKA
RECQL5
IL18RAP
Overlap
FDR
Table S4. Functional annotations of 7 prognostic modules based on GO biological process.
Module
BIRC5
Gene Set
mitotic prometaphase
cell division
M phase of mitotic cell cycle
CenH3-containing nucleosome assembly at
centromere
mitotic cell cycle
nucleosome assembly
chromosome segregation
mitotic chromosome condensation
attachment of spindle microtubules to kinetochore
Size
83
251
90
22
Overlap
14
16
14
8
FDR
<3.3e-04
<3.3e-04
<3.3e-04
<2.5e-04
298
87
52
12
6
14
8
7
5
3
<2.0e-04
<1.6e-04
<1.4e-04
<1.2e-04
<1.1e-04
DNA replication
DNA strand elongation involved in DNA replication
M/G1 transition of mitotic cell cycle
mitotic cell cycle
120
30
67
298
14
9
12
15
<1.6e-04
<1.6e-04
<1.6e-04
<1.6e-04
MCM10
S phase of mitotic cell cycle
G1/S transition of mitotic cell cycle
cell cycle checkpoint
DNA-dependent DNA replication initiation
DNA unwinding involved in replication
96
126
115
18
6
12
12
10
7
3
<1.6e-04
<1.6e-04
<1.4e-04
<1.2e-04
<1.1e-04
AURKA
mitosis
157
5
<1.0e-03
RECQL5
DNA recombination
meiosis
reciprocal meiotic recombination
54
37
25
6
5
3
<1.0e-03
<5.0e-04
3.3e-04
POLQ
DNA recombination
DNA repair
DNA duplex unwinding
double-strand break repair via homologous
recombination
54
244
17
35
3
4
2
2
<1.0e-03
5.0e-04
4.6e-03
9.7e-03
CEP55
G2/M transition of mitotic cell cycle
mitotic cell cycle
Mitosis
109
298
157
4
5
4
2.0e-03
2.0e-03
2.6e-03
18RAP
intracellular protein kinase cascade
protein phosphorylation
induction of apoptosis
86
342
166
3
4
3
9.2e-02
1.0e-01
1.0e-01
We conducted multivariate Cox PH regression model on the top 10 genes of each gene signature
(Table S5), and reported the overall Wald test p-value. For the BRIC5 and MCM10 module, genes
were ranked based on their CIs. For PCNA and Wu signature, top 10 genes were selected based
on the ranked list in their original paper. All five gene signatures show significant prognostic ability
with p-values <0.05 on the normal-like tumors, while only CIN attractor shows higher significance
on Luminal B than it on normal-like subtype. The results confirm our original observation.
Table S5. The performance of multivariate Cox PH regression model of the five gene signatures on
each of the five subtypes. For each gene signature, we used its top-ranked 10 genes to fit the
model and the overall Wald test p-value was reported.
Subtypes
Basal
Her2+
Luminal A
Luminal B
Normal-like
BRIC5
0.851
0.463
0.099
0.191
2.50e-04
MCM10
0.071
0.757
0.389
0.216
1.82e-06
PCNA
0.817
0.840
0.096
0.022
0.018
CIN
0.936
0.277
0.078
0.003
0.023
Wu
0.556
0.263
0.006
0.005
5.60e-04
Table S6. Univariate Cox PH regression of the 32 gene stroma module to each of the five subtypes
the METABRIC dataset and the OsloVal dataset, The Cox model was calculated based on the
averaged gene expression value of the 32 genes. Hazard ratio with 95% confidence interval, Wald
test p-value and concordance index were reported.
Subtypes
Basal
Her2+
Luminal A
Luminal B
Normal-like
METABRIC
OsloVal
HR
1.017
1.047
0.812
1.159
0.849
0.904
0.952
95% CI
0.79-1.31
0.77-1.41
0.67-0.98
0.97-1.39
0.57-1.26
0.82-0.99
0.72-1.25
p-value
0.895
0.765
0.029
0.111
0.421
0.046
0.727
CI
0.513
0.518
0.539
0.522
0.542
0.533
0.528
Table S7. CIs of the CIN attractor metagene signature tested on the clinical feature defined
subgroups of the METABRIC dataset. The METABRIC dataset was split into two or three subgroups
according to different clinical features. We calculated the CI of the CIN gene signature on each
subgroup. We chose the CIN attractor metagene by Cheng.et.al which was developed from multiple
cancer dataset ignoring the PAM50 subtype and had shown significant high prognostic power on
the normal-like breast cancer subtype. To be consistent with the original study, we used the top 10
genes of the CIN attractor metagene signature for calculation.
Lymph node status
CI
Positive
0.60
Negative
0.58
ER status
Positive
0.60
Negative
0.53
tumor grade
1
0.52
2
0.59
3
0.57
low risk
Yes
0.55
No
0.58
Table S8. CIs of the CIN attractor metagene signature tested on the different cellularity defined
subgroups of the METABRIC dataset. CI1 denotes the CIs tested on each cellularity subgroups. CI2
denotes the CIs tested on subgroups of removing a certain cellularity types.
cellularity
high
moderate
low
CI1
0.59
0.62
0.62
CI2
0.62
0.60
0.60
Table S9. Ten types of clinical features used in the Multivariate Cox PH regression model.
Variable name
Type
Metric or levels
Age at Diagnosis
numeric
Tumor size
numeric
Centimeters
Lymph Nodes Positive
numeric
Count
Grade
factor
1 = Nottingham (Elston-Ellis) Score 3 to 5
2 = Nottingham Score 6 to 7
3 = Nottingham Score 8 to 9
ER (Estrogen Receptor)
factor
Negative
IHC Status
Positive
HER2 SNP6 State
factor
NEUT
GAIN
LOSS
Treatment received
factor
CT
CT/HT
CT/HT/RT
CT/RT
HT
HT/RT
NONE
RT
ER Expression
factor
+
PR Expression
factor
+
HER2 Expression
factor
+
Age at Diagnosis: age of patient at diagnosis of disease
Tumor size: size of tumor in cm
Lymph Nodes Positive: This covariate is one component of the standard 'TNM' classification of
breast cancer. In this case this refers to the 'N' component. The number of lymph nodes involved is
prognostic. Basically, after a patient undergoes surgical 'interrogation' of her axillary nodal basin,
she can be staged as:
N0 = no axillary node metastases identified
N1 = 1 - 3 nodes exhibiting metastases
N2 = 4 - 9 nodes being 'positive'
N3 > 10 nodes being 'positive'
Grade: This is a semi-quantitative measure that is a composite of three histopathologic
characteristics seen under a microscope by a pathologist. It therefore should not be interpreted as a
continuous variable, but really should be treated as a categorical variable. The components include
measures of tubule formation, mitotic count, and nuclear pleomorphism.
Estrogen receptor (ER) immunohistochemistry status: Estrogen receptor status is obviously the
'original' molecular marker in breast cancer. It is prognostic and predictive. It can be measured by
IHC (immunohistochemistry) or hormone-binding assay. Both methods are commonly in use
currently.
HER2 SNP6 State: A call as provided by METABRIC, using the SNP6.0 data. Again, HER2 status is
an important component of clinical decision making.
Treatment received: CT: Chemotherapy. HT: Hormonal Therapy. RT: Radiation Therapy
ER Expression: METABRIC provided in their supplemental tables a call based on their expression
data. ER status is such an important component of clinical decision making.
PR Expression: As provided by METABRIC as a dichotomous call based on their expression data.
HER2 Expression: A call as provided by METABRIC, using the expression data.
This
table
is
available
at
the
Breast
Cancer
Challenge
support
page
(https://sagebionetworks.jira.com/wiki/display/BCC).
Figure S1. Kaplan-Meier cumulative survival curves of two breast cancer groups defined based on
the expression of the BIRC5 module over a 15-year period on the five breast cancer subtypes, the
METABRIC whole dataset and the OsloVal dataset respectively. The two patient groups in each plot
were defined by partitioning the patients into two equal-sized sets using the median value of the
averaged gene expression profile of the BIRC5 module.
Figure S2. KM curves of the MCM10 module.
Figure S3. KM curves of the AURKA module.
Figure S4. KM curves of the RECQL5 module.
Figure S5. KM curves of the POLQ module.
Figure S6. KM curves of the CEP55 module.
Figure S7. KM curves of the IL18RAP module.
Figure S8. Distributions of tumor grade, ER status, lymph node status and samples of low risk
tumors among the five breast cancer subtypes and the METABRIC whole dataset. More than 50%
of luminal A and normal-like samples were lymph node negative, which was false for the other three
subtypes. 25% of luminal A and 36% of normal-like samples were in grade 3,while 89% for
basal-like, 72% for HER2+ and 54% for luminal B. Low risk tumors are referred as the ER positive,
lymph node negative and low grade tumors. Normal-like and luminal A subtypes were made up of
more “low risk” patients than other three subtypes.
Figure S9. Distributions of tumor cellularity among the five breast cancer subtypes and the
METABRIC dataset.
Figure S10. Distributions of histological type among the five breast cancer subtypes. DCIS: ductal
carcinoma in situ. IDC: invasive ductal carcinoma. MED: medullary. MUC: mucinous. TUB: tubular.
ILC: invasive lobular carcinoma. INVASIVE TUMOUR MIXED NST AND A SPECIAL TYPE: NST
stands for 'no special type' and therefore this term probably is a grab bag for tumors of mixed
pathologic features. It is unlikely that the few tumors of this category are pathologically identical (at
least by histology). OTHER: a grab bag term, as above. OTHER INVASIVE: a grab bag term, but
the pathologists want to reinforce that there's an invasive component. PHYL: 'phyllodes' tumors that
are a separate category from carcinomas. The do span the range of benign, borderline, and
malignant disease. In common, they are rapid growing, but remain confined to the breast.
Download