1
2
3
4
5
6
11
12
13
14
9
10
7
8
15
16
17
Although only a few Y-SNPs are retained for some of the seven different Y-SNP types, it is possible to find examples for each type such that the difference between the types and their value to improve the phylogenetic tree of Y-chromosome are illustrated. The Y-SNPs of type A are of particular interest since they can refine the existing sub-haplogroups. These Y-SNPs are only carried in six different haplogroups as shown in Figure S1. This figure shows the relative number of non-equivalent Y-SNPs of type A per haplogroup i.e. the number of non-equivalent Y-SNPs divided by the number of samples that are present in our dataset which belong to that particular haplogroup. Haplogroups G and O are the haplogroups with the most relative non-equivalent type A Y-SNPs followed by haplogroups I and
J. An example which shows that Y-SNPs of type A can subdivide the existing sub-haplogroups is given in Table S8: nine Y-SNPs which represent two non-equivalent Y-SNPs can divide the four samples belonging to sub-haplogroup R1a1a10 (R-Z93) into two groups.
18
19
20
21
22
23
24
Besides refining the existing sub-haplogroups, novel Y-SNPs classed in type B can also solve polytomies in the phylogenetic tree. An example of such a type B Y-SNP is observed for the subhaplogroup I2 or I2a (I-P215 or I-P37.2) based on 15 Y-SNPs which solve the polytomy within this sub-haplogroup (Table S9). Figure S2 shows how the phylogenetic tree of I2 (I-P215) looks like after interpreting the results of PENNY. Although the polytomy is not fully resolved the number of direct sub-haplogroups of I2 (I-P215) is reduced to three.
1
25
26
27
28
29
30
The next type of Y-SNPs, type C, confirms the existing sub-haplogroups in the phylogeny: they are equivalent to the existing defining Y-SNP(s) of a sub-haplogroup. An example of such type C Y-SNPs is given in Table S10. Sub-haplogroup O or O1 (O-M175 or O-MSY2.2) is defined in update 1.2 of the phylogenetic tree by the Y-SNPs P186, P191 and P196. PENNY indicated that there are 46 more equivalent Y-SNPs which define this sub-haplogroup.
31
32
33
34
35
The fourth type of potentially interesting Y-SNPs, type D, is quite similar to type C since they are also a confirmation of the sub-haplogroup but these sub-haplogroups are the end-leaves of the phylogenetic tree. Table S11 shows an example of 31 type D Y-SNPs in sub-haplogroup E1b1b1a1c* (E-V22*) which are homologues for the defining Y-SNP V22.
36
41
42
43
37
38
39
40
Type E is the fifth type of potentially interesting Y-SNPs and is defined as the collection of Y-SNPs that are carried in samples of two different sub-haplogroups and their direct parent is the MRCA. This
MRCA has only two sub-haplogroups and in only one of the two sub-haplogroups more than 75% of the samples carry the Y-SNP. The example in Table S12 consists of seven type E Y-SNPs of which all carrying samples belong to MRCA R1b1b2a1a2e (R-M529). Both samples belonging to
R1b1b2a1a2e1 (R-M222) carry the Y-SNPs but also one out of the eleven samples belonging to
R1b1b2a1a2e* (R-M529*) does so.
44
45
46
47
48
49
The sixth type of Y-SNPs resulting from PENNY is type F which resembles type E since there is at least one sub-haplogroup of the MRCA with ≥75% and at least one sub-haplogroup with <75% of the samples carrying the Y-SNP and the MRCA is not polytomous. However, it differs from type E as the
MRCA is not the direct parent of all Y-SNP carrying samples. There were only three type F Y-SNPs predicted by PENNY and they all have brackets which means that at least one sub-haplogroup of the
2
50
51
52
53
54
55
56
57
MRCA has only one high quality sample in the dataset. The example in Table S13 shows such a type
(F) Y-SNP of MRCA O or O1 (O-M175 or O-MSY2.2). The Y-SNP is present in all samples belonging to sub-haplogroups O1a or O1a1 (O-M119 or O-P203) and O2a1 (O-M95) but not in the samples belonging to sub-haplogroups O2b (O-SRY465) and O3 (O-M122). As there is only one sample in the dataset of high quality genomes that belongs to O2b (O-SFY465) and only one Y-SNP was found with this pattern, it is uncertain whether the absence of the Y-SNP in this sub-haplogroup is due to a false negative call of this single SNP or whether this sub-haplogroup needs to be defined in another way in the phylogeny.
58
59
60
61
62
63
The last type of potentially interesting Y-SNPs is type G which gathers the Y-SNPs which occur in several leaves of the tree but never in more than 75% of all samples of those sub-haplogroups.
Examples of such Y-SNPs with MRCA R1b1b2a1a (R-P310) are given in Table S14. However, as no two synonymous SNPs were found no consistent pattern was observed to question the current phylogenetic structure within R-P310 based on these type G Y-SNPs.
64
65
66
71
72
73
74
75
67
68
69
70
Table S1 Overview of the related samples in the dataset. The type of relatedness (familial or identical) is indicated. Each group of related samples also has an identifying group number.
Table S2 AMY-tree v1.2 results of the test panel with samples from the Complete Genomics (CG),
1000 Genomes pilot (1000G pilot) and phase 1 (1000G phase 1) projects, the Personal Genome
Project (PGP), Singapore Sequencing Malay Project (SSMP) project and several individual genome projects (IND). For each sample the determined sub-haplogroup, the number of new Y-SNPs, i.e. Y-
SNPs which are not yet present in the current Y-chromosomal phylogenetic tree, and the Matthews
3
90
91
92
93
86
87
88
89
94
95
96
82
83
84
85
76
77
78
79
80
81 correlation coefficient (MCC) are given. The results in bold indicate the samples which have a MCC value higher than 0.95 which means are considered to have a high SNP calling quality.
Table S3 Dataset of the non-unique regions of the Y-chromosome: pseudoautosomal, heterochromatic, X-transposed and ampliconic segments [1] of the male-specific part of the genome as reported by Wei et al. [2].
Table S4 PENNY results for detection of potentially interesting Y-SNP based on the 173 samples with a high quality (MCC ≥ 0.95). For each Y-SNP that passed the filters an identifier is given, next to its position, ancestral and mutant allele. Also the type of Y-SNP and the determined MRCA are given.
For each sample the presence or absence of the Y-SNP is indicated with + and -. The number of samples that carry the Y-SNP is also given.
Table S5 Overview of the names assigned to the new Y-SNPs which did not have a phylogenetic name. They are characterized by their Hg19 position and their mutation. For some of these Y-SNPs a dbSNP identifier is available.
Table S6 Overview of the Y-SNPs that are classed as ‘waiting room’ by PENNY. For each Y-SNP its identifier, position, ancestral and mutant allele, type and MRCA is given. For each sample the presence or absence of the Y-SNP is indicated with + and -. The number of samples that carry the Y-
SNP is also given.
97
98
99
100
101
Table S7 Overview of the 113 validated private Y-SNPs based on samples of the same genome. For each Y-SNP its position, ancestral and mutant allele, type and MRCA is given. For each sample the presence or absence of the Y-SNP is indicated with + and -. The number of samples that carry the Y-
SNP is also given.
4
102
103
104
105
Table S8 Example of type A Y-SNPs which can divide sub-haplogroup R1a1a10 [R-Z93] into two different groups. These nine Y-SNPs divide the four genomes belonging to this sub-haplogroup into two groups of both two genomes.
106
107
108
Table S9 Example of type B Y-SNPs which can solve the polytomy of sub-haplogroup I2 or I2a [I-
P215 or I-P37.2]. These 15 Y-SNPs are only carried in two of the three sub-haplogroups.
109
110
111
112
Table S10 Example of type C Y-SNPs which can confirm sub-haplogroup O or O1 [O-M175 or O-
MSY2.2]. These 46 Y-SNPs are carried in more than 75% of all 36 genomes which belong to this subhaplogroup.
113
114
115
120
121
122
123
116
117
118
119
124
125
126
127
Table S11 Example of type D Y-SNPs which are carried in all samples belonging to sub-haplogroup
E1b1b1a1c* [E-V22*].
Table S12 Example of type E Y-SNPs which are carried in only one (out of eleven) sample belonging to R1b1b2a1a2e* [R-M529*] and in all samples belonging to sub-haplogroup R1b1b2a1a2e1 [R-
M222].
Table S13 Example of type (F) Y-SNPs which are carried in all samples belonging to subhaplogroups O1a or O1a1 [O-M119 or O-P203], O2a [O-PK4] but not in the sample belonging to subhaplogroups O2b [SRY465] and O3 [M122].
Table S14 Example of type G Y-SNPs which are carried in a few samples belonging to the different sub-haplogroups of R1b1b2a1a [R-P310].
5
134
135
136
137
128
129
130
131
132
133
138
Table S15 Overview of the 27 Y-SNPs which are already known by the International Society of
Genetic Genealogy (ISOGG). Besides the ISOGG name, the position in Hg19, ancestral and mutant alleles and type of Y-SNP also the dbSNP identification is given. Also the number of equivalent Y-
SNPs that are indicated as potentially interesting by PENNY is given. The last two columns contain the information about the lineage to which the Y-SNP belongs according to PENNY and to ISOGG.
Table S16 Example of Y-SNPs which can split the combined sub-haplogroup O1a* or O1a1* [O-
M119* or O-P203*]. Previously these sub-haplogroups were distinguished by the recurrent Y-SNP
P203.1/M307.1. The presence of this recurrent Y-SNP in the seven samples belonging to this subhaplogroup is given next to the presence of the 55 type A Y-SNPs predicted by PENNY. These 55 Y-
SNPs divide the seven genomes into two groups of two and five genomes.
139
140
141
142
143
Table S17 Example of Y-SNPs which can split the combined sub-haplogroup J2b2* or J2b2a [J-M241 or J-M99]. Previously these sub-haplogroups were distinguished by Y-indel M99. The presence of this
Y-indel in the three samples belonging to this sub-haplogroup is given, next to the presence of the 40 type A Y-SNPs predicted by PENNY. These 40 Y-SNPs divide the genomes into two groups.
148
149
150
151
152
153
154
144
145
146
147
Table S18 Example of Y-SNPs which can split the combined sub-haplogroup O or O1 [O-M175 or O-
MSY2.2]. Previously these sub-haplogroups were distinguished by Y-indel MSY2.2. The presence of this Y-indel in the 36 samples belonging to this sub-haplogroup is given, next to the presence of the five type A Y-SNPs predicted by PENNY. These 5 Y-SNPs divide the genomes into two groups.
Table S19 Table format of the updated Y-chromosomal phylogenetic tree version 2.0 optimized based on the results of the first run with PENNY (30 May 2013). For each reported Y-chromosomal (sub-
)haplogroup, the phylogenetic lineage to which the (sub-)haplogroup belongs, called the ‘parental lineage’, and the defining Y-SNPs of the lineage are given.
6
161
162
163
164
155
156
157
158
159
160
165
166
167
168
169
Table S20
Table S21
Table S22
Version 2.0 of the Y-SNP conversion file (30 May 2013). This list contains the name, the synonyms, the RefSNP ID, the position on the Y-chromosome according to references NCBI36
(Hg18) and GRCh37 (Hg19) and the mutant conversion state (Ancestral allele -> Mutant allele) of all reported Y-SNPs.
on the results of the second run with PENNY and on the latest published evolutionary lineages (20
November 2013). For each reported Y-chromosomal (sub-)haplogroup, the phylogenetic lineage to which the (sub-)haplogroup belongs, called the ‘parental lineage’, and the defining Y-SNPs of the lineage are given.
(Hg18) and GRCh37 (Hg19) and the mutant conversion state (Ancestral allele -> Mutant allele) of all reported Y-SNPs.
Table format of the updated Y-chromosomal phylogenetic tree version 2.1 optimized based
Version 2.1 of the Y-SNP conversion file (20 November 2013). This list contains the name, the synonyms, the RefSNP ID, the position on the Y-chromosome according to references NCBI36
7
170
171
172
number
173
174
175
176
177
178
179
Figure S2 Phylogeny of I2 (I-P215) in update 1.2 (upper panel) and after interpretation of the results by the PENNY software (lower panel).
180
8
181
182
Figure S3
update 1.2 (upper panel) and after interpretation of the results by the PENNY software (lower panel).
183
9
184 Figure S4 Number of samples of high quality (MCC ≥ 0.95) per haplogroup.
185
186
187
188
189
190
191
192
193
[1] Skaletsky, H., Kuroda-Kawaguchi, T., Minx, P. J., Cordum, H. S., et al., Nature 2003, 423, 825-
837.
[2] Wei, W., Ayub, Q., Chen, Y., McCarthy, S., et al., Genome Research 2013, 23, 388-395.
10