Counting Pedigrees up to Isomorphism

advertisement
Counting Pedigrees up to Isomorphism
1. Introduction
2. Definition
3. Sex-labelled Discrete Generation Model
4. Sex Label a given unlabelled pedigree
5. Unsex-labelled Discrete Generation Model
6. Summary
Appendix
1. Introduction
When investigating problems like reconstruction of pedigrees or the probability of
data given a particular pedigree, interesting questions related to pedigrees arise.
Simulations based on a model of human population history and geography find that an
individual that is the genealogical ancestor of all living humans existed just a few
thousand years ago, as suggested in a recently published article by Prof. Hein.1 One
related fascinating question, which we are going to investigate in this project, is that
how many different pedigrees there actually are up to isomorphism if we trace back a
certain number of generations.
To do this, we will introduce some definitions.
2. Definition
Pedigree:
A pedigree is a directed graph showing the relationship between individuals according
to their parentage. Both of (2.1) and (2.2) below are examples of pedigrees.
Generation 0:
We call the most recent generation ‘generation 0’, in this project, there is only 1
individual in generation 0.
Discrete:
As we said above, the most recent generation is ‘generation 0’, then let the parents of
the individual in generation g be in generation g+1, and allow the case that more than
one generation number are assigned to one single individual. If no individual in the
pedigree belongs to more than one generation, we say the pedigree is discrete.
Pedigree (2.1) as below is an example of a discrete pedigree, while (2.2) being nondiscrete.
(2.1)
Generation 4
Generation 3
Generation 2
Generation 1
Generation 0
1
Hein, Jotun. ‘Pedigrees for all humanity’ Nature Vol 431 September 2004, Page 518-519
(2.2)
3. Sex-labelled Discrete Generation Model
To start off, we are going to concentrate on a simpler version of the original question,
in which we are only interested in the pedigrees which are sex-labelled, and are of
discrete time and generation.
In this particular case, since every individual in a pedigree is labelled by the sex, there
is a unique link between generation 0 (our starting individual) and any other
individuals in the pedigree. This is a very useful characteristic of the sex-labelled
pedigrees, in the sense that it guarantees every individual is uniquely determined,
which saves us a lot of trouble when counting. Also, the discrete time and generation
constraints make it possible for us to adopt a recursion moving back generation by
generation.
The idea of the counting is as follows:
Sampling one individual, the number of grandparents is a natural number between 2
and 4 inclusive. Similarly, the number of different ancestors g generations back in
time (the ancestors in generation g only) is a natural number between 2 to 2g inclusive.
The number of different pedigrees back to g generations can be easily calculated if we
know the number of different pedigrees of 1 generation with c number of children and
p number of parents(c is any natural number from 1 to 2g-1 inclusive, and p is any
natural number from 2 to 2g inclusive). This number, however, can be calculated
easily.
Now, let’s write the idea above out in symbolic terms. Let Ng(p) denote the number
of different pedigrees going back g generations with 1 individual in generation 0 and
p individuals in generation g, and A(c, p) denote the number of different pedigrees of
1 generation with c children and p parents. Adopting these notations, our recursions
can be shown as the following expressions:
A(c, p) 

1i , j  c
i j p
Sc, i  Sc, j
(3.1)
Ng  1( p)   Ng (c)  A(c, p)
(3.2)
c
In Equation (3.1), Sc,j, the Stirling number of second kind, is the number of different
ways to partition c elements into j sets. Here, we assume among those p parents, i of
them are female and j of them male. Since the partitions of females and males are
independent, A(c, p), the number of different pedigrees of 1 generation with c
children and p parents is just the product of Sc,i and Sc, j summing over all possible
i,j-s.
Notice that among these p parents, at least one of them is female and at most p-1 of
them are female, we can rewrite Equation (1.1):
A(c, p ) 
Min[ c , p 1]

Sc, i  Sc, p  i
(3.3)
i 1
Following this recursion algorithm, we can write a programme called ‘countp’ in
Mathematica.
Step 1. Delete any existing function with the name ‘countp’;
Step 2. Declare fuction ‘countp’ to have input ‘generation’, a integer number variable;
Step 3. Declare matrix variables ‘matrixn’, ‘matrixa’, number variable ‘g’, ‘c’, ‘p’;
‘matrixn’—the g,pth entry in matrixn is Ng(p) above.
‘matrixa’—the c,pth entry in matrixa is A(c,p) above.
Step 4. Define matrixn to have ‘generation’ number of rows and 2^‘geneation’
number of columns with 1,2th entry being 1 and all the other entries being 0;
Step 5. Define matrixa to have 2^(‘generation’ – 1) number of rows and 2^‘geneation’
number of columns with all entries being 0;
Step 6. Assign ‘g’ value 2;
Step 7. If g<= ‘generation’, let c = 1;
Otherwise, go to Step 12;
Step 8. If c<= 2^(g-1), let p = 2;
Otherwise, go to Step 11;
Step 9. If p<=2c, let matrixa[[c,p]] = Summation of Sc,j * Sc,p-j over integer values
for j from 1 to the minimum of c and p-1,
let matrixn[[g,p]] =matrixn[[g,p]] + matrixn[[g-1,c]]*matrixa[[c,p]],
p=p+1;
repeat Step 9;
Otherwise, go to Step 10;
Step 10. Let c=c+1, then go to Step 8;
Step 11. Let g=g+1, then go to Step 7;
Step 12. Sum the last row of matrixn and return the value.
(Matrix[[m,n]] denote the m,n th entry in matrix called ‘Matrix’)
Using the programme we wrote in Mathematica, we have got the following result:
Generation 2
1
1
1
2
3
0
2
Number of Parents
4
5
6
0
0
0
1
0
0
7
0
0
8
0
0
Total
1
4
4
3
28
84
98
52
12
1
279
Number of different sex labelled pedigrees (#)
2
3
4
5
6
Generation 1
1
4
279
2.8766*10^7 2.81597*10^20 7.40974*10^52
#
8
9
10
Generation 7
#
2.76739*10^131 2.88199*10^317 3.52352*10^749 3.89498*10^1737
From the numbers about, we can see that the number of different sex labelled
pedigrees grows very fast with generations. If we trace for 3 generations, then there
are 279 sex-labelled pedigrees up to isomorphism, and if we trace back one generation
more, the number of pedigrees up to isomorphism rises up to 2.8766*10^7! The
recursion programme in Mathematica we used above can calculate up to 10
generations. Beyond that, calculations get very slow.
Tracing back each number of generations, I plotted the log of number of different
pedigrees which have p distinct parents in the earliest generation against p, getting a
log distribution of the number of different pedigrees having a certain number of
parents in the last generation for each generation.
2
0.3
7
0.25
6
1.5
0.2
5
0.15
4
1
3
0.1
2
0.5
0.05
1
2
3
4
2
2 generation
20
3
4
5
6
7
8
2
3 generation
3
4
5
6
7
8
9
10 11 12 13 14 15 16
4 generation
50
120
40
100
15
80
30
60
10
20
5
40
10
2 3 4 5 6 7 8 91011121314151617181920212223242526272829303132
20
2345678910
112
113
14
116
517
18
120
921
22
224
325
26
27
229
830
31
333
234
35
337
638
39
441
042
43
445
446
47
449
850
51
553
254
55
557
658
59
661
062
63
64
5 generation
234567810
911
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
100
99
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
6 generation
7 generation
300
700
1750
250
600
1500
500
1250
400
1000
300
750
200
500
100
250
200
150
100
50
2345610
711
812
913
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
100
98
101
99
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
8 generation
210
3411
5613
12
7815
14
917
16
18
20
19
22
21
24
23
26
25
28
27
30
29
32
31
34
33
36
35
38
37
40
39
42
41
44
43
46
45
48
47
50
49
52
51
54
53
56
55
58
57
60
59
62
61
64
63
66
65
68
67
70
69
72
71
74
73
76
75
78
77
80
79
82
81
84
83
86
85
88
87
90
89
92
91
94
93
100
96
95
102
101
98
97
104
103
99
106
105
108
107
110
109
112
111
114
113
116
115
118
117
120
119
122
121
124
123
126
125
128
127
130
129
132
131
134
133
136
135
138
137
140
139
142
141
144
143
146
145
148
147
150
149
152
151
154
153
156
155
158
157
160
159
162
161
164
163
166
165
167
169
168
171
170
173
172
175
174
177
176
179
178
181
180
183
182
185
184
187
186
189
188
191
190
193
192
195
194
197
196
199
198
201
200
203
202
205
204
207
206
209
208
211
210
213
212
215
214
217
216
219
218
221
220
223
222
225
224
227
226
229
228
231
230
233
232
235
234
237
236
239
238
241
240
243
242
245
244
247
246
249
248
251
250
253
252
255
254
257
256
259
258
261
260
263
262
265
264
267
266
269
268
271
270
273
272
275
274
277
276
279
278
281
280
283
282
285
284
287
286
289
288
291
290
293
292
295
294
297
296
299
298
301
300
303
302
305
304
307
306
309
308
311
310
313
312
315
314
317
316
319
318
321
320
323
322
325
324
327
326
329
328
331
330
333
332
335
334
337
336
339
338
340
342
341
344
343
346
345
348
347
350
349
352
351
354
353
356
355
358
357
360
359
362
361
364
363
366
365
368
367
370
369
372
371
374
373
376
375
378
377
380
379
382
381
384
383
386
385
388
387
390
389
392
391
394
393
396
395
398
397
400
399
402
401
404
403
406
405
408
407
410
409
412
411
414
413
416
415
418
417
420
419
422
421
424
423
426
425
428
427
430
429
432
431
434
433
436
435
438
437
440
439
442
441
444
443
446
445
448
447
450
449
452
451
454
453
456
455
458
457
460
459
462
461
464
463
466
465
468
467
470
469
472
471
474
473
476
475
478
477
480
479
482
481
484
483
486
485
488
487
490
489
492
491
494
493
496
495
498
497
500
499
502
501
504
503
506
505
508
507
510
509
511
512
9 generation
13
12
11
10
518
4
3
2
17
16
15
14
922
8
7
6
21
20
19
25
24
23
29
28
27
26
33
32
31
30
38
37
36
35
34
42
41
40
39
46
45
44
43
50
49
48
47
54
53
52
51
58
57
56
55
62
61
60
59
66
65
64
63
71
70
69
68
67
75
74
73
72
79
78
77
76
83
82
81
80
87
86
85
84
91
90
89
88
104
103
102
101
100
95
94
93
92
108
107
106
105
99
98
97
96
112
111
110
109
116
115
114
113
120
119
118
117
124
123
122
121
128
127
126
125
132
131
130
129
137
136
135
134
133
141
140
139
138
145
144
143
142
149
148
147
146
153
152
151
150
157
156
155
154
161
160
159
158
165
164
163
162
170
169
168
167
166
174
173
172
171
178
177
176
175
182
181
180
179
186
185
184
183
190
189
188
187
194
193
192
191
198
197
196
195
203
202
201
200
199
207
206
205
204
211
210
209
208
215
214
213
212
219
218
217
216
223
222
221
220
227
226
225
224
231
230
229
228
236
235
234
233
232
240
239
238
237
244
243
242
241
248
247
246
245
252
251
250
249
256
255
254
253
260
259
258
257
264
263
262
261
269
268
267
266
265
273
272
271
270
277
276
275
274
281
280
279
278
285
284
283
282
289
288
287
286
293
292
291
290
297
296
295
294
302
301
300
299
298
306
305
304
303
310
309
308
307
314
313
312
311
318
317
316
315
322
321
320
319
326
325
324
323
330
329
328
327
335
334
333
332
331
339
338
337
336
343
342
341
340
347
346
345
344
351
350
349
348
355
354
353
352
359
358
357
356
363
362
361
360
368
367
366
365
364
372
371
370
369
376
375
374
373
380
379
378
377
384
383
382
381
388
387
386
385
392
391
390
389
396
395
394
393
401
400
399
398
397
405
404
403
402
409
408
407
406
413
412
411
410
417
416
415
414
421
420
419
418
425
424
423
422
429
428
427
426
434
433
432
431
430
438
437
436
435
442
441
440
439
446
445
444
443
450
449
448
447
454
453
452
451
458
457
456
455
462
461
460
459
467
466
465
464
463
471
470
469
468
475
474
473
472
479
478
477
476
483
482
481
480
487
486
485
484
491
490
489
488
495
494
493
492
500
499
498
497
496
504
503
502
501
508
507
506
505
512
511
510
509
516
515
514
513
520
519
518
517
524
523
522
521
528
527
526
525
533
532
531
530
529
537
536
535
534
541
540
539
538
545
544
543
542
549
548
547
546
553
552
551
550
557
556
555
554
561
560
559
558
566
565
564
563
562
570
569
568
567
574
573
572
571
578
577
576
575
582
581
580
579
586
585
584
583
590
589
588
587
594
593
592
591
599
598
597
596
595
603
602
601
600
607
606
605
604
611
610
609
608
615
614
613
612
619
618
617
616
623
622
621
620
627
626
625
624
632
631
630
629
628
636
635
634
633
640
639
638
637
644
643
642
641
648
647
646
645
652
651
650
649
656
655
654
653
660
659
658
657
665
664
663
662
661
669
668
667
666
673
672
671
670
677
676
675
674
681
680
679
678
685
684
683
682
689
688
687
686
693
692
691
690
698
697
696
695
694
702
701
700
699
706
705
704
703
710
709
708
707
714
713
712
711
718
717
716
715
722
721
720
719
726
725
724
723
731
730
729
728
727
735
734
733
732
739
738
737
736
743
742
741
740
747
746
745
744
751
750
749
748
755
754
753
752
759
758
757
756
764
763
762
761
760
768
767
766
765
772
771
770
769
776
775
774
773
780
779
778
777
784
783
782
781
788
787
786
785
792
791
790
789
797
796
795
794
793
801
800
799
798
805
804
803
802
809
808
807
806
813
812
811
810
817
816
815
814
821
820
819
818
825
824
823
822
830
829
828
827
826
834
833
832
831
838
837
836
835
842
841
840
839
846
845
844
843
850
849
848
847
854
853
852
851
858
857
856
855
863
862
861
860
859
867
866
865
864
871
870
869
868
875
874
873
872
879
878
877
876
883
882
881
880
887
886
885
884
891
890
889
888
896
895
894
893
892
900
899
898
897
904
903
902
901
908
907
906
905
912
911
910
909
916
915
914
913
920
919
918
917
924
923
922
921
929
928
927
926
925
933
932
931
930
937
936
935
934
941
940
939
938
945
944
943
942
949
948
947
946
953
952
951
950
957
956
955
954
962
961
960
959
958
966
965
964
963
970
969
968
967
974
973
972
971
978
977
976
975
982
981
980
979
986
985
984
983
990
989
988
987
1003
1002
1001
1000
995
994
993
992
991
1007
1006
1005
1004
999
998
997
996
1011
1010
1009
1008
1015
1014
1013
1012
1019
1018
1017
1016
1023
1022
1021
1020
1024
10 generation
According the bar chart above, we can tell that all of them are of an ‘n’ shape, and the
position of maximum value shifts to the left as the number of generation increases.
The following table gives the position (the number of different parents in the earliest
generation) where there are most different pedigrees for tracing back different
generations.
Generation
Position for most
pedigrees
Most number of
Parents
1
2
2
3
3
5
4
8
5
13
2
4
8
16
32
Generation
Position for most
pedigrees
Most number of
Parents
6
22
7
39
8
68
9
120
10
214
64
128
256
512
1024
4. Sex Label a given unlabelled pedigree
Now since we have an idea how many sex labelled pedigrees up to isomorphism
tracing back to any generation from no more than 10. A natural question to ask would
be how many ways there are to sex label a given pedigree if it is not sex labelled.
Consider the number of ways to sex-label a pedigree with c children and p parents
with 1 generation back only, we notice that there are only two ways to sex-label such
a pedigree if it is not possible to partition all the parents and children into two or more
none empty set such that any two children in different sets share no parent and any
two parents in two different sets have no common child. We call the children and
parents group with the above characteristics ‘closed’.
For any given pedigree with only 2 generations (tracing back only 1 generation) of
finite number of parents and children, it is possible to check the number of closed
groups. Since sex-labelling all these different closed groups are independent, and
there are precisely 2 ways to sex-label each of the closed group, the number of ways
to sex-label a pedigree with 2 generation is 2 to the power of number of different
proper closed groups. To say a closed group is proper, we mean it is able to sex-label
this closed group. Naturally, we call a pedigree that is not able to be sex-labelled
improper. An example of an improper closed group is 3 parents in which any two of
them share a child.
In order to distinguish the improper cases, we can adopt the following algorithm. For
any pedigree with two generations, c children and p parents, we label the parents from
1 to p. Start from parent 1 and assign it value 0, and then assign all the parents who
share at least one children with parent 1 the value 1. Suppose parent m is the one with
smallest ordinal number sharing kid(s) with parent 1(1<m<=p), we move on the
consider parent m. Take all parents who share at least one child with parent m, and in
which, we assign parents who have no sex value assigned yet the opposite sex value
of parent m (0 in this case). For those who share child(ren) with parent m but have got
a sex value already, we check whether their assigned sex is indeed the opposite sex
from parent m, if it fails for any of them, this group is improper.
Let’s have a look at an example. For pedigree (4.1), consider the generation with 1,2,3
as parents and 4,5,6 as children. Assign parent 1 with value 0, and then note that both
4,5 are children of parent 1.For child 4, 2 is the other parent and for child 5, 3 is the
other parent, so we assign value 1 to both parent 2 and parent 3. Then we move on the
parent 2, note that both 4 and 6 are children of parent 2. Since we have done with
child 4, we only need to consider child 6.Parent 3 is the other parent of child 6. The
value assigned to parent 3 was 1, but the value of parent 2 is 1 as well, contradiction!
We cannot sex-label pedigree (4.1).
(4.1)
1
2
4
5
3
6
For similar pedigrees as in Section 3, where there is only one individual in generation
0, we can use the method above to figure out the number of ways to sex-label them.
But we cannot simply adopt it if there are two or more individuals in generation 0.
This is because the individuals in generation 0 are identical and will cause further
counting repetitions if we don’t adjust the method as above. Also, note that this only
works for pedigrees of discrete generation, since for non-discrete pedigrees, there may
only be one way to sex-label the parents of some individuals.
5. Unsex-labelled Discrete Generation Model
For the case that all individuals are not sex-labelled, but generation discrete, is it
possible to make a similar recursion to calculate the number of different pedigrees up
to isomorphism? The idea of the original attempt is that each time we trace back a
generation, we check on every individual and keep a track on the isomorphic groups
all these individuals form, using a list. For instance, if we go back 2 generations and
the pedigree looks like the following:
(5.1)
Generation 2
Generation 1
Generation 0
We can write it as:
generation 0: 1,0,0,0,…
generation 1: 0,1,0,0,…
generation 2: 1,1,0,0,…
In generation 0, there is only one individual, so there is one isomorphic group with
only 1 element, and no other groups at all, denote it 1,0,0,0,…
In generation 1, there are 2 individuals in total, and they are isomorphic, so that we
write it as 0,1,0,0,… to denote that there are 1 group of 2 isomorphic elements and no
other groups in this generation.
In generation 2, there are 3 individuals in total, two and only two of which are
isomorphic. So, we denote it as 1,1,0,0,… meaning that there are 1 group of 1 element,
and 1 group of 2 elements in this generation.
Now we can use a similar formula as before to do the recursion:
Ng  1( p)   Ng (c)  A(c, p)
(5.2)
c
Now, c and p are not the number of children and parents, but the lists of isomorphic
groups for the child generation and the parent generation, as our example above. Ng(c)
is the number of different unsex-labelled pedigrees there are up to isomorphism up to
generation g. A(c,p) is the number of different cases that individuals as listed in c can
have parents as listed in p.
Everything seems to work out so far, but then, we discovered a vital problem in this
model. The method to form the list, our way to keep tracks of isomorphic groups is
not informative enough. It may not be sufficient to identify the isomorphic grouping
relation in generation n if we only look at how the pedigree rises from generation n-1
to generation n.
For example, if using the method as above for the six individuals in Generation 3 in
pedigree (5.3), we will get a list of 0,1,0,1,…, 3 and 4 in a group and the rest in the
other.
(5.3)
1
2
3
4
5
6
Generation 3
Generation 2
Generation 1
Generation 0
It seems that individual 1,2,5,6 are all in one isomorphic group. However, it is too
early a stage to say so. If the pedigree up to generation 4 is as pedigree (5.4), then
individual
(5.4)
Generation 4
Generation 3
1
Generation 2
Generation 1
Generation 0
(5.5)
1
2
3
4
5
6. Summary
In this project, we have investigated several questions related to pedigree counting.
For discrete pedigrees with only 1 individual in generation 0, 2e succeeded in finding
a recursion to calculate the number of unsex-labelled pedigrees up to isomorphism
and in finding an algorithm for calculating the ways of sex label a given pedigree. The
number of different sex-labelled pedigrees rises significantly along the number of
generations. It exceeds 1700 digits if we want to go back to 10 generations.
For the unsex-labelled pedigrees, counting is rather hard. Although the number of
unsex-labelled pedigrees should be smaller than the one of sex-labelled pedigrees, it is
a lot harder to identify if two pedigrees are isomorphic and to find a computable
algorithm. We could not adopt the similar recursion as in sex-labelled case and write a
computer programme up.
Appendix
N[countp[4],6]
2.87966 107
N[countp[5],6]
2.81597 1020
N[countp[6],6]
7.40974 1052
N[countp[7],6]
2.76739 10131
N[countp[8],6]
2.88199 10317
N[countp[9],6]
3.52352 10749
N[countp[10],6]
3.89498 101737
Download