Counting Pedigrees up to Isomorphism 1. Introduction 2. Definition 3. Sex-labelled Discrete Generation Model 4. Sex Label a given unlabelled pedigree 5. Unsex-labelled Discrete Generation Model 6. Summary Appendix 1. Introduction When investigating problems like reconstruction of pedigrees or the probability of data given a particular pedigree, interesting questions related to pedigrees arise. Simulations based on a model of human population history and geography find that an individual that is the genealogical ancestor of all living humans existed just a few thousand years ago, as suggested in a recently published article by Prof. Hein.1 One related fascinating question, which we are going to investigate in this project, is that how many different pedigrees there actually are up to isomorphism if we trace back a certain number of generations. To do this, we will introduce some definitions. 2. Definition Pedigree: A pedigree is a directed graph showing the relationship between individuals according to their parentage. Both of (2.1) and (2.2) below are examples of pedigrees. Generation 0: We call the most recent generation ‘generation 0’, in this project, there is only 1 individual in generation 0. Discrete: As we said above, the most recent generation is ‘generation 0’, then let the parents of the individual in generation g be in generation g+1, and allow the case that more than one generation number are assigned to one single individual. If no individual in the pedigree belongs to more than one generation, we say the pedigree is discrete. Pedigree (2.1) as below is an example of a discrete pedigree, while (2.2) being nondiscrete. (2.1) Generation 4 Generation 3 Generation 2 Generation 1 Generation 0 1 Hein, Jotun. ‘Pedigrees for all humanity’ Nature Vol 431 September 2004, Page 518-519 (2.2) 3. Sex-labelled Discrete Generation Model To start off, we are going to concentrate on a simpler version of the original question, in which we are only interested in the pedigrees which are sex-labelled, and are of discrete time and generation. In this particular case, since every individual in a pedigree is labelled by the sex, there is a unique link between generation 0 (our starting individual) and any other individuals in the pedigree. This is a very useful characteristic of the sex-labelled pedigrees, in the sense that it guarantees every individual is uniquely determined, which saves us a lot of trouble when counting. Also, the discrete time and generation constraints make it possible for us to adopt a recursion moving back generation by generation. The idea of the counting is as follows: Sampling one individual, the number of grandparents is a natural number between 2 and 4 inclusive. Similarly, the number of different ancestors g generations back in time (the ancestors in generation g only) is a natural number between 2 to 2g inclusive. The number of different pedigrees back to g generations can be easily calculated if we know the number of different pedigrees of 1 generation with c number of children and p number of parents(c is any natural number from 1 to 2g-1 inclusive, and p is any natural number from 2 to 2g inclusive). This number, however, can be calculated easily. Now, let’s write the idea above out in symbolic terms. Let Ng(p) denote the number of different pedigrees going back g generations with 1 individual in generation 0 and p individuals in generation g, and A(c, p) denote the number of different pedigrees of 1 generation with c children and p parents. Adopting these notations, our recursions can be shown as the following expressions: A(c, p) 1i , j c i j p Sc, i Sc, j (3.1) Ng 1( p) Ng (c) A(c, p) (3.2) c In Equation (3.1), Sc,j, the Stirling number of second kind, is the number of different ways to partition c elements into j sets. Here, we assume among those p parents, i of them are female and j of them male. Since the partitions of females and males are independent, A(c, p), the number of different pedigrees of 1 generation with c children and p parents is just the product of Sc,i and Sc, j summing over all possible i,j-s. Notice that among these p parents, at least one of them is female and at most p-1 of them are female, we can rewrite Equation (1.1): A(c, p ) Min[ c , p 1] Sc, i Sc, p i (3.3) i 1 Following this recursion algorithm, we can write a programme called ‘countp’ in Mathematica. Step 1. Delete any existing function with the name ‘countp’; Step 2. Declare fuction ‘countp’ to have input ‘generation’, a integer number variable; Step 3. Declare matrix variables ‘matrixn’, ‘matrixa’, number variable ‘g’, ‘c’, ‘p’; ‘matrixn’—the g,pth entry in matrixn is Ng(p) above. ‘matrixa’—the c,pth entry in matrixa is A(c,p) above. Step 4. Define matrixn to have ‘generation’ number of rows and 2^‘geneation’ number of columns with 1,2th entry being 1 and all the other entries being 0; Step 5. Define matrixa to have 2^(‘generation’ – 1) number of rows and 2^‘geneation’ number of columns with all entries being 0; Step 6. Assign ‘g’ value 2; Step 7. If g<= ‘generation’, let c = 1; Otherwise, go to Step 12; Step 8. If c<= 2^(g-1), let p = 2; Otherwise, go to Step 11; Step 9. If p<=2c, let matrixa[[c,p]] = Summation of Sc,j * Sc,p-j over integer values for j from 1 to the minimum of c and p-1, let matrixn[[g,p]] =matrixn[[g,p]] + matrixn[[g-1,c]]*matrixa[[c,p]], p=p+1; repeat Step 9; Otherwise, go to Step 10; Step 10. Let c=c+1, then go to Step 8; Step 11. Let g=g+1, then go to Step 7; Step 12. Sum the last row of matrixn and return the value. (Matrix[[m,n]] denote the m,n th entry in matrix called ‘Matrix’) Using the programme we wrote in Mathematica, we have got the following result: Generation 2 1 1 1 2 3 0 2 Number of Parents 4 5 6 0 0 0 1 0 0 7 0 0 8 0 0 Total 1 4 4 3 28 84 98 52 12 1 279 Number of different sex labelled pedigrees (#) 2 3 4 5 6 Generation 1 1 4 279 2.8766*10^7 2.81597*10^20 7.40974*10^52 # 8 9 10 Generation 7 # 2.76739*10^131 2.88199*10^317 3.52352*10^749 3.89498*10^1737 From the numbers about, we can see that the number of different sex labelled pedigrees grows very fast with generations. If we trace for 3 generations, then there are 279 sex-labelled pedigrees up to isomorphism, and if we trace back one generation more, the number of pedigrees up to isomorphism rises up to 2.8766*10^7! The recursion programme in Mathematica we used above can calculate up to 10 generations. Beyond that, calculations get very slow. Tracing back each number of generations, I plotted the log of number of different pedigrees which have p distinct parents in the earliest generation against p, getting a log distribution of the number of different pedigrees having a certain number of parents in the last generation for each generation. 2 0.3 7 0.25 6 1.5 0.2 5 0.15 4 1 3 0.1 2 0.5 0.05 1 2 3 4 2 2 generation 20 3 4 5 6 7 8 2 3 generation 3 4 5 6 7 8 9 10 11 12 13 14 15 16 4 generation 50 120 40 100 15 80 30 60 10 20 5 40 10 2 3 4 5 6 7 8 91011121314151617181920212223242526272829303132 20 2345678910 112 113 14 116 517 18 120 921 22 224 325 26 27 229 830 31 333 234 35 337 638 39 441 042 43 445 446 47 449 850 51 553 254 55 557 658 59 661 062 63 64 5 generation 234567810 911 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 100 99 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 6 generation 7 generation 300 700 1750 250 600 1500 500 1250 400 1000 300 750 200 500 100 250 200 150 100 50 2345610 711 812 913 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 100 98 101 99 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 8 generation 210 3411 5613 12 7815 14 917 16 18 20 19 22 21 24 23 26 25 28 27 30 29 32 31 34 33 36 35 38 37 40 39 42 41 44 43 46 45 48 47 50 49 52 51 54 53 56 55 58 57 60 59 62 61 64 63 66 65 68 67 70 69 72 71 74 73 76 75 78 77 80 79 82 81 84 83 86 85 88 87 90 89 92 91 94 93 100 96 95 102 101 98 97 104 103 99 106 105 108 107 110 109 112 111 114 113 116 115 118 117 120 119 122 121 124 123 126 125 128 127 130 129 132 131 134 133 136 135 138 137 140 139 142 141 144 143 146 145 148 147 150 149 152 151 154 153 156 155 158 157 160 159 162 161 164 163 166 165 167 169 168 171 170 173 172 175 174 177 176 179 178 181 180 183 182 185 184 187 186 189 188 191 190 193 192 195 194 197 196 199 198 201 200 203 202 205 204 207 206 209 208 211 210 213 212 215 214 217 216 219 218 221 220 223 222 225 224 227 226 229 228 231 230 233 232 235 234 237 236 239 238 241 240 243 242 245 244 247 246 249 248 251 250 253 252 255 254 257 256 259 258 261 260 263 262 265 264 267 266 269 268 271 270 273 272 275 274 277 276 279 278 281 280 283 282 285 284 287 286 289 288 291 290 293 292 295 294 297 296 299 298 301 300 303 302 305 304 307 306 309 308 311 310 313 312 315 314 317 316 319 318 321 320 323 322 325 324 327 326 329 328 331 330 333 332 335 334 337 336 339 338 340 342 341 344 343 346 345 348 347 350 349 352 351 354 353 356 355 358 357 360 359 362 361 364 363 366 365 368 367 370 369 372 371 374 373 376 375 378 377 380 379 382 381 384 383 386 385 388 387 390 389 392 391 394 393 396 395 398 397 400 399 402 401 404 403 406 405 408 407 410 409 412 411 414 413 416 415 418 417 420 419 422 421 424 423 426 425 428 427 430 429 432 431 434 433 436 435 438 437 440 439 442 441 444 443 446 445 448 447 450 449 452 451 454 453 456 455 458 457 460 459 462 461 464 463 466 465 468 467 470 469 472 471 474 473 476 475 478 477 480 479 482 481 484 483 486 485 488 487 490 489 492 491 494 493 496 495 498 497 500 499 502 501 504 503 506 505 508 507 510 509 511 512 9 generation 13 12 11 10 518 4 3 2 17 16 15 14 922 8 7 6 21 20 19 25 24 23 29 28 27 26 33 32 31 30 38 37 36 35 34 42 41 40 39 46 45 44 43 50 49 48 47 54 53 52 51 58 57 56 55 62 61 60 59 66 65 64 63 71 70 69 68 67 75 74 73 72 79 78 77 76 83 82 81 80 87 86 85 84 91 90 89 88 104 103 102 101 100 95 94 93 92 108 107 106 105 99 98 97 96 112 111 110 109 116 115 114 113 120 119 118 117 124 123 122 121 128 127 126 125 132 131 130 129 137 136 135 134 133 141 140 139 138 145 144 143 142 149 148 147 146 153 152 151 150 157 156 155 154 161 160 159 158 165 164 163 162 170 169 168 167 166 174 173 172 171 178 177 176 175 182 181 180 179 186 185 184 183 190 189 188 187 194 193 192 191 198 197 196 195 203 202 201 200 199 207 206 205 204 211 210 209 208 215 214 213 212 219 218 217 216 223 222 221 220 227 226 225 224 231 230 229 228 236 235 234 233 232 240 239 238 237 244 243 242 241 248 247 246 245 252 251 250 249 256 255 254 253 260 259 258 257 264 263 262 261 269 268 267 266 265 273 272 271 270 277 276 275 274 281 280 279 278 285 284 283 282 289 288 287 286 293 292 291 290 297 296 295 294 302 301 300 299 298 306 305 304 303 310 309 308 307 314 313 312 311 318 317 316 315 322 321 320 319 326 325 324 323 330 329 328 327 335 334 333 332 331 339 338 337 336 343 342 341 340 347 346 345 344 351 350 349 348 355 354 353 352 359 358 357 356 363 362 361 360 368 367 366 365 364 372 371 370 369 376 375 374 373 380 379 378 377 384 383 382 381 388 387 386 385 392 391 390 389 396 395 394 393 401 400 399 398 397 405 404 403 402 409 408 407 406 413 412 411 410 417 416 415 414 421 420 419 418 425 424 423 422 429 428 427 426 434 433 432 431 430 438 437 436 435 442 441 440 439 446 445 444 443 450 449 448 447 454 453 452 451 458 457 456 455 462 461 460 459 467 466 465 464 463 471 470 469 468 475 474 473 472 479 478 477 476 483 482 481 480 487 486 485 484 491 490 489 488 495 494 493 492 500 499 498 497 496 504 503 502 501 508 507 506 505 512 511 510 509 516 515 514 513 520 519 518 517 524 523 522 521 528 527 526 525 533 532 531 530 529 537 536 535 534 541 540 539 538 545 544 543 542 549 548 547 546 553 552 551 550 557 556 555 554 561 560 559 558 566 565 564 563 562 570 569 568 567 574 573 572 571 578 577 576 575 582 581 580 579 586 585 584 583 590 589 588 587 594 593 592 591 599 598 597 596 595 603 602 601 600 607 606 605 604 611 610 609 608 615 614 613 612 619 618 617 616 623 622 621 620 627 626 625 624 632 631 630 629 628 636 635 634 633 640 639 638 637 644 643 642 641 648 647 646 645 652 651 650 649 656 655 654 653 660 659 658 657 665 664 663 662 661 669 668 667 666 673 672 671 670 677 676 675 674 681 680 679 678 685 684 683 682 689 688 687 686 693 692 691 690 698 697 696 695 694 702 701 700 699 706 705 704 703 710 709 708 707 714 713 712 711 718 717 716 715 722 721 720 719 726 725 724 723 731 730 729 728 727 735 734 733 732 739 738 737 736 743 742 741 740 747 746 745 744 751 750 749 748 755 754 753 752 759 758 757 756 764 763 762 761 760 768 767 766 765 772 771 770 769 776 775 774 773 780 779 778 777 784 783 782 781 788 787 786 785 792 791 790 789 797 796 795 794 793 801 800 799 798 805 804 803 802 809 808 807 806 813 812 811 810 817 816 815 814 821 820 819 818 825 824 823 822 830 829 828 827 826 834 833 832 831 838 837 836 835 842 841 840 839 846 845 844 843 850 849 848 847 854 853 852 851 858 857 856 855 863 862 861 860 859 867 866 865 864 871 870 869 868 875 874 873 872 879 878 877 876 883 882 881 880 887 886 885 884 891 890 889 888 896 895 894 893 892 900 899 898 897 904 903 902 901 908 907 906 905 912 911 910 909 916 915 914 913 920 919 918 917 924 923 922 921 929 928 927 926 925 933 932 931 930 937 936 935 934 941 940 939 938 945 944 943 942 949 948 947 946 953 952 951 950 957 956 955 954 962 961 960 959 958 966 965 964 963 970 969 968 967 974 973 972 971 978 977 976 975 982 981 980 979 986 985 984 983 990 989 988 987 1003 1002 1001 1000 995 994 993 992 991 1007 1006 1005 1004 999 998 997 996 1011 1010 1009 1008 1015 1014 1013 1012 1019 1018 1017 1016 1023 1022 1021 1020 1024 10 generation According the bar chart above, we can tell that all of them are of an ‘n’ shape, and the position of maximum value shifts to the left as the number of generation increases. The following table gives the position (the number of different parents in the earliest generation) where there are most different pedigrees for tracing back different generations. Generation Position for most pedigrees Most number of Parents 1 2 2 3 3 5 4 8 5 13 2 4 8 16 32 Generation Position for most pedigrees Most number of Parents 6 22 7 39 8 68 9 120 10 214 64 128 256 512 1024 4. Sex Label a given unlabelled pedigree Now since we have an idea how many sex labelled pedigrees up to isomorphism tracing back to any generation from no more than 10. A natural question to ask would be how many ways there are to sex label a given pedigree if it is not sex labelled. Consider the number of ways to sex-label a pedigree with c children and p parents with 1 generation back only, we notice that there are only two ways to sex-label such a pedigree if it is not possible to partition all the parents and children into two or more none empty set such that any two children in different sets share no parent and any two parents in two different sets have no common child. We call the children and parents group with the above characteristics ‘closed’. For any given pedigree with only 2 generations (tracing back only 1 generation) of finite number of parents and children, it is possible to check the number of closed groups. Since sex-labelling all these different closed groups are independent, and there are precisely 2 ways to sex-label each of the closed group, the number of ways to sex-label a pedigree with 2 generation is 2 to the power of number of different proper closed groups. To say a closed group is proper, we mean it is able to sex-label this closed group. Naturally, we call a pedigree that is not able to be sex-labelled improper. An example of an improper closed group is 3 parents in which any two of them share a child. In order to distinguish the improper cases, we can adopt the following algorithm. For any pedigree with two generations, c children and p parents, we label the parents from 1 to p. Start from parent 1 and assign it value 0, and then assign all the parents who share at least one children with parent 1 the value 1. Suppose parent m is the one with smallest ordinal number sharing kid(s) with parent 1(1<m<=p), we move on the consider parent m. Take all parents who share at least one child with parent m, and in which, we assign parents who have no sex value assigned yet the opposite sex value of parent m (0 in this case). For those who share child(ren) with parent m but have got a sex value already, we check whether their assigned sex is indeed the opposite sex from parent m, if it fails for any of them, this group is improper. Let’s have a look at an example. For pedigree (4.1), consider the generation with 1,2,3 as parents and 4,5,6 as children. Assign parent 1 with value 0, and then note that both 4,5 are children of parent 1.For child 4, 2 is the other parent and for child 5, 3 is the other parent, so we assign value 1 to both parent 2 and parent 3. Then we move on the parent 2, note that both 4 and 6 are children of parent 2. Since we have done with child 4, we only need to consider child 6.Parent 3 is the other parent of child 6. The value assigned to parent 3 was 1, but the value of parent 2 is 1 as well, contradiction! We cannot sex-label pedigree (4.1). (4.1) 1 2 4 5 3 6 For similar pedigrees as in Section 3, where there is only one individual in generation 0, we can use the method above to figure out the number of ways to sex-label them. But we cannot simply adopt it if there are two or more individuals in generation 0. This is because the individuals in generation 0 are identical and will cause further counting repetitions if we don’t adjust the method as above. Also, note that this only works for pedigrees of discrete generation, since for non-discrete pedigrees, there may only be one way to sex-label the parents of some individuals. 5. Unsex-labelled Discrete Generation Model For the case that all individuals are not sex-labelled, but generation discrete, is it possible to make a similar recursion to calculate the number of different pedigrees up to isomorphism? The idea of the original attempt is that each time we trace back a generation, we check on every individual and keep a track on the isomorphic groups all these individuals form, using a list. For instance, if we go back 2 generations and the pedigree looks like the following: (5.1) Generation 2 Generation 1 Generation 0 We can write it as: generation 0: 1,0,0,0,… generation 1: 0,1,0,0,… generation 2: 1,1,0,0,… In generation 0, there is only one individual, so there is one isomorphic group with only 1 element, and no other groups at all, denote it 1,0,0,0,… In generation 1, there are 2 individuals in total, and they are isomorphic, so that we write it as 0,1,0,0,… to denote that there are 1 group of 2 isomorphic elements and no other groups in this generation. In generation 2, there are 3 individuals in total, two and only two of which are isomorphic. So, we denote it as 1,1,0,0,… meaning that there are 1 group of 1 element, and 1 group of 2 elements in this generation. Now we can use a similar formula as before to do the recursion: Ng 1( p) Ng (c) A(c, p) (5.2) c Now, c and p are not the number of children and parents, but the lists of isomorphic groups for the child generation and the parent generation, as our example above. Ng(c) is the number of different unsex-labelled pedigrees there are up to isomorphism up to generation g. A(c,p) is the number of different cases that individuals as listed in c can have parents as listed in p. Everything seems to work out so far, but then, we discovered a vital problem in this model. The method to form the list, our way to keep tracks of isomorphic groups is not informative enough. It may not be sufficient to identify the isomorphic grouping relation in generation n if we only look at how the pedigree rises from generation n-1 to generation n. For example, if using the method as above for the six individuals in Generation 3 in pedigree (5.3), we will get a list of 0,1,0,1,…, 3 and 4 in a group and the rest in the other. (5.3) 1 2 3 4 5 6 Generation 3 Generation 2 Generation 1 Generation 0 It seems that individual 1,2,5,6 are all in one isomorphic group. However, it is too early a stage to say so. If the pedigree up to generation 4 is as pedigree (5.4), then individual (5.4) Generation 4 Generation 3 1 Generation 2 Generation 1 Generation 0 (5.5) 1 2 3 4 5 6. Summary In this project, we have investigated several questions related to pedigree counting. For discrete pedigrees with only 1 individual in generation 0, 2e succeeded in finding a recursion to calculate the number of unsex-labelled pedigrees up to isomorphism and in finding an algorithm for calculating the ways of sex label a given pedigree. The number of different sex-labelled pedigrees rises significantly along the number of generations. It exceeds 1700 digits if we want to go back to 10 generations. For the unsex-labelled pedigrees, counting is rather hard. Although the number of unsex-labelled pedigrees should be smaller than the one of sex-labelled pedigrees, it is a lot harder to identify if two pedigrees are isomorphic and to find a computable algorithm. We could not adopt the similar recursion as in sex-labelled case and write a computer programme up. Appendix N[countp[4],6] 2.87966 107 N[countp[5],6] 2.81597 1020 N[countp[6],6] 7.40974 1052 N[countp[7],6] 2.76739 10131 N[countp[8],6] 2.88199 10317 N[countp[9],6] 3.52352 10749 N[countp[10],6] 3.89498 101737