A LISP Program To Estimate Phenotypes Produced By Equine Breeding Pairs By Susan H. Melville CS 462 AI, Spring 98 Veterinary research colleges throughout the world have been collaborating on genetic mapping of the horse. From a medical standpoint the genetics involving health concerns should be the top priority, and great strides have been made by several institutions. Genetic testing is now available for the mutation causing HYPP in Quarter Horses. Other work involves identifying breeding pairs at risk for producing foals with CIDS in Arabians (similar to AIDS), and "Lung Bleeders" in Thoroughbreds. For most breeders, however, the primary interest is in the ongoing research into coat color and patterns. There are several reasons for this: 1) Economics: Black horses and Black and White spotted horses generally sell for more money than similarly bred horses of other colors. 2) Breed restrictions: Some breeds are restricted to certain colors (such as Palomino), patterns (such as Paints), or lack of patterns (such as Quarter Horse). 3) Health Considerations: Three of the pattern genes (roan, overo, and white) are lethal to the foal if the paring is dominant homogeneous. This program is designed to help breeders estimate the color foals that could result from a pairing. The goal is to eventually include a natural language interface (for input of the parents colors and to provide phenotype descriptions) and to place the program on the Web. First a little terminology: The horse has 32 chromosome pairs; each shaped like a spiraling ladder. Each "rung" in the ladder is a locus for two genes; one from each parent. (A gene is also known as an allele) Each pair of genes represents a upper case, and the recessive gene dominant genes or two recessive it A mixed pair, one dominant and one heterogeneous. trait. The dominant gene is represented in in lower case. If a pair consists of two is referred to as homozygous or homogeneous. recessive, is referred to as heterozygous or The coat color genes of Quarter Horses and Paints are most often represented as follows (and usually in this order); Aa Bb Cc CRcr Dd Ee Ff Gg Oo Pp Rr RBrb STYsty Tt Ww Since we have 15 pairs, we are looking at 230 possible combinations requiring 1,073,741,824 rules! i To narrow down the field lets begin by removing those genes that would be invalid for input as Quarter Horses or Paint breeding pairs. C stands for color, thus cc would be an albino. W stands for white, thus WW would be an albino. These designations are from two different research institutes and may be indicating the same gene. In any case, since the Paint and Quarter Horse registries will not allow albinos to be registered or bred, they will be removed from consideration. Presence of pigment will be assumed. RB stands for rabicano, white hairs in the flanks & base of tail. P stands for pangare, light hairs in the muzzle and flanks. F darkens the lower legs and mane. Since all of these do not alter the registration color of the horse they will be removed from consideration. This leaves us with Aa Bb CRcr Dd Ee Gg Oo Rr STYsty Tt Ten pairs left, only 220 or unacceptable! 1,048,576 combinations now... this is still The answer is in processing by phenotype instead of genotype. The actual genetic combination is called a genotype, however the outward appearance is referred to as a phenotype. Take for example just the gene referred to as 'A', a pairing of an 'Aa' stallion with an 'Aa' mare could result in the following... Foal Foal Foal Foal 1 2 3 4 Aa aA AA aa All have different genotypes, but horses 1,2,3 have the same phenotype because the dominant 'A' has expressed itself. We start by considering the functions of the genes and find that they can be split into 3 groups. base colors Aa Bb Ee STYsty secondary CRcr Dd Gg Rr patterns Tt Oo The 4 base color pairs can produce 16 genotypes, but only 6 phenotypes; Black, Brown, Sorrel, Chestnut, Bay, and Mahogany Bay. Thus, processing the base colors separately will cut down our overall calculations by nearly two thirds, We can eliminate more rules by checking only for the dominant trait where allowable. For all the genes, except R, O, and CR, we need only check for a dominant gene from either parent, combining four checks... sire sire sire sire dominant, dam dominant dominant, dam recessive recessive, dam dominant recessive, dam recessive into two either parent dominant sire recessive, dam recessive We still need to check all possibilities for R, O, and CR because the homozygous state RR or OO produces a stillborn foal and CRCR produces an unregisterable cream colored foal with blue eyes. And to further reduce processing, breedings between Quarter Horses need not be checked for the pattern genes at all. Bottom line: 1,048,576 possible genotypes reflected in only 97 RULES! Can this be made more efficient? Processing wise; yes! Because there is no need to check all 97 rules for EACH subgroup, there is no reason to load all 97 rules at once! Thus the algorithm for processing will be, 1) Find the possible base colors for the foals. 2) Apply secondary genetics to the base colors. 3) If one or both of the parents is a Paint, apply patterns to the secondary results. For example: base color secondary pattern CHESTNUT + CREME + TOBIANO = Palomino Tobiano (Like the horse "Apache" on "Walker Texas Ranger") Translating the base allele codes into LISP atoms. STD LISP DESIGNATION AFFECT A (POINTS) points = main tail and lower legs are darker; black if B present a (~POINTS) no points B (~LIVER) black melanin pigment. Legs mane and tail black when affected by A, but since E can also produce a black body we will call this 'not liver' b (LIVER) no eumelanin, legs just a shade darker when affected by A E (BLACK) black eumelanin pigment not affected by A e (RED) not black, phaeomelanin pigment a.k.a. red STY (SOOTY) sooty or smutty, darkens the coat sty (~SOOTY) no coat change Translating the secondary color allele codes into LISP atoms. STD LISP DESIGNATION AFFECT CR (CREME) dilution of RED, if A and B present horse is a Buckskin, if not, horse is a Palomino. Blacks, Ee or ee, only slightly affected, thus a black with a CR ancestor may produce an unexpected CR foal. cr (~CREME) no change CRCR (2CREME) double dilution of color, BLACK becomes “SILVER SMOKEY”, as for dilution of RED, if A and B present horse is a Perlino, if not, horse is a Cremello. Perlino and Cremello are not allowed as inputs, but are possible outputs of breedings between Palominos and/or Buckskins D (DUN) Dun factor. Color lightened. Primitive marks (stripes) on face, spine, and lower legs. d (~DUN ) no change G (GREY) Grey. Horse will be born normal color and eventually turn grey (to white), including face. g (~GREY) no change R (ROAN) Roan. Horse will be born with white hairs mixed in with colored hairs. Face stays normal color. r (~ROAN) no change RR (LETHAL ROAN) Horse has inherited homogenous roan genes from parents and will be aborted or still born Translating the pattern allele codes into LISP atoms. STD LISP DESIGNATION AFFECT T (TOBIANO) large white patches with sharp edges face usually unchanged, at least one leg white t (~TOBIANO) no change O (OVERO) irregularly edged white patches, horse can range from nearly all white to nearly all colored with a small white patch o (~OVERO) no change OO (LETHAL OVERO) Horse has inherited homogenous overo genes from parents and will be aborted or still born For our test pair we will use a Black and White Tobiano Paint Stallion aa Bb crcr dd Ee gg rr stysty Tt oo = aa Bb Ee stysty | crcr dd gg rr | Tt oo base secondary pattern Buckskin Quarter Horse Mare Aa Bb CRcr dd ee gg rr stysty tt oo = Aa Bb ee stysty | CRcr dd gg rr | tt oo base secondary pattern This program uses the forward chaining procedure we were given in class, but uses two lists, *assertions* and *temp*. *rules* is changed as needed. Enter just the base genetics for each parent: *assertions* ((S ~POINTS) ((S ~LIVER) ((S LIVER) ((S BLACK) ((S RED) ((S ~SOOTY) ((M POINTS) ((M ~POINTS) ((M ~LIVER) ((M LIVER) ((M RED) ((M ~SOOTY) EMPTY-STREAM)))))))))))) ;; load rules (input-base-rules) ;; process base colors (forward-chain) Rule Rule Rule Rule Rule Rule Rule A00 A01 B00 B01 E00 E01 S00 indicates indicates indicates indicates indicates indicates indicates (~POINTS). (POINTS). (LIVER). (~LIVER). (RED). (BLACK). (~SOOTY). *temp* ((~POINTS) ((POINTS) ((LIVER) ((~LIVER) ((RED) ((BLACK) ((~SOOTY) EMPTYSTREAM))))))) The program now moves *temp* into *assertions* and clears *temp* and *rules* *assertions* ((~POINTS) ((POINTS) ((LIVER) ((~LIVER) ((RED) ((BLACK) ((~SOOTY) EMPTYSTREAM))))))) *temp* EMPTY-STREAM NOW THE BASE COLOR RULES ARE LOADED AND PROCESSED (base-color-rules) (forward-chain) Rule BASE0000 indicates (SORREL). Rule BASE0010 indicates (CHOCOLATE). Rule BASE0110 indicates (BLACK). Rule BASE1110 indicates (BAY). I am trying the rules again. Nothing new noted. *assertions* ((~POINTS) ((POINTS) ((LIVER) ((~LIVER) ((RED) ((BLACK) ((~SOOTY) EMPTYSTREAM))))))) *temp* ((SORREL) ((CHOCOLATE) ((BLACK) ((BAY) EMPTY-STREAM)))) *assertions* and *rules* are cleared and the rules for processing the secondary gene pairings of the parents are loaded *temp* is not cleared *assertions* EMPTY-STREAM *temp* ((SORREL) ((CHOCOLATE) ((BLACK) ((BAY) EMPTY-STREAM)))) the secondary gene pairings of the parents are now loaded and processed ;; base genes are now in *temp* ;; clear assertions (setf *assertions* 'empty-stream) ;; load new rules (input-secondary-rules) ;; input parents secondary colors (remember-assertion '(S ~CREME)) (remember-assertion '(S ~CREME)) (remember-assertion '(S ~DUN)) (remember-assertion '(S ~DUN)) (remember-assertion '(S ~GREY)) (remember-assertion '(S ~GREY)) (remember-assertion '(S ~ROAN)) (remember-assertion '(S ~ROAN)) (remember-assertion '(M CREME)) (remember-assertion '(M ~CREME)) (remember-assertion '(M ~DUN)) (remember-assertion '(M ~DUN)) (remember-assertion '(M ~GREY)) (remember-assertion '(M ~GREY)) (remember-assertion '(M ~ROAN)) (remember-assertion '(M ~ROAN)) The new assertions *assertions* ((S ~CREME) ((S ~DUN) ((S ~GREY) ((S ~ROAN) ((M CREME) ((M ~CREME) ((M ~DUN) ((M ~GREY) ((M ~ROAN) EMPTY-STREAM))))))))) ;; process (forward-chain) Rule Rule Rule Rule Rule C00 C01 D00 G00 R00 indicates indicates indicates indicates indicates (~CREME). (CREME). (~DUN). (~GREY). (~ROAN). The processing adds the results to the base colors already in *temp* *temp* ((SORREL) ((CHOCOLATE) ((BLACK) ((BAY) ((~CREME) ((CREME) ((~DUN) ((~GREY) ((~ROAN) EMPTY-STREAM))))))))) We now replace *assertions* with those in *temp* then clear *temp*. (setf *assertions* *temp*) (setf *temp* 'empty-stream) *assertions* ((SORREL) ((CHOCOLATE) ((BLACK) ((BAY) ((~CREME) ((CREME) ((~DUN) ((~GREY) ((~ROAN) EMPTY-STREAM))))))))) *temp* EMPTY-STREAM Clear the old *rules* and load new ones, and process. ;; load new rules (output-secondary-rules) ;; process (forward-chain) Rule Rule Rule Rule Rule Rule Rule Rule SORREL0000 indicates (RED SORREL). SORREL1000 indicates (LIGHT PALOMINO). BAY0000 indicates (CLASSIC BAY). BAY1000 indicates (LIGHT BUCKSKIN). CHOCOLATE0000 indicates (CHOCOLATE CHESTNUT). CHOCOLATE1000 indicates (CHOCOLATE PALOMINO). BLACK0000 indicates (JET BLACK). BLACK1000 indicates (SMOKY BLACK). *assertions* ((SORREL) ((CHOCOLATE) ((BLACK) ((BAY) ((~CREME) ((CREME) ((~DUN) ((~GREY) ((~ROAN) EMPTY-STREAM))))))))) *temp* ((RED SORREL) ((LIGHT PALOMINO) ((CLASSIC BAY) ((LIGHT BUCKSKIN) ((CHOCOLATE CHESTNUT) ((CHOCOLATE PALOMINO) ((JET BLACK) ((SMOKY BLACK) EMPTY-STREAM)))))))) If we were processing two Quarter Horses *temp* would be our final answer, but one of the parents, the stallion is a Paint... so we need to check for coat patterns. We leave *temp* alone and load the parent's pattern genes into *assertions* ;; clear *assertions* (setf *assertions* 'empty-stream) ;; input parent's patterns (remember-assertion '(S TOBIANO)) (remember-assertion '(S ~TOBIANO)) (remember-assertion '(S ~OVERO)) (remember-assertion '(S ~OVERO)) (remember-assertion '(M ~TOBIANO)) (remember-assertion '(M ~TOBIANO)) (remember-assertion '(M ~OVERO)) (remember-assertion '(M ~OVERO)) *assertions* ((S TOBIANO) ((S ~TOBIANO) ((S ~OVERO) ((M ~TOBIANO) ((M ~OVERO) EMPTYSTREAM))))) *temp* ((RED SORREL) ((LIGHT PALOMINO) ((CLASSIC BAY) ((LIGHT BUCKSKIN) ((CHOCOLATE CHESTNUT) ((CHOCOLATE PALOMINO) ((JET BLACK) ((SMOKY BLACK) EMPTY-STREAM)))))))) Clear the old rules, load the new ones, and process ;; load new rules (input-pattern-rules) (forward-chain) Rule T00 indicates (~TOBIANO PATTERN GENE). Rule T01 indicates (TOBIANO PATTERN GENE). Rule O00 indicates (~OVERO PATTERN GENE). I am trying the rules again. Nothing new noted. The processing adds the pattern genes to the base colors already in *temp* *temp* ((RED SORREL) ((LIGHT PALOMINO) ((CLASSIC BAY) ((LIGHT BUCKSKIN) ((CHOCOLATE CHESTNUT) ((CHOCOLATE PALOMINO) ((JET BLACK) ((SMOKY BLACK) ((~TOBIANO PATTERN GENE) ((TOBIANO PATTERN GENE) ((~OVERO PATTERN GENE) EMPTY-STREAM))))))))))) We now replace *assertions* with those in *temp* then clear *temp*. (setf *assertions* *temp*) (setf *temp* 'empty-stream) Clear the old rules, load the new ones, and process ;; load new rules (output-pattern-rules) (forward-chain) Rule Rule Rule Rule Rule Rule Rule Rule Rule Rule Rule Rule Rule Rule Rule Rule PATTERN2 PATTERN2 PATTERN2 PATTERN2 PATTERN2 PATTERN2 PATTERN2 PATTERN2 PATTERN5 PATTERN5 PATTERN5 PATTERN5 PATTERN5 PATTERN5 PATTERN5 PATTERN5 indicates indicates indicates indicates indicates indicates indicates indicates indicates indicates indicates indicates indicates indicates indicates indicates (RED SORREL & WHITE TOBIANO). (LIGHT PALOMINO & WHITE TOBIANO). (CLASSIC BAY & WHITE TOBIANO). (LIGHT BUCKSKIN & WHITE TOBIANO). (CHOCOLATE CHESTNUT & WHITE TOBIANO). (CHOCOLATE PALOMINO & WHITE TOBIANO). (JET BLACK & WHITE TOBIANO). (SMOKY BLACK & WHITE TOBIANO). (SOLID RED SORREL). (SOLID LIGHT PALOMINO). (SOLID CLASSIC BAY). (SOLID LIGHT BUCKSKIN). (SOLID CHOCOLATE CHESTNUT). (SOLID CHOCOLATE PALOMINO). (SOLID JET BLACK). (SOLID SMOKY BLACK). *temp* now contains the final results. Future plans (hopes?) for this program: 1. Further streamlining by combining secondary phenotypes with “or” logic. example “if sorrel or chestnut” 2. Addition of Appaloosa genotypes. 3. Refinement as more equine genetic mapping information evolves. 4. Natural language interface for entering colors of parents and for descriptions of colors. 5. Porting the final program to my website; http://www.neca.com/~melville/stallion/ References: Geurts, Reiner: Hair Colour in the Horse. (1973) Holland, (1977) English Translation J.A. Allen & Co., London Hillenbrand, Laura: Breeding For Color. (1992) Equus #182, Dec. North, Ed: Breeding For Color. (1992) Terry, Miss.: Northfork Press Spononberg, D. Phillip: Horse of an Unexpected Color. (1989) Equus #137, Mar. University of California at Davis: Equine Genetics. (1998) Web site: http://www.vgl.ucdavis.edu/~lvmillon/ Attachments: project.lsp Note: Appaloosas have additional coat patterns but to limit the database this project involves only Quarter Horses and Paints. i