A LISP Program To Estimate Phenotypes Probabilities In Equine

advertisement
A LISP Program To Estimate Phenotypes Produced By Equine Breeding Pairs
By Susan H. Melville
CS 462 AI, Spring 98
Veterinary research colleges throughout the world have been collaborating on
genetic mapping of the horse. From a medical standpoint the genetics involving
health concerns should be the top priority, and great strides have been made by
several institutions. Genetic testing is now available for the mutation causing
HYPP in Quarter Horses. Other work involves identifying breeding pairs at risk
for producing foals with CIDS in Arabians (similar to AIDS), and "Lung Bleeders"
in Thoroughbreds. For most breeders, however, the primary interest is in the ongoing research into coat color and patterns. There are several reasons for this:
1) Economics: Black horses and Black and White spotted horses generally sell for
more money than similarly bred horses of other colors.
2) Breed restrictions: Some breeds are restricted to certain colors (such as
Palomino), patterns (such as Paints), or lack of patterns (such as Quarter
Horse).
3) Health Considerations: Three of the pattern genes (roan, overo, and white)
are lethal to the foal if the paring is dominant homogeneous.
This program is designed to help breeders estimate the color foals that could
result from a pairing. The goal is to eventually include a natural language
interface (for input of the parents colors and to provide phenotype
descriptions) and to place the program on the Web.
First a little terminology:
The horse has 32 chromosome pairs; each shaped like a spiraling ladder. Each
"rung" in the ladder is a locus for two genes; one from each parent. (A gene is
also known as an allele)
Each pair of genes represents a
upper case, and the recessive gene
dominant genes or two recessive it
A mixed pair, one dominant and one
heterogeneous.
trait. The dominant gene is represented in
in lower case. If a pair consists of two
is referred to as homozygous or homogeneous.
recessive, is referred to as heterozygous or
The coat color genes of Quarter Horses and Paints are most often represented as
follows (and usually in this order);
Aa Bb Cc CRcr Dd Ee Ff Gg Oo Pp Rr RBrb STYsty Tt Ww
Since we have 15 pairs, we are looking at 230 possible combinations requiring
1,073,741,824 rules! i
To narrow down the field lets begin by removing those genes that would be
invalid for input as Quarter Horses or Paint breeding pairs.
C stands for color, thus cc would be an albino.
W stands for white, thus WW would be an albino.
These designations are from two different research institutes and may be
indicating the same gene. In any case, since the Paint and Quarter Horse
registries will not allow albinos to be registered or bred, they will be removed
from consideration. Presence of pigment will be assumed.
RB stands for rabicano, white hairs in the flanks & base of tail.
P stands for pangare, light hairs in the muzzle and flanks.
F darkens the lower legs and mane.
Since all of these do not alter the registration color of the horse they will be
removed from consideration.
This leaves us with
Aa Bb CRcr Dd Ee Gg Oo Rr STYsty Tt
Ten pairs left, only 220 or
unacceptable!
1,048,576 combinations now... this is still
The answer is in processing by phenotype instead of genotype.
The actual genetic combination is called a genotype, however the outward
appearance is referred to as a phenotype. Take for example just the gene
referred to as 'A', a pairing of an 'Aa' stallion with an 'Aa' mare could result
in the following...
Foal
Foal
Foal
Foal
1
2
3
4
Aa
aA
AA
aa
All have different genotypes, but horses 1,2,3 have the same phenotype because
the dominant 'A' has expressed itself.
We start by considering the functions of the genes and find that they can be
split into 3 groups.
base colors
Aa Bb Ee STYsty
secondary
CRcr Dd Gg Rr
patterns
Tt Oo
The 4 base color pairs can produce 16 genotypes, but only 6 phenotypes;
Black, Brown, Sorrel, Chestnut, Bay, and Mahogany Bay. Thus, processing the base
colors separately will cut down our overall calculations by nearly two thirds,
We can eliminate more rules by checking only for the dominant trait where
allowable. For all the genes, except R, O, and CR, we need only check for a
dominant gene from either parent, combining four checks...
sire
sire
sire
sire
dominant, dam dominant
dominant, dam recessive
recessive, dam dominant
recessive, dam recessive
into two
either parent dominant
sire recessive, dam recessive
We still need to check all possibilities for R, O, and CR because the homozygous
state RR or OO produces a stillborn foal and CRCR produces an unregisterable
cream colored foal with blue eyes.
And to further reduce processing, breedings between Quarter Horses need not be
checked for the pattern genes at all.
Bottom line: 1,048,576 possible genotypes reflected in only 97 RULES!
Can this be made more efficient? Processing wise; yes!
Because there is no need to check all 97 rules for EACH subgroup, there is no
reason to load all 97 rules at once!
Thus the algorithm for processing will be,
1) Find the possible base colors for the foals.
2) Apply secondary genetics to the base colors.
3) If one or both of the parents is a Paint, apply patterns to the secondary
results.
For example:
base color
secondary
pattern
CHESTNUT
+
CREME
+
TOBIANO = Palomino Tobiano
(Like the horse "Apache" on "Walker Texas Ranger")
Translating the base allele codes into LISP atoms.
STD
LISP DESIGNATION
AFFECT
A
(POINTS)
points = main tail and lower legs
are darker; black if B present
a
(~POINTS)
no points
B
(~LIVER)
black melanin pigment. Legs mane
and tail black when affected by A,
but since E can also produce a black body
we will call this 'not liver'
b
(LIVER)
no eumelanin, legs just a shade darker
when affected by A
E
(BLACK)
black eumelanin pigment not affected by A
e
(RED)
not black, phaeomelanin pigment a.k.a. red
STY
(SOOTY)
sooty or smutty, darkens the coat
sty
(~SOOTY)
no coat change
Translating the secondary color allele codes into LISP atoms.
STD
LISP DESIGNATION
AFFECT
CR
(CREME)
dilution of RED, if A and B present horse is a
Buckskin, if not, horse is a Palomino. Blacks, Ee
or ee, only slightly affected, thus a black with a CR
ancestor may produce an unexpected CR foal.
cr
(~CREME)
no change
CRCR
(2CREME)
double dilution of color, BLACK becomes “SILVER
SMOKEY”, as for dilution of RED, if A and B
present horse is a Perlino, if not, horse
is a Cremello. Perlino and Cremello are not
allowed as inputs, but are possible outputs
of breedings between Palominos and/or Buckskins
D
(DUN)
Dun factor. Color lightened. Primitive marks
(stripes) on face, spine, and lower legs.
d
(~DUN )
no change
G
(GREY)
Grey. Horse will be born normal color and
eventually turn grey (to white), including face.
g
(~GREY)
no change
R
(ROAN)
Roan. Horse will be born with white hairs mixed
in with colored hairs. Face stays normal color.
r
(~ROAN)
no change
RR
(LETHAL ROAN)
Horse has inherited homogenous roan genes from
parents and will be aborted or still born
Translating the pattern allele codes into LISP atoms.
STD
LISP DESIGNATION
AFFECT
T
(TOBIANO)
large white patches with sharp edges
face usually unchanged, at least one leg white
t
(~TOBIANO)
no change
O
(OVERO)
irregularly edged white patches, horse can range
from nearly all white to nearly all colored with a
small white patch
o
(~OVERO)
no change
OO
(LETHAL OVERO)
Horse has inherited homogenous overo genes from
parents and will be aborted or still born
For our test pair we will use a
Black and White Tobiano Paint Stallion
aa Bb crcr dd Ee gg rr stysty Tt oo = aa Bb Ee stysty | crcr dd gg rr | Tt oo
base
secondary
pattern
Buckskin Quarter Horse Mare
Aa Bb CRcr dd ee gg rr stysty tt oo = Aa Bb ee stysty | CRcr dd gg rr | tt oo
base
secondary
pattern
This program uses the forward chaining procedure we were given in class, but
uses two lists, *assertions* and *temp*. *rules* is changed as needed.
Enter just the base genetics for each parent:
*assertions*
((S ~POINTS) ((S ~LIVER) ((S LIVER) ((S BLACK) ((S RED) ((S ~SOOTY)
((M POINTS) ((M ~POINTS) ((M ~LIVER) ((M LIVER) ((M RED) ((M ~SOOTY)
EMPTY-STREAM))))))))))))
;; load rules
(input-base-rules)
;; process base colors
(forward-chain)
Rule
Rule
Rule
Rule
Rule
Rule
Rule
A00
A01
B00
B01
E00
E01
S00
indicates
indicates
indicates
indicates
indicates
indicates
indicates
(~POINTS).
(POINTS).
(LIVER).
(~LIVER).
(RED).
(BLACK).
(~SOOTY).
*temp*
((~POINTS) ((POINTS) ((LIVER) ((~LIVER) ((RED) ((BLACK) ((~SOOTY) EMPTYSTREAM)))))))
The program now moves *temp* into *assertions* and clears *temp*
and *rules*
*assertions*
((~POINTS) ((POINTS) ((LIVER) ((~LIVER) ((RED) ((BLACK) ((~SOOTY) EMPTYSTREAM)))))))
*temp*
EMPTY-STREAM
NOW THE BASE COLOR RULES ARE LOADED AND PROCESSED
(base-color-rules)
(forward-chain)
Rule BASE0000 indicates (SORREL).
Rule BASE0010 indicates (CHOCOLATE).
Rule BASE0110 indicates (BLACK).
Rule BASE1110 indicates (BAY).
I am trying the rules again.
Nothing new noted.
*assertions*
((~POINTS) ((POINTS) ((LIVER) ((~LIVER) ((RED) ((BLACK) ((~SOOTY) EMPTYSTREAM)))))))
*temp*
((SORREL) ((CHOCOLATE) ((BLACK) ((BAY) EMPTY-STREAM))))
*assertions* and *rules* are cleared and the rules for processing the secondary
gene pairings of the parents are loaded
*temp* is not cleared
*assertions*
EMPTY-STREAM
*temp*
((SORREL) ((CHOCOLATE) ((BLACK) ((BAY) EMPTY-STREAM))))
the secondary gene pairings of the parents are now loaded and processed
;; base genes are now in *temp*
;; clear assertions
(setf *assertions* 'empty-stream)
;; load new rules
(input-secondary-rules)
;; input parents secondary colors
(remember-assertion '(S ~CREME))
(remember-assertion '(S ~CREME))
(remember-assertion '(S ~DUN))
(remember-assertion '(S ~DUN))
(remember-assertion '(S ~GREY))
(remember-assertion '(S ~GREY))
(remember-assertion '(S ~ROAN))
(remember-assertion '(S ~ROAN))
(remember-assertion '(M CREME))
(remember-assertion '(M ~CREME))
(remember-assertion '(M ~DUN))
(remember-assertion '(M ~DUN))
(remember-assertion '(M ~GREY))
(remember-assertion '(M ~GREY))
(remember-assertion '(M ~ROAN))
(remember-assertion '(M ~ROAN))
The new assertions
*assertions*
((S ~CREME) ((S ~DUN) ((S ~GREY) ((S ~ROAN) ((M CREME) ((M ~CREME) ((M ~DUN) ((M
~GREY) ((M ~ROAN) EMPTY-STREAM)))))))))
;; process
(forward-chain)
Rule
Rule
Rule
Rule
Rule
C00
C01
D00
G00
R00
indicates
indicates
indicates
indicates
indicates
(~CREME).
(CREME).
(~DUN).
(~GREY).
(~ROAN).
The processing adds the results to the base colors already in *temp*
*temp*
((SORREL) ((CHOCOLATE) ((BLACK) ((BAY) ((~CREME) ((CREME) ((~DUN)
((~GREY) ((~ROAN) EMPTY-STREAM)))))))))
We now replace *assertions* with those in *temp* then clear *temp*.
(setf *assertions* *temp*)
(setf *temp* 'empty-stream)
*assertions*
((SORREL) ((CHOCOLATE) ((BLACK) ((BAY) ((~CREME) ((CREME) ((~DUN)
((~GREY) ((~ROAN) EMPTY-STREAM)))))))))
*temp*
EMPTY-STREAM
Clear the old *rules* and load new ones, and process.
;; load new rules
(output-secondary-rules)
;; process
(forward-chain)
Rule
Rule
Rule
Rule
Rule
Rule
Rule
Rule
SORREL0000 indicates (RED SORREL).
SORREL1000 indicates (LIGHT PALOMINO).
BAY0000 indicates (CLASSIC BAY).
BAY1000 indicates (LIGHT BUCKSKIN).
CHOCOLATE0000 indicates (CHOCOLATE CHESTNUT).
CHOCOLATE1000 indicates (CHOCOLATE PALOMINO).
BLACK0000 indicates (JET BLACK).
BLACK1000 indicates (SMOKY BLACK).
*assertions*
((SORREL) ((CHOCOLATE) ((BLACK) ((BAY) ((~CREME) ((CREME) ((~DUN)
((~GREY) ((~ROAN) EMPTY-STREAM)))))))))
*temp*
((RED SORREL) ((LIGHT PALOMINO) ((CLASSIC BAY) ((LIGHT BUCKSKIN)
((CHOCOLATE CHESTNUT) ((CHOCOLATE PALOMINO) ((JET BLACK) ((SMOKY BLACK)
EMPTY-STREAM))))))))
If we were processing two Quarter Horses *temp* would be our final answer, but
one of the parents, the stallion is a Paint... so we need to check for coat
patterns.
We leave *temp* alone and load the parent's pattern genes into *assertions*
;; clear *assertions*
(setf *assertions* 'empty-stream)
;; input parent's patterns
(remember-assertion '(S TOBIANO))
(remember-assertion '(S ~TOBIANO))
(remember-assertion '(S ~OVERO))
(remember-assertion '(S ~OVERO))
(remember-assertion '(M ~TOBIANO))
(remember-assertion '(M ~TOBIANO))
(remember-assertion '(M ~OVERO))
(remember-assertion '(M ~OVERO))
*assertions*
((S TOBIANO) ((S ~TOBIANO) ((S ~OVERO) ((M ~TOBIANO) ((M ~OVERO) EMPTYSTREAM)))))
*temp*
((RED SORREL) ((LIGHT PALOMINO) ((CLASSIC BAY) ((LIGHT BUCKSKIN)
((CHOCOLATE CHESTNUT) ((CHOCOLATE PALOMINO) ((JET BLACK) ((SMOKY BLACK)
EMPTY-STREAM))))))))
Clear the old rules, load the new ones, and process
;; load new rules
(input-pattern-rules)
(forward-chain)
Rule T00 indicates (~TOBIANO PATTERN GENE).
Rule T01 indicates (TOBIANO PATTERN GENE).
Rule O00 indicates (~OVERO PATTERN GENE).
I am trying the rules again.
Nothing new noted.
The processing adds the pattern genes to the base colors already in *temp*
*temp*
((RED SORREL) ((LIGHT PALOMINO) ((CLASSIC BAY) ((LIGHT BUCKSKIN)
((CHOCOLATE CHESTNUT) ((CHOCOLATE PALOMINO) ((JET BLACK) ((SMOKY BLACK)
((~TOBIANO PATTERN GENE) ((TOBIANO PATTERN GENE) ((~OVERO PATTERN GENE)
EMPTY-STREAM)))))))))))
We now replace *assertions* with those in *temp* then clear *temp*.
(setf *assertions* *temp*)
(setf *temp* 'empty-stream)
Clear the old rules, load the new ones, and process
;; load new rules
(output-pattern-rules)
(forward-chain)
Rule
Rule
Rule
Rule
Rule
Rule
Rule
Rule
Rule
Rule
Rule
Rule
Rule
Rule
Rule
Rule
PATTERN2
PATTERN2
PATTERN2
PATTERN2
PATTERN2
PATTERN2
PATTERN2
PATTERN2
PATTERN5
PATTERN5
PATTERN5
PATTERN5
PATTERN5
PATTERN5
PATTERN5
PATTERN5
indicates
indicates
indicates
indicates
indicates
indicates
indicates
indicates
indicates
indicates
indicates
indicates
indicates
indicates
indicates
indicates
(RED SORREL & WHITE TOBIANO).
(LIGHT PALOMINO & WHITE TOBIANO).
(CLASSIC BAY & WHITE TOBIANO).
(LIGHT BUCKSKIN & WHITE TOBIANO).
(CHOCOLATE CHESTNUT & WHITE TOBIANO).
(CHOCOLATE PALOMINO & WHITE TOBIANO).
(JET BLACK & WHITE TOBIANO).
(SMOKY BLACK & WHITE TOBIANO).
(SOLID RED SORREL).
(SOLID LIGHT PALOMINO).
(SOLID CLASSIC BAY).
(SOLID LIGHT BUCKSKIN).
(SOLID CHOCOLATE CHESTNUT).
(SOLID CHOCOLATE PALOMINO).
(SOLID JET BLACK).
(SOLID SMOKY BLACK).
*temp* now contains the final results.
Future plans (hopes?) for this program:
1. Further streamlining by combining secondary phenotypes with “or” logic.
example “if sorrel or chestnut”
2. Addition of Appaloosa genotypes.
3. Refinement as more equine genetic mapping information evolves.
4. Natural language interface for entering colors of parents and for
descriptions of colors.
5. Porting the final program to my website;
http://www.neca.com/~melville/stallion/
References:
Geurts, Reiner: Hair Colour in the Horse. (1973) Holland, (1977) English
Translation J.A. Allen & Co., London
Hillenbrand, Laura: Breeding For Color. (1992) Equus #182, Dec.
North, Ed: Breeding For Color. (1992) Terry, Miss.: Northfork Press
Spononberg, D. Phillip: Horse of an Unexpected Color. (1989) Equus #137, Mar.
University of California at Davis: Equine Genetics. (1998) Web site:
http://www.vgl.ucdavis.edu/~lvmillon/
Attachments:
project.lsp
Note: Appaloosas have additional coat patterns but to limit the database this
project involves only Quarter Horses and Paints.
i
Download