Introduction: Complex traits

advertisement
Introduction: Complex Traits
Introduction: Complex traits
Summary
Genotype-phenotype relationships for aetiologically complex traits can be complicated
by incomplete allelic penetrances, genetic heterogeneity, and oligogenic inheritance.
Complex traits do not therefore segregate within extended pedigrees in a
straightforward Mendelian fashion. As a consequence, complex trait aetiologies are
best studied using large samples (for statistical power) of small families, using
analytical methods which make minimal assumptions about underlying genetic models.
This avoids drawing conclusions which are biased by misspecified parameters. Before
any attempt to map genetic susceptibility loci for a complex trait, it is worth applying
non-parametric epidemiological methods to large, small-family samples (twin studies,
sibling relative-risk studies) to gauge the extent to which genetic factors determine the
phenotypic variance of interest.
The manner in which a trait is defined is crucial to the outcome of any study of its
underlying causes. Most complex traits are phenotypically heterogeneous, and results
can therefore be difficult to replicate between complex-trait studies. Quantitative trait
definitions can offer advantages over qualitative definitions for aetiologically complex
traits which are usually continuous and multivariate in nature.
1
Introduction: Complex Traits
Monogenic and complex traits
No trait of an individual organism is determined by either genetic or environmental
causes alone. However, for a population of organisms, it is possible to conceive of
distinct components of trait variance determined by genetic versus environmental
differences between individuals (Hill 1984). Furthermore, traits can be imagined to fall
on a continuous spectrum defined by the simplicity of causal mapping from genetic
differences between individuals to population phenotypic differences (Fisher 1918,
Lander and Schork 1994, Beaudet et al. 1995). So called ‘monogenic’ traits are those at
the extreme end of the spectrum for which a single genetic change causes a marked
phenotype which is distinct from the phenotype of individuals not having that change,
and therefore the different genetic and phenotypic states co-segregate closely in
extended pedigrees (e.g. the human diseases cystic fibrosis, Duchenne muscular
dystrophy, Tay Sachs disease; reviewed in Beaudet et al. 1995).
For monogenic disorders, the genotype-phenotype relationship is robust to other
genetic and environmental differences between individuals provided those other
differences fall within a circumscribed range. Indeed, aetiological complexity affects
most so-called monogenic traits, since environmental and genetic differences additional
to the major single-gene effect usually influence the final developmental outcome. For
example, multiple genetic factors modify the predisposing effect of certain β-globin
alleles to sickle cell anaemia (Dover et al. 1992; Their et al. 1994). Mutations in the
gene CFTR produce different cystic fibrosis phenotypes depending on their specific
chromosomal backgrounds (Kiesewetter et al. 1993).
For complex multifactorial human traits like diabetes, autism and dyslexia, many
genetic and environmental sources of phenotypic variance must combine in the right
2
Introduction: Complex Traits
way to cause the trait to develop to any particular state (Lander and Schork 1994;
Davies et al. 1994, Bailey et al. 1995, Smith et al. 1996). It is uncommon in the study
of these conditions to find extended pedigrees showing patterns of trait inheritance
which are compatible with classical Mendelian dominant, recessive, or X-linked singlegene transmission. Instead, the mapping of genotype to phenotype in aetiologically
complex disorders must be complicated by some or all of the following possibilities, of
which the first three are probably important in most cases (Lander and Schork 1994):
Incomplete penetrance: when a predisposing allele does not cause the development of a
certain phenotypic state in some individuals but does in others, because the phenotype
is modified by other background genetic and environmental influences (e.g in sickle
cell anaemia, Dover et al. 1992).
Phenocopy: when a trait develops to a certain state in individuals who do not have a
specified predisposing allele, because different heterogeneous genetic and
environmental factors can also cause the same or similar trait development, for example
when mutations in different genes affect one particular biochemical pathway or
developmental process. Mutations in a wide range of autosomal and X-linked genes
cause Retinitis Pigmentosa, a disorder involving retinal degeneration (Vervoort et al.
2000).
Oligogenic inheritance: for many human traits, different phenotypic states are probably
associated with different inherited combinations of polymorphisms at several loci
(Fisher 1918, Falconer and Mackay 1996). Segregation studies of common
aetiologically complex traits in humans (see below) consistently provide evidence for
this model (e.g Lubs et al. 1993, Pennington et al. 1991). Oligogenic loci may also
interact with one another during trait development (epistasis), so that individual-locus
effects on trait development may not be purely additive (Hodge 1981). The successful
3
Introduction: Complex Traits
dissection of oligogenic effects in humans has so far been largely confined to modifier
loci in studies of monogenic traits (Lander and Schork 1994). Sickle cell anaemia
provides a classic example, where the β-globin determining effect is altered by at least
two other loci, one autosomal and one X-linked (Dover et al. 1992; Their et al. 1994).
In terms of complex human disorders, the supposed model is that oligogenic
polymorphisms must be transmitted to individuals in the right combinations to bias
their trait development towards some diagnostic threshold (Falconer 1981), at least if a
qualitative definition of the disorder is used (see below).
Imprinting: Parent-of-origin effects on gene activity mediated by locus imprinting can
complicate genotype-phenotype relationships (Pardo-Manual de Villena et al. 2000).
Cases of uniparental disomy have exposed the imprinted loci which cause Prader-Willi
or Angelman syndrome, depending on whether the paternal or maternal copy of a
region on chromosome 15 is deleted in patients (Fulmer-Smentek et al. 2001). Also,
locus imprinting may underlie some sex differences in cognitive functions with genetic
components located on the X chromosome (Skuse et al. 1997).
Other exceptions to Mendelian genotype-phenotype co-segregation: A few human
diseases are caused by unstably transmitted trinucleotide repeat expansions, for which
the degree of expansion can be related to the disease severity (fragile X syndrome,
myotonic dystrophy, Huntington disease, spinobulbar muscular atrophy and
spinocerebellar ataxia type 1; Sutherland et al. 1992; reviewed in Beaudet et al 1994).
Transmission distortion has been demonstrated for at least one locus in the mouse
caused by a process known as meitoic drive (Silver et al. 1993), and similar processes
may occur in humans. Also, selective forces operating against alleles early in
development might result in a seemingly distorted pattern of inheritance post-selection.
4
Introduction: Complex Traits
Large-scale chromosome abnormalities like translocations, deletions or aneuploidy
typically give rise to complex multiple phenotypes which depend on the precise
locations of chromosomal breakpoints and/or the extent of missing genetic material
(Borgaonkar 1994). Chromosome abnormalities are often transmitted in a Mendelian
fashion, but phenotypes can differ in carriers of the same abnormality (e.g NopolaHemmi et al. 2000, see Introduction: Dyslexia phenotype and previous genetic studies),
again illustrating the importance of background oligogenic modifying effects.
Chromosome abnormalities often illustrate the importance of ‘position effects’, where
genes not directly disrupted by the abnormality nevertheless have their activities altered
by a change in their chromosomal proximity to other controlling genetic loci (Kleinjan
and van Heyningen 1998).
Finally, genetic disease transmission also takes place in eukaryotic organisms via the
inheritance of mitochondrial DNA (Wallace 1994).
By definition, monogenic diseases are those for which phenotypic differences
determined by a single genetic locus stand out disproportionately above the normal
ranges of variance in traits related to the disease (Beaudet et al. 1995). The larger the
phenotypic effect that a new mutation has, the more likely that that effect will be
detrimental, since traits will usually develop away from pre-evolved optimal states
(Dawkins 1986). Also, the magnitudes of non-adaptive single-locus effects on trait
development will determine the strengths of selection which operate against their
corresponding mutations (Haldane 1924, 1958; Goodenough 1984). A broad correlation
therefore exists for human genetic diseases between severity, ‘monogenicity’, and low
disease frequency in the population (Beaudet et al. 1995). The total frequency of
monogenic disorders in human populations is ≈1%, with frequencies as low as 1 in
5
Introduction: Complex Traits
50,000 (Beaudet et al. 1995). In contrast, up to 60% of people are estimated to suffer
from disorders with multifactorial genetic backgrounds, especially when late-onset
diseases are included (Baird et al. 1988). This reflects in part that selective pressures
are lower on genetic polymorphisms which have relatively modest individual
phenotypic effects (although several polymorphisms may combine to produce severe
oligogenic phenotypes), and selective pressures are also lower on polymorphisms
which exert their detrimental effects later in life (Charlesworth 1980). In addition, it is
possible that polymorphisms which contribute to the development of complex disorders
might become fixed in a population by having beneficial effects on some traits and
modest detrimental effects on others. This latter model might be especially applicable
in life-history contexts, when youthful advantages might be conferred by a
polymorphism at the expense of health problems later.
Finally, positive or negative seletion pressures on genetic loci are influenced by locusspecific allelic interactions and allelic frequencies (Hartl and Clark 1997). For example,
cystic fibrosis is a severe autosomal recessive disease with a high penetrance, but
susceptibility alleles are surprisingly frequent in the population (1 in 20-25 Caucasions
is heterozygous for a disease allele), so there might have been an unknown selective
advantage associated with disease alleles in the heterozygous state in Caucasian
populations (reviewed in Beaudet et al. 1995). In sickle cell anaemia, heterozygotes
have increased resistance to falciparum malaria susceptibility (Ridley 1993).
In summary, all genetic analysis of phenotypic traits is fundamentally aimed at
describing a statistical relationship between the genetic differences and phenotypic
differences that exist in a population of organisms, and since it is not usually possible
to account for all sources of phenotypic variance in genetic studies, genotype-
6
Introduction: Complex Traits
phenotype relationships are normally described in probabilistic terms (Terwiliger and
Ott 1994). Also, the designation of a trait as monogenic or genetically complex can
only be made in the context of a particular population with regard to its unique
environmental and broader genetic background.
Genetic epidemiology
The way in which a trait is distributed within families or populations yields strong clues
to the complexity of its aetiology. For monogenic diseases, patterns of trait segregation
within pedigrees are compatible with autosomal dominant, recessive or X-linked
transmission of a single susceptibility locus with a high penetrance (Ott 1991;
Terwiliger and Ott 1994; Beaudet et al. 1995). In general, the more generations and
matings within an affected pedigree, the more clearly the pattern of trait segregation
supports one genetic model while excluding others. For example, the transmission of a
trait with equal likelihood from both parental sexes, or the observation of male-to-male
trait transmission, both suggest that the determining locus is autosomal and not Xlinked. The appearance of a trait in children of unaffected parents, and if the frequency
of affected children from those parents is 1 in 4, suggests an autosomal recessive
model. This is particularly true against a background of consanguineous mating, which
increases the chance that a recessive disease-causing allele will be inherited from both
parents. In contrast, the transmission of a trait from a single affected parent to half of
their children suggests an autosomal dominant model, as does the absence of the trait in
children of unaffected individuals within an affected pedigree. These patterns stem
from basic Mendelian principles of gene-transmission and allelic interaction
(Goodenough 1984), but even for monogenic diseases they can be obscured by reduced
allelic penetrances, the presence of phenocopies and genetic heterogeneity, and by
7
Introduction: Complex Traits
oligogenic inheritance (see above). Also, when high frequencies of trait-susceptibility
alleles exist in the general population, the underlying pattern of trait segregation might
be further concealed by the presence within a single pedigree of multiple independent
copies of these alleles (Lander and Schork 1994). The possibility of new mutations
arising within pedigrees must also be considered. Roughly 5x10-6 mutations arise per
gene per generation in humans, so the chance of observing a new mutation in any
individual at a given locus is 1 in 100,000 (Beaudet et al. 1995). However, some
genetic conditions can lead to higher somatic genome-wide mutation rates (Millar et al.
1999).
For monogenic disorders, segregation analysis uses observed patterns of trait
inheritance within extended pedigrees and within their source populations to fit a
parametric inheritance model which is based on a major single-locus-effect with minor
modifier effects (Lalouel et al. 1983; Newman et al. 1988; Terwiliger and Ott 1994).
Direct estimates are made of parameters such as the mode of transmission, disease
allele frequency, allelic penetrances (heterozygous, homozygous, and often sex-specific
as well), and the phenocopy rate which can indicate the degree of genetic heterogeneity
within the sample pedigrees. Often segregation models are fitted using likelihood
techniques, which calculate the probability of observing a given set of affected pedigree
data under various causal hypotheses in a manner similar to monogenic linkage
analysis (Lalouel et al. 1983; Ott 1991; Terwiliger and Ott 1994; see Introduction:
Linkage and linkage disequilibrium). For example, for a straightforward autosomal
dominant disease, determined only by a normal allele A and disease allele a, the
maximum likelihood penetrance estimates of the genotypes AA, Aa and aa will be 0, 1
and 1, the estimated phenocopy rate will be 0, and the frequency estimate of a will be
low if the affected pedigrees show only single-copy transmission. When a trait is purely
8
Introduction: Complex Traits
monogenic, the likelihood of the simplest monogenic model usually stands out far
above the likelihoods of other, competing hypotheses (Ott 1990). In contrast, when a
monogenic effect does not clearly dominate an observed pattern of trait transmission,
additional modifying effects can often be accommodated through the adjustment of one
or several different parameters to yield similar likelihood estimates, but with no
rationale for choosing between models (Ott 1990).
Obviously then, for complex disorders, for which there are by definition no major
single gene effects in most affected pedigrees, it is usually not possible to distinguish
between competing genetic hypotheses using model fitting techniques (Ott 1990;
Lander and Schork 1995). This is a reflection of a broad general principle in statistics,
where the choice of parametric versus non-parametric analytical methods depends on
how many specific assumptions can be made about properties of the data being
analysed. When assumptions are made correctly, their recognition in an analysis
usually provides extra statistical power than if they are not modelled, since in general
fewer properties of the data will remain unexplained by the particular relation being
tested for. However, when assumptions are violated, their inclusion in an analysis can
have unpredictable effects which will often increase the rate of type I (false positive) or
type II (false negative) error (Samuels 1991). For this reason, the study of complex trait
genetics is most successfully performed using non-parametric methods, which sacrifice
a degree of statistical power in return for making fewer assumptions (but never no
assumptions) about underlying causal models (Risch 1990, Lander and Schork 1994,
Weeks and Lathrop 1995). One general consequence of this approach is that power
must instead be derived from large sample sizes in complex trait genetics (Weeks and
Lathrop 1995). The methodological approaches described in this thesis for mapping
loci involved in dyslexia susceptibility will wholly reflect these basic statistical
9
Introduction: Complex Traits
principles, which in essence recognise our very limited knowledge of the phenotype we
are studying.
Sometimes a major-gene subtype of a complex disorder might segregate within
individual pedigrees in a monogenic way (e.g. Fagerheim et al. 1999; see Introduction:
Dyslexia phenotype and previous genetic studies), but generally the segregation of
complex disorders within families is not straightforward by definition, so large
extended pedigrees do not generally provide ideal samples for complex trait studies
(Lander and Schork 1994, Weeks and Lathrop 1995). Therefore, instead of using small
numbers of large pedigrees, genetic studies of complex traits usually rely on large
numbers of small pedigrees (Risch 1990; Davies et al. 1994, Weeks and Lathrop 1995;
Daniels et al. 1996, International Molecular Genetics Study of Autism Consortium
1998), for three main reasons. First, large numbers of pedigrees are needed for
statistical power, and smaller pedigrees are relatively easy to find and collect. Second,
small pedigrees provide independent contributions to the dataset while reflecting the
disorder as manifested in a more general population setting, which for common
complex traits is usually more interesting than the identification of rare monogenic
forms. Third, in the case of nuclear sib-pair families, the individuals manifesting the
trait can be selected to have similar ages and within-family environments, thus
curtailing two major sources of phenotypic variance which might otherwise obscure
any genetically determined variance.
Twin and relative-risk studies - is a trait ‘genetic’?
Two key epidemiological methods based on large numbers of sib-pair families are
commonly used for studying complex traits; twin studies, and sibling relative-risk
studies (Risch 1990; Neale and Cardon 1992; Bishop and Williamson 1990). Both are
10
Introduction: Complex Traits
used to answer the questions of whether and how much variance in a complex trait is
caused by genetic differences between individuals in a given population. The
underlying approach of both is to test for a relationship between the phenotypic
similarity of individuals and their degree of genetic relatedness. Exactly the same logic
applies in genetic linkage analysis, when a test is made for a relationship between the
phenotypic similarity of individuals and their genetic relatedness at a specific
chromosomal locus (Terwiliger and Ott 1994). Consequently some analytical
approaches from epidemiology can be modified to perform linkage analysis, e.g
DeFries-Fulker regression analysis, see below, and Introduction: Linkage and Linkage
Disequilibrium.
For a monogenic trait, a clear pattern of within-pedigree trait segregation is assumed to
demonstrate a genetic cause accounting for the differences between affected and
unaffected members (Beaudet et al. 1995; Terwiliger and Ott 1994), while unaffected
members act as within-family controls for environmental sources of phenotypic
variance. Nonetheless, it can be difficult to discount an environmental cause for a
monogenic trait if any degree of complexity is involved in its aetiology (Beaudet et al.
1995, Lander and Schork 1995; Waldman et al. 1998). Since separate families or
branches of families may often experience different environments, familial clustering
of a trait does not in itself prove that genetic factors cause trait phenotypic variance.
Neither does a decreasing risk of developing the trait in increasingly genetically
dissimilar relatives of affected individuals (relative-risk, see below). When large
numbers of small, independent families are used in a study, the possibility that
environmental differences rather than genetic differences account for between-family
trait variance becomes particularly plausible (Falconer and Mackay 1996).
11
Introduction: Complex Traits
Twin studies control for this problem by quantifying trait differences between MZ
versus DZ twins (Neale and Cardon 1992). The genetic invariance of MZ twins allows
a direct estimate of within-twin-pair environmentally determined trait variance to be
made for a given sample, simply by quantifying the variance in phenotypes within MZ
twinships. Variance in within-pair DZ twin phenotypes is then assumed to be
determined by the same within-pair environmental variance component as for MZ
pairs, as well as by the genetic differences between DZ twins whose coefficient of
relationship is 1 /2. Thus a direct estimate of broad sense trait heritability can be
calculated for a given sample (i.e the proportion of genetically determined trait
variance; Falconer and Mackay 1996). Once the within-twin-pair environmental and
genetic variances are accounted for, any residual sample variance must be due to a third
variance component, which describes environmentally determined variance between
twinships. In twin studies, same-sex DZ twins are usually chosen to eliminate withinpair sex-determined variance in order to justify the variance comparison with MZ twins
(Falconer and Mackay 1996). However, the between-twinship variance component
might partly reflect sex determined trait variance if male-male and female-female
pairings are both included in a sample. Likewise, between-pair variance might typically
include the effects of age on the trait, or any other relevant causal covariate.
There are different qualitative and quantitative methods for analyzing twin data (Neale
and Cardon 1992). In qualitative terms, the absolute risk that an MZ twin of an
‘affected’ proband has of developing a disorder yields a direct estimate of that trait’s
penetrance for a given environment, and comparing MZ:DZ trait concordances yields
an estimate of qualitative trait heritability. To analyze quantitative traits, maximumlikelihood variance components/factor methods, or else twin-pair regression-based
techniques, can be used. One regression method in particular, called DeFries-Fulker
12
Introduction: Complex Traits
(DF) regression (DeFries and Fulker 1985) has been used in twin studies of dyslexia,
which is the trait of study in this thesis. Variants of this method have also been used in
dyslexia genetic linkage analysis and in bivariate phenotypic analysis of reading-related
measures (DeFries and Fulker, 1985; DeFries et al. 1987, Olson et al. 1994a). The
essence of this analysis is to designate as probands those individuals in a twin sample
who fall beyond an arbitrary threshold at one tail of a continuous trait distribution, and
then assess the regression of their cotwin’s scores towards the population mean by
fitting this linear regression model;
C = B1P + B2 R + A
in which C is the cotwin score, P the proband score, B1 adjusts for the average twin
resemblance, R is the overall genetic relationship between the twins (1 or 1/2), and B2
measures the extent of differential cotwin regression, from which the heritability
estimate is derived, assuming that the probands in MZ and DZ pairings have the same
mean.
An important consequence of this approach is that the heritability of the proband
deviation from the mean is estimated. The heritability in the population as a whole will
only be the same if the same genetic and environmental factors which influence
proband deviations also determine individual differences in the normal range, which
need not necessarily be the case for complex oligogenic traits in general (although
confirmatory factor analysis in the normal ability range has been carried out in dyslexia
twin studies; Olson et al. 1991). See Introduction: Linkage and Linkage Disequilibrium
for a modification of the DF regression approach for performing quantitative linkage
analysis which derives power from extreme trait-sampling selection.
Linkage mapping studies of complex traits often rely on the logic of twin studies to
show definitively that traits have genetically determined variance components (Weeks
13
Introduction: Complex Traits
and Lathrop 1995; International Molecular Genetics Study of Autism Consortium
1998; this study, see Introduction: Dyslexia phenotype and previous genetic studies). If
any assumption of the twin model is violated in a particular case, then a false positive
or negative finding of trait heritability might occur (Neale and Cardon 1992; Falconer
and Mackay 1996). In particular, one crucial assumption is that the environmental
variance component is the same in MZ and DZ pairs. This assumption may be violated
when applying simple twin variance models, which typically ignore the possibility of
genotype/environment interactions in determining trait development, or ignore the
possibility that MZ and DZ twin pairs might differ with regard to their in utero
experiences and subsequent parental treatment (Falconer and Mackay 1996). For
example, MZ twins can be of three types according to the arrangement of foetal
membranes, whereas DZ twins are of only one type. Also, genotype/environment
covariance might occur, when different genotypes are selectively raised in different
environmental conditions, and this would affect DZ more than MZ variance (Falconer
and Mackay 1996).
However, these complications notwithstanding, twin studies provide a convenient way
to dissect genetic from environmentally determined phenotypic variance for complex
traits, since the twins are age-, sex-, and family-environment-matched within pairings
(thus reducing several important confounding sources of variance), and the MZ withinpair trait variance is the only human variance which can be modelled with no genetic
component.
For a qualitative trait, an epidemiological parameter called relative risk (λ R ) can be
defined as the ratio of the observed risk of developing the trait in a given class of
relatives of an affected individual to the risk in the general population (Risch 1990,
Bishop and Williamson 1990). The sibling relative risk (λS) is one such measure. A λS
14
Introduction: Complex Traits
above 1 suggests a genetic trait-variance component, although environmental sources
of variance are more difficult to discount than in twin studies, especially for
increasingly distant relatives who typically develop in distinct family environments.
The λS measure is usually preferred to other relative comparisons which may be more
influenced by environmental factors or by the effect of age on a trait (Weeks and Lange
1995). For example, either of these influences might impact on estimates of offspringrelative-risk (λO). Nonetheless, the magnitue of λR will usually be positively related to
the magnitude of any genetic trait variance component, and therefore λ R provides a
broad indication of the available power to map susceptibility loci for a complex trait
(Risch 1990). For the purposes of locus mapping, each individual locus involved in an
oligogenic trait can be thought to determine a locus-specific λR which contributes to the
overall trait λR (Risch 1990; Weeks and Lathrop 1995). The way in which the overall
trait λR decreases for increasingly genetically distant classes of relatives can suggest
different models of how oligogenic loci combine to produce the trait; a rapid decrease
in relative-risk with genetic unrelatedness suggests a multiplicative model, whereas a
slower decrease suggests an additive model (Risch 1990), though environmental factors
might again have an impact.
Defining traits
The causal background to any trait depends crucially on the way the trait is defined or
measured (Lander and Schork 1995, Smith 1996; Waldman et al. 1998; Stoltenburg and
Burmeister 2000). For complex traits, ignorance of the underlying genetic model
usually leads to phenotypic heterogeneity, when a particular trait definition is applied
across individuals whose similar phenotypes arise from a variety of heterogeneous
genetic and environmental causes (Lander and Schork 1995; Smith 1996). For this
15
Introduction: Complex Traits
reason, one potential benefit from dissecting heterogeneous causal models for complex
human disorders is that frameworks for diagnosing or measuring different pathogenic
subtypes might be created, and this in turn could lead to more effectively targeted
methods of intervention.
In general, the more heterogeneous the causes of variance in a trait are, the more
difficult it is to achieve statistical power to identify any of the individual causes using a
sample of a given size (Eaves 1994, Lander and Kruglyak 1995, Risch and Merikangas
1996), and this applies to segregation, twin, adoption, linkage and association studies,
as well as to studies of environmental factors. For this reason, many genetic studies of
human diseases try to use narrowed trait definitions in order to work with more
aetiologically homogeneous samples (see Lander and Schork 1994). This can be
achieved by restricting sample ascertainment parameters such as the age of disease
onset in patients, their clinical severity or clinical sub-type, whether they have a
positive family history for the disorder, by excluding patients with other comorbid
conditions, and also by targeting particular ethnic groups (see Materials and Methods
for the application of these principals in this dyslexia genetic study).
Different approaches to sample ascertainment and phenotypic testing/diagnosis are
invariably used by different researchers in studies of complex traits (Lander and Schork
1995; Smith et al. 1996; Waldman et al. 1998; Stoltenburg and Burmeister 2000), and
this inevitably increases trait phenotypic heterogeneity when considered across studies.
Usually study samples will also differ with regard to their environmental backgrounds.
A typical level of phenotypic heterogeneity has confounded dyslexia genetic research
(Smith 1996, Fisher and Smith 2001). Family samples for dyslexia studies have been
variously ascertained (a) clinically or epidemiologically, (b) from different
countries/continents sometimes using different first languages, (c) from different
16
Introduction: Complex Traits
socioeconomic groups, (d) with different emphases on trait familiality and the size and
segregation patterns of dyslexia within pedigrees, (e) using various criteria to identify
affected individuals which differed in severity, population frequency, targeted dyslexic
subtype and targeted cognitive abilities, and (f) using different methodological,
phenotypic and analytical approaches in their genetic linkage analyses (see
Introduction: Dyslexia phenotype and previous genetic studies). Such variability means
that it can be difficult for independent studies to replicate genetic findings for complex
human disorders, especially to replicate linkage results (Lander and Kruglyak 1995; see
Introduction: Linkage and Linkage Disequilibrium), simply because the causal
background for the trait in any two samples might be very different (Stoltenburg and
Burmeister 2000).
Trait definitions: qualitative or quantitative?
Traits can be defined qualitatively (categorically) or quantitatively. The principal
advantage of a qualitative definition is that it often simplifies analysis (Weeks and
Lathrop 1995), since the trait distribution becomes dichotomous (‘affected’ or ‘not
affected’ in the case of human disorders). Categorical diagnosis is ideally suited to the
study of monogenic diseases for which the difference in phenotypic states between
affected and unaffected individuals stands out above the normal variance in diseaserelated traits. However, categorical diagnosis is also useful when quantitative
phenotypic measures have not been or can not be developed which reliably distinguish
between affected and unaffected individuals, which might otherwise be done, for
example, by using some threshold score. This is often a difficulty in the study of
complex traits, and sometimes too for monogenic traits (International Molecular
Genetics Study of Autism Consortium 1998, Fagerheim et al. 2000).
17
Introduction: Complex Traits
For example, attention deficit/hyperactivity disorder (ADHD) in children, an
aetiologically complex behavioural trait (Gillis et al. 1992; Smalley 1997), is typically
measured using parental or teacher questionnaires together with diagnostic interviews
(APA 1994). These tests yield semi-quantitative measures of the severity of the
disorder by counting the number of ADHD-related qualitative symptoms which a child
shows. Clinicians typically use this data as a guide to making a categorical diagnosis of
ADHD, but they do not necessarily use a strict cut-off score on any one test to make the
diagnosis (Smalley 1997). In such cases the judgement of a clinician, if used well, is
effectively allowing for the fact that complex traits are often multivariate in nature
(Lander and Schork 1994; Weeks and Lathrop 1995; Smith et al. 1996), i.e. they can be
characterised by a range of different phenotypic changes. Some or all of these changes
might have different heterogeneous causes and may or may not be found in all affected
individuals, and for some or all there might be no quantitative measure available.
When a quantitative phenotypic measure has been devised which yields a continuous
frequency distribution with useful properties in study samples (e.g approximate
normality and a single mode), when the measure correlates highly with qualitative
diagnostic calls made by clinicians, and also if the measure is hypothesised to identify
an important causal phenotype related to a disorder, then there are several benefits of
using the quantitative measure directly in phenotypic or genetic analysis (Lander and
Schork 1994, Cardon et al. 1994,1995; Weeks and Lathrop 1995, Daniels et al. 1996;
Fisher et al. 1999). First, by collapsing quantitative measures into dichotomous
diagnostic schemes using a threshold score, much inherent information in quantitative
scales gets thrown away, and therefore statistical power can be lost in many, though not
all cases (see Introduction: Linkage and Linkage Disequilibrium). Second, the choice
of a threshold score to define affected versus non-affected individuals will inevitably be
18
Introduction: Complex Traits
arbitrary for a continuous unimodal trait, not reflecting any real property of the data.
Quantitative measures naturally provide a fuller description of continuous phenotypes,
and are especially applicable to traits with oligogenic backgrounds which inevitably
have continuous, or at least not dichotomous, trait variance (Fisher 1918; Falconer and
Mackay 1996).
Third, the use of a suite of quantitative measures provides flexibility to explore the
multivariate nature of complex disorders, for example through correlation and factor
analyses (see Chapter2). Avoiding arbitrarily imposed cut-offs and categorical
diagnoses can actually simplify analyses at a multivariate level, both practically and
conceptually. Mathematical tools for performing multivariate linkage or association
analysis which can simultaneously model oligogenic and epistatic effects on several
disorder-related quantitative measures are currently being developed (Lon Cardon and
Angela Marlow, Personal Communication). This approach promises to allow very
sophisticated descriptions of variance relationships among quantitative measures
related to human diseases, for example the extent to which variances in measures share
the same or different genetic backgrounds, as well as where the determing loci are
located in the genome. The ultimate description of any trait is quantitative and
multivariate, and not only for diseases, but for all human variation across its entire
range.
A fourth potential benefit of quantitative measures is that they can often be aimed at
narrowly defined aspects of complex multivariate traits, and can therefore have
relatively homogeneous causal backgrounds compared to the globally defined trait, so
facilitating the search for underlying causal factors (Weeks and Lathrop 1995; see
Introduction: Dyslexia phenotype and previous genetic studies). However, in principle,
19
Introduction: Complex Traits
quantitative measures need not be any less phenotypically heterogeneous than
qualitative trait definitions.
Finally, for genetic studies, a major requirement for a quantitative phenotype is
obviously that it should show a proportion of heritable (genetically determined)
variance, from which statistical power to identify genetic factors is both limited and
derived (Lander and Schork 1994; Lander and Kruglyak 1995; Weeks and Lathrop
1995).
20
Download