Introduction: Complex Traits Introduction: Complex traits Summary Genotype-phenotype relationships for aetiologically complex traits can be complicated by incomplete allelic penetrances, genetic heterogeneity, and oligogenic inheritance. Complex traits do not therefore segregate within extended pedigrees in a straightforward Mendelian fashion. As a consequence, complex trait aetiologies are best studied using large samples (for statistical power) of small families, using analytical methods which make minimal assumptions about underlying genetic models. This avoids drawing conclusions which are biased by misspecified parameters. Before any attempt to map genetic susceptibility loci for a complex trait, it is worth applying non-parametric epidemiological methods to large, small-family samples (twin studies, sibling relative-risk studies) to gauge the extent to which genetic factors determine the phenotypic variance of interest. The manner in which a trait is defined is crucial to the outcome of any study of its underlying causes. Most complex traits are phenotypically heterogeneous, and results can therefore be difficult to replicate between complex-trait studies. Quantitative trait definitions can offer advantages over qualitative definitions for aetiologically complex traits which are usually continuous and multivariate in nature. 1 Introduction: Complex Traits Monogenic and complex traits No trait of an individual organism is determined by either genetic or environmental causes alone. However, for a population of organisms, it is possible to conceive of distinct components of trait variance determined by genetic versus environmental differences between individuals (Hill 1984). Furthermore, traits can be imagined to fall on a continuous spectrum defined by the simplicity of causal mapping from genetic differences between individuals to population phenotypic differences (Fisher 1918, Lander and Schork 1994, Beaudet et al. 1995). So called ‘monogenic’ traits are those at the extreme end of the spectrum for which a single genetic change causes a marked phenotype which is distinct from the phenotype of individuals not having that change, and therefore the different genetic and phenotypic states co-segregate closely in extended pedigrees (e.g. the human diseases cystic fibrosis, Duchenne muscular dystrophy, Tay Sachs disease; reviewed in Beaudet et al. 1995). For monogenic disorders, the genotype-phenotype relationship is robust to other genetic and environmental differences between individuals provided those other differences fall within a circumscribed range. Indeed, aetiological complexity affects most so-called monogenic traits, since environmental and genetic differences additional to the major single-gene effect usually influence the final developmental outcome. For example, multiple genetic factors modify the predisposing effect of certain β-globin alleles to sickle cell anaemia (Dover et al. 1992; Their et al. 1994). Mutations in the gene CFTR produce different cystic fibrosis phenotypes depending on their specific chromosomal backgrounds (Kiesewetter et al. 1993). For complex multifactorial human traits like diabetes, autism and dyslexia, many genetic and environmental sources of phenotypic variance must combine in the right 2 Introduction: Complex Traits way to cause the trait to develop to any particular state (Lander and Schork 1994; Davies et al. 1994, Bailey et al. 1995, Smith et al. 1996). It is uncommon in the study of these conditions to find extended pedigrees showing patterns of trait inheritance which are compatible with classical Mendelian dominant, recessive, or X-linked singlegene transmission. Instead, the mapping of genotype to phenotype in aetiologically complex disorders must be complicated by some or all of the following possibilities, of which the first three are probably important in most cases (Lander and Schork 1994): Incomplete penetrance: when a predisposing allele does not cause the development of a certain phenotypic state in some individuals but does in others, because the phenotype is modified by other background genetic and environmental influences (e.g in sickle cell anaemia, Dover et al. 1992). Phenocopy: when a trait develops to a certain state in individuals who do not have a specified predisposing allele, because different heterogeneous genetic and environmental factors can also cause the same or similar trait development, for example when mutations in different genes affect one particular biochemical pathway or developmental process. Mutations in a wide range of autosomal and X-linked genes cause Retinitis Pigmentosa, a disorder involving retinal degeneration (Vervoort et al. 2000). Oligogenic inheritance: for many human traits, different phenotypic states are probably associated with different inherited combinations of polymorphisms at several loci (Fisher 1918, Falconer and Mackay 1996). Segregation studies of common aetiologically complex traits in humans (see below) consistently provide evidence for this model (e.g Lubs et al. 1993, Pennington et al. 1991). Oligogenic loci may also interact with one another during trait development (epistasis), so that individual-locus effects on trait development may not be purely additive (Hodge 1981). The successful 3 Introduction: Complex Traits dissection of oligogenic effects in humans has so far been largely confined to modifier loci in studies of monogenic traits (Lander and Schork 1994). Sickle cell anaemia provides a classic example, where the β-globin determining effect is altered by at least two other loci, one autosomal and one X-linked (Dover et al. 1992; Their et al. 1994). In terms of complex human disorders, the supposed model is that oligogenic polymorphisms must be transmitted to individuals in the right combinations to bias their trait development towards some diagnostic threshold (Falconer 1981), at least if a qualitative definition of the disorder is used (see below). Imprinting: Parent-of-origin effects on gene activity mediated by locus imprinting can complicate genotype-phenotype relationships (Pardo-Manual de Villena et al. 2000). Cases of uniparental disomy have exposed the imprinted loci which cause Prader-Willi or Angelman syndrome, depending on whether the paternal or maternal copy of a region on chromosome 15 is deleted in patients (Fulmer-Smentek et al. 2001). Also, locus imprinting may underlie some sex differences in cognitive functions with genetic components located on the X chromosome (Skuse et al. 1997). Other exceptions to Mendelian genotype-phenotype co-segregation: A few human diseases are caused by unstably transmitted trinucleotide repeat expansions, for which the degree of expansion can be related to the disease severity (fragile X syndrome, myotonic dystrophy, Huntington disease, spinobulbar muscular atrophy and spinocerebellar ataxia type 1; Sutherland et al. 1992; reviewed in Beaudet et al 1994). Transmission distortion has been demonstrated for at least one locus in the mouse caused by a process known as meitoic drive (Silver et al. 1993), and similar processes may occur in humans. Also, selective forces operating against alleles early in development might result in a seemingly distorted pattern of inheritance post-selection. 4 Introduction: Complex Traits Large-scale chromosome abnormalities like translocations, deletions or aneuploidy typically give rise to complex multiple phenotypes which depend on the precise locations of chromosomal breakpoints and/or the extent of missing genetic material (Borgaonkar 1994). Chromosome abnormalities are often transmitted in a Mendelian fashion, but phenotypes can differ in carriers of the same abnormality (e.g NopolaHemmi et al. 2000, see Introduction: Dyslexia phenotype and previous genetic studies), again illustrating the importance of background oligogenic modifying effects. Chromosome abnormalities often illustrate the importance of ‘position effects’, where genes not directly disrupted by the abnormality nevertheless have their activities altered by a change in their chromosomal proximity to other controlling genetic loci (Kleinjan and van Heyningen 1998). Finally, genetic disease transmission also takes place in eukaryotic organisms via the inheritance of mitochondrial DNA (Wallace 1994). By definition, monogenic diseases are those for which phenotypic differences determined by a single genetic locus stand out disproportionately above the normal ranges of variance in traits related to the disease (Beaudet et al. 1995). The larger the phenotypic effect that a new mutation has, the more likely that that effect will be detrimental, since traits will usually develop away from pre-evolved optimal states (Dawkins 1986). Also, the magnitudes of non-adaptive single-locus effects on trait development will determine the strengths of selection which operate against their corresponding mutations (Haldane 1924, 1958; Goodenough 1984). A broad correlation therefore exists for human genetic diseases between severity, ‘monogenicity’, and low disease frequency in the population (Beaudet et al. 1995). The total frequency of monogenic disorders in human populations is ≈1%, with frequencies as low as 1 in 5 Introduction: Complex Traits 50,000 (Beaudet et al. 1995). In contrast, up to 60% of people are estimated to suffer from disorders with multifactorial genetic backgrounds, especially when late-onset diseases are included (Baird et al. 1988). This reflects in part that selective pressures are lower on genetic polymorphisms which have relatively modest individual phenotypic effects (although several polymorphisms may combine to produce severe oligogenic phenotypes), and selective pressures are also lower on polymorphisms which exert their detrimental effects later in life (Charlesworth 1980). In addition, it is possible that polymorphisms which contribute to the development of complex disorders might become fixed in a population by having beneficial effects on some traits and modest detrimental effects on others. This latter model might be especially applicable in life-history contexts, when youthful advantages might be conferred by a polymorphism at the expense of health problems later. Finally, positive or negative seletion pressures on genetic loci are influenced by locusspecific allelic interactions and allelic frequencies (Hartl and Clark 1997). For example, cystic fibrosis is a severe autosomal recessive disease with a high penetrance, but susceptibility alleles are surprisingly frequent in the population (1 in 20-25 Caucasions is heterozygous for a disease allele), so there might have been an unknown selective advantage associated with disease alleles in the heterozygous state in Caucasian populations (reviewed in Beaudet et al. 1995). In sickle cell anaemia, heterozygotes have increased resistance to falciparum malaria susceptibility (Ridley 1993). In summary, all genetic analysis of phenotypic traits is fundamentally aimed at describing a statistical relationship between the genetic differences and phenotypic differences that exist in a population of organisms, and since it is not usually possible to account for all sources of phenotypic variance in genetic studies, genotype- 6 Introduction: Complex Traits phenotype relationships are normally described in probabilistic terms (Terwiliger and Ott 1994). Also, the designation of a trait as monogenic or genetically complex can only be made in the context of a particular population with regard to its unique environmental and broader genetic background. Genetic epidemiology The way in which a trait is distributed within families or populations yields strong clues to the complexity of its aetiology. For monogenic diseases, patterns of trait segregation within pedigrees are compatible with autosomal dominant, recessive or X-linked transmission of a single susceptibility locus with a high penetrance (Ott 1991; Terwiliger and Ott 1994; Beaudet et al. 1995). In general, the more generations and matings within an affected pedigree, the more clearly the pattern of trait segregation supports one genetic model while excluding others. For example, the transmission of a trait with equal likelihood from both parental sexes, or the observation of male-to-male trait transmission, both suggest that the determining locus is autosomal and not Xlinked. The appearance of a trait in children of unaffected parents, and if the frequency of affected children from those parents is 1 in 4, suggests an autosomal recessive model. This is particularly true against a background of consanguineous mating, which increases the chance that a recessive disease-causing allele will be inherited from both parents. In contrast, the transmission of a trait from a single affected parent to half of their children suggests an autosomal dominant model, as does the absence of the trait in children of unaffected individuals within an affected pedigree. These patterns stem from basic Mendelian principles of gene-transmission and allelic interaction (Goodenough 1984), but even for monogenic diseases they can be obscured by reduced allelic penetrances, the presence of phenocopies and genetic heterogeneity, and by 7 Introduction: Complex Traits oligogenic inheritance (see above). Also, when high frequencies of trait-susceptibility alleles exist in the general population, the underlying pattern of trait segregation might be further concealed by the presence within a single pedigree of multiple independent copies of these alleles (Lander and Schork 1994). The possibility of new mutations arising within pedigrees must also be considered. Roughly 5x10-6 mutations arise per gene per generation in humans, so the chance of observing a new mutation in any individual at a given locus is 1 in 100,000 (Beaudet et al. 1995). However, some genetic conditions can lead to higher somatic genome-wide mutation rates (Millar et al. 1999). For monogenic disorders, segregation analysis uses observed patterns of trait inheritance within extended pedigrees and within their source populations to fit a parametric inheritance model which is based on a major single-locus-effect with minor modifier effects (Lalouel et al. 1983; Newman et al. 1988; Terwiliger and Ott 1994). Direct estimates are made of parameters such as the mode of transmission, disease allele frequency, allelic penetrances (heterozygous, homozygous, and often sex-specific as well), and the phenocopy rate which can indicate the degree of genetic heterogeneity within the sample pedigrees. Often segregation models are fitted using likelihood techniques, which calculate the probability of observing a given set of affected pedigree data under various causal hypotheses in a manner similar to monogenic linkage analysis (Lalouel et al. 1983; Ott 1991; Terwiliger and Ott 1994; see Introduction: Linkage and linkage disequilibrium). For example, for a straightforward autosomal dominant disease, determined only by a normal allele A and disease allele a, the maximum likelihood penetrance estimates of the genotypes AA, Aa and aa will be 0, 1 and 1, the estimated phenocopy rate will be 0, and the frequency estimate of a will be low if the affected pedigrees show only single-copy transmission. When a trait is purely 8 Introduction: Complex Traits monogenic, the likelihood of the simplest monogenic model usually stands out far above the likelihoods of other, competing hypotheses (Ott 1990). In contrast, when a monogenic effect does not clearly dominate an observed pattern of trait transmission, additional modifying effects can often be accommodated through the adjustment of one or several different parameters to yield similar likelihood estimates, but with no rationale for choosing between models (Ott 1990). Obviously then, for complex disorders, for which there are by definition no major single gene effects in most affected pedigrees, it is usually not possible to distinguish between competing genetic hypotheses using model fitting techniques (Ott 1990; Lander and Schork 1995). This is a reflection of a broad general principle in statistics, where the choice of parametric versus non-parametric analytical methods depends on how many specific assumptions can be made about properties of the data being analysed. When assumptions are made correctly, their recognition in an analysis usually provides extra statistical power than if they are not modelled, since in general fewer properties of the data will remain unexplained by the particular relation being tested for. However, when assumptions are violated, their inclusion in an analysis can have unpredictable effects which will often increase the rate of type I (false positive) or type II (false negative) error (Samuels 1991). For this reason, the study of complex trait genetics is most successfully performed using non-parametric methods, which sacrifice a degree of statistical power in return for making fewer assumptions (but never no assumptions) about underlying causal models (Risch 1990, Lander and Schork 1994, Weeks and Lathrop 1995). One general consequence of this approach is that power must instead be derived from large sample sizes in complex trait genetics (Weeks and Lathrop 1995). The methodological approaches described in this thesis for mapping loci involved in dyslexia susceptibility will wholly reflect these basic statistical 9 Introduction: Complex Traits principles, which in essence recognise our very limited knowledge of the phenotype we are studying. Sometimes a major-gene subtype of a complex disorder might segregate within individual pedigrees in a monogenic way (e.g. Fagerheim et al. 1999; see Introduction: Dyslexia phenotype and previous genetic studies), but generally the segregation of complex disorders within families is not straightforward by definition, so large extended pedigrees do not generally provide ideal samples for complex trait studies (Lander and Schork 1994, Weeks and Lathrop 1995). Therefore, instead of using small numbers of large pedigrees, genetic studies of complex traits usually rely on large numbers of small pedigrees (Risch 1990; Davies et al. 1994, Weeks and Lathrop 1995; Daniels et al. 1996, International Molecular Genetics Study of Autism Consortium 1998), for three main reasons. First, large numbers of pedigrees are needed for statistical power, and smaller pedigrees are relatively easy to find and collect. Second, small pedigrees provide independent contributions to the dataset while reflecting the disorder as manifested in a more general population setting, which for common complex traits is usually more interesting than the identification of rare monogenic forms. Third, in the case of nuclear sib-pair families, the individuals manifesting the trait can be selected to have similar ages and within-family environments, thus curtailing two major sources of phenotypic variance which might otherwise obscure any genetically determined variance. Twin and relative-risk studies - is a trait ‘genetic’? Two key epidemiological methods based on large numbers of sib-pair families are commonly used for studying complex traits; twin studies, and sibling relative-risk studies (Risch 1990; Neale and Cardon 1992; Bishop and Williamson 1990). Both are 10 Introduction: Complex Traits used to answer the questions of whether and how much variance in a complex trait is caused by genetic differences between individuals in a given population. The underlying approach of both is to test for a relationship between the phenotypic similarity of individuals and their degree of genetic relatedness. Exactly the same logic applies in genetic linkage analysis, when a test is made for a relationship between the phenotypic similarity of individuals and their genetic relatedness at a specific chromosomal locus (Terwiliger and Ott 1994). Consequently some analytical approaches from epidemiology can be modified to perform linkage analysis, e.g DeFries-Fulker regression analysis, see below, and Introduction: Linkage and Linkage Disequilibrium. For a monogenic trait, a clear pattern of within-pedigree trait segregation is assumed to demonstrate a genetic cause accounting for the differences between affected and unaffected members (Beaudet et al. 1995; Terwiliger and Ott 1994), while unaffected members act as within-family controls for environmental sources of phenotypic variance. Nonetheless, it can be difficult to discount an environmental cause for a monogenic trait if any degree of complexity is involved in its aetiology (Beaudet et al. 1995, Lander and Schork 1995; Waldman et al. 1998). Since separate families or branches of families may often experience different environments, familial clustering of a trait does not in itself prove that genetic factors cause trait phenotypic variance. Neither does a decreasing risk of developing the trait in increasingly genetically dissimilar relatives of affected individuals (relative-risk, see below). When large numbers of small, independent families are used in a study, the possibility that environmental differences rather than genetic differences account for between-family trait variance becomes particularly plausible (Falconer and Mackay 1996). 11 Introduction: Complex Traits Twin studies control for this problem by quantifying trait differences between MZ versus DZ twins (Neale and Cardon 1992). The genetic invariance of MZ twins allows a direct estimate of within-twin-pair environmentally determined trait variance to be made for a given sample, simply by quantifying the variance in phenotypes within MZ twinships. Variance in within-pair DZ twin phenotypes is then assumed to be determined by the same within-pair environmental variance component as for MZ pairs, as well as by the genetic differences between DZ twins whose coefficient of relationship is 1 /2. Thus a direct estimate of broad sense trait heritability can be calculated for a given sample (i.e the proportion of genetically determined trait variance; Falconer and Mackay 1996). Once the within-twin-pair environmental and genetic variances are accounted for, any residual sample variance must be due to a third variance component, which describes environmentally determined variance between twinships. In twin studies, same-sex DZ twins are usually chosen to eliminate withinpair sex-determined variance in order to justify the variance comparison with MZ twins (Falconer and Mackay 1996). However, the between-twinship variance component might partly reflect sex determined trait variance if male-male and female-female pairings are both included in a sample. Likewise, between-pair variance might typically include the effects of age on the trait, or any other relevant causal covariate. There are different qualitative and quantitative methods for analyzing twin data (Neale and Cardon 1992). In qualitative terms, the absolute risk that an MZ twin of an ‘affected’ proband has of developing a disorder yields a direct estimate of that trait’s penetrance for a given environment, and comparing MZ:DZ trait concordances yields an estimate of qualitative trait heritability. To analyze quantitative traits, maximumlikelihood variance components/factor methods, or else twin-pair regression-based techniques, can be used. One regression method in particular, called DeFries-Fulker 12 Introduction: Complex Traits (DF) regression (DeFries and Fulker 1985) has been used in twin studies of dyslexia, which is the trait of study in this thesis. Variants of this method have also been used in dyslexia genetic linkage analysis and in bivariate phenotypic analysis of reading-related measures (DeFries and Fulker, 1985; DeFries et al. 1987, Olson et al. 1994a). The essence of this analysis is to designate as probands those individuals in a twin sample who fall beyond an arbitrary threshold at one tail of a continuous trait distribution, and then assess the regression of their cotwin’s scores towards the population mean by fitting this linear regression model; C = B1P + B2 R + A in which C is the cotwin score, P the proband score, B1 adjusts for the average twin resemblance, R is the overall genetic relationship between the twins (1 or 1/2), and B2 measures the extent of differential cotwin regression, from which the heritability estimate is derived, assuming that the probands in MZ and DZ pairings have the same mean. An important consequence of this approach is that the heritability of the proband deviation from the mean is estimated. The heritability in the population as a whole will only be the same if the same genetic and environmental factors which influence proband deviations also determine individual differences in the normal range, which need not necessarily be the case for complex oligogenic traits in general (although confirmatory factor analysis in the normal ability range has been carried out in dyslexia twin studies; Olson et al. 1991). See Introduction: Linkage and Linkage Disequilibrium for a modification of the DF regression approach for performing quantitative linkage analysis which derives power from extreme trait-sampling selection. Linkage mapping studies of complex traits often rely on the logic of twin studies to show definitively that traits have genetically determined variance components (Weeks 13 Introduction: Complex Traits and Lathrop 1995; International Molecular Genetics Study of Autism Consortium 1998; this study, see Introduction: Dyslexia phenotype and previous genetic studies). If any assumption of the twin model is violated in a particular case, then a false positive or negative finding of trait heritability might occur (Neale and Cardon 1992; Falconer and Mackay 1996). In particular, one crucial assumption is that the environmental variance component is the same in MZ and DZ pairs. This assumption may be violated when applying simple twin variance models, which typically ignore the possibility of genotype/environment interactions in determining trait development, or ignore the possibility that MZ and DZ twin pairs might differ with regard to their in utero experiences and subsequent parental treatment (Falconer and Mackay 1996). For example, MZ twins can be of three types according to the arrangement of foetal membranes, whereas DZ twins are of only one type. Also, genotype/environment covariance might occur, when different genotypes are selectively raised in different environmental conditions, and this would affect DZ more than MZ variance (Falconer and Mackay 1996). However, these complications notwithstanding, twin studies provide a convenient way to dissect genetic from environmentally determined phenotypic variance for complex traits, since the twins are age-, sex-, and family-environment-matched within pairings (thus reducing several important confounding sources of variance), and the MZ withinpair trait variance is the only human variance which can be modelled with no genetic component. For a qualitative trait, an epidemiological parameter called relative risk (λ R ) can be defined as the ratio of the observed risk of developing the trait in a given class of relatives of an affected individual to the risk in the general population (Risch 1990, Bishop and Williamson 1990). The sibling relative risk (λS) is one such measure. A λS 14 Introduction: Complex Traits above 1 suggests a genetic trait-variance component, although environmental sources of variance are more difficult to discount than in twin studies, especially for increasingly distant relatives who typically develop in distinct family environments. The λS measure is usually preferred to other relative comparisons which may be more influenced by environmental factors or by the effect of age on a trait (Weeks and Lange 1995). For example, either of these influences might impact on estimates of offspringrelative-risk (λO). Nonetheless, the magnitue of λR will usually be positively related to the magnitude of any genetic trait variance component, and therefore λ R provides a broad indication of the available power to map susceptibility loci for a complex trait (Risch 1990). For the purposes of locus mapping, each individual locus involved in an oligogenic trait can be thought to determine a locus-specific λR which contributes to the overall trait λR (Risch 1990; Weeks and Lathrop 1995). The way in which the overall trait λR decreases for increasingly genetically distant classes of relatives can suggest different models of how oligogenic loci combine to produce the trait; a rapid decrease in relative-risk with genetic unrelatedness suggests a multiplicative model, whereas a slower decrease suggests an additive model (Risch 1990), though environmental factors might again have an impact. Defining traits The causal background to any trait depends crucially on the way the trait is defined or measured (Lander and Schork 1995, Smith 1996; Waldman et al. 1998; Stoltenburg and Burmeister 2000). For complex traits, ignorance of the underlying genetic model usually leads to phenotypic heterogeneity, when a particular trait definition is applied across individuals whose similar phenotypes arise from a variety of heterogeneous genetic and environmental causes (Lander and Schork 1995; Smith 1996). For this 15 Introduction: Complex Traits reason, one potential benefit from dissecting heterogeneous causal models for complex human disorders is that frameworks for diagnosing or measuring different pathogenic subtypes might be created, and this in turn could lead to more effectively targeted methods of intervention. In general, the more heterogeneous the causes of variance in a trait are, the more difficult it is to achieve statistical power to identify any of the individual causes using a sample of a given size (Eaves 1994, Lander and Kruglyak 1995, Risch and Merikangas 1996), and this applies to segregation, twin, adoption, linkage and association studies, as well as to studies of environmental factors. For this reason, many genetic studies of human diseases try to use narrowed trait definitions in order to work with more aetiologically homogeneous samples (see Lander and Schork 1994). This can be achieved by restricting sample ascertainment parameters such as the age of disease onset in patients, their clinical severity or clinical sub-type, whether they have a positive family history for the disorder, by excluding patients with other comorbid conditions, and also by targeting particular ethnic groups (see Materials and Methods for the application of these principals in this dyslexia genetic study). Different approaches to sample ascertainment and phenotypic testing/diagnosis are invariably used by different researchers in studies of complex traits (Lander and Schork 1995; Smith et al. 1996; Waldman et al. 1998; Stoltenburg and Burmeister 2000), and this inevitably increases trait phenotypic heterogeneity when considered across studies. Usually study samples will also differ with regard to their environmental backgrounds. A typical level of phenotypic heterogeneity has confounded dyslexia genetic research (Smith 1996, Fisher and Smith 2001). Family samples for dyslexia studies have been variously ascertained (a) clinically or epidemiologically, (b) from different countries/continents sometimes using different first languages, (c) from different 16 Introduction: Complex Traits socioeconomic groups, (d) with different emphases on trait familiality and the size and segregation patterns of dyslexia within pedigrees, (e) using various criteria to identify affected individuals which differed in severity, population frequency, targeted dyslexic subtype and targeted cognitive abilities, and (f) using different methodological, phenotypic and analytical approaches in their genetic linkage analyses (see Introduction: Dyslexia phenotype and previous genetic studies). Such variability means that it can be difficult for independent studies to replicate genetic findings for complex human disorders, especially to replicate linkage results (Lander and Kruglyak 1995; see Introduction: Linkage and Linkage Disequilibrium), simply because the causal background for the trait in any two samples might be very different (Stoltenburg and Burmeister 2000). Trait definitions: qualitative or quantitative? Traits can be defined qualitatively (categorically) or quantitatively. The principal advantage of a qualitative definition is that it often simplifies analysis (Weeks and Lathrop 1995), since the trait distribution becomes dichotomous (‘affected’ or ‘not affected’ in the case of human disorders). Categorical diagnosis is ideally suited to the study of monogenic diseases for which the difference in phenotypic states between affected and unaffected individuals stands out above the normal variance in diseaserelated traits. However, categorical diagnosis is also useful when quantitative phenotypic measures have not been or can not be developed which reliably distinguish between affected and unaffected individuals, which might otherwise be done, for example, by using some threshold score. This is often a difficulty in the study of complex traits, and sometimes too for monogenic traits (International Molecular Genetics Study of Autism Consortium 1998, Fagerheim et al. 2000). 17 Introduction: Complex Traits For example, attention deficit/hyperactivity disorder (ADHD) in children, an aetiologically complex behavioural trait (Gillis et al. 1992; Smalley 1997), is typically measured using parental or teacher questionnaires together with diagnostic interviews (APA 1994). These tests yield semi-quantitative measures of the severity of the disorder by counting the number of ADHD-related qualitative symptoms which a child shows. Clinicians typically use this data as a guide to making a categorical diagnosis of ADHD, but they do not necessarily use a strict cut-off score on any one test to make the diagnosis (Smalley 1997). In such cases the judgement of a clinician, if used well, is effectively allowing for the fact that complex traits are often multivariate in nature (Lander and Schork 1994; Weeks and Lathrop 1995; Smith et al. 1996), i.e. they can be characterised by a range of different phenotypic changes. Some or all of these changes might have different heterogeneous causes and may or may not be found in all affected individuals, and for some or all there might be no quantitative measure available. When a quantitative phenotypic measure has been devised which yields a continuous frequency distribution with useful properties in study samples (e.g approximate normality and a single mode), when the measure correlates highly with qualitative diagnostic calls made by clinicians, and also if the measure is hypothesised to identify an important causal phenotype related to a disorder, then there are several benefits of using the quantitative measure directly in phenotypic or genetic analysis (Lander and Schork 1994, Cardon et al. 1994,1995; Weeks and Lathrop 1995, Daniels et al. 1996; Fisher et al. 1999). First, by collapsing quantitative measures into dichotomous diagnostic schemes using a threshold score, much inherent information in quantitative scales gets thrown away, and therefore statistical power can be lost in many, though not all cases (see Introduction: Linkage and Linkage Disequilibrium). Second, the choice of a threshold score to define affected versus non-affected individuals will inevitably be 18 Introduction: Complex Traits arbitrary for a continuous unimodal trait, not reflecting any real property of the data. Quantitative measures naturally provide a fuller description of continuous phenotypes, and are especially applicable to traits with oligogenic backgrounds which inevitably have continuous, or at least not dichotomous, trait variance (Fisher 1918; Falconer and Mackay 1996). Third, the use of a suite of quantitative measures provides flexibility to explore the multivariate nature of complex disorders, for example through correlation and factor analyses (see Chapter2). Avoiding arbitrarily imposed cut-offs and categorical diagnoses can actually simplify analyses at a multivariate level, both practically and conceptually. Mathematical tools for performing multivariate linkage or association analysis which can simultaneously model oligogenic and epistatic effects on several disorder-related quantitative measures are currently being developed (Lon Cardon and Angela Marlow, Personal Communication). This approach promises to allow very sophisticated descriptions of variance relationships among quantitative measures related to human diseases, for example the extent to which variances in measures share the same or different genetic backgrounds, as well as where the determing loci are located in the genome. The ultimate description of any trait is quantitative and multivariate, and not only for diseases, but for all human variation across its entire range. A fourth potential benefit of quantitative measures is that they can often be aimed at narrowly defined aspects of complex multivariate traits, and can therefore have relatively homogeneous causal backgrounds compared to the globally defined trait, so facilitating the search for underlying causal factors (Weeks and Lathrop 1995; see Introduction: Dyslexia phenotype and previous genetic studies). However, in principle, 19 Introduction: Complex Traits quantitative measures need not be any less phenotypically heterogeneous than qualitative trait definitions. Finally, for genetic studies, a major requirement for a quantitative phenotype is obviously that it should show a proportion of heritable (genetically determined) variance, from which statistical power to identify genetic factors is both limited and derived (Lander and Schork 1994; Lander and Kruglyak 1995; Weeks and Lathrop 1995). 20