Computational Biology 9

advertisement
Computational Biology 9
the consequences of genetic variation. Often, polymorphism scanning
has been confined to genomic sequences within or near exons (20).
However, genetic variation is not introduced into the genome by
processes that respect boundaries between genes, nor does it select for
coding sequences.
Furthermore, the effect of a sequence change may not be confined to a
single gene. Therefore, it is likely to be more efficient and productive to
evaluate sequence variation over a chromosomal region when assessing
the effect of genotype on phenotype. Although SNPs at different sites
within a chromosomal region can be identified and characterized
individually, alleles at different positions along a chromosome can be
associated. The presence or absence of an allele at one site can provide
information about alleles at other sites; this association between alleles
is called linkage disequilibrium. A haplotype can be defined as the
relationship between deoxyribonucleic acid (DNA) sequence variants in
the same gene, region, or chromosome. Analysis of polymorphisms over
large genomic segments of the human genome has indicated that
polymorphic variation at multiple sites within a chromosomal region
can be grouped into patterns (“haplotypes”) with high linkage
disequilibrium (20–23). Regions with low linkage disequilibrium
separate the haplotypic blocks. The size of the genomic sequence
contained within a haplotypic block can range from a few to over 100
kb. Analysis of human chromosome 21 indicated that about half the
haplotypic blocks identified, each with an average size of 7.8 kb, could
be defined by less
than three SNPs, and less than three different haplotypes within a block
encompassed most of the population (80%) (23). Unfortunately, the
structure of the haplotypic blocks cannot be determined empirically. A
large amount of detailed sequence information must be available to
identify the haplotype-defining SNPs and the conserved blocks. It is also
likely that the size and structure of the haplotypic blocks will change as
sequence information from more individuals is obtained and analyzed.
However, analysis of the haplotypic patterns enables individuals within
a population to be segmented into a finite number of small groups
sharing the same haplotype for a particular chromosomal region.
Identification of haplotypes to
segment the human population has great potential. This will decrease
the amount of genotyping required to characterize a genomic region,
and the haplotypic information will enable a human population to be
segregated into a finite number of different groups. The frequency of
disease or disease-associated traits can be compared among the groups
with different genetic haplotypes within a chromosomal region. It is
hoped that this will provide a more efficient method for identification of
disease-susceptibility regions in human populations. One of the first
examples of linkage disequilibrium mapping using haplotypes was the
identification of a 250-kb region of human chromosome 5q31 associated
with Crohn’s disease susceptibility (24). There were 11 SNPs with
strong linkage disequilibrium
in the 5q31 region associated with Crohn’s disease susceptibility, and it
was not possible to identify an individual disease-associated SNP. These
results are consistent with the possibility that a set of polymorphisms
within a chromosomal region, which may effect more than a single gene,
contribute to the disease susceptibility. Similarly, inbred mouse strains
are particularly useful for genetic analysis because the entire genome of
an inbred strain is effective in linkage disequilibrium. The parental
origin of DNA segments in intercross progeny over entire chromosomes
can be inferred by analysis of only a few polymorphic markers (25).
Furthermore, there is extensive linkage disequilibrium among
polymorphisms in the genome of inbred strains.
Analogous to the human population, SNPs among the inbred strains can
be organized into haplotypic blocks (26). Analysis of regions linked to
susceptibility to complex disease-related traits in mouse models has also
indicated that genetic changes across chromosomal regions affecting
multiple genes, rather than within a particular gene, may contribute to
susceptibility. For example, a region on chromosome 1 that controls
autoantibody production in a mouse model of SLE was analyzed.
Polymorphisms within a set of co-linear interferon-inducible genes in
this region were responsible for differential autoantibody production in
this model (27). This result appears to be applicable to other mouse
models of human disease related traits. Fine-mapping analysis often
identifies several distinct subloci within a linked chromosomal region
that independently contribute to the phenotypic trait. Additional
analysis of a linked chromosomal region regulating autoantibody
production and nephritis in the murine model of systemic lupus
demonstrated that the interval consisted of at least four distinct genetic
loci (28). Similarly, the ability of our “digital disease” computer
program to identify chromosomal regions regulating complex traits in
mice is likely to result from recognition of patterns of genetic variation
over large (10 cm) regions within the mouse genome (29). Analysis of
the patterns of variation over larger regions is likely to be informative
in situations when analysis of a single SNP does not reveal genotype–
phenotype correlation. 4. Integrative approaches must be utilized to
efficiently analyze complex
biological processes. Only a limited amount of resolution can be
achieved with the use of any single approach for analyzing a complex
biological .
Download