Haplotype Structure of the Mouse Genome Jianmei Wang Guochun Liao, Janet Cheng, Anh Nguyen, Jingshu Guo, Christopher Chou, Steven Hu, Sharon Jiang, John Allard, Steve Shafer, Anne Puech, John D. McPherson, Dorothee Foernzler, Gary Peltz, and Jonathan Usuka 1. INTRODUCTION Commonly available inbred mouse strains can be used to genetically model traits that vary in the human population, including those associated with disease susceptibility. In order to understand how genetic differences regulate trait variation in humans, we must first develop a detailed understanding of how genetic variation in the mouse produces the phenotypic differences among inbred mouse strains. The information obtained from analysis of experimental murine genetic models can direct biological experimentation, clinical research, and human genetic analysis. This “mouse to man” approach will increase our knowledge of the genes and pathways regulating important biological processes and disease susceptibility. The availability of the complete sequence of the mouse genome (1) enables the genetic differences among commonly studied inbred strains to be characterized. This will facilitate identification of the genetic basis for phenotypic trait differences among the inbred strains. To do this, we have analyzed the pattern of genetic variation among 18 inbred mouse strains and have produced a high-resolution haplotypic map of the inbred mouse genome. This haplotypic map covers 75 Mb of the mouse genome. An additional 99 Mb of the mouse genome, which was not polymorphic among the 16 Mus musculus strains, was also analyzed. Analysis of the genetic distance between inbred strains and of the haplotypic blocks generated using different strains demonstrated that inclusion of only the 16 M. musculus strains 71 Computational Genetics and Genomics: Edited by: G. Peltz © Humana Press Inc., Totowa, NJ produced balanced haplotypic block structures that reflected extensive allele sharing among closely related inbred strains. Although haplotypic blocks in the inbred mouse genome had similarities with those described in humans, there are important differences that increase the likelihood that genetic variants underlying phenotypic trait differences can be successfully identified in the mouse. 2. CHARACTERIZATION OF GENETIC VARIATION AMONG INBRED STRAINS Polymorphisms were identified by resequencing targeted genomic regions in 1672 genes across 18 inbred mouse strains (2): 129/Sv, A/HeJ, A/J, AKR/J, B10.D2-H2/oSnJ, BALB/cByJ, BALB/cJ, C3H/HeJ, C57BL/6J, CAST/Ei, DBA/2J, LG/J, LP/J, MRL/MpJ, NZB/BinJ, NZW/LaC, and SM/J SPRET/Ei. Identification of single nucleotide polymorphisms (SNPs) was performed by targeted resequencing of genomic regions using methods that have been described previously (2). For genes that were less than 5 kb in size, the entire gene was analyzed for polymorphisms. For genes greater than 5 kb in size, a 1-kb region surrounding each exon, a 2-kb region at 5’ of the transcriptional start site, and a 500-bp segment downstream of the 3’ end of the transcript were analyzed. Both strands of a selected genomic region were sequenced, and sequence waveforms were analyzed using Phred and Phrap (3,4). Potential polymorphisms were identified, and sequence quality was assessed in an automated fashion. Only SNPs with very high-quality sequence were accepted: those with either single stranded sequence with Phred scores equal to or above 30 or (more commonly) double stranded DNA sequence with Phred scores equal to or above 20 for both strands. The mouse SNP database used in this study contained 105,064 unique SNPs, and a total of 1,440,349 alleles were characterized for these 18 strains. The number of SNPs on each chromosome ranged from a low value of 1083 SNPs on chromosome 18 to 16,615 SNPs on chromosome 7 (Table 1). The genetic distance between the inbred mouse strains was assessed using this allelic information. To measure this, the percent allelic difference was calculated as the ratio of the number of SNPs identified using only a selected pair of strains to the total number of SNPs identified among all 18 inbred strains. The CAST/Ei and SPRET/Ei strains were derived from wild mice of Asian and European origin, respectively. The 16 other M. musculus strains were bred from a small group of mice at the beginning of the last century (reviewed in ref. 5). Consistent with their independent origin, the CAST/Ei and SPRET/Ei strains have more than 39 and 70%, respectively, 72 Wang et al. Haplotype Structure of the Mouse Genome : allelic differences when compared with any one of the 16 other M. musculus strains (Table 2). In contrast, the 16 other M. musculus strains were far more genetically similar. The allelic differences among M. musculus strain pairs ranged from 0.8% (A/HeJ:A/J) to 16.4% (NZW/LaC:Balb/cJ) (Table 2). The genetic distance revealed by SNP allelic information is consistent with published genealogies of mouse inbred strains (5).