UNCTIONAL STUDIES: TYPING mRNA IN F1 CROSSES TO STUDY GENE REGULATION Polymorphisms that are transcribed into the coding region or 3UTR of agene can have profound effects on the production of a functional protein or on regulation of the RNA or protein encoded by a gene (12). One such transcribed polymorphism affected susceptibility to asthma in a mouse genetic model, a deletion in the coding region of the gene resulted in C5-deficiency that correlated with susceptibility (68). In humans, a polymorphism in the TCF7 gene, C883A, was associated with type 1 diabetes (69). In addition to direct functional effects, a transcribed polymorphism can be used to analyze transcriptional regulation of a gene. If two mouse strains differ in the allele present in a transcribed gene, the F1 generation derived from a cross of these strains can be used to study cis- vs trans-regulation (70) (Fig. 2). In F1 mice, the transcription factors, polymerases, and other transactingfactors that will affect transcription of the RNA in question are derived equally from each parental strain. If the mRNA level is regulated by transacting factors, then the F1 mice should have equal levels of the two alleles represented in the mRNA transcribed by this gene, (allele 1/allele 2) = 1. If however, the mRNA expression is controlled in a cis manner, in other words, if this gene controls its own expression, then the proportion of allele 1 to allele 2 will not equal 1 (70). The following example is from an animal model for T-helper cell differentiation. The expression of TCF7 mRNA is about threefold higher in the B10.D2 mouse strain than Balb/c. An SNP was identified in the second exon of the TCF7 gene. After converting the mRNA to cDNA, the ratio of cDNA of one genotype to the other was determined by quantitative, allele-specific PCR (Fig. 3). In this case, the ratio of allele 1 to allele 2 was not equal to one, indicating that the transcriptional control is caused by cis-acting factors. 94 Wang et al. SNP Discovery and Genotyping 95 Cowles et al. (70) analyzed allelespecific transcription in F1 mice and in different tissues for these mice. This allowed them to study cis-regulated expression in conjunction with tissue-specific expression. Gene regulation plays an important role in mammalian biology, so the interest in gene regulation will continue to increase, and SNP-based assays allow easy analysis of transcriptional regulation. 6. CONCLUSIONS The number of human SNPs available in the public databases continues to grow at a very rapid rate. More importantly, the quality of the information Fig. 2. The messenger RNA (mRNA) level for TCF7 is controlled in a cis-acting manner. (A) A depiction of breeding two inbred parents to generate a first filial (F1) generation. The F1 mouse receives equal contribution from each parent. (B) The level of two alleles (C/G) of the TCF7 single nucleotide polymorphisms in F1mouse. The F1 mouse would have equal C or G allele from both parents that is transcribed into mRNA. If the RNA level is controlled in a cis-dependent manner, the levels would be different from 50/50 as seen in this figure. available for each SNP is improving even faster. Between June and November of 2003, the number of uniquely mapped human SNPs grew from approx 4.1 to 5.8 million, whereas the number of experimentally validated SNPs grew from approx half a million to an impressive 2.4 million. The number of SNPs with a known allele frequency increased at a much slower rate, but as data from the international haplotype map project becomes available in 2004, this should change dramatically. For the individual researcher, additional human SNP discovery will only be conducted on limited regions for which a higher SNP density may be required. For these purposes, existing SNP discovery technologies would be sufficient. For most model organisms, however, the SNP coverage is still too low. Except for the commonly used inbred mouse strains, only a handful to a few thousand SNPs have been deposited in NCBI dbSNP for other organisms. For example, dbSNP build 118 only contains two reference SNPs for the chimpanzee Pan troglodytes. As additional genomes from other organisms are sequenced, and in silico computational biology takes off, generation of extensive polymorphism data will become a priority. Impressive progress has been made in the area of SNP genotyping in recent years, yet, in terms of throughput and pricing, whole-genome scans of sufficiently large numbers of cases and controls for association studies continue to be beyond the reach of all but a few institutions. Although the price per single genotype has decreased to close to $0.01 for some technologies, the comprehensive or actual price of genotyping in real life situations including labor costs, consumables, and up-front instrumentation investments are more realistically in the range of $0.1–$1. In addition, such prices are most often only achievable in sustained ultra-high-throughput operations processing 10–100 thousands of genotypes per day. At a comprehensive price of $0.05 per genotype, generating 100 thousand genotypes per day would entail yearly expense of roughly 1 million dollars. Similarly, a100,000 SNP whole-genome scan of 500 cases and 500 controls would cost $5 million. As with sequencing, progress in SNP genotyping prices and throughput has been incremental rather than exponential in recent years. Promising developments in the field of genome sequencing bear close monitoring; if it does indeed become possible, as some have projected, to sequence a whole mammalian genome for a few thousand dollars, these developments may make targeted genotyping of select polymorphisms unnecessary.Finally, progress is being made in terms of understanding the functional relevance of SNPs and other polymorphisms. New and improved experimentaland in silico methods for determining and predicting the biological 96 Wang et al.SNP Discovery and Genotyping 97 nction of polymorphisms transcribed into RNA (as discussed here), as well as intronic regulatory SNPs, are urgently needed.