Mining data from 1000 genomes to identify the causal variant in regions under positive selection The MIT Faculty has made this article openly available. Please share how this access benefits you. Your story matters. Citation Grossman, Shari et al. "Mining data from 1000 genomes to identify the causal variant in regions under positive selection." Genome Biology 2010, 11(Suppl 1):I22 (11 October 2010). As Published http://genomebiology.com/2010/11/S1/I22 Publisher BioMed Central Ltd. Version Final published version Accessed Wed May 25 15:28:15 EDT 2016 Citable Link http://hdl.handle.net/1721.1/69960 Terms of Use Creative Commons Attribution Detailed Terms http://creativecommons.org/licenses/by/2.0 Grossman et al. Genome Biology 2010, 11(Suppl 1):I22 http://genomebiology.com/2010/11/S1/I22 INVITED SPEAKER PRESENTATION Open Access Mining data from 1000 genomes to identify the causal variant in regions under positive selection Shari Grossman1,2,3*†, Ilya Shlyakhter1,2†, Elinor K Karlsson1,2, Shervin Tabrizi1,2, Kristian Andersen1,2, John Rinn2, Eric Lander2, Steve Schaffner2, Pardis C. Sabeti1,2, The 1000 Genomes Project From Beyond the Genome: The true gene count, human evolution and disease genomics Boston, MA, USA. 11-13 October 2010 The human genome contains hundreds of regions in which the patterns of genetic variation indicate recent positive natural selection, yet for most of these the underlying gene and the advantageous mutation remain unknown. We recently reported the development of a method, Composite of Multiple Signals (CMS), that combines tests for multiple signals of natural selection and increases resolution by up to 100-fold. Applying CMS to candidate selected regions from the International Haplotype Map, we localized several hundred signals to ~50-100 kb, identifying individual gene and polymorphism targets of selection. These regions included genes involved in processes known to be targets of selection, such as infectious disease, skin pigment, metabolism, and hair and sweat. We further identified many candidates that are similar to regulatory elements. In several regions, we identified variants that are significantly associated with the expression of nearby genes in the selected population. Moreover nearly half of the ~200 regions we examined localized to regions with no genes. Thirty of the regions contain long noncoding RNAs that have been shown to often regulate nearby genes, suggesting that variation within the RNAs might have functional consequences. With preliminary data now available from the 1000 Genomes Project, we are beginning to explore full sequence data, which should contains most if not all of the causal selected polymorphisms. We extended the CMS method to the preliminary data set, validating our previously identified candidates and identifying many new intriguing coding and regulatory variants. Author details 1 Center for Systems Biology and Department of Organismic and Evolutionary Biology, Cambridge, MA 02138, USA. 2Broad Institute of Harvard and MIT, Cambridge, MA 02139 USA. 3Harvard Medical School, Boston, MA 0211, USA. Published: 11 October 2010 doi:10.1186/gb-2010-11-S1-I22 Cite this article as: Grossman et al.: Mining data from 1000 genomes to identify the causal variant in regions under positive selection. Genome Biology 2010 11(Suppl 1):I22. Submit your next manuscript to BioMed Central and take full advantage of: • Convenient online submission • Thorough peer review • No space constraints or color figure charges • Immediate publication on acceptance † Contributed equally 1 Center for Systems Biology and Department of Organismic and Evolutionary Biology, Cambridge, MA 02138, USA Full list of author information is available at the end of the article • Inclusion in PubMed, CAS, Scopus and Google Scholar • Research which is freely available for redistribution Submit your manuscript at www.biomedcentral.com/submit © 2010 Grossman et al; licensee BioMed Central Ltd.