Adam Siepel Assistant Professor Biological Statistics & Computational Biology Cornell University Area of tremendous growth Computing plays a central role Of increasing interest to nonspecialists Important in the commercial, academic, and government sectors Complex concepts, large data sets, lots of noise The typical molecular biologist is a good proxy for a “citizen scientist” vis-à-vis computer literacy Widely used public, web-based resources for researchers (NCBI Entrez, Genome Browsers, OMIM, PubMed, etc.) Many academic analysis programs (of highly variable quality) Some commercial analysis and visualization packages Strong online/open-access publication movement Decent educational resources for lay people: blogs, wikipedia, etc. (but unconnected) Rapid growth in resources for personal genomics Open-access publishing Public Library of Science (PLoS): seven journals, rigorous review, high impact factors BioMed Central (BMC): 190 journals (!), peer review, rapid turnaround, respectable impact Both pay-to-publish, free distribution Personal genomics Most famous: 23andMe, Navigenics, deCODEMe Consumer genotyping (500k-1M SNPs) from saliva samples for $1000 – $2500 Electronic report of disease risk; in some cases geneology, nondisease phenotypes Ethical, legal, and political controversy More information for citizen scientists Opening up of “black box” of scientific research; democratization of science Opportunity to “write” as well as “read” Increased dependency on electronic resources, eScience infrastructure Improved connectivity between resources at different levels: blogs, textbooks, wikipedia, browsers, research papers Better access to underlying science from personal genomics gateways Genotyping in a clinical setting (for diagnosis, guiding treatment); requires computational infrastructure! Progress toward a more dynamic, interactive publication model Steady progress in usability and integration of tools and databases Use of new genotyping technologies in forensics, paternity testing, etc. A much stronger form of personalized medicine (every genome sequenced) A completely different publishing model (?) Analytical power tools in the hands of citizens (association mapping, ancestry inference, …) (?) Gene therapy (?) Attaching funding to community contributions can be effective Tool developers need recognition (Bioinformatics application notes) Comprehensive databases with simple, general visualization are catalysts for progress (Genome browsers) Components and pipelines work Progress has to be bottom-up as well as top-down Incentives still not adequate in some cases: improving annotations, building well-engineered software, ensuring scientific reproducibility, obeying standards, etc. Privacy and security are HUGE unresolved issues Academic world still very focused on papers, but papers are static, machine nonreadable, incomprehensible to lay people Hording of data for attribution or profit Difficulty of interpreting results (personal genomics, phylogenetics, etc.) NCBI gateway: www.ncbi.nlm.nih.gov UCSC Genome Browser: genome.ucsc.edu Ensembl Genome Browser: www.ensembl.org PLoS Gateway: www.plos.org BMC Gateway: www.biomedcentral.com Personal genomics: www.23andme.com, www.navigenics.com, www.decodeme.com Microsoft Research Faculty Summit 2008