GCAT-SEEKquence The Genome Consortium for Active Teaching NextGen Sequencing Group NextGen Sequencing Request Form Complete fields below, save file with your last name at the beginning of the filename (e.g. newman-GCAT-SEEK Sequence request form.pdf) and email to Vincent Buonaccorsi <BUONACCORSI@juniata.edu> A. Contact Information 1. Name: Regina Lamendella 2. Department: Biology 3. Institution: Juniata College 4. Phone Number: 510-486-7384 5. Email Address: RLamendella@lbl.gov B. Project Information 1. Title: Longitudinal Analysis of fecal microbial communities from an Inflammatory Bowel Disease Cohort Using Ultra High Throughput sequencing. 2. Category: Metagenomics 3. Total amount of sequence requested: 90 Gigabases 4. Preferred technology: Illumina HiSeq 5. Do you have funds for a partial run next Spring? Yes C. Describe the background, hypotheses and specific aims (500 words max) Inflammatory bowel diseases (IBD) are increasing in prevalence in Europe and North America, which likely is due in part to the Western lifestyle. IBD can be divided into two disease categories: Ulcerative Colitis (UC) and Crohn’s Disease (CD). Both are chronic, relapsing, immunologically mediated disorders that can have severe physical consequences. The hypothesis is a breakdown in the host-microbial mutualism is a consequence of a general breakdown in the balance between protective and harmful bacteria (dysbiosis) in the gut. Thus, there is a growing search for specific members of the intestinal microbial community that provide the antigens which spark the inflammation for IBD. Some changes in the microbial community related to IBD have been noted, including reduced diversity of the Firmicutes, the presence of bacteria that are not normally considered to be commensals, and increased concentrations of E. coli [1]. While current research has demonstrated a significant difference in the composition of the gut microbiota of IBD patients as compared to healthy individuals, these studies have been based on a single time point. There are a number of other factors that can have an impact on IBD, which include disease progression over time (flare-ups vs. remission), diet, surgery and drug use. The goal of this study is to obtain samples from the same individuals under conditions of active disease and remission, to overcome the problem of individual variations in gut microbial compositions, which will lead to a clearer understanding of the interplay between the host inflammatory response and the gut microbiota. Specific Aim 1: Carry out a longitudinal analysis of the gut microbial communities in fecal samples from patients with different IBD phenotypes compared to healthy individuals to assess the stability of the gut microbiota. Specific Aim 2: Carry out a longitudinal analysis of the gut microbial community in fecal samples from IBD patients to assess changes in the gut microbiota associated with disease severity. Advancements in sequencing technologies, which offer greater numbers of sequencing reads at much lower costs, are revolutionizing our understanding of microbial communities. Very recently, the paired end Solexa/Illumina technology has been used to perform sequencing at the depth of millions of 16S rRNA gene sequences per sample, enabling exhaustive coverage of diverse environments such as the gut. Further, this technology is amenable to a high level of multiplexing, which increases its utility for examining hundreds and even thousands of complex sets of samples [2]. This study plans to leverage the Illumina sequencing platform for 16S rRNA amplicon analysis (iTags) to deeply survey the microbial community structure of this IBD cohort. D. Describe the methods [sample prep, calculation of amount of sequence required, analysis plan IRB approval has been obtained to prospectively collect stool samples for microbial profiling, from 30 patients from American and Swedish IBD cohorts. Metadata collected for these IBD phenotypes include, age, sex, smoking history, serological markers, history of antibiotic use, resectioning, flare-up, and IBD family history. Samples are being collected from each patient every three months over a 15 month period. Samples are frozen at -20 °C until further processing. Genomic DNA will be extracted using the MoBio Fecal DNA extraction kit according to the manufacturer’s instructions, with an additional heat step prior to beadbeating (60 °C for 10 min) to aid in efficient cell lysis. Paired-end, barcoded libraries of hyper-variable (V6 region) 16S rDNA fragments amplified from samples will be constructed and sequenced using the Illumina Hiseq platform. A total of 90 Gigabases of sequence data will be necessary (1,000,000 sequences per sample x 300 bp length (150 bp paired end) x 300 samples (150 samples in duplicate). Sequence data will be analyzed using the QIIME pipeline [3]. Briefly, the 16S rRNA gene sequences will be clustered with uclust and assigned to operational taxonomic units (OTUs) with 97% similarity. Representative sequences from each OTU will be aligned and assigned with Pynast [4] using the Greengenes core sequence set. As the number of sequence reads in each sample may vary, the dataset will be rarified prior alpha diversity calculations. For beta diversity analysis the weighted UniFrac distance matrix will be used for the principal coordinate analysis (PCoA). Multivariate community analysis and correlation to metadata will be performed within PCORD 5 software using normalized OTU tables (genus-level) generated in QIIME. E. Describe the role and number of undergraduates involved in the project, and how they would benefit. If funded, I plan to invite 10 undergraduates in Spring 2013 into my research program with the goal of learning bioinformatic analyses for this ultra-highthroughput sequencing data. Using web-based tutorials, students will analyze the microbial community structure of these samples and employ statistical models to correlate health-related metadata to sequence data. This research course will provide students with the foundation for understanding how interdisciplinary approaches are necessary to solve complex health-related problems. Students will also synthesize primary literature related to informatics issues relevant to highthroughput sequencing, gut microbiomics, and applications of biotechnology in disease therapy. The results of this project will serve as basis for a publication including research students. Additionally, this project will serve as the basis for one and two week modules in General Biology and Microbiology courses, exposing potentially hundreds of students to how high throughput sequencing technologies are revolutionizing modern science. F. I agree to administer the GCAT-SEEK pre- and post-activity assessment test for students and to complete the faculty post-utilization survey. _X_ yes, ____ no G. Describe any other broader impact or intellectual merit considerations. Analysis of microbial community structure in this longitudinal inflammatory bowel disease cohort can potentially reveal bacteria or groups of bacteria associated with flare-up and remission states. These findings can potentially lead to the development of novel biomarkers or even future potential therapeutics for the disease. H. References 1. Willing BP, Dicksved J, Halfvarson J, Andersson AF, Lucio M, Zheng Z, Jarnerot G, Tysk C, Jansson JK, Engstrand L: A Pyrosequencing Study in Twins Shows That Gastrointestinal Microbial Profiles Vary With Inflammatory Bowel Disease Phenotypes. Gastroenterology (2010) 139(6):1844-U1105. 2. Caporaso JG, Lauber CL, Walters WA, Berg-Lyons D, Lozupone CA, Turnbaugh PJ, Fierer N, Knight R: Global patterns of 16S rRNA diversity at a depth of millions of sequences per sample. P Natl Acad Sci USA (2011) 108(4516-4522. 3. Caporaso JG, Kuczynski J, Stombaugh J, Bittinger K, Bushman FD, Costello EK, Fierer N, Pena AG, Goodrich JK, Gordon JI, Huttley GA et al: QIIME allows analysis of high-throughput community sequencing data. Nat Methods (2010) 7(5):335-336. 4. Caporaso JG, Bittinger K, Bushman FD, DeSantis TZ, Andersen GL, Knight R: PyNAST: a flexible tool for aligning sequences to a template alignment. Bioinformatics (2010) 26(2):266-267.