GCAT-SEEKquence The Genome Consortium for Active Teaching NextGen Sequencing Undergraduate Education Workshop Juniata College, Huntingdon, PA June 8 to 12, 2015 Morgan State University Baltimore, MD June 15 to 19, 2015 Overview. Workshops will be organized into whole-group and small breakout sessions divided by application type: transcriptomics, bacterial genomics, metagenomics, and eukaryotic genomics. A keynote address on integration of genomics projects into the undergraduate classroom will be provided for the whole group on the first day. Days two through four will be divided into breakout sessions by application, where wet lab sample preparation and bioinformatic approaches will be covered. Early evening sessions will develop educational modules and assessment strategies that incorporate the workshop experiences into the participants’ courses. Late evening poster sessions will provide an opportunity for current and former participants and instructors to present their research, with the goal of devising strategies to leverage NextGen sequencing to advance the projects. On day five participants will briefly present their customized teaching module to other members, giving all members an overview of the alternative applications and teaching approaches. Target Audience: Undergraduate educators and students that are novices with respect to NextGen sequencing technology and bioinformatics. We will assist in all key stages of experimental design through assessment Our goal is to make it easier for faculty to integrate NextGen sequencing into classes. Workshop focuses on o using raw data as catalyst for learning by workshop participants, o publishing our teaching modules for broader benefit of the GCAT-SEEK network. Teaching modules developed will emphasize strategies for deep, rather than superficial “black box” active learning. GCAT-SEEK 2015 Workshop Description Page 1 Costs: Support from the National Science Foundation and the Howard Hughes Medical Institute (to Juniata College) funds the workshop, including housing and meals for all participants and instructors, as well partial support for sequencing runs for samples prepared at the workshop. Limited travel funding is also available and should cover most costs. Objectives: Upon completion of the workshop, participants will be able to: Design experiments using next-generation sequencing technologies Prepare nucleic acid samples and assess quality Sequence and analyze their samples Teach modules that integrate next-generation sequencing research into the classroom Assess student learning goals and track outcomes Who may apply? Any GCAT-SEEK network member (see www.GCAT-SEEK.org to join) working with undergraduates. Sorry, this workshop is not intended for graduate students or highschool teachers. Pairs of faculty from the same institution across disciplines (Bio/Chem /IT/Math/Physics) Pairs of faculty from the same discipline across institutions Excellent students with leadership credentials or potential are invited to apply with a sponsor. A maximum of 5 student/faculty pairs Applications will be accepted by single researchers open to pairing with an individual from another institution with a similar area of research interest (only one project will be sequenced depending on funds available and size of project). Individuals without projects or team-mates will be considered if we need more people to fill the workshop. What will be proposed? Nature of project (feasibility). You aren’t expected to know all the details, but the more that is in there, the easier it will be to judge the potential of the project and whether the project will fit within the teaching framework we developed below. Please see the description of the workshops below to determine the kind of projects that would be a closer fit for the planned educational content. Description of course in which the module will be integrated (feasibility for curricular integration) including number of students in course per year. Describe the class, and how you see the module fitting in, and when you think the modification will take place. Is next-generation sequencing already a topic covered in the class? Criteria for application evaluation: Potential to impact undergraduate education, directly and through network adoption Is the project scientifically sound? Is the project feasible given our limited sequencing funds? GCAT-SEEK 2015 Workshop Description Page 2 How many students are involved in the class where it will be used, and will changes be implemented soon? Is the project likely to provide authentic research opportunities for many students through subsequent bioinformatic projects? If a student applicant is proposed, how strong are the student’s leadership qualifications Does the investigator have a track record of accomplishment in research and education? Why pairs? Previous experience by GCAT-CHIP (M. Campbell pers. comm.) and HHMI more generally found that attendance by faculty teams increases likelihood of successful curricular integration once professors return to their home institution. Their sense of isolation is greatly decreased, and enthusiasm for the challenge of change is maintained. In particular faculty combinations of biology, information technology, biochemistry, mathematical modeling, and statistics would be highly beneficial for subsequent collaboration and integration among disciplines at the home institution. Ability to communicate and collaborate across disciplines is a core competency identified by the Vision and Change dialogues and the collaboration within an institution would model such activities for students. Faculty pairs of biologists from different institutions will facilitate a sense of community and collaboration that is not usually possible at small colleges. We plan to aggressively edit and customize teaching module templates at the workshop to publishing on our web site for the network at large. Working in teams will make that more feasible. Up to five faculty may opt, rather than bringing a faculty colleague, to invite an excellent undergraduate research student with established leadership credentials or potential to attend the workshop. Student participation helps create student leaders by further developing their knowledge, communication skills and credentials. Student participation fosters a national community of student researchers. As was documented in the Vision and Change Dialogues, the student perspective is important, often overlooked, and ultimately deepens the conversation. GCAT-SEEK 2015 Workshop Description Page 3 Eukaryotic genomics breakout session. The goals for the eukaryotic genome analysis section will be for participants to perform de novo and/or reference-based assembly, automated annotation, and comparative genomics. In advance of the workshop, participants will be asked to submit DNA samples from their organism and complete a few basic computer tutorials. During the eukaryotic genomics breakout session we will review different types of next-generation sequencer output, file types, quality scores, linkers, barcodes, assembly principles, algorithms (de novo vs resequencing), and programs. Participants will perform error correction and assembly of a practice GCATSEEK dataset using SOAP deNOVO and/or other assemblers on the Juniata College HHMI cluster and/or iPlant Atmosphere. Participants will perform gene annotation using MAKER. Additional analyses will include identification of SNPs, comparative analysis of orthologous and paralogous gene clusters, pairwise alignment of syntenic regions from closely related species that have available genome sequences, and RADseq analysis using Stacks and R. Prokaryotic genomics breakout session. The goals for the prokaryotic genome analysis section will be for participants to prepare library-construction quality DNA and appropriate documentation from their organism of interest and to perform de novo and/or reference-based assembly, and automated annotation of a real dataset. In advance of the workshop, participants may be asked to submit samples of their organism, provide any special growth information, and register to use specific bioinformatics sites. The first day of the prokaryotic genomics breakout session will begin with an overview and comparison of different approaches for gDNA isolation. Participants will then isolate gDNA from the organism of interest and set up a quality-control PCR using 16S rRNA universal primers. In the afternoon of the same day, participants will work in the computer lab to review different types of next-generation sequencer output, file types, quality scores, linkers, barcodes, assembly principles, algorithms (de novo vs resequencing), and programs. Participants will begin assembly of a practice GCAT-SEEK dataset using either the NextGENe, Geneious, and/or CLC Workbench suites on the Juniata College GCAT-SEEK cluster. While the assemblies are running, participants will review annotation methods focusing on RAST (Rapid Annotation with Subsystem Technologies), NCBI and DOE-JGI IMG (Integrated Microbial Genome) tools. Upon completion, sequence assemblies will be reviewed, and finishing strategies discussed. Assembled sequences and reference sequences (if not already present) will then be loaded into RAST and/or IMG, and annotation run overnight. During the second and third days of the prokaryotic genomics breakout session participants will assess DNA quality by Qubit quantification and electrophoresis of gDNA and PCR products. Participants will prepare documents to be sent with samples to the sequencing facility. DNA samples that pass quality control standards will be packaged and sent for sequencing. During the afternoon, participants will review the annotation results to determine subsystems present and correlate these to the organism’s phenotypes. Genomes of related organisms will be compared with several phylogenomic metrics such as average amino acid identity (AAI), average nucleotide identity (ANI), and estimated DNA-DNA Hybridization value. Gene content and order will be assessed to determine core and unique genes, and synteny. GCAT-SEEK 2015 Workshop Description Page 4 Metagenomics breakout session. The goals for the metagenomic analysis workshop will be for participants to prepare their high quality DNA samples for 16S/18S rRNA gene, ITS region, or functional gene sequencing and to learn relevant bioinformatics analyses of these datasets. In advance of the workshop, participants will be asked to submit high quality DNA extracts, provide the relevant metadata, and register to use specific open-source bioinformatics tools. The first day of the metagenomics breakout session will begin with an overview of sample preparation for sequencing. Participants interested in targeted gene sequencing will subsequently perform PCR amplification with the appropriate Illumina barcoded primers and sequencing adaptors. In the afternoon of the same day we will have an introduction to working in linux and compute cluster environments, followed by an introduction to the analysis of 16S/18S rRNA gene/ITS/functional gene sequences, using QIIME software. Briefly, this tutorial will cover de-multiplexing, quality filtering, clustering and annotation of sequences, as well as an introduction to multivariate statistical approaches to comparing different samples. The morning of the second day, our PCR amplified libraries (or other single gene libraries) will be quantified and quality checked using a Qubit Fluorometer and Agilent Bioanalyzer. In the afternoon we will also continue working on data analysis and interpretation within QIIME and provide participants with a brief overview of other statistics that can be performed outside of QIIME. On the third day of the workshop, we will have a brief lecture on preparation of shotgun metagenomic libraries followed by a hands-on tutorial of shotgun metagenomic bioinformatics. First, participants will be introduced to open-source metagenomics data analysis tools including MG-RAST, IMG/M, CAMERA in addition to in-house pipelines available on the HHMI compute cluster. Participants will also be given an introduction on how to perform metagenomics analyses on the Amazon Elastic Compute Cloud. GCAT-SEEK 2015 Workshop Description Page 5 RNAseq breakout session. The RNA-seq analysis breakout sessions will walk participants through the four major phases of RNA-seq analysis: RNA isolation and library preparation, transcriptome assembly, gene annotation, and analysis of expression differences. Prior to the workshop, participants will be asked to submit their samples of interest, provide details about the samples including treatment differences, and register for access to relevant bioinformatic tools. Following an overview of RNA isolation methods and a discussion of alternative approaches, participants will extract RNA from their samples, or participants may bring extracted samples to the workshop. These RNA samples may be used for library preparation at the workshop. Participants will then be introduced to the bioinformatic tools and approaches for RNAseq analysis using both the online Galaxy interface and basic linux command line approaches. Assembly basics, including next-generation sequencer file types, quality scores, barcodes, assembly principles, and assembly programs will be introduced. Participants will assemble a practice GCAT-SEEK dataset using the Trinity, Velvet, OASES, and/or SeqMan NGEN programs on the Juniata College GCAT-SEEK cluster. Participants will then review annotation methods, focusing on alignment to NCBI protein databases and Blast2GO, and prepare to annotate the assembled practice dataset. Finally, participants will map the sequenced reads to the newly assembled transcriptome and/or a reference transcriptome. The statistical basis for using coverage from this mapping to measure gene expression will be discussed, and various tools for analysis will be introduced. The mapping results will then be used to quantify gene expression differences using DESeq to identify group differences, the stochastic expectation maximization algorithm (SEM) to identify co-variation with phenotypes, and/or weighted gene correlation network analysis (WGCNA) to identify gene networks, all in the R statistical environment. Strategies for interpreting and visualizing the functional significance of the results (gene ontology analysis, metabolic pathway analysis, polymorphisms etc.) will be explored using the practice dataset. GCAT-SEEK 2015 Workshop Description Page 6 Evening Sessions on Pedagogical Integration. We propose to use informal evening homework sessions each night to focus on teaching module design. This section of the workshop will be coordinated by Nancy Trun, and assisted by the workshop facilitators specific to their breakout session. Teams will meet at a lounge area in their dormitory each evening and adapt materials provided from the day’s workshop session to fit the research project they are working on, and address goals of the course they identified in their application. Workshop presenters will create a teaching modules aimed at lower division students as laboratory or case-study, active-teaching experiences. Modules will contain goals, technical details and protocols, student activities and assessments that can be used by participants to teach NextGen technology. Some material and details may need to be adjusted depending on the details of the experimental design and goals, and considerations such as class size and level. The workshop participants will present to the entire group the outline of their module on the last day of the workshop. The facilitators will make sure the information in the presentation is accurate and that the presentation is designed to engage the other workshop participants The Day 1 evening session would involve editing biological context and experimental design content. Days 2 and 3 would involve editing wet lab and bioinformatic protocols. Day 4 would involve editing the student activities, assessment tools, and revising the teaching module as a whole. Day 5 AM will involve faculty teams presenting to the whole group for 15-20 minutes, and submitting their module to GCAT-SEEK for further peer review, editing, and publishing. The information to be included in the presentation will focus on: i) how their sequencing technology works, ii) what question they are using the technology for, iii) a brief overview of how their data will be analyzed, and iv) limitations of the technology in addressing the research goal. Presentations will be video recorded and published so that network members can see how the modules are supposed to work from the designers themselves. The evening homework sessions will have three purposes. First, they will reinforce what participants have learned during the workshop. Second, they will give the workshop and network participants an overview for all of the NextGen sequencing technologies. Third, they will produce accessible draft teaching modules that can be widely adopted. GCAT-SEEK 2015 Workshop Description Page 7 Assessment group session. A core objective of this project is to provide participating faculty with the skills and tools necessary to incorporate research-based next generation sequencing pedagogy into their classrooms. We hypothesize that the research-based pedagogical innovations fostered by this project will enhance student learning, particularly in the core competencies outlined in the Vision and Change 2011 report. We intend to test this hypothesis by applying the GCAT-SEEK assessment instrument, which was developed based on these core competencies, across all GCAT-SEEK projects. During the proposed workshops, participants will thus learn to analyze student progress on both consortium and course-specific learning goals using the GCAT-SEEK assessment instrument. The resulting data can then be used to assess student progress as well as to modify and improve course modules. This assessment process will ensure that student learning is continually improving and that, as technologies change, the GCAT-SEEK consortium will ensure that faculty skills and course modules change with them. By the end of the workshop, participants will have prepared their samples for sequencing at the workshop and ideally have sent them off to be processed. They will have practiced bioinformatic procedures on datasets similar to those they are producing (or your actual data in some cases), and will have gained experience editing and customizing an online student assessment tool. They will have produced education modules that will fit their research interests and data when it arrives from the sequencing core. Network members will benefit from these efforts in having available numerous completed, parallel projects by which they can engage genomics students. Network members can optionally i) use the customized modules and data without having to perform wet lab experiments themselves, or ii) use the generic modules to perform customization themselves. GCAT-SEEK 2015 Workshop Description Page 8