Clinical and Research Genomics Assignment #4 From Lecture_10-11 (April 30th): Microbiome and Metagenome Characterizations and CrossSpecies Analysis _________________________________________________________________________________________________________________________ Assignment: Analyze and contextualize raw sequence data from the PathoMap project to characterize the microbiome and metagenome of your samples. Due Date: 12:00PM on May 7th This assignment has two sets of questions. _________________________________________________________________________________________________________________________ Downloading Data 1) Download the raw, publicly available data: ftp://ftp-trace.ncbi.nlm.nih.gov/sra/srainstant/reads//BySample/sra/SRS/SRS802/SRS802263/SRR1749190/ 2) Download the SRA toolkit: http://www.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?view=software 3) Use sratoolkit2.4.4 to convert SRA to FASTQ, then FASTQ to FASTA, like this: >./fastq-dump.2.4.4 ~/Desktop/SRR1749190.sra To complete this assignment you will need the following files: P00606 – Station – SAMN03270567 P00427 – Station – SAMN03270390 PQiagenBeadWash928-1 (Culture01) – Culture Sample – SAMN3271460 P00046 – Sequence Sample – SAMN03270043 Any Gowanus sample (Sample ID GCSS-XX, SAMN03271470-1484) Any abandoned station sample (Sample ID PABXX, SAMN03271423-52) Running MetaPhlAn Upload data to the free Galaxy instance of MetaPhlAn 1.7 and run on default parameters: http://huttenhower.sph.harvard.edu/galaxy/ Running BLAST The sequences will be too large for the web-based BLAST tool to handle so you will need to take a subset of the sequence and run that. For those familiar with the command line, you can open terminal and run: head –n 1000 sequence.fasta > sequence_1000.fasta This will take the first 1000 lines of your sequence If you are not familiar with terminal you can just open the sequence in a text editor and copy a part of the sequence and paste into BLAST. To run BLAST follow the tutorial posted on the website: http://physiology.med.cornell.edu/faculty/mason/lab/clinicalgenomics/how_to_use_blast.pdf _________________________________________________________________________________________________________________________ Short Answer Questions 1. Summarize the MetaPhlAn/BLAST results. Do you see any similarities? Any differences? For each sample list 5 species you found interesting. Give the species name as well as a brief description of the organism. 2. Did you find anything interesting in sample P00606? Hint: Check the MetaPhlAn results in the Enterobacteriales family. Now take that sample and run it on MetaPhlAn v2.0 on Galaxy. Have the results changed? If so, what could be a reason? 3. Did you find anything interesting in sample P00427? Hint: Check the MetaPhlAn results in the Bacillaceae family. Do you find this finding plausible? Why or why not? 4. We found interesting molecular echoes that make the abandoned station and Gowanus samples stand out against the rest of our dataset. Are there any organisms that you found only in the Gowanus, and only in the abandoned station, but not in the other samples? Any explanation as to why/how they were introduced there? 5. Compare and contrast the culture and sequencing methods. What are the pros/cons of each? Compare the results of Culture01 and P00046, they were taken from the same station, but one swab was cultured then sequenced, the other was sequenced. Do you find any organisms found in both samples? _________________________________________________________________________________________________________________________ Essay Questions 6. The findings of plague and anthrax in our dataset were scrutinized by some scientists and the media. In response we posted a blog that highlighted further research into these findings. Read “The Long Road from Data to Wisdom, and from DNA to Pathogen” http://www.pathomap.org/2015/03/10/long-roaddata-wisdom-dna-pathogen/. Also read, “http://read-lab-confederation.github.io/nyc-subwayanthrax-study/. What do you think of the discussion? In your opinion, what challenges face the field of metagenomics? 7. Put on the hat of metagenomics expert. You now have a massive metagenomic dataset of nearly 1500 samples collected from New York City’s subway system that represent all species in a city. What sort of analysis would you like to perform and research question would you like to delve into? 8. After reading the study and playing with the data yourself, has your perception of the subway changed at all? Some coverage included the comment , “It’s probably fine to lick the subway poles.” http://gothamist.com/2015/02/05/quit_cryin_mama_loves_u.php Would you? Why or why not? _________________________________________________________________________________________________________________________ Please hand the assignment on the day of the lecture, or email if you cannot attend. For any questions, please contact Ebrahim Afshinnekoo (eba2001@med.cornell.edu), Priyanka Vijay (prv2004@med.cornell.edu), or Professor Mason (chm2042@med.cornell.edu).