NGS Bioinformatics Workshop 1.2 Tutorial – Sequence Formats, Databases and Visualization Tools March 15th, 2012 BioSci room B9242 Facilitator: Richard Bruskiewich Adjunct Professor, MBB Learning Objectives Linux revisited Quick dive into the Open-Bio pool (BioPython) A first look at NGS data: NCBI short read archive Processing NGS: FASTX tool kit et al. Visualization: IGV Files and Permission Linux user permissions: owner, group, or others Owner/user is the person who created the file “OWNS” the file / directory Group is a team of people that’s associated together GROUP project / Team work Others is just other people on the server Each file / directory can have it’s permission set to (r)ead, (w)rite, or e(x)ecute chmod: change file permissions Do a long listing (ls –l) dr-x-wxrw- Separated into four sections (d)(r - x)(- w x)(r w -) directory or file (-) user (owner) group others Examples: chmod o+x foo.txt grant ‘execute’ permission to ‘others’ on foo.txt chmod g-rw foo.txt remove ‘read’ and ‘write’ permission from group chmod ugo+rwx foo.txt grant all rights to everyone To change the user/group (‘owner’) of a file: chmod ubuntu:ubuntu foo.txt a few useful tips… Hitting “tab” will auto-complete file or program names (or suggest possible names) Up arrow will let you return to previous commands Editing of text files: “nano” is an easier alternative to “emacs”, but less powerful alternatively, use SSH client to transfer files on your Windows desktop, edit them in Windows, then transfer back BUT: make sure you use a text editor that knows the difference between a Windows and a Linux text file (e.g. Notepad++) Some more useful basic Linux commands “cd” changes your directory, e.g. ‘cd /usr/local’ “man” display manual for command, e.g. ‘man ‘ls’ “pwd” tells you the directory you are currently in (= working directory) “history” will list recent commands, enumerated with line numbers. By; typing an exclamation point with the line number (e.g. !123), you can redo the command Accessing remote servers “ssh” – Secure Shell ssh –i private_keypair user@host “scp” – Secure CoPy ssh –i private_keypair [user@host:]sourcefile [user@host:]targetfile Where user is the account (default: local user) and host is the internet name of the computer (defaults: local host) OpenBio Case Study: BioPython http://biopython.org/wiki/Biopython http://biopython.org/DIST/docs/tutorial/Tutorial.html NGS Bioinformatics Workshop 1.2 Tutorial – Sequence Formats, Databases and Visualization Tools FIRST LOOK AT NGS DATA http://www.ncbi.nlm.nih.gov/sra/ http://hannonlab.cshl.edu/fastx_toolkit/ Linux, MacOSX or Unix only Get the precompiled binary wget http://hannonlab.cshl.edu/fastx_toolkit/ fastx_toolkit_0.0.13_binaries_Linux_2.6_amd64.tar.bz2 bunzip2 fastx_toolkit_0.0.13_binaries_Linux_2.6_amd64.tar.bz2 tar –xvf fastx_toolkit_0.0.13_binaries_Linux_2.6_amd64.tar sudo mv bin/* /usr/local/bin FASTX tool kit I FASTQ-to-FASTA converter FASTQ Information Shortening reads in a FASTQ or FASTQ files (removing barcodes or noise). FASTQ/A Renamer Collapsing identical sequences in a FASTQ/A file into a single sequence (while maintaining reads counts) FASTQ/A Trimmer Chart Quality Statistics and Nucleotide Distribution FASTQ/A Collapser Convert FASTQ files to FASTA files. Renames the sequence identifiers in FASTQ/A file. FASTQ/A Clipper Removing sequencing adapters / linkers FASTX tool kit II FASTQ/A Reverse-Complement Producing the Reverse-complement of each sequence in a FASTQ/FASTA file. FASTQ/A Barcode splitter FASTA Formatter Filters sequences based on quality FASTQ Quality Trimmer Converts FASTA sequences from/to RNA/DNA FASTQ Quality Filter Changes the width of sequences line in a FASTA file FASTA Nucleotide Changer Splitting a FASTQ/FASTA files containing multiple samples Trims (cuts) sequences based on quality FASTQ Masker Masks nucleotides with 'N' (or other character) based on quality www.bioinformatics.bbsrc.ac.uk/projects/download.html http://www.bioinformatics.bbsrc.ac.uk/projects/fastqc/ Integrative Genomics Viewer http://www.broadinstitute.org/igv/