ppt

advertisement
NGS Bioinformatics Workshop
1.2 Tutorial – Sequence Formats, Databases and
Visualization Tools
March 15th, 2012
BioSci room B9242
Facilitator: Richard Bruskiewich
Adjunct Professor, MBB
Learning Objectives
Linux revisited
Quick dive into the Open-Bio pool (BioPython)
A first look at NGS data:
NCBI short read archive
Processing NGS: FASTX tool kit et al.
Visualization: IGV
Files and Permission
 Linux user permissions: owner, group, or others
Owner/user is the person who created the file
“OWNS” the file / directory
Group is a team of people that’s associated together
GROUP project / Team work
Others is just other people on the server
 Each file / directory can have it’s permission set
to (r)ead, (w)rite, or e(x)ecute
chmod: change file permissions
Do a long listing (ls –l)
 dr-x-wxrw- Separated into four sections

(d)(r - x)(- w x)(r w -)
directory or file (-)
user (owner)
group
others
Examples:
chmod o+x foo.txt
 grant ‘execute’ permission to ‘others’ on foo.txt
chmod g-rw foo.txt
 remove ‘read’ and ‘write’ permission from group
chmod ugo+rwx foo.txt  grant all rights to everyone
To change the user/group (‘owner’) of a file:
chmod ubuntu:ubuntu foo.txt
a few useful tips…
 Hitting “tab” will auto-complete file or program names (or
suggest possible names)
 Up arrow will let you return to previous commands
 Editing of text files: “nano” is an easier alternative to “emacs”,
but less powerful
 alternatively, use SSH client to transfer files on your Windows desktop, edit
them in Windows, then transfer back
 BUT: make sure you use a text editor that knows the difference between a
Windows and a Linux text file (e.g. Notepad++)
Some more useful basic Linux commands
“cd” changes your directory, e.g. ‘cd /usr/local’
“man” display manual for command, e.g. ‘man
‘ls’
“pwd” tells you the directory you are currently
in (= working directory)
“history” will list recent commands,
enumerated with line numbers. By; typing an
exclamation point with the line number (e.g.
!123), you can redo the command
Accessing remote servers
“ssh” – Secure Shell
ssh –i private_keypair user@host
“scp” – Secure CoPy
ssh –i private_keypair [user@host:]sourcefile
[user@host:]targetfile
Where user is the account (default: local user)
and host is the internet name of the computer
(defaults: local host)
OpenBio Case Study: BioPython
http://biopython.org/wiki/Biopython
http://biopython.org/DIST/docs/tutorial/Tutorial.html
NGS Bioinformatics Workshop
1.2 Tutorial – Sequence Formats, Databases and Visualization Tools
FIRST LOOK AT NGS DATA
http://www.ncbi.nlm.nih.gov/sra/
http://hannonlab.cshl.edu/fastx_toolkit/
Linux, MacOSX or Unix only
Get the precompiled binary
wget http://hannonlab.cshl.edu/fastx_toolkit/
fastx_toolkit_0.0.13_binaries_Linux_2.6_amd64.tar.bz2
bunzip2
fastx_toolkit_0.0.13_binaries_Linux_2.6_amd64.tar.bz2
tar –xvf
fastx_toolkit_0.0.13_binaries_Linux_2.6_amd64.tar
sudo mv bin/* /usr/local/bin
FASTX tool kit I
 FASTQ-to-FASTA converter


FASTQ Information


Shortening reads in a FASTQ or FASTQ files (removing
barcodes or noise).
FASTQ/A Renamer


Collapsing identical sequences in a FASTQ/A file into a single
sequence (while maintaining reads counts)
FASTQ/A Trimmer


Chart Quality Statistics and Nucleotide Distribution
FASTQ/A Collapser


Convert FASTQ files to FASTA files.
Renames the sequence identifiers in FASTQ/A file.
FASTQ/A Clipper

Removing sequencing adapters / linkers
FASTX tool kit II
 FASTQ/A Reverse-Complement

Producing the Reverse-complement of each sequence in a
FASTQ/FASTA file.
 FASTQ/A Barcode splitter


FASTA Formatter


Filters sequences based on quality
FASTQ Quality Trimmer


Converts FASTA sequences from/to RNA/DNA
FASTQ Quality Filter


Changes the width of sequences line in a FASTA file
FASTA Nucleotide Changer


Splitting a FASTQ/FASTA files containing multiple samples
Trims (cuts) sequences based on quality
FASTQ Masker

Masks nucleotides with 'N' (or other character) based on
quality
www.bioinformatics.bbsrc.ac.uk/projects/download.html
http://www.bioinformatics.bbsrc.ac.uk/projects/fastqc/
Integrative Genomics Viewer
http://www.broadinstitute.org/igv/
Download