Extension_day_tutorial

advertisement
School of Interdisciplinary Informatics
presents
Extension Day for the Bioinformatics Module
Part One: Installation
BioLinux is an open source virtual machine. It runs Ubuntu (Unix), and has several
hundred, freely available, bioinformatics programs preinstalled. You can
download it to your computer, provided your computer is running virtualizing
software. We use Oracle VirtualBox, because it is free, and available for many
different operating systems.
Installation of BioLinux as a Guest OS
To install BioLinux 6.0 as Guest OS in Windows/Mac, determine what type of
processor your machine is and what version of windows is currently installed.
This information can be obtained in Window’s My Computer Properties (right
click the “Computer” icon in Windows and select “Properties”). An example of
such information for the various make and model machines we have used is given
in the following table.
Processor
Windows
Version
BioLinux
Remarks
HP
Pavilion
AMD 64 bit
Microsoft
Windows 7 32bit
Home Edition
BioLinux 6
(VirtualBox
Ubuntu64bit)
Hit F10 in startup, go to
System Configuration and
Enable SVM support
[APPENDIX C]
Dell
Vostro
Intel i3
Microsoft
Windows 7 64bit
Home Edition
BioLinux 6
(VirtualBox
Ubuntu64bit)
Virtualization supported
AMD Vision
Microsoft
Premium 64 Windows 7 32 Bit
bit
Enterprise Edition
BioLinux 6
(VirtualBox
Ubuntu64bit)
Virtualization supported
Model
Toshiba
Satellite
Installation of VirtualBox
Install Oracle VirtualBox with recommended settings:
 The software can be downloaded from the following location
http://www.virtualbox.org/wiki/Downloads
a) Select the choice for your OS.
b) Click “Save File”.
c) Go to the “Downloads” folder on your computer, and open (double-click) the
“VirtualBox-4…” file.
d) Click “Run” in the window. We will proceed by selecting all the defaults for
Virtual Box installation.
a) The VirtualBox wizard will open. Click “Next”.
b) The defaults should be good. Click “Next”.
c) Create the shortcut icons. Click “Next”.
d) Click “Yes”.
e) Click “Install”.
f) Click “Install”.
g) Click “Finish”.
Installation of BioLinux
To download your own version of BioLinux 6.0,
1. Go to biolinux downloads
(http://nebc.nerc.ac.uk/tools/bio-linux/bl_download).
2. Fill out the form and download the image ISO to your “Downloads” folder.
(http://nebc.nerc.ac.uk/downloads/bio-linux-6-latest.iso).
3. Use your virtualization software to create a biolinux computer:
a) Run VirtualBox. In the VirtualBox screen, press the "New" Button.
b) Press the "Next" Button.
c) Enter Name of Virtual Box (e.g., Biolinux). Choose Linux in Operating
System Dropdown and Ubuntu (64bit) in Version Dropdown.
d) Choose the amount of RAM (memory) for virtual OS (e.g., Half of what is
available).
e) Click on “Create new hard disk” and click "Next”.
f) Click "Next".
g) Choose “Fixed-size Storage” and click "Next".
h) The location is by default the name of Virtual Box from step “C”. Give the
disk size for Biolinux Operating System (20 GB is a good size), and click
“Next”. 6.7 GB is the minimum size, but be careful to keep the size well
under 100 GB.
i) Click on "Finish".
j) Wait while it formats the disk space.
k) Click on "Finish".
l) The screenshot after creating a new VirtualBox.
m) Click on "Settings" Button and Select "System" Tab. In “Boot Order”: Move
"Hard Disk" to top (Arrow Button on right), and de-select "Floppy".
n) Click on "Storage" Tab. Select “Empty” IDE Controller and click on the
CDROM picture on the right, then click on “Choose a virtual CD/DVD disk
file…”.
o) Browse to the location where you have downloaded the ISO of BioLinux
and click "Open".
p) Click "OK".
]
q) This is now configured to run live Bio-Linux 6. Click on "Start" to run.
 Run the BioLinux VirtualBox by clicking on "Start”.
a) Click on "OK".
b) To install the Bio-Linux, click on "Install Bio-Linux 6".
c) Select your time zone.
d) Make sure you select “USA” as your keyboard layout.
e) Click on "Forward".
f) Enter your name, username to the OS, password and computer name.
g) Click on "Install".
h) This step will take about an hour. (Time may vary for different machines.)
i) The installation is complete. Click on "Restart Now".
j) At this step just press the “ENTER” key.
k) This is the login screen after the system has restarted. Click on your
username.
l) Enter your password and click "Log In".
Notes:
By default, there is a user called "nebc" created. You can ignore that user, or you
can remove it from the system by:
sudo userdel -r nebc
Exercises
Access to the command line is achieved through terminals. You can open a
terminal by:
 clicking on the “terminal icon” in the top taskbar
 clicking on the “Applications” menu in the top taskbar, choosing
“Accessories”, and then choosing “Terminal”
 press Alt+F2 and type gnome-terminal

going into an already open terminal and typing the command
>gnome-terminal &
Part Two: EXERCISES
Exercise 1: Use a sequence analysis tool that is available in BioLinux.
EMBOSS Command Line Tools
The European Molecular Biology Open Software Suite (EMBOSS) is a collection of
many computational tools used in bioinformatics. In this exercise we will use
some of these tools to retrieve and analyze DNA sequence data.
To use the EMBOSS command line programs, start the BioLinux and open up a
terminal, using one of the methods described in Part One. Then:
Step 1: Get a Strawberry DNA Sequence using “seqret” (sequence retrieval). Type
in the commands that are high-lighted /shaded:
$ seqret
Reads and writes (returns) sequences.
Input (gapped) sequence(s): embl:af193789
output sequence(s) [af193789.fasta]:
$ view af193789.fasta
Step 2: To get sequence data containing complete information relating to the
sequence, use “entret” (entry retrieval). Type in the commands that are highlighted/shaded:
$ entret
Retrieves sequence entries from flatfile databases and files.
Input sequence(s): embl:af193789
Output file: Full text of a sequence database entry
[af193789.entret]:
$ view af193789.entret
Step 3: Find the coding region from the entret file and write it down.
Step 4: Translate Nucleic Acid Sequence to Amino Acid Sequence using “transeq”.
Type in the commands that are high-lighted/shaded:
$ transeq
Translate nucleic acid sequences
Input nucleotide sequence(s): embl:af193789
protein output sequence(s) [af193789.pep]: protein1
$ view protein1
>AF193789_1 Fragaria x ananassa alcohol
acyltransferase (AAT) mRNA, complete cds.
TYFAKMEKIEVSINSKHTIKPSTSSTPLQPYKLTLLDQLTPPAYVP
IVFFYPITDHDFNL
PQTLADLRQALSETLTLYYPLSGRVKNNLYIDDFEEGVPYLEARV
NCDMTDFLRLRKIEC
LNEFVPIKPFSMEAISDERYPLLGVQVNVFDSGIAIGVSVSHKLI
DGGTADCFLKSWGAV
FRGCRENIIHPSLSEAALLFPPRDDLPEKYVDQMEALWFAGKK
VATRRFVFGVKAISSIQ
DEAKSESVPKPSRVHAVTGFLWKHLIAASRALTSGTTSTRLSIAA
QAVNLRTRMNMETVL
DNATGNLFWWAQAILELSHTTPEISDLKLCDLVNLLNGSVKQC
NGDYFETFKGKEGYGRM
CEYLDFQRTMSSMEPAPDIYLFSSWTNFFNPLDFGWGRTSWI
GVAGKIESASCKFIILVP
TQCGSGIEAWVNLEEEKMAMLEQDPHFLALASPKTLI*RY*LR
KIMWLVQCFDFAVNKV*
ISSPANQ*NASMIDFVYVCYPNVFPYACNQYSSLL*QMLY*AS
SYKVIYLLKIKLWKFYQ
KKKKKK
Step 5: Choose a region to translate. Type in the commands that are highlighted/shaded:
$ transeq -regions 16:1374
Translate nucleic acid sequences
Input nucleotide sequence(s): embl:af193789
protein output sequence(s) [af193789.pep]: protein2
$ view protein2
>AF193789_1 Fragaria x ananassa alcohol
acyltransferase (AAT) mRNA, complete cds.
MEKIEVSINSKHTIKPSTSSTPLQPYKLTLLDQLTPPAYVPIVFFYP
ITDHDFNLPQTLA
DLRQALSETLTLYYPLSGRVKNNLYIDDFEEGVPYLEARVNCDM
TDFLRLRKIECLNEFV
PIKPFSMEAISDERYPLLGVQVNVFDSGIAIGVSVSHKLIDGGTA
DCFLKSWGAVFRGCR
ENIIHPSLSEAALLFPPRDDLPEKYVDQMEALWFAGKKVATRR
FVFGVKAISSIQDEAKS
ESVPKPSRVHAVTGFLWKHLIAASRALTSGTTSTRLSIAAQAVN
LRTRMNMETVLDNATG
NLFWWAQAILELSHTTPEISDLKLCDLVNLLNGSVKQCNGDYF
ETFKGKEGYGRMCEYLD
FQRTMSSMEPAPDIYLFSSWTNFFNPLDFGWGRTSWIGVAG
KIESASCKFIILVPTQCGS
GIEAWVNLEEEKMAMLEQDPHFLALASPKTLI*
Step 6: Choose a different frame to translate. Type in the commands that
are high-lighted/shaded:
$ transeq -regions 16:1374 -frame=2
Translate nucleic acid sequences
Input nucleotide sequence(s): embl:af193789
protein output sequence(s) [af193789.pep]: protein3
$ view protein3
>AF193789_2 Fragaria x ananassa alcohol
acyltransferase (AAT) mRNA, complete cds.
WRKLRSV*IPNTPSNHQLPLHHFSLTSLPSWTSSLLRRMSPSCS
STPLLTMTSIFLKP*L
T*DKPFRRLSLCTIHSLEGSKTTYTSMILKKVSHTLRLE*IVT*LIF
*GFGKSSALMSLF
Q*NHLVWKQYLMSVTPCLEFKSTFSILE*QSVSPSLTSSSMEER
QTVFSSPGVLFFEGVV
KISYILVSLKQHCFSHREMTCLKSMSIRWKRYGLPEKKLLQGDL
YLV*KPYLQFKMKRRA
SPCPSHHEFMPSLVFSGNI*SLLLGH*HQVLLQQDFL*RPRQ*T
*EHG*TWRQCWIMPLE
TCSGGHRPY*S*VIQHQRSVILSCVTWLTCSMDLSNNVTVITLR
LSRVKRDMEECASI*I
FRGL*VLWNQHRIFIYSRAGLIFSTHLILDGGGHHGLELQEKLNL
QVASS*Y*FQHNAVL
ELKRG*I*KKRKWLC*NKIPIF*R*HLQRP*FX
Exercise 2: View Structure of “Haemoglobin” Protein
Step 1: Start BioLinux and open Firefox Mozilla browser. Then go to the NCBI
website (http://www.ncbi.nlm.nih.gov/) and search for “Haemoglobin”.
Step 2: In the search result page, click on the “Structure”.
Step 3: Click on the any image that is listed.
Step 4: You will be taken to the structure page for this protein. On this page, click
over “Structure View in Cn3D”.
Step 5: Save the file.
Step 6: In the Bio-Linux menu in the top toolbar, click on “Applications”, then go
to “Bioinformatics”, then to “Cn3D”.
Step 7: This will run the Cn3D software and open the following screen.
Step 8: Click on “File” menu, and “Open” submenu.
Step 9: Browse to the recently downloaded file and press OK.
Step 10: The Protein Structure of the Haemoglobin is loaded onto the screen.
Download