School of Interdisciplinary Informatics presents Extension Day for the Bioinformatics Module Part One: Installation BioLinux is an open source virtual machine. It runs Ubuntu (Unix), and has several hundred, freely available, bioinformatics programs preinstalled. You can download it to your computer, provided your computer is running virtualizing software. We use Oracle VirtualBox, because it is free, and available for many different operating systems. Installation of BioLinux as a Guest OS To install BioLinux 6.0 as Guest OS in Windows/Mac, determine what type of processor your machine is and what version of windows is currently installed. This information can be obtained in Window’s My Computer Properties (right click the “Computer” icon in Windows and select “Properties”). An example of such information for the various make and model machines we have used is given in the following table. Processor Windows Version BioLinux Remarks HP Pavilion AMD 64 bit Microsoft Windows 7 32bit Home Edition BioLinux 6 (VirtualBox Ubuntu64bit) Hit F10 in startup, go to System Configuration and Enable SVM support [APPENDIX C] Dell Vostro Intel i3 Microsoft Windows 7 64bit Home Edition BioLinux 6 (VirtualBox Ubuntu64bit) Virtualization supported AMD Vision Microsoft Premium 64 Windows 7 32 Bit bit Enterprise Edition BioLinux 6 (VirtualBox Ubuntu64bit) Virtualization supported Model Toshiba Satellite Installation of VirtualBox Install Oracle VirtualBox with recommended settings: The software can be downloaded from the following location http://www.virtualbox.org/wiki/Downloads a) Select the choice for your OS. b) Click “Save File”. c) Go to the “Downloads” folder on your computer, and open (double-click) the “VirtualBox-4…” file. d) Click “Run” in the window. We will proceed by selecting all the defaults for Virtual Box installation. a) The VirtualBox wizard will open. Click “Next”. b) The defaults should be good. Click “Next”. c) Create the shortcut icons. Click “Next”. d) Click “Yes”. e) Click “Install”. f) Click “Install”. g) Click “Finish”. Installation of BioLinux To download your own version of BioLinux 6.0, 1. Go to biolinux downloads (http://nebc.nerc.ac.uk/tools/bio-linux/bl_download). 2. Fill out the form and download the image ISO to your “Downloads” folder. (http://nebc.nerc.ac.uk/downloads/bio-linux-6-latest.iso). 3. Use your virtualization software to create a biolinux computer: a) Run VirtualBox. In the VirtualBox screen, press the "New" Button. b) Press the "Next" Button. c) Enter Name of Virtual Box (e.g., Biolinux). Choose Linux in Operating System Dropdown and Ubuntu (64bit) in Version Dropdown. d) Choose the amount of RAM (memory) for virtual OS (e.g., Half of what is available). e) Click on “Create new hard disk” and click "Next”. f) Click "Next". g) Choose “Fixed-size Storage” and click "Next". h) The location is by default the name of Virtual Box from step “C”. Give the disk size for Biolinux Operating System (20 GB is a good size), and click “Next”. 6.7 GB is the minimum size, but be careful to keep the size well under 100 GB. i) Click on "Finish". j) Wait while it formats the disk space. k) Click on "Finish". l) The screenshot after creating a new VirtualBox. m) Click on "Settings" Button and Select "System" Tab. In “Boot Order”: Move "Hard Disk" to top (Arrow Button on right), and de-select "Floppy". n) Click on "Storage" Tab. Select “Empty” IDE Controller and click on the CDROM picture on the right, then click on “Choose a virtual CD/DVD disk file…”. o) Browse to the location where you have downloaded the ISO of BioLinux and click "Open". p) Click "OK". ] q) This is now configured to run live Bio-Linux 6. Click on "Start" to run. Run the BioLinux VirtualBox by clicking on "Start”. a) Click on "OK". b) To install the Bio-Linux, click on "Install Bio-Linux 6". c) Select your time zone. d) Make sure you select “USA” as your keyboard layout. e) Click on "Forward". f) Enter your name, username to the OS, password and computer name. g) Click on "Install". h) This step will take about an hour. (Time may vary for different machines.) i) The installation is complete. Click on "Restart Now". j) At this step just press the “ENTER” key. k) This is the login screen after the system has restarted. Click on your username. l) Enter your password and click "Log In". Notes: By default, there is a user called "nebc" created. You can ignore that user, or you can remove it from the system by: sudo userdel -r nebc Exercises Access to the command line is achieved through terminals. You can open a terminal by: clicking on the “terminal icon” in the top taskbar clicking on the “Applications” menu in the top taskbar, choosing “Accessories”, and then choosing “Terminal” press Alt+F2 and type gnome-terminal going into an already open terminal and typing the command >gnome-terminal & Part Two: EXERCISES Exercise 1: Use a sequence analysis tool that is available in BioLinux. EMBOSS Command Line Tools The European Molecular Biology Open Software Suite (EMBOSS) is a collection of many computational tools used in bioinformatics. In this exercise we will use some of these tools to retrieve and analyze DNA sequence data. To use the EMBOSS command line programs, start the BioLinux and open up a terminal, using one of the methods described in Part One. Then: Step 1: Get a Strawberry DNA Sequence using “seqret” (sequence retrieval). Type in the commands that are high-lighted /shaded: $ seqret Reads and writes (returns) sequences. Input (gapped) sequence(s): embl:af193789 output sequence(s) [af193789.fasta]: $ view af193789.fasta Step 2: To get sequence data containing complete information relating to the sequence, use “entret” (entry retrieval). Type in the commands that are highlighted/shaded: $ entret Retrieves sequence entries from flatfile databases and files. Input sequence(s): embl:af193789 Output file: Full text of a sequence database entry [af193789.entret]: $ view af193789.entret Step 3: Find the coding region from the entret file and write it down. Step 4: Translate Nucleic Acid Sequence to Amino Acid Sequence using “transeq”. Type in the commands that are high-lighted/shaded: $ transeq Translate nucleic acid sequences Input nucleotide sequence(s): embl:af193789 protein output sequence(s) [af193789.pep]: protein1 $ view protein1 >AF193789_1 Fragaria x ananassa alcohol acyltransferase (AAT) mRNA, complete cds. TYFAKMEKIEVSINSKHTIKPSTSSTPLQPYKLTLLDQLTPPAYVP IVFFYPITDHDFNL PQTLADLRQALSETLTLYYPLSGRVKNNLYIDDFEEGVPYLEARV NCDMTDFLRLRKIEC LNEFVPIKPFSMEAISDERYPLLGVQVNVFDSGIAIGVSVSHKLI DGGTADCFLKSWGAV FRGCRENIIHPSLSEAALLFPPRDDLPEKYVDQMEALWFAGKK VATRRFVFGVKAISSIQ DEAKSESVPKPSRVHAVTGFLWKHLIAASRALTSGTTSTRLSIAA QAVNLRTRMNMETVL DNATGNLFWWAQAILELSHTTPEISDLKLCDLVNLLNGSVKQC NGDYFETFKGKEGYGRM CEYLDFQRTMSSMEPAPDIYLFSSWTNFFNPLDFGWGRTSWI GVAGKIESASCKFIILVP TQCGSGIEAWVNLEEEKMAMLEQDPHFLALASPKTLI*RY*LR KIMWLVQCFDFAVNKV* ISSPANQ*NASMIDFVYVCYPNVFPYACNQYSSLL*QMLY*AS SYKVIYLLKIKLWKFYQ KKKKKK Step 5: Choose a region to translate. Type in the commands that are highlighted/shaded: $ transeq -regions 16:1374 Translate nucleic acid sequences Input nucleotide sequence(s): embl:af193789 protein output sequence(s) [af193789.pep]: protein2 $ view protein2 >AF193789_1 Fragaria x ananassa alcohol acyltransferase (AAT) mRNA, complete cds. MEKIEVSINSKHTIKPSTSSTPLQPYKLTLLDQLTPPAYVPIVFFYP ITDHDFNLPQTLA DLRQALSETLTLYYPLSGRVKNNLYIDDFEEGVPYLEARVNCDM TDFLRLRKIECLNEFV PIKPFSMEAISDERYPLLGVQVNVFDSGIAIGVSVSHKLIDGGTA DCFLKSWGAVFRGCR ENIIHPSLSEAALLFPPRDDLPEKYVDQMEALWFAGKKVATRR FVFGVKAISSIQDEAKS ESVPKPSRVHAVTGFLWKHLIAASRALTSGTTSTRLSIAAQAVN LRTRMNMETVLDNATG NLFWWAQAILELSHTTPEISDLKLCDLVNLLNGSVKQCNGDYF ETFKGKEGYGRMCEYLD FQRTMSSMEPAPDIYLFSSWTNFFNPLDFGWGRTSWIGVAG KIESASCKFIILVPTQCGS GIEAWVNLEEEKMAMLEQDPHFLALASPKTLI* Step 6: Choose a different frame to translate. Type in the commands that are high-lighted/shaded: $ transeq -regions 16:1374 -frame=2 Translate nucleic acid sequences Input nucleotide sequence(s): embl:af193789 protein output sequence(s) [af193789.pep]: protein3 $ view protein3 >AF193789_2 Fragaria x ananassa alcohol acyltransferase (AAT) mRNA, complete cds. WRKLRSV*IPNTPSNHQLPLHHFSLTSLPSWTSSLLRRMSPSCS STPLLTMTSIFLKP*L T*DKPFRRLSLCTIHSLEGSKTTYTSMILKKVSHTLRLE*IVT*LIF *GFGKSSALMSLF Q*NHLVWKQYLMSVTPCLEFKSTFSILE*QSVSPSLTSSSMEER QTVFSSPGVLFFEGVV KISYILVSLKQHCFSHREMTCLKSMSIRWKRYGLPEKKLLQGDL YLV*KPYLQFKMKRRA SPCPSHHEFMPSLVFSGNI*SLLLGH*HQVLLQQDFL*RPRQ*T *EHG*TWRQCWIMPLE TCSGGHRPY*S*VIQHQRSVILSCVTWLTCSMDLSNNVTVITLR LSRVKRDMEECASI*I FRGL*VLWNQHRIFIYSRAGLIFSTHLILDGGGHHGLELQEKLNL QVASS*Y*FQHNAVL ELKRG*I*KKRKWLC*NKIPIF*R*HLQRP*FX Exercise 2: View Structure of “Haemoglobin” Protein Step 1: Start BioLinux and open Firefox Mozilla browser. Then go to the NCBI website (http://www.ncbi.nlm.nih.gov/) and search for “Haemoglobin”. Step 2: In the search result page, click on the “Structure”. Step 3: Click on the any image that is listed. Step 4: You will be taken to the structure page for this protein. On this page, click over “Structure View in Cn3D”. Step 5: Save the file. Step 6: In the Bio-Linux menu in the top toolbar, click on “Applications”, then go to “Bioinformatics”, then to “Cn3D”. Step 7: This will run the Cn3D software and open the following screen. Step 8: Click on “File” menu, and “Open” submenu. Step 9: Browse to the recently downloaded file and press OK. Step 10: The Protein Structure of the Haemoglobin is loaded onto the screen.