Introduction to Linux and HPC Presented by: Al Ritacco, Shailender Nagpal AGENDA • • • • • • Introduction to Linux How to request an HPC account How to Login to HPC Basic Linux commands Available resources How to submit a job to the cluster 2 AGENDA Introduction to Linux How to request an HPC account How to Login to HPC Basic Linux commands Available resources How to submit a job to the cluster 3 What is UNIX • Unix is an Operating System (OS), just like Microsoft "Windows" is an OS Computers – Runs on many computer "servers“, has ability to provide multi-user, multi-tasking environment – Orchestrates the various parts of the computer: the processor, the on-board memory, the disk drives, keyboards, video monitors, etc. to perform useful tasks • Unix operating system comprises three parts – the kernel (with commands to interact with it), standard utility programs/ services, system configuration files What is Linux • Linux is “souped-up” Unix, and provides additional user-friendly programs – command line interface (CLI) and graphical user interface (GUI) are available to execute commands • What exactly does this mean? – It means we can install and run scientific software as well as business applications 5 Why Unix/Linux? • UNIX is good for automation of computer tasks: – performing complex operations with very few key strokes – operating on large number of objects for e.g., • Parsing file contents (pattern matching) • Manipulating text files containing scientific data • UNIX is fast • LINUX(≈ UNIX) is free and runs on all PCs and MACs, plus specialty hardware for mobile devices • Many scientific software are freely available on Linux AGENDA Introduction to Linux How to request an HPC account (to work on Linux) How to Login to HPC Basic Linux commands Available resources How to submit a job to the cluster 7 Getting an account • To get started on using the Umass linux servers, you need to have an account. Fill out this form: https://ghpcc06.umassrc.org/hpc/index.php • Your PI has to authorize • To connect to the HPC server from Windows, use Putty client, or from a Mac, use SSH http://wiki.umassrc.org/wiki/index.php/Connecting_to_t he_Cluster 8 Working on a Linux Computer • Linux as a personal workstation • Linux/Unix as a central “server” (multi-user) – Three pieces of information – user name, password and server name or IP address • “Putty" on Windows OS can be used to connect to UMass Research Computing servers – remote login may not allow for displaying graphics - text mode interaction only – graphics or "X" can be displayed using special tools (Xming) AGENDA Introduction to Linux How to request an HPC account How to Login to HPC Basic Linux commands Available resources How to submit a job to the cluster 10 Logging into Linux • Why do we need to login? – Tracking who can login and what access they have • Logging in – Use SSH client software – Login to a particular server which has a designated name: • Ex: hpcc01.umassmed.edu, ghpcc06.umassrc.org • User credentials: user name, password – SSH Client for Windows: Putty – SSH Client for Mac/Linux: Terminal 11 Connect to UMass Servers 12 How do I interact with Linux? • Using a command line interface (CLI) where we explicitly type commands and have Linux execute them (using a command shell) • What is a command shell – A program that interprets the commands we wish to have executed by Linux • Enter “bash” – Bourne again Shell 13 Logging out of Linux • Logging Out of Linux: – To end your session use the “exit” command from the command prompt: [username@hpcc02 ~]$ exit Connection to hpcc02 closed • You can also use the key sequence (<ctrl>+D) to close a sessions 14 AGENDA Introduction to Linux How to request an HPC account How to Login to HPC Basic Linux commands Available resources How to submit a job to the cluster 15 Before we begin learning… • We will use the term Linux and UNIX interchangeably • Many variants of Linux exist – Redhat, Ubuntu, CentOS, Debian, etc. • Commands between Linux distributions will be exactly (or almost exactly) the same • Most of the commands we will be covering are applicable to other *NIX based operating systems 16 Files and Linux • Linux users are working with – Applications – Files • There are several different file types defined for different types of usage in Linux – Basic files text or binary type files (sequence files, etc.) – Executable files (programs). Programs such as bowtie, gate, ls, cp, cd, etc. 17 Things we need to do on a shell • Just like with a Windows PC, users need to: – Create, edit, move, rename and delete files – Organize files into folders and navigate the filesystem – Organize users and control permissions of what they can see and do – View and manage processes, services – Install and run programs and work with their output • In Linux, you have to learn "commands" to get above things done, implementing them on the "shell" or "command line" Filesystem: Relative and Absolute Path • The Linux file system is hierarchical and resembles a tree structure • A user in the “admin” directory can access the “steve” directory by specifying the relative path “steve” or the absolute path “/users/admin/steve”. Similarly “users” can be accessed by specifying “../users” or “/users” Linux Layout • Linux commands are typically installed under: – – – – – /bin /sbin /usr/bin /share/bin /share/apps Linux commands Typical system commands User level commands (editors, etc.) Specific cluster software Specific genome based cluster tools 20 Basic command structure • Basic form of a Unix command is: command [-options] [arguments] • Example: ls -l /tmp – “ls” is the command. It lists contents of a directory – “-l” is the option or flag or modifier of the default behavior of command. Try “ls”. – “/tmp” is the argument. Contents of this directory are shown • Aborting a shell command – most Unix systems allow to abort the current command by typing Control-C Note on Linux and commands • Linux commands are case sensitive so: – Exit is not the same as exit – Bowtie is not the same as BOWTIE – Gate is not the same as gatE • In Linux we use a / as a directory separator – In Windows we use \ as the directory separator • Linux file names can be descriptive and do not require a file extension 22 Basic Linux commands (List 1) ls cp rm mv cd mkdir pwd rmdir cat List the contents of directory Copy file(s) Remove file(s) Move file(s) Change location to another directory Make a new directory Display the path of current directory Remove a directory Display contents of file Basic Linux commands (List 1 ..contd) head tail clear vi passwd less more history Display beginning of file Display end of file Clear up the shell window Open a file for editing in the VI editor Change the password Displays contents of file with scrolling Displays contents of file with scrolling Displays history of commands executed Basic Linux commands (List 1 ..contd) date who whoami last exit wc grep man Displays the current date and time Displays who is currently logged in Displays your username Displays recent login activity Exit the shell Count words and lines in file Search for string pattern in file Display “manual” page for chosen command 25 Determining Present Working Directory with “pwd” • When user logs in, they are placed in their HOME directory, which is usually under the “/home” directory • The linux shell account name and the home directory name are usually the same, so “/home/snagpal” would be the home directory location for user “snagpal” • As users navigate the filesystem, they can check/confirm where they are currently by running the “pwd” command [snagpal@u15982204 ~]$ pwd /home/snagpal • In windows, you can view the same in the windows explorer address bar Changing directories with “cd” • Often, users need to go to another directory that is: – a sub-directory that can be accessed below in the tree hierarchy of the present working directory – a super-directory that can be accessed through the parent of the present working directory • In both cases, absolute and relative paths can be used. Lets say user is currently in “/home/snagpal” and needs to access – A sub-directory of the home directory cd linuxcourse cd /home/snagpal/linuxcourse – A super-directory of the home directory cd ../../usr/local cd /usr/local Listing files and directories with “ls” • “ls” lists files and sub-directories in a chosen directory. Windows explorer offers a rich, graphical equivalent – To list files in the current directory ls – To list files in another directory (absolute path) ls /usr/local – To modify the default view of the output to a long list ls –l /usr/local Making Directories with “mkdir” • To create new sub-directories in the home folder or elsewhere on the filesystem, use the “mkdir” command • Absolute or relative paths can be specified mkdir linuxcourse mkdir /home/snagpal/linuxcourse Removing Directories with “rmdir” • To remove directories in the home folder or elsewhere on the filesystem, use the “rmdir” command • Absolute or relative paths can be specified rmdir linuxcourse rmdir /home/snagpal/linuxcourse Copying, Moving and Removing files • Users needing to make duplicates of a file can easily do so using the “cp” command. It requires the source and destination location to be specified (absolute or relative path) cp /share/training/linux/test.txt /home/snagpal cp /share/training/linux/test.txt . • The dot “.” represents current working directory. Copying leaves a copy of the file in its original source location. Move deletes it, and also allows to rename files mv /share/training/linux/test.txt /home/snagpal/file.txt mv /share/training/linux/test.txt file.txt • To remove a file, use “rm” rm test.txt rm /home/snagpal/file.txt File Naming conventions in Linux • To name files and directories, use: – – – – – characters numbers period dash underscore A-Z, a-z 0-9 . _ • Files and Directory with shell meta characters in the name should be avoided, such as: \ / < > ! $ % ^ & * | { } [ ] “ ‘ ` ; ~ The “vi” editor (…contd) • To exit the “vi” editor and return to the linux prompt, you have to return to command mode, by pressing the “Esc” key. Then use the “:” key to enter the command line mode wq w! q! saves the current changes and exits vi saves the current changes but doesn’t exit vi exits vi without saving any change • There are many more commands to execute in the command mode and command line mode. A vi tutorial is suggested Creating and editing files • Linux has many text editors, most commonly “vi”, but “emacs”, “pico” and “nano” can also be installed • Most common syntax is: vi newfile.txt vi existingfile.txt # Creates new file # Opens existing file • The filename is checked to see if it exists. If it does, it is displayed. If not, a new file with the name is created • By default, “vi” opens in command mode. Users can scroll in the file – up, down, page up, page down, move cursor, delete lines, undo, etc • To enter the “write” or “insert” mode for adding text, users press the “i” or “a” key on keyboard. To exit, press “Esc” key Searching for patterns in text with “grep” • Grep searches line-by-line for a specified pattern, and outputs any line that matches the pattern. Basic syntax for the grep command is: grep [options] pattern [files] cp /share/training/linux/seq.fasta . grep ">" seq.fasta grep TCGAAGA seq.fasta • Many “options”, also searches using regular expressions (a mathematical expression that expresses the characteristics of one or more strings, e.g.:te?xt, *omics Counting words in file with “wc” • The “wc” command counts words and lines in a file cp /share/training/linux/abstract.txt . cat abstract.txt wc abstract.txt wc –l abstract.txt Text processing Linux Commands $ $ $ $ $ head -2 file_name List the first two lines tail -2 file_name List last two lines head -5 file_name|tail -1 List fifth line cat file_name|head -50|tail -1 List 50th line cat file|sort -rn|tail -5 List the last 5 items (sorted in reverse numerical order) $ sort -rn file|uniq –c Sort a file, and count the number of line occurrences 37 Miscellaneous commands • Displaying current date and time with "date“ date • Clearing the terminal with "clear“ clear • Displaying history of commands with "history“ history Getting Help in Unix • Use the man command, followed by the name of the command you need help with – Type ‘man ls’ to see the manual page for the "ls" command man ls User convenience features • Shell tab completion with suggestions • Shell expansion of wild-cards for specifying multiple arguments ls –l *.txt • Combining options/flags ls –la *.txt • Using flag names with "--“ • Copying and pasting clipboard with left and right mouse clicks Tying Linux commands together • All commands are executed left -> right (LR) – Output is expressed in the same manner • Linux Pipes ‘|’ and commands • Ex: determine how many sequences we have $ cat sequence.fastq | wc There are 4 lines per sequence in a fastq, how can we determine the # of sequences (x/4): $ wc -l sequence.fastq| awk '{print $1}‘ | xargs -i echo "scale=0; {}/4“ | bc -l 41 Linux/UNIX Redirection • What is redirection? – Linux uses the notion of < and > for redirection of input and output respectively. – A redirection using > allows the user to save the output to a file for example. In the same way > redirects output, < redirects input from for example the keyboard to a file for input. – Ex: echo “test” > file1 # “test” to file1 – Ex: cat < file1 # output the “file1” file 42 Redirection (..contd) • A word on redirection: be careful when using redirection to a file, as a single > (redirect output from stdout to a file) will overwrite (or create) a file, whereas a >> (two > signs in a row) will attempt to append to a file thus preserving the initial file input. 43 Redirection (…contd) • If we create two files (file1/2) with Line1, and Line2 in them respectively • We can then create a new file using the > Redirection operator $ cat file1 file2 > file3 44 Redirection (…contd) • Using bowtie with re-direction – Ex: analyze fastq files to look for all alignments per read, with hits guaranteed best stratum (with ties broken by quality), and reporting 2 end-to-end hits • In the bowtie example we are redirecting the output of the bowtie alignment reads to the file we have named ‘output_file’ in your scratch dir. $ bowtie -a --best -v 2 upstream_mate downstream_mate.fastq > ~/scratch/output_file 45 Shortcut BASH keystrokes • Keyboard shortcut timesavers in BASH – – – – – – CTRL + A Move cursor to start of line CTRL + C Stop a program CTRL + D Logout (Same as ‘exit’ command) CTRL + E Move Cursor to end of line CTRL + Z Suspend program TAB Command completion (type part of command and hit tab to complete command) – TAB TAB Shows all commands available 46 Executing Commands • PATH – Commands are part of your shell’s PATH • For example: when we type a command such as ‘ls’ the command will be run as it is part of the search PATH – An example PATH is $ echo $PATH /bin:/sbin/:/home/ritaccoa – Commands which are not in your PATH will not be found and therefore not executed 47 Calling external bioinformatics programs • On our server, several Bioinformatics software are installed $ module avail • General method to using a software is to load the software’s module $ module load bowtie/1.0.2 $ bowtie --help AGENDA Introduction to Linux How to request an HPC account How to Login to HPC Basic Linux commands Available resources How to submit a job to the cluster 49 HPC infrastructure at UMass RC • Massachusetts Green High Performance Computing Cluster – 10264 cores available, each node has 196 - 512 GB RAM. 12 GPU nodes available – 400TBs of high performance EMC Isilon X series storage – FDR based Infiniband (IB) network and a 10GE network for the storage environment • Software related to research installed: – Physics, Medical Physics, Genomics, Chemistry… 50 51 Information Services, 00/00/2010 52 Information Services, 00/00/2010 Basic terminology • What is a node? • What is a CPU? • What is a core? • What is an Operating System – What is a kernel? • What is a process? – Single process OS and processes – Concurrent (Multi-tasking) OS and processes – Multiple cores (SMP) and Linux processes 53 Basic Terminology • What is a Node? – A single computer/blade which contains X number of CPUs and Y number of cores per CPU • What is a CPU? – The central processing unit (CPU) carries out all of the instructions in which a computer system requires to execute/perform a given task • What is a core? – A core is a processor within a CPU chip (there can be many cores on a given CPU) 54 Basic Terminology • What is a process? – A process is a program executing (ex: iTunes) • What is a Kernel? – The kernel is the glue between the hardware and the user. The Kernel schedules processes. – The kernel can be thought of as a crossing guard directing traffic for optimal performance 55 Basic Terminology • Processes and tasks – Single process OS and processes • Single processing OSs can run only one user process at a given time, a single task • All tasks run until completion before another task is started • MSDOS is an example of this type of single user execution OS. • Linux Processes and Cores – A one to one relationship is optimal for performance 56 Basic Terminology • Processes and tasks, cont – Concurrent (Multi-tasking) OS and processes • A concurrent OS provides users the ability to execute many programs simultaneously • Linux provides users the ability to execute: an editor, a music player, and other tasks simultaneously, thus allowing for multi-tasking – Multiple cores (SMP) and Linux processes • A process which can take advantage more than one core while running. These are typically called: multithreaded. 57 Short Review • If a node has four CPUs and two cores per CPU, how many total cores are there? • In Linux can we execute an editor and a program to search a genome at the same time? • How many processes should we execute on a node which has two CPUs with 8 cores each? 58 AGENDA Introduction to Linux How to request an HPC account How to Login to HPC Basic Linux commands Available resources How to submit a job to the cluster 59 What is HPC? • HPC = High Performance Computing – Infrastructure where hundreds or thousands of computers are networked together with shared common storage – Multiple users can login and use the infrastructure – More than 1 computer can be used to complete a computing task – Special tools/skills required to leverage HPC environment – Linux, LSF commands 60 Definitions HPC Term Definition Node A single computer available to perform computing tasks Rack A cabinet in which multiple nodes can be stacked vertically and/or horizontally, allowing for efficient housing, networking and power management Cluster A collection of computer “nodes” that are on the same network for inter-node communication, shared storage and to execute jobs CPU A CPU is the electronic circuitry (Microprocessor) within a computer that carries out the instructions of a computer program Core Independent programming unit within a CPU that can execute program instructions. A modern CPU can have multiple cores Head node In a cluster, one or a few nodes can be designated as a head node where users typically are able to login and create/monitor jobs 61 Definitions (…contd) HPC Term Definition Compute node Compute nodes in a cluster execute a job created by a head node. Users cannot login into a compute node Process A process is an instance of a computer program that is being executed. It contains the program code and its current activity Thread A thread is the smallest sequence of programmed instructions that can be managed independently by the scheduler of an OS Job A job is a linux command that is designated to be executed on a compute node rather than the head node Job array Identical jobs that have a different iterator variable Parallel job Jobs that break a complex computing task into smaller tasks, such that each task is executed on different nodes simultaneously Queue Designated “lanes” for submitting different types of jobs depending on priority, resources required or expected duration of execution 62 Definitions (…contd) HPC Term Definition Scheduler HPC software that allows for efficient utilization of cluster resources based on submitted job types Job Management HPC software that keeps track of jobs submitted Research computing One of the departments within Umass Medical School responsible for supporting the HPC infrastructure on campus. Not related to “IT” Cloud computing A variant of HPC infrastructure which is not limited to a particular organization, where computing resources are requested on demand Distributed computing Buzzword similar to High Performance Computing Parallel computing Buzzword similar to High Performance Computing 63 Why do you need HPC? • Needs assessment: – Use software that’s only available on linux • Install it yourself on your own linux PC? • RC already has it installed? – Automate data crunching tasks • Routine incoming data that needs to be crunched? • Workflow available within RC to handle it? – Simulations • Molecular dynamics simulations taking too much time? 64 HPC is not for these! • To run windows software with ponit-n-click interfaces • Working with office documents – spreadsheets, slides, etc • Video games, music or general video • Web browsing • Emails 65 Policies for HPC use • If you have a “need” to use HPC, RC group can help, but there are expectations: – Understanding of the constraints of our HPC implementation – CPUs, memory, local and shared storage, networking, etc – Good knowledge of your own tasks/jobs that you are going to run – expected run times, utilization of memory, disk space and network bandwidth – Fair share policies 66 Typical HPC environment The Cluster HEWLETT PACKARD HEWLETT PACKARD Slave node HEWLETT PACKARD Slave node HEWLETT PACKARD Slave node Slave node Connections: HEWLETT PACKARD HEWLETT PACKARD Slave node HEWLETT PACKARD HEWLETT PACKARD Slave node HEWLETT PACKARD HEWLETT PACKARD Slave node HEWLETT PACKARD Slave node HEWLETT PACKARD Slave node HEWLETT PACKARD Slave node HEWLETT PACKARD Slave node HEWLETT PACKARD Slave node HEWLETT PACKARD Slave node HEWLETT PACKARD Slave node HEWLETT PACKARD Slave node HEWLETT PACKARD Slave node Slave node HEWLETT PACKARD Slave node Slave node Cluster head PROLIANT SD 1850R CISCOSYSTEMS Power Supply 0 Power Supply 1 Catalyst 8500 SERIES Switch Processor Internal cluster traffic (ethernet 1 Gb/s) Slave node HEWLETT PACKARD Slave node HEWLETT PACKARD Slave node HEWLETT PACKARD Slave node HEWLETT PACKARD Slave node Storage unit NAS/SAN Storage 67 SD NAS storage (ethernet 1 Gb/s) Public network (ethernet 100 Mb/s) What is a computing “Job”? • A computing “job” is an instruction to the HPC system to execute a command or script – Simple linux commands that can be executed within miliseconds would probably not qualify to be submitted as a “job” – Any command that is expected to take up a big portion of CPU or memory for more than a few seconds on a node would qualify to be submitted as a “job”. Why? (Hint: multi-user environment) 68 How to submit a “job” • The basic syntax is: bsub <valid linux command> • bsub: LSF command for submitting a job • Lets say user wants to count number of lines in a FASTQ file. On a linux PC, the command is wc –l reads.fastq • To submit a job to do the work, do bsub wc –l reads.fastq 69 Specifying more “job” options • Jobs can be marked with options for better job tracking and resource management – Job should be submitted with parameters such as queue name, estimated runtime, job name, memory required, output and error files, etc. • These can be passed on in the bsub command bsub –q short –W 1:00 –R rusage[mem=2048] –J “Myjob” –o hpc.out –e hpc.err wc –l reads.fastq 70 Job submission “options” Option flag or Description name -q Name of queue to use. On our systems, possible values are “short” (<=4 hrs execution time), “long” and “interactive” -W Allocation of node time. Specify hours and minutes as HH:MM -J Job name. Eg “Myjob” -o Output file. Eg. “hpc.out” -e Error file. Eg. “hpc.err” -R Resources requested from assigned node. Eg: “-R rusage[mem=1024]”, “-R hosts[span=1]” -n Number of cores to use on assigned node. Eg. “-n 8” 71 Why use the correct queue? • • • • Match requirements to resources Jobs dispatch quicker Better for entire cluster Help GHPCC staff determine when new resources are needed 72 Demo Create a script “hello-job-array.sh” #!/bin/bash #BSUB -q short #BSUB -W 00:10 #BSUB -n 1 #BSUB -R "rusage[mem=1024]" #BSUB -J "myTask[1-80]” #BSUB -o logs/out.%J.%I echo "Hello Job $LSB_JOBID Task $LSB_JOBINDEX" To execute on shell, run: bsub < hello-job-array.sh 73 Learning to use HPC • Linux is a pre-requisite to using any HPC system – Plenty of linux tutorials on the internet – Attend our “Intro to linux” sessions when offered • Our website is a good resource for learning to use HPC, visit www.umassrc.org • Lots of examples provided 74 Disk usage best practices • Archive your data – Make backups of your data on mid-long term storage • Use local storage if possible – Local storage always faster than network • Don’t use farline for cluster processing 75 HPC Best practices • When submitting a large number of jobs please consider: – Single CPU jobs versus multi CPU Jobs – Correct amount of memory for your job – Job Arrays – Job dependencies 76 HPC Best practices cont. • The earlier your jobs are submitted the earlier your job will gain needed LSF resources. • Re-direct all LSF output to one directory for convenience • Add the following to your LSF / Job directives: (redirects stdout/stderr) #BSUB -o $HOME/LSF_jobs_output/LSF_job.%J.out #BSUB -o $HOME/LSF_jobs_output/LSF_job.%J.%I.out 77 HPC Best practices cont. LSF Queues and policies • Fair share attempts to equalize CPU (slot) resources for Labs and users at job submission. • The priority of a job is calculated in relation to other submitted jobs. The priority for jobs will change as jobs complete and job slots become available • All labs start with an equal weight • Each lab member shares in this weight when submitting jobs • Weights are measured from job submissions per user and per lab • Weights are based on CPU time used and a decay time 78 Working with bioinformatics data files: A demo • Log on to the Umass server using Putty on windows or Terminal on Mac • Request an interactive shell session on one of the compute nodes for this demo $ bsub –q interactive –W 4:00 –Is bash • Navigate to the training directory or copy the examples to your local directory $ cd /share/training/linux-bioinformatics $ cp /share/training/linux-bioinformatics/* ~ Working with bioinformatics data files: A demo (…contd) • We have a file with genomic sequence, called “sequence.fa”, and a file with NGS reads, “reads.fq”. Confirm them $ ls • We can examine a file using this Linux command $ file sequence.fa sequence.fa: ASCII text • Lets look at the attributes of the files in this directory $ ls -l Working with bioinformatics data files: A demo (…contd) • The “cat” command can be used to display the contents of one or more files to the screen $ cat sequence.fa • Maybe better to scroll through the file, as pages? $ less sequence.fa • Display just the first line of file (header) $ head -1 sequence.fa • Display the last 3 lines of the file $tail -3 sequence.fa Working with bioinformatics data files: A demo (…contd) • Determine number of lines in FASTQ file wc –l reads.fq • Count the number of reads in FASTQ file $ x=`wc -l reads.fq | cut -f 1 -d ' '` $ echo “$((x/4)) reads” • Search for pattern in the sequence file and count grep –c ACGTCA sequence.fa • Search for adapter and count reads containing it grep ^ACGTCA reads.fq | wc -l Innovagene Informatics. All rights reserved Working with bioinformatics data files: A demo (…contd) • Case-insensitive search and count grep –i ^ACGTCA reads.fq grep –i ^ACGTCA reads.fq | wc –l • Display all headers in sequence file $ grep ^> sequence.fa • Count number of bases in single-sequence FASTA file $ more +2 sequence.fa | wc -m Working with bioinformatics data files: A demo (…contd) • Now lets align the reads to the sequence file (chr19) module load bowtie2/2-2.1.0 module load samtools/1.2 • If you still have enough time remaining on this compute node (interactive sessions can be requested for up to 8 hours), run bowtie2 bowtie2-build index sequence.fa bowtie2 -p 1 -x sequence.fa reads.fq -S read.fq.sam • You can also submit this alignment as a compute job Working with bioinformatics data files: A demo (…contd) • Create a bowtie script with the following content #!/bin/bash module load bowtie2/2-2.1.0 module load samtools/1.2 bowtie2-build sequence.fa reference bowtie2 -p 8 -x reference reads.fq -S reads.fq.sam samtools view -b reads.fq.sam –o reads.fq.bam Working with bioinformatics data files: A demo (…contd) • Now submit this script as a compute job bsub -W 4:00 -q short -R "rusage[mem=4096]" -J "bowtie-job" -o ngs.out -e ngs.err ./bowtie-align.sh • Another way of writing the script is to include all of the command line options into the script itself (next slide) • Then submit the compute job as bsub < bowtie-align2.sh Working with bioinformatics data files: A demo (…contd) #!/bin/bash #BSUB -J "SeqAlignJob" #BSUB -R rusage[mem=4096] #BSUB -q short #BSUB -W 4:00 #BSUB -o ngs.out #BSUB -e ngs.err module load bowtie2/2-2.1.0; module load samtools/1.2 bowtie2-build sequence.fa reference bowtie2 -p 8 -x reference reads.fq -S reads.fq.sam samtools view -b reads.fq.sam -o reads.fq.bam