Introduction to Linux Alan Orth April 17, 2010 ILRI, Nairobi What is Linux? … An Operating System - (just like Windows and Mac!) - Created in the 1990s by Linus Torvalds - Microsoft DOS was too limiting - UNIX was expensive and restrictive - Linux was born What is Linux? Examples of Linux Operating Systems, called “distributions”: - Ubuntu (obvious?) Debian Fedora Redhat CentOS SuSE Big list at: http://distrowatch.com Why Linux? - Linux is “free” - Free (money) - Free (freedom... “open source”) - Peer reviewed - … makes Linux a good match for science Why Linux for Bioinformatics? - Bioinformatics == the application of information technology and computer science to the field of molecular biology - Data sets are getting bigger, we need more processing power power! - … computers with that kind of power use Linux - extremely efficient and stable - excellent in text processing Get Your Feet Wet Most research institutions and universities have Linux servers. Use an SSH client like “putty” to connect to our Linux server from Windows: Server: hpc.ilri.cgiar.org Username: user1 Password: user1 Getting Familiar Linux has a graphical environment like Windows, but the real power lies in its command line mode. In Linux, you type commands in the “shell.” After you have entered a command you press Enter to run the command. Familiarize yourself with your environment: whoami – print the name of the current user id – print information about the current user who – print a list of other users who are logged in date – print the current date and time on the server cal – print a calendar for the current month echo – print a text string to the screen Getting Familiar Linux commands come in various forms. Some are simple, and can be used by themselves: whoami cal Other times you can add “arguments” to change the behavior of the command. Arguments are separated by one or more spaces: cal 4 2009 Other other commands require arguments (they don’t make sense to run by themselves) Navigating the File System Files and folders are organized in a hierarchical fashion. The top of the hierarchy is called the “root.” Here is the standard directory structure in Linux: bin / etc home james alan work pics - the “root” directory is often represented by “/” - “directory” is a fancy word for “folder” Navigating the File System Before we can start solving world hunger, we have to learn how to move around the file system comfortably. Analyze your directory structure using some of the following commands: pwd – print the current “working” directory ls – list the contents of the current directory cd – change to another directory mkdir – create a new directory Navigating the File System Create some directories and get the hang of moving around them: mkdir one mkdir two mkdir two/three cd one What if you want to move to two now? There is no two in the current directory (verify with ls). Our directory structure looks like this: user1 one two You are here three Navigating the File System If we want to move to the directory two we have to first move back up in the directory hierarchy. Once we move back to user1 we will be able to move into two. cd .. cd two In Linux “..” means “parent directory,” and you see once we move to the parent, we're able to then move to two. Other special directories include “.” (the current directory), and “~” (your home directory). Working With Files & Folders Commands used for managing files and folders: cp – copy a file mv – move a file (this is how you rename) rm – delete a file file – print the type of file more – read a text file less – read a text file (less is more, but better!) head – print the beginning of a file cat – print a file to the screen Working With Files & Folders Reference for some basic commands which use or require additional arguments: ls ls ls mv cp rm rm rm -lh (“long” list of files) -la (“long” list of hidden files) -lh file (“long” list of file) file file1 (rename file to file1) file filecopy (copy file to filecopy) file (delete file) -i file (delete file, but ask first) -r folder (delete folder) Working With Files & Folders Copy files from Windows → Linux? Use WinSCP! SCP is the “secure copy” protocol which uses the same username and password you use with Putty. You can also download files from the Internet using the following commands: wget – “web get” utility for HTTP and FTP ftp – “file transfer protocol” links – simple, text-based web browser Your First File Use the text editor nano to create a new file named “hello”: cd ~ nano hello Type a simple message and then save the file by writing it to the disk: ^O (Control-O) In the world of Linux, the “^” character in key combinations signifies pressing the Control key. Exit the text editor by pressing ^X (Control-X) Working With Files & Folders Make a copy of your new text file: cp hello hello2 cat hello more hello Press “q” when you're done to quit more. Do you see how the two are different? cat hello hello2 cat simply prints a file to the screen, while more is used to interactively view a text file one page at a time. Programs like more are called “pagers.” I/O Redirection By default, command line programs print to “stdout” (standard out). I/O redirection manipulates the input/output of Linux programs, allowing you to capture it or send it somewhere else. Make a copy of hello (without using cp): cat hello > hello3 cat hello3 The “>” character performs a “redirect,” taking the output of the cat command and putting it into the file hello3. I/O Redirection Now try using echo: echo “My name is Alan” > hello3 cat hello3 What happened to hello3?... It was overwritten! The “>” operator creates a new file to store the output, but if the file already exists it will be overwritten! Use “>>” to append to a file: echo “Appended” >> hello3 I/O Redirection Another useful technique is to redirect one program's output into another program's input; this is done using a “pipe.” For example: when a command produces a lot of output, and you want to read the output one page at a time: who | more This is an important technique and will come in handy when you begin using Linux for text processing. Text Processing Basics See how many times a certain user is logged in. grep prints lines which match a given string: who | grep “aorth” | wc -l wc counts words, but can also count lines if you pass it the “-l” argument. You can also do the same thing, using grep's counting argument: who | grep -c “aorth” Count the number of sequences in a fasta file: grep -c “>” Tutorial.fna More Text Processing sed, the stream editor, can do powerful things with text files. One common example is a search and replace: echo Hello echo Hello | sed 's/Hello/Goodbye/' Delete blank lines from a file using sed: cat myfile | sed “/^$/d” > mynewfile tr can also be used to translate text: echo “HELLO” | tr 'A-Z' 'a-z' More Text Processing But sed is the king of text substitution; we can use something called “regular expressions” to match complex text patterns and act on them. In this example, nucleotides were to be replaced with integers, A = 1, C = 2, G = 3, T = 4: sed -e 's/\bA\b/1' autosomes.txt | less Here we search for an “A” bordered by a word barrier on both sides (standing alone), and replace it with a “1”. Now add a substitution for the “C”, etc. Eventually you would want to redirect your output to a new file instead of less: sed -e 's/\bA\b/1' -e 's/\bC\b/2' autosomes.txt > autosomes_int.txt Shell Scripts A shell script is a text file with a list of commands inside. Shell scripts are good for automating tasks you use often, or running batch jobs. Enter the following in a new file, script.sh: echo “Date and time is:” date echo “Your current directory is:” pwd Run the script like this: sh script.sh More Shell Scripts A more advanced shell script utilizing a loop: for num in 1 2 3 do echo “We are on $num…” done Where to Get Help You can always read the manual! To see the “man page” for the ls command: man ls WWWeb resources: LinuxQuestions.org UbuntuForums.org Me: a.orth@cgiar.org