Introduction to Unix Christy Avery March 17, 2008 There have been quite a few changes with regards to research computing at UNC. The previous clusters, Baobab and sunny, were discontinued mid-year 2007. The current cluster, Emerald, has many (if not more) of the same features found on Baobab and sunny, but is quite a bit different with regards to data storage and management. The main difference is that YOUR data, files, logs etc. are now stored on a secured access scratch space. What does this mean for you? It means that files are periodically deleted (generally after 30 days of inactivity). Please be aware that every time you use Emerald you should get in the habit of transferring all of your files back to your personal AFS space (i.e. the H drive) or your PC. The good thing is that the scratch space is huge, so you don’t have to worry about running out of disk space when running simulations etc. A good place to start if you get stuck is help.unc.edu. I also recommend buying “Unix for Dummies” or a similar reference if you need to use UNIX frequently. To get started you need to setup your Emerald profile on SSH Secure Shell: Go to profiles Add profile Type “Emerald”, click “Add to Profiles” Go to “Edit Profiles” Select Emerald Host name: emerald.unc.edu User name: your ONYEN Terminal answerback: vt100 Tunneling: check “Tunnel X11 connections” Keyboard: check “Backspace sends Delete” and “Line wrap” *Note, SSH Secure shell is available at https://software.unc.edu/available.php#SSH How to add programs Emerald doesn’t automatically register you as a user of all computing programs available at UNC. Instead, you need to request access. This is done using the “ipm” unix command and only has to be done once. Also remember that unix is case sensitive and all commands are executed through the SSH secure shell software (see link above) Command If you want to query the available programs ipm query –all query the available sas programs ipm query sas add sas ipm add sas add stata ipm add stata see the programs you’ve subscribed to ipm query –current remove a (an old version of a) program ipm remove sas-912 1 How to get to your scratch space, add a folder, and navigate between folders Unix is command line driven. This means that you don’t use a mouse to navigate between subdirectories (i.e. folders). Your net-scratch (netscr) space should be set up if you requested an account through onyen.unc.edu. For example, mine is /netscr/christya If you want to see what (sub) directory you’re in How to change directories move up one directory Two directories Three directories and so on… move down one directory two directories move up two then down two Command pwd cd cd ../ cd ../../ cd ../../../ cd directory1 cd directory1/directory2 cd ../../directory1/directory2 Let’s say you want to add a folder named “temp” under your ONYEN directory. How to add a folder How to look at directory contents How to go to temp (i.e. change directory to temp) Command mkdir temp ls cd temp How to add a file to your temp NETSCR folder Copying programs from your H drive or pc to your scratch space can be done two ways. The easiest is through the SSH Secure file transfer interface available through SSH Secure Shell. 2 Next, type /netscr/Your ONYEN in the right-hand box and H: in the left Now, make a new file in your H drive entitled “test.txt”. Right click on the white space in your h drive folder, click new…text document. Open it and type your onyen. Save and close. Rename the file test.txt. Since Unix is command line driven you cannot use spaces in names. I also recommend keeping file names as short as possible. Go back to your SSH Secure file transfer window. You’ll probably need to click “refresh” on the H drive side (the refresh button has two green arrows going different horizontal directions). Now, click test.txt and drag it into the temp file. Go back to your SSH Secure Shell window. Navigate to the temp subdirectory. Command How to view your .txt file from the temp subdirectory more test.txt From the ONYEN subdirectory more temp/test.txt While you can use “more” to view your file, you need to use a text editor if you want to change the file. Text editors are a more advanced topic and won’t be covered today. I often find that it is easier to edit the file on your pc and simply copy over a new version. 3 How to run SAS on Emerald There are two ways to run SAS on emerald. (Remember that you have to use the ipm add command first.) The program and dataset should be copied over to the same netscr subdirectory (for example: netscr/YOURONYEN/sasProgs). If you have a short program (i.e. one that would only take 5 or so minutes to run) you may use the bsas command Command How to run short sas programs bsas yourprogramname.sas *Remember that you have to be in the same subdirectory as the program and dataset when you type the sas command. However, if you have a longer program you have to submit it to LSF (a software to allocate resources fairly to all running jobs). Trust me, you will receive a phone call or email from the IT people if you submit a long sas program using the short bsas command. Instead, you have to submit it to the blade server (more info at http://help.unc.edu/4372). Command How to run a longer sas program bsub -R blade sas my_prog.sas To run SAS on one of the high memory AIX UNIX nodes you need to specify the p5aix resource: bsub -Ip -q int -R p5aix sas -memsize 7G -sortsize 7136M my_prog.sas The above example allows SAS to access up to 7 gigabytes of memory on the compute node for your job. Whenever you set memsize , you should make sure to set sortsize at least 32 megabytes smaller. Note that bsub options come after the word "bsub" but before the word "sas" and that SAS options come after the word "sas". UNIX/Linux is unforgiving about writing over existing files. If you ask SAS to write a dataset or file that already exists, it will simply write over the old one. If you run SAS jobs without changing to a different directory, all files will be written in your working directory. To avoid cluttering up your working directory, create a subdirectory using the mkdir command, and run all SAS jobs from there instead. Then, all your SAS files will be nicely organized. Viewing a job's log or output files with an editor while it is running may cause the job to, at best, halt or, at worst, crash. Instead, view these files with the command more (to start from the beginning) or less (to start from the end). Similarly, if your job uses any other files, such as permanent datasets or raw data files, do not alter these while the job is running. LSF will send you an email when your SAS program is complete. Although you will see a .log and .list file while the program is running, as mentioned above you can safely view these ONLY after the program is finished. I typically transfer these files to my H drive and view them in SAS. 4 Checking the status of jobs on Emerald How to check the status of a job on Emerald How to cancel a job (using any program) on Emerald Command bhist OR bjobs bkill jobid Other considerations Both SAS and Stata require a command to point to the starting dataset (i.e. a libname or use statement, respectively). Remember that you first need to transfer your dataset over to the unix folder and assign the correct location. This is when the pwd command is handy. For example, my libname/use statement could read: libname christy “/netscr/christya/temp”; use “/netscr/christya/temp/temp.dta”, clear For additional help with UNIX: http://help.unc.edu/?id=5288 REMEMBER TO TRANSFER ALL FILES YOU WANT TO KEEP BACK TO YOUR H DRIVE OR PC. THEY WILL BE DELETED AFTER A MONTH OF INACTIVITY. 5