Graduate Student Survival Guide: using cluster, gnuplot and LaTeX Janardhan Rao Doppa School of EECS, Oregon State University doppa@eecs.oregonstate.edu http://web.engr.oregonstate.edu/~doppa 1 EECS Cluster: what ? • A computing resource to run your jobs • Off-shore your computing • Experiments or simulations for research • Will be handy when you have to run large number of experiments • You don’t want to use your DELL (read as delicate) laptop • Web • http://engr.oregonstate.edu/computing/cluster/ • 2 EECS Cluster: how ? • Connection: Connect to one of the “submit” Hosts Submit32 or submit64 • Availability: Check the availability of slots in each queue I386, em64t, amd64-low, eecs1 • Compile: Compile your code on the remote machine • Script: Prepare the “submit script” command to run your program, which queue, where to store the output or error • Submit: Submit the job using “submit script” • Monitor: Monitor the status auto- email or manually check the status 3 EECS Cluster: how ? • Connection: Connect to one of the “submit” Hosts ssh <user> @ {submit32, submit64}.eecs.oregonstate.edu • Availability: Check the availability of slots in each queue • qstat command : learn the usage “qstat - - help” • “qstat –f –q <queue>” where <queue> = i386 or em64t or amd64-low or eecs1 em64t@exec-em64t-01.hpc.engr.o BIP 2/2 1402020 0.50500 run09_26.s matthchr r 1402032 0.50500 closfc mathewm r 2.02 lx24-amd64 10/28/2010 20:05:08 10/28/2010 21:03:08 #occupied / # total 4 1 1 EECS Cluster: how ? • Script: Prepare the “submit script” #!/bin/csh #Job name #$ -N job_name #Current Working Directory #$ -cwd # Resource request for the faster bees #$ -soft -l mem_total=3.00G # specify the hardware platform to run the job on. # options are: amd64, em64t, i386, volumejob (use em64t if you don't care) #$ -q i386 # Output/error file (merged) #$ -o output_file.out#$ -j y # Command sequence ./source_file 5 EECS Cluster: how ? • Submit: Submit the job using “submit script” Change permissions of script: “chmod u+x script.csh” “qsub script.csh” • Monitor: Monitor the status “qstat –u <user>” Cautions: You should have enough disk space (logs and outputs) and main memory (RAM) to run the program Don’t monopolize the cluster – think of others also! Budgeted experimental design – based on the available resources (slots), hard deadlines (time) etc. 6 gnuplot: what ? • A command-line program to generate 2D and 3D plots better than Excel – no more frustrating clicks! specify style, fonts, legends as commands reuse the code for modifications or similar plots generates very good PS or EPS figures which are highly compatible with LaTeX “gnu” is not the same as “GNU”!! • Web http://www.gnuplot.info/ Available for both linux and windows 7 gnuplot: how ? • Data file: Create data file to be used for the plot Space separated column-wise data • Code file: Create the gnuplot code file Specify the title of plot, axes names and ranges, legends, thickness of lines, color etc. Specify the output format (PNG, PS or EPS), along with the filename • Run: run your code on the gnuplot command-line Copy and paste your code on the command-line and press ENTER 8 gnuplot: how ? • Data file: Create data file to be used for the plot Space separated column-wise data 0.1 0.2 0.3 0.4 0.5 0.6 100 100 100 100 100 84 73.13 70.14 70.14 74.62 74.62 64.17 9 70.14 73.13 73.13 73.13 73.13 70.89 gnuplot: how ? • Code file: Create the gnuplot code file set terminal postscript eps enhanced "Helvetica" 18 set term postscript eps color set key graph 0.75,0.9 set size 0.9, 0.9 set title "Bayes-EM vs Ripper on NFL data \n (Novelty missingness model)“ set ylabel "Accuracy (%)“ set xlabel "Percentage of missing values“ set xrange [0.1:0.6] set yrange [50:100] set output 'EM_comparison_novelty.eps‘ plot \ 'EM_comparison_novelty.txt' using 1:$2 t'Bayes-EM' with linespoint lt 2 lw 1 pt 7,\ 'EM_comparison_novelty.txt' using 1:$3 t'RIPPER-conservative' with linespoint lt 3 lw 3 'EM_comparison_novelty.txt' using 1:$4 t'RIPPER-aggressive' with linespoint lt 4 lw 3 10 gnuplot: how ? • Run: run your code on the gnuplot command-line Copy and paste your code on the command-line and press ENTER 11 gnuplot: resources • Short and quick reference guide http://sparky.rice.edu/gnuplot.html • Web resources http://www.gnuplot.info/ Demos, tutorials, sample codes and scripts Lot of useful sample plots are available at: http://www.cse.iitb.ac.in/silmaril/br/lib/exe/fetch.php?id=students&cache=c ache&media=students:gnuplot.tgz Thanks to Bhaskaran Raman and Kameshwari Chebrolu. 12 LaTeX: what ? • A manuscript preparation system better than Word – no more equation editors! Math formulas and equations are easier to write Bibliography and cross-referencing is much easy Almost all conference and journal papers are written using LaTeX Default standard in academia – get used to it! • Web http://en.wikibooks.org/wiki/LaTeX Windows editors: TeXnicCenter and WinEdit Linux editors: Lyx and Kyle 13 LaTeX: basic files • LaTex code .tex – LaTeX input code file .sty – style file • Bibliography .bib – bibliography file .bst – bibliography style file • Output .dvi – device independent file .ps – postscript file 14 LaTeX: writing code file • Start with an existing template • Basic commands \section, \subsection, \subsubsection Text mode vs. Math mode ($ $) Math symbols: \alpha, \beta, \gamma \begin{environment} and \end{environment} • \begin{itemize} and \end{itemize} • \begin{equation} and \end{equation} • \begin{figure} and \end{figure} • \begin{table} and \end{table} 15 LaTeX: bibliography file • A sample bibliography entry @inproceedings{CRF-ICML:01, author = {John Lafferty and Andrew McCallum and Fernando Pereira}, title = {Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data}, booktitle = {ICML'01: Proceedings of the 18th International Conference on Machine Learning}, year = {2001}, } @article{TRITRAINING-TKDE:05, author = {Zhi-Hua Zhou and Ming Li}, title = {Tri-Training: Exploiting Unlabeled Data Using Three Classifiers}, journal = {IEEE Transactions on Knowledge and Data Engineering}, volume = {17}, issue = {11}, year = {2005}, } 16 LaTeX: compiling • LaTeX code with “latex” or “pdflatex” • BibTeX code with “bibtex” Latex <code> Bibtex <bib> Latex <code> two pass algorithm! • Collaborative writing Use CVS or SVN repository – much easier! 17 LaTeX: resources • LaTeX cheat sheet • http://www.ctan.org/texarchive/info/latexcheat/latexcheat/latexsheet.pdf • LaTeX wiki book • http://en.wikibooks.org/wiki/LaTeX/ • Learn tips and tricks • From expert users • From online forums • Grow your bag of tricks – will save your time at deadlines! 18 LaTeX in PowerPoint • TeXPoint – A LaTeX add-on for ppt and word http://texpoint.necula.org/ http://web.engr.oregonstate.edu/~mehtane/late x/index.html • TeXclip – LaTeX to image http://maru.bonyari.jp/texclip/texclip.php • Beamer slides using LaTeX http://bitbucket.org/rivanvx/beamer/wiki/Home 19 MS students: Advice • Hard to fund all the MS students bad economy, low grant money etc. Short time investment – faculty will chose their bets carefully! • Look for alternative funding sources BSG, Media Services, Library, Science laboratories, e.g., chemistry, biology etc. • Bottom line: Grad school is costly, but a very good long term investment!! 20 MS students: Advice • Immediate reward vs. long-term average reward • Worst: you finish your graduate school with your money • Concentrate on your education and develop skills • Go for a summer internship – money and experience • Specialize in something – good job market! • You can pay your loans in less than 6 months!! • Don'ts • Finish classes quickly and graduate with ME – bad idea! • worry about money while in school – won’t be productive 21 Questions ?? 22