Slide - Oregon State University

advertisement
Graduate Student Survival Guide:
using cluster, gnuplot and LaTeX
Janardhan Rao Doppa
School of EECS, Oregon State University
doppa@eecs.oregonstate.edu
http://web.engr.oregonstate.edu/~doppa
1
EECS Cluster: what ?
• A computing resource to run your jobs
• Off-shore your computing
• Experiments or simulations for research
• Will be handy when you have to run large number of
experiments
• You don’t want to use your DELL (read as delicate)
laptop 
• Web
• http://engr.oregonstate.edu/computing/cluster/
•
2
EECS Cluster: how ?
• Connection: Connect to one of the “submit” Hosts
 Submit32 or submit64
• Availability: Check the availability of slots in each queue
 I386, em64t, amd64-low, eecs1
• Compile: Compile your code on the remote machine
• Script: Prepare the “submit script”
 command to run your program, which queue, where to
store the output or error
• Submit: Submit the job using “submit script”
• Monitor: Monitor the status
 auto- email or manually
check the status
3
EECS Cluster: how ?
• Connection: Connect to one of the “submit” Hosts
 ssh <user> @ {submit32, submit64}.eecs.oregonstate.edu
• Availability: Check the availability of slots in each queue
• qstat command : learn the usage “qstat - - help”
• “qstat –f –q <queue>” where <queue> = i386 or em64t or
amd64-low or eecs1
em64t@exec-em64t-01.hpc.engr.o BIP 2/2
1402020 0.50500 run09_26.s matthchr r
1402032 0.50500
closfc mathewm r
2.02 lx24-amd64
10/28/2010 20:05:08
10/28/2010 21:03:08
#occupied / # total
4
1
1
EECS Cluster: how ?
• Script: Prepare the “submit script”
#!/bin/csh
#Job name
#$ -N job_name
#Current Working Directory
#$ -cwd
# Resource request for the faster bees
#$ -soft -l mem_total=3.00G
# specify the hardware platform to run the job on.
# options are: amd64, em64t, i386, volumejob (use em64t if you don't care)
#$ -q i386
# Output/error file (merged)
#$ -o output_file.out#$ -j y
# Command sequence
./source_file
5
EECS Cluster: how ?
• Submit: Submit the job using “submit script”
 Change permissions of script: “chmod u+x script.csh”
 “qsub script.csh”
• Monitor: Monitor the status
 “qstat –u <user>”
 Cautions:
 You should have enough disk space (logs and outputs) and
main memory (RAM) to run the program
 Don’t monopolize the cluster – think of others also!
 Budgeted experimental design – based on the available
resources (slots), hard deadlines (time) etc.
6
gnuplot: what ?
• A command-line program to generate 2D and 3D plots
 better than Excel – no more frustrating clicks!
 specify style, fonts, legends as commands
 reuse the code for modifications or similar plots
 generates very good PS or EPS figures which are highly
compatible with LaTeX
 “gnu” is not the same as “GNU”!!
• Web
 http://www.gnuplot.info/
 Available for both linux and windows
7
gnuplot: how ?
• Data file: Create data file to be used for the plot
 Space separated column-wise data
• Code file: Create the gnuplot code file
 Specify the title of plot, axes names and ranges,
legends, thickness of lines, color etc.
 Specify the output format (PNG, PS or EPS), along with
the filename
• Run: run your code on the gnuplot command-line
 Copy and paste your code on the command-line and press ENTER
8
gnuplot: how ?
• Data file: Create data file to be used for the plot
 Space separated column-wise data
0.1
0.2
0.3
0.4
0.5
0.6
100
100
100
100
100
84
73.13
70.14
70.14
74.62
74.62
64.17
9
70.14
73.13
73.13
73.13
73.13
70.89
gnuplot: how ?
• Code file: Create the gnuplot code file

set terminal postscript eps enhanced "Helvetica" 18
set term postscript eps color
set key graph 0.75,0.9
set size 0.9, 0.9
set title "Bayes-EM vs Ripper on NFL data \n (Novelty missingness model)“
set ylabel "Accuracy (%)“
set xlabel "Percentage of missing values“
set xrange [0.1:0.6]
set yrange [50:100]
set output 'EM_comparison_novelty.eps‘
plot \
'EM_comparison_novelty.txt' using 1:$2 t'Bayes-EM' with linespoint lt 2 lw 1 pt 7,\
'EM_comparison_novelty.txt' using 1:$3 t'RIPPER-conservative' with linespoint lt 3 lw 3
'EM_comparison_novelty.txt' using 1:$4 t'RIPPER-aggressive' with linespoint lt 4 lw 3
10
gnuplot: how ?
• Run: run your code on the gnuplot command-line
 Copy and paste your code on the command-line and press ENTER
11
gnuplot: resources
• Short and quick reference guide
 http://sparky.rice.edu/gnuplot.html
• Web resources
 http://www.gnuplot.info/
 Demos, tutorials, sample codes and scripts
 Lot of useful sample plots are available at:
http://www.cse.iitb.ac.in/silmaril/br/lib/exe/fetch.php?id=students&cache=c
ache&media=students:gnuplot.tgz
 Thanks to Bhaskaran Raman and Kameshwari Chebrolu.
12
LaTeX: what ?
• A manuscript preparation system
 better than Word – no more equation editors!
 Math formulas and equations are easier to write
 Bibliography and cross-referencing is much easy
 Almost all conference and journal papers are written using LaTeX
 Default standard in academia – get used to it!
• Web
 http://en.wikibooks.org/wiki/LaTeX
 Windows editors: TeXnicCenter and WinEdit
 Linux editors: Lyx and Kyle
13
LaTeX: basic files
• LaTex code
 .tex – LaTeX input code file
 .sty – style file
• Bibliography
 .bib – bibliography file
 .bst – bibliography style file
• Output
 .dvi – device independent file
 .ps – postscript file
14
LaTeX: writing code file
• Start with an existing template
• Basic commands
 \section, \subsection, \subsubsection
 Text mode vs. Math mode ($ $)
 Math symbols: \alpha, \beta, \gamma
 \begin{environment} and \end{environment}
• \begin{itemize} and \end{itemize}
• \begin{equation} and \end{equation}
• \begin{figure} and \end{figure}
• \begin{table} and \end{table}
15
LaTeX: bibliography file
• A sample bibliography entry
@inproceedings{CRF-ICML:01,
author = {John Lafferty and Andrew McCallum and Fernando Pereira},
title = {Conditional Random Fields: Probabilistic Models for Segmenting and Labeling
Sequence Data},
booktitle = {ICML'01: Proceedings of the 18th International Conference on Machine
Learning},
year = {2001},
}
@article{TRITRAINING-TKDE:05,
author = {Zhi-Hua Zhou and Ming Li},
title = {Tri-Training: Exploiting Unlabeled Data Using Three Classifiers},
journal = {IEEE Transactions on Knowledge and Data Engineering},
volume = {17},
issue = {11},
year = {2005},
}
16
LaTeX: compiling
• LaTeX code with “latex” or “pdflatex”
• BibTeX code with “bibtex”
 Latex <code>
 Bibtex <bib>
 Latex <code>
 two pass algorithm!
• Collaborative writing
 Use CVS or SVN repository – much easier!
17
LaTeX: resources
• LaTeX cheat sheet
• http://www.ctan.org/texarchive/info/latexcheat/latexcheat/latexsheet.pdf
• LaTeX wiki book
• http://en.wikibooks.org/wiki/LaTeX/
• Learn tips and tricks
• From expert users
• From online forums
• Grow your bag of tricks – will save your time at deadlines!
18
LaTeX in PowerPoint
• TeXPoint – A LaTeX add-on for ppt and
word
 http://texpoint.necula.org/
 http://web.engr.oregonstate.edu/~mehtane/late
x/index.html
• TeXclip – LaTeX to image
 http://maru.bonyari.jp/texclip/texclip.php
• Beamer slides using LaTeX
 http://bitbucket.org/rivanvx/beamer/wiki/Home
19
MS students: Advice
• Hard to fund all the MS students
 bad economy, low grant money etc.
 Short time investment – faculty will chose their bets
carefully!
• Look for alternative funding sources
 BSG, Media Services, Library, Science laboratories,
e.g., chemistry, biology etc.
• Bottom line: Grad school is costly, but a very good
long term investment!!
20
MS students: Advice
• Immediate reward vs. long-term average reward
• Worst: you finish your graduate school with your money
• Concentrate on your education and develop skills
• Go for a summer internship – money and experience
• Specialize in something – good job market!
• You can pay your loans in less than 6 months!!
• Don'ts
• Finish classes quickly and graduate with ME – bad idea!
• worry about money while in school – won’t be productive
21
Questions ??
22
Download