Document

advertisement
NEEShub Simulation Capabilities
February 17, 2012 Webinar
http://nees.org/resources/4079
George E Brown, JR. Network for Earthquake Engineering
Simulation
Gregory Rodgers Ph.D.
NEESComm IT
Purdue University, West Lafayette, IN
Post-webinar updates
Webinar Introduction
Audience
Simulation tool developers and
NEES power users who:
•have a very large simulations or many simulations in excess of 30 minutes run time.
•need to script parameter sweeps
•run a structure analysis with a large suite of ground motions.
Prerequisite
An understanding of command line interfaces such as Linux bash
Summary
•This webinar will introduce advanced users to new NEEShub capabilities in the area of
simulation and batch processing. Power users often write a script to orchestrate a set of
simulation runs to cover many different test cases. Recent batch processing services
have been added to NEEShub to make this easy and to provide access to large scratch
space. Upon completion of this webinar, a user will be able to write scripts to submit one
or more jobs to multiple execution venues to utilize high performance computing
resources available to NEES
Agenda
HOUR 1
•
•
•
•
•
•
•
•
•
How Simulation fits into the NEES CyberInfrastructure
Introduction to the linux workspace tool on NEEShub
Manual execution (command line) of applications.
Manual execution (command line) of the opensees simulator
Use of the new batchsubmit command to run opensees
Use of batchsubmit to run other applications
The batchstatus command
Demonstration of how HOME directory space is linked to scratch space.
Advanced batchsubmit options and scripting the execution of batchsubmit.
HOUR 2 (advanced)
•
•
•
•
•
•
•
•
How to build a bash command file including editors available on Linux
Simple parallel execution (The --ncpus argument to batchsubmit)
Parallel opensees (how to modify sequential input to be parallel input)
How to use batchsubmit for other venues.
Overview of various NEES execution High Performance Computing (HPC) venues:
They are local hub execution, osg, hansen, steele, kraken, and ranger,
How does the openseeslab user interface use batchsubmit
Advanced batchsubmit options review
Scratch cleanup algorithm
NEES Cyber Infrastructure
A. Site Operations
Tools
site/personal data
Experiment Data
Hub Tool
Sessions
NE
C. Cloud / Simulation
Environment
Scratch Space
B. NEESHub
Web Server
E. EOT
F. Spreadsheet DBs
Project Editor
Group Space
Resources
Personal Space
Collaboration
Web
Browser
NEES
Web
Services
Server
NSF Xsede
Open Science Grid
Purdue Hansen
D. The NEES
Project
Warehouse
Introduction to the linux workspace tool on NEEShub.
• Start a workspace from this page
http://nees.org/resources/workspace
Click “Launch”
• You must be part of a special group. If you are not in this group, open
a ticket stating that you need workspace access, provide
justification, and we will add you to the group.
• The window can be resized and popped out of the browser.
• Multiple terminals can be opened in the same window.
• A workspace session is persistent. You can leave
the browser and you can get back to existing
workspace from myneeshub page at any time.
• This session can also be shared with other users or
administrators.
Execution of Applications
from the command line
• Simple utilities
date
Print the date
env
List environment variables
ls (and ls –l)
Show list of files (long list)
cd
Change working directory
pwd
Show working directory
mkdir
make a directory
rm (rmdir) Remove a file (directory)
cat
Write contents of file on the screen
cp
Copy a file
man <command>
Show help about a command (man pages)
exit
Teminate your session
Use Arrow keys get previous commands
Putting commands in a script file
• A list of commands can be put into a script
– Avoid retyping
– Loop through commands
– To make executable use command:
chmod 755 <filename>
• 3 important scripting languages to consider
bash
Tcl/Tk
Python
linux commands, also csh
The language for opensees
Advanced high performance language
Manual Execution of OpenSees
• Opensees tcl prompt verses Linux command prompt
• Start opensees with a tcl prompt (no argument)
• Start opensees to execute a file of tcl commands. ( one
argument)
• The binary OpenSees verses the wrapper shell called
opensees
opensees <input TCL file>
The spelling of the OpenSees binary is OpenSees, but
opensees is a wrapper to call OpenSees that sets up the
environment correctly.
High Volume Batch on NEEShub
• Consistent and asynchronous submission to multiple
venues: local, osg, steele, hansen, kraken. The last
three are part of the new xsede system that replaces
Teragrid.
• Asynchronous: job is submitted without waiting for
job to complete before returning control to
submitter.
• Your run directories in $HOME/scratch will be
symbolic links to a large (>30TB) shared space. Runs
will be compressed or purged with a cleanup
algorithm as needed.
• Only user will have access to run directories
batchsubmit
• The batchsubmit command is a wrapper around any command to
execute an asynchronous batch job.
batchsubmit <batchsubmitoptions> command <command options>
• batchsubmitoptions begin with a double dash.
• batchsubmit prints one line of output: the name of the newly created
directory where BOTH job input is located and output will be found.
• The help for batchsubmit gives an example of how to run opensees
batchsubmit –h
batchsubmit –h | more
batchsubmit date
batchsubmit opensees /apps/demo/sine/sine.tcl
batchsubmit –appdir /apps/openseesbuild/osg OpenSees /apps/demo/sine/sine.tcl
batchsubmit –jo
btype sine –onlyinfile opensees /apps/demo/sine/sine.tcl
Input Processing
• Default: The first argument after the application command is considered an
input file. All files from this directory are copied to the scratch run
directory. Two other options:
--onlyinfile Only copy the input file
--rcopyindir Recursive copy all files and directories from the same
directory as the inputfile.
• Note: input file not allowed to be home directory unless –onlyinfile
specified.  You should create a directory for your opensees tcl file.
Recommend a dir for each simulation.
• What if you have an application where the first argument is NOT the input
file (unlike opensees)?
--infilearg Indicates which argument is the input file
--infile Use this file as an input file where this file is implied by application
command hence not one of its arguments.
batchsubmit files/dirs
• Job input exists in new scratch directory upon completion of the
batchsubmit command. One scratch directory for each batchsubmit
command (each job). The directory name has this template.
$HOME/scratch/<jobtype>/<jobname>/
• Job output exists when the job is completed.
–You will get an email when job starts and when job completes unless you specify –
nonotify
• Review the various output files generated in a job run directory
<jobname>.stdout
<jobname>.stderr
The run directory
@STATUS
joblog
.log
.born_on_date
Standard output. What would be printed to screen
Standard error.
Same directory name where the input file was found.
Note: your input file is in this directory.
Interesting info about the environment job was run
Statistics recorded about this job
Used for scratch cleanup.
Job lifecycle
• System uses the file @STATUS to store the job status.
• States:
Presubmit – only for remote venues
Submitted – Waiting to start. Only for remote venues.
Started – application is running. Remote venues will actually update this file
Completed – All results are returned.
Deleted – Job has been removed from the shared scratch space but your
scratch directory still shows it.
Saved – Job was moved to your HOME directory. Symbolic link to shared
scratch space is gone. Job is taking up your quota when it is saved.
batchstatus and batchcancel
• Other batchsubmit utilities
batchstatus – shows the status of each of your
jobs.
batchcancel – Cancel a job. This command is not
released yet.
batchsave – Remove a job from scratch space
and save it to your HOME directory space.
NEEShub Disk Space
HOME space
Groups Space
Scratch space
Warehouse
/home/neeshub/<youruserid>
/data/groups/<groupname>
$HOME/scratch
/nees/home/<PROJ-DIR>
• Use of synchronees to upload and
download between your workstation
and NEEShub spaces
• Advice: Use relative names for input
and output files so your job can run on
venues other than “local”
webdav
• NEEShub data locations:
HOME Space
Synchronees
workstation data
Group Space
NE
batchsubmit
Scratch Space
Advance batchsubmit options
--wait
Only for venues local and osg. This option will hold the completion of batchsubmit
until the job is COMPLETED. Standard output and standard error will be printed on
the screen.
--appdir A Directory containing the application with bin and lib subdirs.
The app_command must be in appdir/bin subdirectory.
Both bin and lib directories are sent to execution machine for
every run. So be careful not to specify a large installed
application directory. This option eliminates need to install
apps on venues other than local. See your application provider.
--envars List of environment variables separated by commas.
Only specify names here, values must be set before calling
the batchsubmit command thus allowing special characters.
For local execution, all environment variables are commuted.
HOUR 2 Agenda
• Simple parallel execution (The --ncpus argument to batchsubmit)
• Parallel opensees (how to modify sequential input to be parallel
input)
• How to use batchsubmit for other venues.
• Overview of various NEES execution High Performance Computing
(HPC) venues:
Local
Use for testing small jobs less than 4 hours
osg
Use for many moderate size jobs.
hansen
Use for large parallel jobs
Steele
Use for many parallel jobs
kraken and ranger (pending)
ncpus<16
ncpus=1
ncpus<=48
ncpus<=8
• Advanced batchsubmit options and scripting the execution of
batchsubmit.
• Building bash scripts to save typing.
• Scratch cleanup algorithm
Simple Parallel Execution
--ncpus <value>
•The above options will cause your application command to
execute <value> times in parallel.
Example:
batchsubmit –ncpus 4 date
•What good is it to run the same thing ncpus times? None,
unless your application is aware that it is running in parallel.
•A parallel aware application will only do 1 Nth the amount of
work, knowing that the other processors will do the other parts of
the work.
•It is not hard to make your application become parallel aware
especially with a scripting language like TCL.
Simple Parallel Execution
Example: Run the same model through 27 ground motions.
We want to divide the ground motions among 8 processors, PID = 0..7
P0
P1
P2
P3
P4
P5
P6
P7
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
/* If PID is processor number, then this can be run on all 8 processors */
For count = 0 to 26
if (count % 8) == PID then /* % gets remainder from division */
Do analysis for ground motion #count.
else
skip
end
set pid [getPID]
set numP [getNP]
set count 0;
source ReadRecord.tcl
set g 384.4
foreach scaleFactor {0.25 0.5 0.75 1.0} {
foreach gMotion [glob -nocomplain -directory GM *.AT2] {
if {[expr $count % $numP] == $pid} {
source model.tcl
source analysis.tcl
set ok [doGravity]
loadConst -time 0.0
if {$ok == 0} {
set gMotionName [string range $gMotion 0 end-4 ]
ReadRecord ./$gMotionName.AT2 ./$gMotionName$scaleFactor.dat dT nPts
timeSeries Path 1 -filePath $gMotionName$scaleFactor.dat -dt $dT -factor [expr $g*$scaleFactor]
if {$nPts != 0} {
recorder EnvelopeNode -file $gMotionName$scaleFactor.out -node 3 4 -dof 1 2 3 disp
doDynamic $dT $nPts
file delete $gMotionName$scaleFactor.dat
if {$ok == 0} {
puts "$gMotionName with factor: $scaleFactor OK"
} else {
puts "$gMotionName with factor: $scaleFActor FAILED"
}
} else {
puts "$gMotion - NO RECORD"
}
}
wipe
}
incr count 1;
}
}
Yellow highlighted code is possible in OpenSeesMP
You can remove the yellow and run in OpenSees
But it will take much longer.
The value of numP will be the –ncpus value provided
to batchsubmit
How to use batchsubmit for other
execution venues
--venue hansen | steele | osg
Future values will include kraken and ranger.
Note:
The batchsubmit options --nn and --ppn are not yet functional.
In the future, this will allow extremely large values of --ncpus.
-- ncpus will be the product of --nn and --ppn.
--mpiargs This option specifies additional arguments to mpirun.
Wrap these arguments in single quotes.
Typically no additional arguments mpi agruments are needed.
Sample parallel jobs
To save typing, I created the following scripts
/apps/demo/bin/ex1
/apps/demo/bin/ex2
The above will just print the batchsubmit examles but not run them.
The following scripts will print and run the commands
/apps/demo/bin/ex1.sh
/apps/demo/bin/ex2.sh
Lets take time to study these examples.
Venue guidelines
Venue
--------local
osg
hansen
steele
Guidance
-------------------------------------Use for testing small jobs less than 4 hours
Use for many moderate size jobs.
Use for large parallel jobs
Use for many parallel jobs
--ncpus
--------------ncpus<16
--ncpus=1
--ncpus<=48
--ncpus<=8
• Future venues to include kraken and ranger.
• Xsede (formerly teragrid) venues are steele, kraken, and
ranger. Xsede and hansen use PBS for job submission.
PBS jobs submission is automated by batchsubmit.
• This batchsubmit option can change the pbs queue
--xdqueue
The default queue for steele is "standby".
The default queue for hansen is "nees".
Advanced batchsubmit options
--jnpref
Job name prefix, default is "job". Blanks not allowed
Environment variable JNPREF will also override this.
Try "export JNPREF="run_" before batchsubmit.
--jobname Specify jobname and override autoincrement generated jobname.
Recommend not to use this to avoid jobname collisions.
However, if a collision occurs with an existing scratch dir,
batchsubmit will create a new directory.
--xdqueue Queue for xsede machines (steele or hansen)
The default queue is "standby".
The default queue for hansen is "nees".
Building bash scripts
• Commands can be stored in a file and these
files can be executed
• File can be “ sourced” or executed.
• Recommend you store your personal scripts in
$HOME/bin
• Text Editors available on NEEShub
gedit
nano
vi
Scratch Cleanup Algorithm
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
IF used < 75% THEN EXIT report no activity required
Delete all jobs > 1yr old , log action
FOR ACTION = compress, archive (phase1 , phase2)
| FOR T=6m, 5m,4m,3m,2m,4w,3w (pass 1, pass 2, …)
|
| FOR X= 5,10,20,40,ALL
|
|
| Calculate set of top X users of scratch space
|
|
| FOR SIZE=128G,32G,8G,2G,512M,128M,32M
|
|
|
| FOREACH rundirectory
|
|
|
|
| IF rundirectory size>SIZE AND
|
|
|
|
| rundirectory is owned by X AND
|
|
|
|
| rundirectory lifetime >T
|
|
|
|
| THEN ACTION rundirectory, log action
|
|
|
|
|
|
| IF used < 50% THEN
| EXIT report SIZE,X,T,A thresholds
IF used > 50% THEN report policy failure and revise policy
Values in red are policy parameters that can be revised by management as needed
GC Algorithm Lemmas
•
•
No jobs < 3 weeks old will ever be deleted or compressed without a policy change.
Very small jobs < 32MB compressed will be never be deleted by the system.
– Worst case: 500,000 32MB jobs would consume 50% of a 32TB scratch.
•
•
•
Process largest to smallest jobs for a fixed set of users and older than a specific age
(inner loop)
Process sets of large users with jobs older than a specific age (middle loop)
Outer loop
– Pass 1 Process jobs > 6months
– Pass 2 Process jobs > 5 months …
•
No jobs are deleted until all jobs >3 weeks old and > 32MB have been compressed.
Compression is phase 1, deletion is phase 2.
•
Example report stream:
–
–
–
–
–
–
–
Day1 :
Day2 :
Day3 :
Day4 :
Day5 :
Day X-1 :
DayX :
No activity, 74% used
>2GB,Top 10 users, >3 months old, compressed 50% used
>32MB, All users, > 3 weeks old, compressed 50% used (closest call to deletion)
> 32GB, Top 5 users, >2 months old, deleted 45% used
No activity, 65% used
>2GB, All users, > 3 weeks old, deleted 50% used (close to policy failure)
>32M, All users, > 3 weeks old deleted, 60% used , POLICY FAILURE
 Policy parameters need adjustment
Topics Not covered in this webinar
• Use of batchsubmit to build User Interface
• Use of pegasus for workflow management
– This is in development and test. A single pegaus job can
submit many jobs that have inter-job dependencies.
• Creation of appdir for portable applications. Only
functional appdir today is /app/openseesbuild/osg
• Modification of OpenSees source to create personal
copy of OpenSees with custom materials and
models.
– Process in development with Prof. Elwood’s graduate
student.
Download