Using Gaussian 03 at MCSR Brian W. Hopkins Mississippi Center for Supercomputing Research 19 February 2009 What We’re Doing Here • • • • • General information: license, &c. Input file structure The g03sub utility Job naming Reading the output file Gaussian 03 • “Starting from the basic laws of quantum mechanics, Gaussian predicts the energies, molecular structures, and vibrational frequencies of molecular systems, along with numerous molecular properties derived from these basic computation types. It can be used to study molecules and reactions under a wide range of conditions, including both stable species and compounds which are difficult or impossible to observe experimentally such as short-lived intermediates and transition structures.” – Gaussian.com Site License • Gaussian 03 is a commercial program. • UM and MCSR have a complete site license for Gaussian 03. – available for all UM/MCSR computers – source code and precompiled binaries – latest revisions when they are released – support • through Brian: assist@mcsr.olemiss.edu Parallel Versions & Licenses • G03 has a native OpenMP parallelization layer for SMP systems. • For distributed-memory systems, G03 relies on Linda for parallelism. • Linda is distributed under a separate and more restrictive license. – Ask about Linda-enabled G03 at assist@mcsr.olemiss.edu Installing Gaussian Locally • The site license means that you can build G03 on your lab computers if you want it. • Contact MCSR staff for media. – Source code – Binary versions • Be aware that processor, OS and compiler combinations are very restrictive. Using Gaussian at MCSR • G03 is installed on all MCSR systems: – Sweetgum (older SGI Origin 2000) – Mimosa (cluster of P4 Linux boxes)* – Redwood (SGI Altix 3700) – Sequoia (SGI Altix XE hybrid cluster) • Any MS researcher can get accounts and use Gaussian. • Accounts are set up to run G03 w/o modification when they are created. Doing Science with G03 at MCSR • • • • • • • Login Move to /ptmp/$USER Set up desired directory tree Build input files Submit jobs with g03sub Watch queue with qstat Monitor progress until job completes G03, UNIX, and You • All MCSR systems have some UNIXlike operating system. – IRIX on sweetgum – openSuSe on mimosa – SLES on redwood, sequoia • The interface to G03 happens in text files and the UNIX command line. • MCSR staff offer training courses for users new to UNIX environments. Disk Space on MCSR Systems • We try to set up the filesystem environment as uniformly as possible • All systems have home areas (~), permanent temporary areas (/ptmp) and some form of scratch space (/tmp, /work). • The precise structure depends on the machine. ~ and /ptmp • Disk quotas in home areas are small. – Files to customize the environment • .cshrc • .alias – Some scripts, &c. – Symbolic links to /ptmp • /ptmp areas are larger – Data storage – Program code & executables Building Input Files %nprocl=4 %mem=900mb %chk=methane.chk #P MP2(FC)/6-31++G** 5D OPT(tight) SCF(tight) 5d MP2/6-31++G** optimization of methane 0,1 C 0.0000000000 0.0000000000 -0.0000000318 H 0.0000000000 0.0000000000 1.0840336982 H 1.0220367932 0.0000000000 -0.3613445025 H -0.5110183966 -0.8851098265 -0.3613445025 H -0.5110183966 0.8851098265 -0.3613445025 Program Control Options %nprocl=4 The number of processors %mem=900mb The amount of memory needed %chk=methane.chk The name of the checkpoint file #P MP2(FC)/6-31++G** 5D OPT(tight) SCF(tight) 5d MP2/6-31++G** optimization of methane 0,1 C 0.0000000000 0.0000000000 -0.0000000318 H 0.0000000000 0.0000000000 1.0840336982 H 1.0220367932 0.0000000000 -0.3613445025 H -0.5110183966 -0.8851098265 -0.3613445025 H -0.5110183966 0.8851098265 -0.3613445025 Requesting Multiple Processors • Gaussian 03 as 2 different ways to request multiple processors – %nprocs=X is used to request X processors of a shared-memory machine (sweetgum, redwood, sequoia*) – %nprocl=X is used to request X nodes of a distributed memory machine (mimosa) • In principle they can be used together, but we don’t have any machines on which that would be appropriate. Memory Request • Can be specified in words or bytes: – KW, MW, GW – KB, MB, GB 1W = 8B • If unitless, the default unit is single words. • Different treatments for parallel jobs: – If %nprocs, give the total memory for all procs – If %nprocl, give the memory for each node. The Checkpoint File • Stores information needed to restart the calculation – Information the defines the current wavefunction and derivatives thereof – Most recent geometry • Especially useful in geometry optimizations – Restarting timed-out opts – Speeding opts with the opt-freq-opt cycle • If no path is given, the program will look for a file with this name in the working directory • If no such file exists: – If nothing on the command line requires a checkpoint file, job will start from scratch and create a new file – If something on the command line (geom=check; guess=read) needs the file, the job will crash. The Command Line %nprocl=4 %mem=900mb %chk=methane.chk #P MP2(FC)/6-31++G** 5D OPT(tight) SCF(tight) 5d MP2/6-31++G** optimization of methane 0,1 C 0.0000000000 0.0000000000 -0.0000000318 H 0.0000000000 0.0000000000 1.0840336982 H 1.0220367932 0.0000000000 -0.3613445025 H -0.5110183966 -0.8851098265 -0.3613445025 H -0.5110183966 0.8851098265 -0.3613445025 The command line General Features • • • • • Starts w/ # Next is 1-letter code to specify verbosity After that is long string to direct job Not case sensitive Job parameters can be in any order – Best to choose a structure and stick with it The Comment Line %nprocl=4 %mem=900mb %chk=methane.chk #P MP2(FC)/6-31++G** 5D OPT(tight) SCF(tight) 5d MP2/6-31++G** optimization of methane 0,1 C 0.0000000000 0.0000000000 -0.0000000318 H 0.0000000000 0.0000000000 1.0840336982 H 1.0220367932 0.0000000000 -0.3613445025 H -0.5110183966 -0.8851098265 -0.3613445025 H -0.5110183966 0.8851098265 -0.3613445025 1 blank line The comment line 1 blank line The Comment Line • Must be present • Can’t be blank • Will be echoed into output, and so can serve as a sort of “label” for an output file • Usually more trouble than it’s worth Charge, Multiplicity %nprocl=4 %mem=900mb %chk=methane.chk #P MP2(FC)/6-31++G** 5D OPT(tight) SCF(tight) 5d MP2/6-31++G** optimization of methane 0,1 C 0.0000000000 0.0000000000 -0.0000000318 H 0.0000000000 0.0000000000 1.0840336982 H 1.0220367932 0.0000000000 -0.3613445025 H -0.5110183966 -0.8851098265 -0.3613445025 H -0.5110183966 0.8851098265 -0.3613445025 Charge, muliplicity Charge and Muliplicity • Charges w/o sign will be considered positive • Program will automatically try to reconcile structure, charge, multiplicity, and reference (if specified) – Failure will kill the job: The combination of multiplicity 2 and 16 electrons is impossible. Error termination via Lnk1e in /usr/local/apps/g03/l301.exe. • Especially for open-shell systems, be sure to check state after the job. The Molecule %nprocl=4 %mem=900mb %chk=methane.chk #P MP2(FC)/6-31++G** 5D OPT(tight) SCF(tight) 5d MP2/6-31++G** optimization of methane 0,1 C 0.0000000000 0.0000000000 -0.0000000318 H 0.0000000000 0.0000000000 1.0840336982 H 1.0220367932 0.0000000000 -0.3613445025 H -0.5110183966 -0.8851098265 -0.3613445025 H -0.5110183966 0.8851098265 -0.3613445025 Molecular structure Molecular Structure • Can be given in various coordinate systems: – Cartesian – Internal (Z-Matrix) – Redundant Internal (Z-Matrix + additional coords) • Generally no requirement to orient structure in any particular way, or to define internals wrt symmetry elements Symmetry Elements? • Consider two structures: O H 1 oh1 H 1 oh1 2 hoh1 O H 1 oh1 H 1 oh2 2 hoh1 oh1 = 0.95 hoh1 = 109.5 oh1 = 0.95 oh2 = 0.95 hoh1 = 109.5 Recognized as de facto C2v symmetry • Both will be recognized by G03 as C2v; optimizations will be automatically constrained accordingly. Blank Line At The End • Required if molecule given as Cartesians • The only way for G03 to know that the molecule is done. • One of the most common problems for new users of G03. • Marked by this error message: End of file in ZSymb. Error termination via Lnk1e in /usr/local/apps/g03/l101.exe Specialty Topics For Another Time • • • • • Custom basis sets Redundant coordinates Constrained opts & PES scans CP corrections Symmetry controls Submitting Jobs With g03sub • Because MCSR machines are shared by large numbers of users, we use a batch system to control the use of processors, memory, &c. • The batch system is called PBS; users run jobs by writing special scripts that are then submitted to a queue. • Due to the large volume of G03 jobs, we have created a script that takes a simple command line syntax and automatically generates and submits a PBS job script. The Basic G03sub Syntax g03sub -n procs -m mem –t time <-d disk> file.ext Where: •procs is the number of nodes or processors requested •mem is the amount of memory requested •time is the amount of time requested •disk (optional) is the amount of disk needed •file.ext is the name of the job input file Scratch Directories • Certain G03 jobs will require scratch space to read and write data during the job. • This data is not kept after job completion. • Performance of the scratch disk system can be limiting factor in total job performance. • All systems have default settings for scratch disk that can be overridden as needed. • Invoking the -d flag followed by an estimate of the disk space needed by the job will help our systems direct jobs to the best scratch systems. Consistency Checks • G03sub automatically checks to make sure some features of your job are consistent and system appropriate: – %nprocl vs. %nprocs – total # of processors (limit 4) – -n vs. %nproc – -m vs. %mem • Jobs that fail the checks will return intuitive error messages .out and .OUT • A G03 job with input file file.ext is going to create output files file.out and file.log • To protect preexisting data, g03sub scans the working directory and moves files to safety: – File.out file.OUT – File.log file.LOG • WARNING: There is no third file name; further job submissions will overwrite data. Script Creation and Submission • Once the consistency checks have been passed, g03sub automatically creates a PBS script and submits it. • Accepted submissions will report job numbers: 7244.mimosa.mcsr.olemiss.edu • Some submissions may be rejected by PBS itself, usually because the user lacks the proper queue access: qsub: Job rejected by all possible destinations Checking on a Job • Once a job is submitted, it can be checked on using qstat. – qstat -u user mimosa(no_blank_line)% qstat -u r1130 mimosa.mcsr.olemiss.edu: Req'd Req'd Elap Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time --------------- -------- -------- ---------- ------ --- --- ------ ----- - ----7166.mimosa.mcs r1130 MCSR-2N DX101pcbZO 9291 2 2 400mb 12:00 R 00:42 7167.mimosa.mcs r1130 MCSR-2N DX002pcbZO -- 2 2 400mb 12:00 Q -- – qstat -f jobnumber • returns all info for a job The Job Naming Scheme • Jobs are assigned a jobname by PBS that’s different from the anme of the input file. • G03sub assigns job names by combining a two-character descriptive code and the name of the input file (truncated as needed) 7175.mimosa E132a11-15 r1221 0 Q MCSR-2N A Quick Guide to the Naming Convention • The first character indicates the broad theoretical method you’re using: – – – – – – – H = Hartree-Fock D = Density Functional Theory P = Perturbation Theory I = Configuration Interaction Theory C = Coupled-Cluster Theory E = Empirical Theories X = Method not recognized by g03sub • The second character indicates the order of the derivative taken – – – – 0 = energy only 1 = gradient (opt, &c.) 2 = 2nd derivative (freq, &.) X = syntax not recognized by g03sub Reading the Output File • Once a job is finished, it will produce an output file with the same root name as the input file and a .out extension • This file contains all the information produced by the job: – – – – – Orbital energies and occupations Reference and correlated energies Optimized geometries Vibrational frequencies Thermochemical data • Be careful that you’re extracting the right information from the file! Some Important Output Features: Energy Points • The reference energy: SCF Done: E(RHF) = -76.0307499791 A.U. after 10 cycles Convg = 0.8565D-08 -V/T = 2.0020 S**2 = 0.0000 • Correlated energies: \1\GINC-NODE4-1\SP\RCCSD-FC\6-31++G(d,p)\H2O1\BWHOPKIN\19-Feb2009\0\\#CCSD/6-31++G** 5D scf(tight)\\wah da tah!\\0,1\O\H,1,0.95\H,1,0.95,2,109.5\\Version=IA32LG03RevE.01\State=1A1\HF=-76.03075\MP2=-76.2303342\MP3=76.2352666\MP4D=-76.2387725\MP4DQ=-76.2371357\MP4SDQ=76.2384984\CCSD=-76.2384892\RMSD=8.565e-09\Thermal=0.\PG=C02V [C2(O1),SGV(H2)]\\@ Some Important Output Features: Optimizations • Convergence Information: Item Value Threshold Converged? Maximum Force 0.000013 0.000015 YES RMS Force 0.000012 0.000010 NO Maximum Displacement 0.000029 0.000060 YES RMS Displacement 0.000027 0.000040 YES • Final Geometry: Final structure in terms of initial Z-matrix: O H,1,oh1 H,1,oh2,2,hoh1 Variables: oh1=0.94306581 oh2=0.94306581 hoh1=107.088157 Some Important Output Features: Frequency Calcualtions • Imaginary modes: Full mass-weighted force constant matrix: Low frequencies --- -304.6041 -22.3249 -16.3689 -16.3688 Low frequencies --- 0.0008 996.8348 996.8348 ****** 1 imaginary frequencies (negative Signs) ****** Diagonal vibrational polarizability: 0.4660701 0.4660701 0.2475471 • Thermochemical Data: E (Thermal) CV S KCal/Mol Cal/Mol-Kelvin Cal/Mol-Kelvin Total 16.291 5.995 44.947 Electronic 0.000 0.000 0.000 Translational 0.889 2.981 34.608 Rotational 0.889 2.981 10.334 Vibrational 14.513 0.033 0.004 Q Log10(Q) Ln(Q) Total Bot 0.280164D-02 -2.552587 -5.877549 Total V=0 0.121603D+09 8.084945 18.616274 Vib (Bot) 0.230447D-10 -10.637429 -24.493586 Vib (V=0) 0.100024D+01 0.000103 0.000237 Electronic 0.100000D+01 0.000000 0.000000 Translational 0.300432D+07 6.477746 14.915562 Rotational 0.404665D+02 1.607096 3.700475 0.0002 0.0004