pbqff

This page contains the complete documentation for pbqff as a single HTML page. For a quick reference see the man directory in the main repository, where you can find a manual page in both Unix man and PDF formats.

Table of Contents

Introduction

pbqff is a package (really a collection of packages) for computing anharmonic vibrational and rotational spectroscopic constants for small molecules. It does this through the use of quartic force fields (QFFs) to approximate the molecular potential energy function and second-order vibrational and rotational perturbation theory, which utilize the derivative information from the function approximation to compute the actual spectroscopic constants.

Because this is a complex task, pbqff is broken into several reusable components, only the final piece of which is included in this repository. Other major components include:

pbqff itself is a relatively thin wrapper around these packages that primarily handles tasks like generating the grid of displaced geometries to map out the QFF in various coordinate systems. This fact isn't really important directly, but it may help to explain some of the potential error messages you encounter, if they reference source code in psqs, for example.

Program Input

The primary input for pbqff is a TOML file. TOML as a format is widely used in the Rust community and is pretty easy for humans to write, although you will end up having to type a lot of quotes compared to something like YAML. To start, here's an example input file for a MOPAC QFF on a PBS queue:

Example Input File

geometry = """
O
H 1 OH
H 1 OH 2 HOH

OH = 1.0
HOH = 109.5
"""
optimize = true
charge = 0
step_size = 0.005
sleep_int = 2
job_limit = 2048
chunk_size = 1
coord_type = "norm"
template = "scfcrt=1.D-21 aux(precision=14 comp xp xs xw) PM6 THREADS=1"
queue_template = """
#!/bin/sh
#PBS -N {{.basename}}
#PBS -S /bin/bash
#PBS -j oe
#PBS -o pts/{{.basename}}.out
#PBS -W umask=022
#PBS -l walltime=100:00:00
#PBS -l cput=100:00:00
#PBS -l ncpus=1
#PBS -l mem=16gb
#PBS -q workq

module load openpbs

export WORKDIR=$PBS_O_WORKDIR
export TMPDIR=/tmp/$USER/$PBS_JOBID
export MOPAC_CMD=/ddnlus/r2518/Packages/mopac/build/mopac
export LD_LIBRARY_PATH=/ddnlus/r2518/Packages/mopac/build
cd $WORKDIR
mkdir -p $TMPDIR
trap 'rm -rf $TMPDIR' EXIT
"""
program = "mopac"
queue = "slurm"
findiff = true
check_int = 0

In addition to the options specified here, there are a few other optional values like hybrid_template for specifying a separate quantum chemistry program template for the cubic and quartic force constants, weights for specifying atomic weights, dummy_atoms for specifying a number of trailing dummy atoms, and the very specialized norm_resume_hff for resuming a normal coordinate QFF from the finished Cartesian harmonic force field (HFF). A summary of all the possible input options is given in the table below, followed by a more detailed description of each option.

Summary of Input Options

Option Type Description
geometry String The initial molecular geometry as XYZ or Z-matrix
optimize bool Whether or not to optimize the geometry before running the QFF
charge isize The molecular charge
step_size f64 The displacement size for the QFF in Å
sleep_int usize Interval in seconds to sleep between job polling iterations
job_limit usize Maximum number of jobs to submit to the queue at once
chunk_size usize Number of jobs to combine into a single queue submission
coord_type CoordType Displacement type for the QFF. Supported values are sic, cart, and norm
template TemplateSrc Quantum chemistry program template. Can either be a literal String or a map with the key file
hybrid_template Option<TemplateSrc> Separate quantum chemistry program template for cubic and quartic force constants
queue_template Option<TemplateSrc> Queuing system template
program Program A quantum chemistry program name. Supported values are dftb+, mopac, molpro, and cfour
queue Queue A queuing system. Supported values are pbs, slurm, and local
findiff Option<bool> Whether to use finite differences for derivative evaluations. Not all coordinate types support this
check_int usize Interval in polling iterations to write checkpoints. A value of 0 disables checkpoints
weights Option<Vec<f64>> An optional sequence of atomic masses for use in normal coordinate QFFs and spectroscopic calculations
dummy_atoms Option<usize> An optional number of trailing dummy atoms
norm_resume_hff Option<bool> Resume a normal coordinate QFF from a previously-finished HFF checkpoint

Geometry Specification and Optimization

The geometry can be specified as either an XYZ geometry with an optional number of atoms and comment line or as a Z-matrix. Currently, pbqff cannot automatically convert between Z-matrix and XYZ geometries, so optimize = true is required if the geometry is provided as a Z-matrix. This allows pbqff to extract the Cartesian geometry from the quantum chemistry program's output file and use that for the geometrical displacements. For completeness, here is an example XYZ geometry for cyclopropenylidene:

5
c3h2 geometry
C          0.0000000000        0.0000000000       -0.8887390447
C          0.0000000000       -0.6627588746        0.3682596138
C          0.0000000000        0.6627588746        0.3682596138
H          0.0000000000       -1.5952720351        0.9069548903
H          0.0000000000        1.5952720351        0.9069548903

Again, the first two lines are optional, but if either one is present then both must be present. And an example Z-matrix geometry for water:

O
H 1 OH
H 1 OH 2 HOH

OH = 1.0
HOH = 109.5

Aside from the caveats above about Z-matrix input, the optimize keyword tells pbqff whether or not to optimize the geometry before building the rest of the QFF. Because the quality of the Taylor series approximation underlying the QFF is tied to the assumption that the input geometry is a minimum on the potential energy surface (PES), any geometries provided with optimize = false should definitely have been optimized prior to the pbqff run and at the same level of theory as the QFF.

Charge

charge is possibly the most straightforward input option as it is simply the molecular charge as a whole number. As we'll see in later sections, this won't necessarily be used if you omit a {{.charge}} entry in your program template, but the option must be provided in the pbqff input file even in those cases.

Step Size

step_size sets the displacement size for the QFF in units of Ångstroms. 0.005 is a reasonable default, with larger values like 0.010 or even 0.020 helping occasionally with floppy systems.

Sleep Interval

pbqff uses a polling model for running jobs in parallel. On each iteration of a big loop that runs until all jobs have finished, pbqff checks each output file for the currently-running jobs to see if any have finished. For very short single-point energy calculations (like PM6 or HF/STO-3G) this makes a lot of sense because many jobs are likely to finish constantly and each iteration will make some progress. On the other hand, for long-running individual jobs (like CCSD(T)-F12/cc-pCVTZ-F12 on a large molecule), pbqff will waste a lot of CPU time constantly checking the same output files that may not have changed meaningfully since the last polling interval. Thus, sleep_int is an interval in seconds for pbqff to sleep between such polling attempts. Higher values should cause pbqff to spend less CPU time reading the same files over and over, but extremely high values could potentially lead to periods of no jobs running because submitting more jobs as old ones finish is also part of the polling loop. In practice, I usually use a minimum of 60 seconds for this and go up to ~600 seconds for what I consider really long jobs. The options in the next section also have some effect on what to choose here.

Job Limit and Chunk Size

These two options, job_limit and chunk_size are in the same section because they are very closely related. Early versions of pbqff submitted each single-point energy calculation as a separate queue submission limited by the job_limit setting. However, this was inefficient enough to attract the attention of the HPC administrators who recommended combining multiple program invocations into a single submission script. This led to the addition of the chunk_size option, which sets the number of invocations per submission script (chunk). Hopefully this is already clear, but below is an illustration of a chunk of MOPAC jobs for a local queue, where "job submission" is handled by a very simple bash script. This would correspond to a chunk_size of 5.

/opt/mopac/mopac job.0000000.mop
/opt/mopac/mopac job.0000001.mop
/opt/mopac/mopac job.0000002.mop
/opt/mopac/mopac job.0000003.mop
/opt/mopac/mopac job.0000004.mop

Because of this legacy, job_limit still corresponds to individual calculations, despite the fact that they are now grouped into chunks. As a result, if you want to restrict the number of jobs as reported by your queue manager you should consider the ratio job_limit / chunk_size rather than job_limit alone. For example, to target a maximum number of queued jobs of 500 with a chunk size of 25, you could set a job_limit of 500 * 25 = 12_500. Note the use of the underscore as a digit separator. This is not required in TOML but can be handy for large numbers.

The main consideration when choosing a chunk_size is the amount of time spent submitting jobs versus actually running jobs. According to our system administrators, the cutoff (likely pretty conservative) is around 5 minutes of job time. This can lead to pretty crazy chunk sizes, especially for PM6 calculations on small molecules, where single-point energy computations can take on the order of 0.01 seconds (5 minutes = 300 seconds | 1 job / 0.01 seconds => 30_000 jobs per chunk!). However, more expensive calculations can easily take five minutes or more each, allowing you to use a chunk size of 1. Individual jobs give pbqff the greatest freedom to queue new jobs as old ones finish, but, again, it's possible to spend more time writing and submitting separate jobs (and taxing the queue system) than actually running calculations.

Of course, the preceding paragraphs all assume that you're running pbqff on a large molecule with many single-point calculations and that you're on some kind of distributed compute cluster. If, on the other hand, you're running pbqff on a personal computer or a single node on a cluster using the local queue option, you will only be able to run a very small number of chunks at once anyway (roughly one per CPU or thread you allowed pbqff to use). In this case, there's no real incentive to shrink your chunk_size to allow more flexible submission. As long as your chunk size is small enough to send one chunk to each core you requested, all of your compute will be saturated already. For example, you wouldn't want to request a chunk size of 250 for a 1000-job QFF and request 8 cores because only 4 cores would get chunks to work on, but there's no real reason to set a chunk size of 25 either. It will just lead to a lot more shell scripts being written. The logic is similar for a small molecule like a diatomic where the total number of points may be smaller than a reasonable chunk size for other calculations.

Rules of Thumb

This table contains my general heuristics for job_limit and chunk_size requests on our compute cluster, where the hard limit on queued jobs is around 1000, but most of the time I can only get ~50 jobs to run at once anyway.

Level of Theory Time Per Job (s) Job Limit Chunk Size
PM6 (5 atoms) 0.01 1_500_000 30_000
F12-DZ (5 atoms) 1 15_000 300
TZ-cCR (8 atoms) 300 50 1

Coordinate Types

pbqff originally grew out of two separate projects: go-cart, which was my first implementation of finite-difference Cartesian-coordinate QFFs, and pbqff proper, which automated our group's traditional least-squares fitted symmetry-internal coordinate (SIC) QFFs. These two (SICs and Cartesian coordinates) are still supported, along with the newest but already most-used normal-coordinate QFFs. Fully documenting the construction of a symmetry-internal coordinate system is beyond the scope of this manual (at least for now but likely forever) because it involves a lot of practice and manual construction. Thus, the two recommended coordinate systems for pbqff are Cartesian coordinates, requested by coord_type = "cart" and normal coordinates, requested by coord_type = "norm".

Cartesian Coordinates

Cartesian coordinates are the most straightforward coordinate system. The initial geometry is either already in Cartesian (XYZ) form or can be converted from a Z-matrix by the quantum chemistry program. Then, each of these coordinates is displaced in all of the two-, three-, and four-coordinate combinations needed to map out the QFF. It is theoretically possible to approximate the PES with a least-squares fit in Cartesian coordinates, but this is not currently implemented. I tried this in a previous version of pbqff, and it led to significant bottlenecks in the least-squares fit due to the much larger number of Cartesian points compared to SICs without any improvement (and occasional worsening) of the results. In other words, Cartesian coordinates imply findiff = true.

Normal Coordinates

Normal coordinates are generally a clear improvement over Cartesian coordinates. First, the VPT2 implementation in spectro only requires the semi-diagonal normal coordinate force constants (Φiijj and Φiiii), but these require the fully-off-diagonal Cartesian coordinate force constants like Fijkl to compute. Similarly, and even more straightforwardly, normal coordinates discard the rotational and translational degrees of freedom present in Cartesian coordinates, lowering the number of coordinates from 3N to 3N − 6. Combined, these factors allow for far fewer single-point energies to be evaluated in normal coordinates.

Further, normal coordinates share the convenience of Cartesian coordinates: they can be derived automatically from the input geometry. This may not be clear from this manual because of the limited treatment of SICs, but this is the primary contrast between SICs and normal coordinates because SICs also require far fewer points than Cartesians, putting them on more equal footing with normal coordinates in terms of computational cost (but still worse for even medium-sized molecules) but certainly not in convenience.

Of course, the similarity to Cartesian coordinates is no accident. Normal coordinates can only be derived automatically by computing a Cartesian harmonic force field (HFF) to obtain the Hessian matrix, from which the mass-weighted eigenvalues can be extracted to yield the normal coordinates. This separation of the HFF and QFF leads to two additional options for normal coordinate QFFs: hybrid_template and norm_resume_hff. These are both discussed more fully in later sections, but in brief, hybrid_template allows you to use the plain template as the template for the HFF points and the hybrid_template for the cubic and quartic points; and norm_resume_hff allows you to resume a normal coordinate QFF from a checkpoint after the initial HFF.

Normal coordinates are also the only coordinate type for which the findiff option actually has any effect. SICs are assumed to be fitted (findiff = false) and Cartesians are assumed to be finite-difference. However, because of the way normal coordinates are constructed (via the initial HFF), fitted normal coordinates are not refit to zero the approximated function gradient as was typical with SICs. Again, this is likely not to make much sense unless you're familiar with the SIC procedure itself, but in short, this means normal coordinates don't really capture the benefits SICs got from performing this fitting, and finite differences are more economical, so stick with finite differences (findiff = true) unless you really know what you're doing.

Finally, normal coordinates are not without their limitations. Their main shortcoming relates to the mass-weighting involved in their definition, which causes the atomic masses to be baked into the resulting force constants in a way that is not easy to reverse. Practically, this means that computing isotopic spectroscopic data with normal coordinate QFFs requires recomputing the QFF for each combination of atomic masses. Second, because they are based on the Cartesian HFF, any numerical issues in the initial HFF can be dramatically compounded in the QFF. This mostly arises with highly symmetrical molecules (symmetric and spherical tops) with many resonances, but it can also pop up for other tricky molecules. It should be pretty obvious when this happens (anharmonic frequencies of 10,000 or more), but it's something to keep in the back of your mind.

Symmetry-Internal Coordinates

As mentioned above, fully documenting the SIC code could double the size of this document. All I'll say here is that running an SIC QFF in pbqff requires the top of an INTDER input file (the simple- and symmetry-internal coordinate definitions) in a file named intder.in in the directory where you invoke pbqff. If there's a resurgence of interest in SICs, I'd like to handle this kind of input in a more compact way and without requiring a separate file. Feel free to send me an email or open a GitHub issue if you are interested in this or try using it and encounter any problems.

Quantum Chemistry Program Types

As mentioned in the introduction, pbqff offloads the interface to both chemistry programs and queuing systems to the psqs library. As listed in the psqs README, it currently supports writing input files and reading output files for the following programs:

Where these are listed roughly in order of support quality. MOPAC and Molpro have been tested the most thoroughly, while CFOUR and DFTB+ have been tested but to a much more limited extent. Each of these can be requested in the pbqff input file using the program keyword with valid options being the lowercase versions of each program name as a TOML string: "mopac", "molpro", "cfour", "dftb+".

Queue Types

Like the program option above, the queue option depends on the values supported by psqs, which are the following:

PBS and Slurm are popular queuing systems on HPC clusters, while the local queue is a way to run pbqff on a single computer. Again, these are listed roughly in order of support. Hundreds of QFFs have been run with pbqff on a PBS cluster, while a much smaller number have been run on a smaller Slurm cluster. The local queue is used in all of the pbqff unit tests, so by that metric it is arguably the best-tested, but it essentially has no error handling, making it suited really only for these tests and other short test calculations rather than "production" QFFs.

These can be selected by their lowercase TOML string forms with the queue keyword: "pbs", "slurm", and "local".

Quantum Chemistry Program Templates

The program template is the main way for you to customize how pbqff runs the single-point energies that compose the QFF. The template syntax is inspired by Go's template library, where patterns like {{.geom}} are replaced by the appropriate information (somewhat obviously the geometry in this example). Currently each supported program has its own set of supported options, but most of them are the same at least.

Most users of pbqff specify both these template values and the queue_template discussed in the next section as a literal TOML string, but you can also specify an external file using the syntax below.

template = { file = "/path/to/template" }

All of the examples in the following sections will work for either plain template input or for the special hybrid_template option. As mentioned in the section on normal coordinates, hybrid_template allows you to run the HFF portion of a normal coordinate QFF with one level of theory and the cubic and quartic portions of the QFF at a second level of theory. As far as I know, this has only been tested to work with Molpro, so see that section below for an example input file.

MOPAC

MOPAC has the simplest template format because MOPAC uses a single line at the start of the file to specify all options, followed by two comment lines and the geometry. As such, pbqff does not handle any template parameter replacements for MOPAC (the curly brace stuff from the paragraphs above). Options like the charge and whether or not to run an optimization are simply appended to the first line of the template, and the geometry for each single-point energy calculation is appended to the file after two arbitrary comment lines. As such, the template value in a MOPAC QFF should be a single-line TOML string like those in the examples below.

template = "scfcrt=1.D-21 aux(precision=14 comp xp xs xw) PM6 THREADS=1"
template = """gnorm=0.0 scfcrt=1.D-21 aux(precision=14 comp xp xs xw) \
PM6 SINGLET THREADS=1 external=params.dat"""

As shown in the second example, you can also use the multi-line TOML string syntax (triple quotes) to allow very long templates to be broken across multiple lines with a backslash. Unfortunately this is not supported by single-line TOML strings with single quotes.

MOPAC defaults to running a geometry optimization, so the input option 1SCF is added to the input line to disable this for the single-point energy computations. As far as I know it's safe to include any geometry-optimization-specific options (like gnorm above) in these single-point calculations, so pbqff doesn't have to do any further processing on the template.

Molpro

For Molpro, pbqff supports two template parameters: {{.geom}} and {{.charge}}, although it's often easier to supply the charge literally in your template. The charge option is quite straightforward as the input charge option is simply substituted directly for {{.charge}} in the template string. The geometry handling, on the other hand, is slightly more complicated. Notice that there is no closing curly brace in the templates below. The reason for this is simply programming convenience, at a slight risk of confusion for the user. For Z-matrix geometries, Molpro expects the main Z-matrix to be inside the curly braces and the variable definitions to follow the closing curly brace. In contrast, XYZ geometries should be fully enclosed in curly braces. Omitting the closing curly brace makes it easier for pbqff to handle this distinction.

Here's a somewhat complicated template example that captures the flexibility of the template input for a TZ-cCR calculation:

template = """memory,8,g
gthresh,energy=1.d-12,zero=1.d-22,oneint=1.d-22,twoint=1.d-22;
gthresh,optgrad=1.d-8,optstep=1.d-8;
nocompress;

geometry={
{{.geom}}
basis={
default,cc-pCVTZ-f12
}
set,charge={{.charge}}
set,spin=0
hf,accuracy=16,energy=1.0d-10
{CCSD(T)-F12,thrden=1.0d-12,thrvar=1.0d-10,nocheck;core}
{optg,grms=1.d-8,srms=1.d-8}
etz=energy

basis=cc-pvtz-dk
hf,accuracy=16,energy=1.0d-10
{CCSD(T),thrden=1.0d-12,thrvar=1.0d-10,nocheck;}
edk=energy

basis=cc-pvtz-dk
dkroll=1
hf,accuracy=16,energy=1.0d-10
{CCSD(T),thrden=1.0d-12,thrvar=1.0d-10,nocheck;}
edkr=energy

pbqff=etz(2)+edkr-edk
show[1,f20.12],pbqff"""

In addition to the note about the missing curly brace, also note that pbqff will only read the output energy if it is named pbqff and displayed in the output using the show directive. For simple calculations, this can be achieved using an expression like pbqff=energy, followed by thte show line in the example above. However, for F12 calculations in particular, it's important to remember that the F12 energy in Molpro is a vector, with the F12a and F12b energies in it. The example above uses the F12b energy (etz(2)). This might look unusual if you're a programmer used to indices starting at zero, but Fortran indices actually start at 1, meaning etz(1) is the F12a energy and etz(2) is the F12b energy.

Another important feature of Molpro template handling is how it adds or subtracts geometry optimization keywords depending on the type of calculation requested. There are four possibilities here:

In the first and last cases, pbqff doesn't have to make any changes to your template. However, in the second and third cases, pbqff needs to bring the request for an initial geometry optimization into alignment with the provided template. It does this by either adding a default optg directive ({optg,grms=1.d-8,srms=1.d-8}) in the second case, or by deleting a line matching the regular expression (?i)^.*optg(,|\s*$). If you're not familiar with the intricacies of regular expressions, this looks for a line starting with some optional junk (such as a curly brace), followed by the literal word optg, followed by either a comma or some number of spaces and a newline. While writing this, it doesn't seem quite right, but it has worked for a couple of years in practice to identify typical optg lines like {optg,grms=1.d-8,srms=1.d-8} and skip lines like gthresh,optgrad=1.d-8,optstep=1.d-8;, which also includes optg but not in the right way.

In summary, if you're picky about how the optimization should be run, be sure to include a single-line optg directive of your own choosing, and pbqff will remove it as needed for single-point energy calculations. Otherwise you can omit an optg line entirely and pbqff will include a default one as needed too.

As one final example, here's a combination template and hybrid_template input file section for a TZ-cCR+DZ calculation:

template = """memory,1,g
gthresh,energy=1.d-12,zero=1.d-22,oneint=1.d-22,twoint=1.d-22;
gthresh,optgrad=1.d-8,optstep=1.d-8;
nocompress;

geometry={
{{.geom}}
basis={
default,cc-pCVTZ-f12
}
set,charge={{.charge}}
set,spin=0
hf,accuracy=16,energy=1.0d-10
{CCSD(T)-F12,thrden=1.0d-12,thrvar=1.0d-10,nocheck;core}
{optg,grms=1.d-8,srms=1.d-8}
etz=energy

basis=cc-pvtz-dk
hf,accuracy=16,energy=1.0d-10
{CCSD(T),thrden=1.0d-12,thrvar=1.0d-10,nocheck;}
edk=energy

basis=cc-pvtz-dk
dkroll=1
hf,accuracy=16,energy=1.0d-10
{CCSD(T),thrden=1.0d-12,thrvar=1.0d-10,nocheck;}
edkr=energy

pbqff=etz(2)+edkr-edk
show[1,f20.12],pbqff"""

hybrid_template = """memory,1,g
gthresh,energy=1.d-12,zero=1.d-22,oneint=1.d-22,twoint=1.d-22;
gthresh,optgrad=1.d-8,optstep=1.d-8;
nocompress;

geometry={
{{.geom}}
basis={
default,cc-pVDZ-f12
}
set,charge={{.charge}}
set,spin=0
hf,accuracy=16,energy=1.0d-10
{CCSD(T)-F12,thrden=1.0d-12,thrvar=1.0d-10}
{optg,grms=1.d-8,srms=1.d-8}

pbqff=energy(2)
show[1,f20.12],pbqff
"""

If renamed just to template, the hybrid_template would serve as an F12-DZ template itself.

CFOUR

Like with Molpro, pbqff also supports the {{.geom}} and {{.charge}} keywords for CFOUR. Additionally, pbqff recognizes a {{.keywords}} template parameter that it needs to know where to insert things like COORD=CARTESIAN for the single-point energy calculations. One, admittedly strange, difference with the CFOUR {{.charge}} parameter is that it also adds the prefix CHARGE= as expected by CFOUR. With these considerations in mind, an example CFOUR template for a CCSD/cc-pVTZ QFF is shown below.

comment line
{{.geom}}

*CFOUR(CALC=CCSD,BASIS=PVTZ,MEMORY_SIZE=8,MEM_UNIT=GB,REF=RHF,MULT=1
{{.charge}}
{{.keywords}})

Unlike Molpro, there is no special handling of geometry optimization. If you request a geometry optimization, pbqff assumes that you have marked all of the variables in the Z-matrix to optimize with * as expected by CFOUR and will extract the optimized Cartesian geometry from the MOLDEN output file in the same directory as the main CFOUR output.

DFTB+

Again, like Molpro and CFOUR, DFTB+ supports the {{.geom}} and {{.charge}} template parameters. Also like Molpro, DFTB+ requires a rather elaborate geometry optimization input section, so if you don't provide a section matching the regular expression (?i)Driver = GeometryOptimization (basically just that literal text but ignoring case), pbqff will add the default value shown below.

Driver = GeometryOptimization {
	Optimizer = Rational {}
	MovedAtoms = 1:-1
	MaxSteps = 100
	OutputPrefix = "geom.out"
	Convergence {
		Energy   = 1e-8
		GradElem = 1e-7
		GradNorm = 1e-7
		DispElem = 1e-7
		DispNorm = 1e-7
	}
}

And here is a full DFTB+ example template with a custom geometry optimization block that is probably a more sensible default.

template = """Geometry = xyzFormat {
{{.geom}}
}

Hamiltonian = DFTB {
  Scc = Yes
  SlaterKosterFiles = Type2FileNames {
    Prefix = "/home/brent/chem/dftb/matsci/"
    Separator = "-"
    Suffix = ".skf"
  }
  MaxAngularMomentum {
    C = "p"
    H = "s"
  }
  Charge = {{.charge}}
  PolynomialRepulsive = SetForAll { Yes }
}

Options {
}

Analysis {
  CalculateForces = Yes
}

ParserOptions {
  ParserVersion = 12
}

Driver = GeometryOptimization {
  Optimizer = LBFGS { }
  MovedAtoms = 1:-1
  MaxSteps = 500
  OutputPrefix = "geom.out"
  Convergence {
    Energy = 1e-8
    GradElem = 1e-8
    GradNorm = 1e-8
    DispElem = 1e-7
    DispNorm = 1e-7
  }
}
"""

Queue Templates

Just like the plain template keyword allows you to customize the chemistry program input, the queue_template keyword allows you to tell pbqff how to submit your jobs to the three kinds of queues it knows about: PBS, Slurm, and what it calls a "local" queue.

One thing to keep in mind with the queue templates is how the chemistry program is invoked for each input job. The primary means of customizing this is through the PROGRAM_CMD variables. The table below shows the command that pbqff writes to a submission script for each job in a chunk. Don't worry about {filename} here, that's a Rust format string parameter, not a template parameter like we saw in the previous sections. pbqff will automatically replace this with the proper filename.

Program Template
MOPAC $MOPAC_CMD {filename}.mop
Molpro $MOLPRO_CMD {filename}.inp
CFOUR (cd {filename} && $CFOUR_CMD)
DFTB+ (cd {filename} && $DFTB_CMD > out)

Note that CFOUR and DFTB+ expect a single filename for each input file (ZMAT and dftb_in.hsd, respectively), so their "filenames" actually correspond to directories. Again, that is effectively an implementation detail of pbqff, not something you should have to worry about.

In addition to using these command variables, all of the queue implementations have access to the template parameters {{.basename}} and {{.filename}}. These both refer to the filename of the submission script itself (usually something like main0.pbs or main0.slurm). basename just strips off any directory information, which makes it a bit neater to use as the job name, as shown in the examples below.

The structure of each section below is the same. First, tables containing the default CMD values for each of the supported programs on that queue are shown, followed by example queue_template values for each program.

PBS

Program Default Command
MOPAC /ddnlus/r2518/Packages/mopac/build/mopac
Molpro "molpro -t $NCPUS --no-xml-output"
CFOUR "/ddnlus/r2518/bin/c4ext_new.sh $NCPUS"
DFTB+ /ddnlus/r2518/.conda/envs/dftb/bin/dftb+

MOPAC and DFTB+ have simple commands (just paths to executables here that don't require quotes, but Molpro and CFOUR require additional arguments separated by spaces.

MOPAC

This example is repeated from the initial input file example. One feature to point out here that will be shared with later templates is the use of the trap shell command to run any commands that need to be run after the single-point energy calculations. pbqff just appends each CMD line to the template, so there's not any opportunity to append commands after that. trap allows you to associate an action with a signal received by the shell process, in this case the EXIT signal sent when the process exits. If you need to run multiple commands you can either separate them with colons or &&, or define a simple shell cleanup function and invoke it like this: trap cleanup EXIT.

queue_template = """
#!/bin/sh
#PBS -N {{.basename}}
#PBS -S /bin/bash
#PBS -j oe
#PBS -o pts/{{.basename}}.out
#PBS -W umask=022
#PBS -l walltime=100:00:00
#PBS -l cput=100:00:00
#PBS -l ncpus=1
#PBS -l mem=16gb
#PBS -q workq

module load openpbs

export WORKDIR=$PBS_O_WORKDIR
export TMPDIR=/tmp/$USER/$PBS_JOBID
cd $WORKDIR
mkdir -p $TMPDIR
trap 'rm -rf $TMPDIR' EXIT

export LD_LIBRARY_PATH=/ddnlus/r2518/Packages/mopac/build
export MOPAC_CMD=/ddnlus/r2518/Packages/mopac/build/mopac
"""

Note the addition of the LD_LIBRARY_PATH variable along with MOPAC_CMD. This is not a pbqff requirement but something to keep in mind if your command points to a dynamically-linked executable like this version of MOPAC.

Molpro
queue_template = """
#!/bin/sh
#PBS -N {{.basename}}
#PBS -S /bin/bash
#PBS -j oe
#PBS -o {{.basename}}.out
#PBS -W umask=022
#PBS -l walltime=1000:00:00
#PBS -l ncpus=1
#PBS -l mem=8gb
#PBS -q workq

module load openpbs molpro

export WORKDIR=$PBS_O_WORKDIR
export TMPDIR=/tmp/$USER/$PBS_JOBID
cd $WORKDIR
mkdir -p $TMPDIR
trap 'rm -rf $TMPDIR' EXIT

export MOLPRO_CMD=\"molpro -t $NCPUS --no-xml-output\"
"""

One potentially surprising part of this template is the use of basename instead of filename in the #PBS -o line. While Molpro PBS jobs get written into a single directory, unlike CFOUR and DFTB+ jobs, the Molpro module on Maple doesn't allow Molpro jobs to be submitted from a separate directory anymore. As a result, pbqff will change to the appropriate directory before submitting a job, leading to this template discrepancy.

CFOUR

This template and the following are pretty much the same, as were the previous two, so there's not much more to say. In general, the top portion of each template just consists of normal PBS directives with optional template parameters, followed by a single command or couple of commands (for MOPAC) to set the variable used by pbqff.

queue_template = """
#!/bin/sh
#PBS -N {{.basename}}
#PBS -S /bin/bash
#PBS -j oe
#PBS -o {{.filename}}.out
#PBS -W umask=022
#PBS -l walltime=1000:00:00
#PBS -l ncpus=1
#PBS -l mem=8gb
#PBS -q workq

module load openpbs

export WORKDIR=$PBS_O_WORKDIR
cd $WORKDIR

CFOUR_CMD=\"/ddnlus/r2518/bin/c4ext_new.sh $NCPUS\"
"""
DFTB+
queue_template = """
#!/bin/sh
#PBS -N {{.basename}}
#PBS -S /bin/bash
#PBS -j oe
#PBS -o {{.filename}}.out
#PBS -W umask=022
#PBS -l walltime=1000:00:00
#PBS -l ncpus=1
#PBS -l mem=8gb
#PBS -q workq

module load openpbs

export WORKDIR=$PBS_O_WORKDIR
cd $WORKDIR

export DFTB_CMD=/ddnlus/r2518/.conda/envs/dftb/bin/dftb+
"""

Slurm

The general approach for Slurm and the local queue is identical to PBS, so you may want to read the previous section even if you're intending to use one of these other queues. Here I will just present some example templates for quick reference.

MOPAC

This template and the Molpro one below are embedded directly from psqs, where the templates for pbqff are defined. psqs doesn't currently define default templates for CFOUR or DFTB+, so you'll have to provide a queue_template of your own for those programs.

#!/bin/bash
#SBATCH --job-name=semp
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH -o {{.filename}}.out
#SBATCH --no-requeue
#SBATCH --mem=1gb
export LD_LIBRARY_PATH=/home/qc/mopac2016/
export MOPAC_CMD=/home/qc/mopac2016/MOPAC2016.exe
echo $SLURM_JOB_ID
date
hostname
Molpro
#!/bin/bash
#SBATCH --job-name={{.filename}}
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH -o {{.filename}}.out
#SBATCH --no-requeue
#SBATCH --mem=8gb

MOLPRO_CMD="/home/qc/bin/molpro2020.sh 1 1"
CFOUR
#!/bin/bash
#SBATCH --job-name={{.filename}}
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH -o {{.filename}}.out
#SBATCH --no-requeue
#SBATCH --mem=8gb

CFOUR_CMD="cfour"
DFTB+
#!/bin/bash
#SBATCH --job-name={{.filename}}
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH -o {{.filename}}.out
#SBATCH --no-requeue
#SBATCH --mem=8gb

DFTB_CMD="dftb+"

Local

Like for Slurm, pbqff includes only two default templates for the local queue option, which is primarily used for testing. Because the local queue simply "submits" jobs using bash scripts rather than PBS or Slurm submission scripts, a local queue template can be much simpler than a PBS or Slurm template, essentially just defining the path to the executable to be run. Of course, you can add any additional commands you want, but this is all that pbqff requires.

MOPAC
export MOPAC_CMD=/opt/mopac/mopac
export LD_LIBRARY_PATH=/opt/mopac/ 
DFTB+
DFTB_CMD=/opt/dftb+/dftb+ 
Molpro
MOLPRO_CMD=molpro 
CFOUR
CFOUR_CMD=cfour 

Derivative Types

As alluded to in the coordinate type discussion, pbqff supports two types of derivative calculations for the force constants used in the VPT2 calculations. The first, and more traditional, of these involves a least-squares fit of the QFF energies to the Taylor series expansion of the potential energy surface using the ANPASS program. This works well for small molecules and can help to smooth over minor numerical issues because the initial guess at the PES can be refit to the function minimum to yield zero-gradient force constants. On the other hand, for large molecules, assembling the matrix of energies and performing the least-squares solution actually becomes a fairly significant bottleneck in terms of both time and memory requirements. This is compounded by larger coordinate systems like Cartesians.

As such, most of our recent calculations have relied instead on finite difference derivatives, specifically central finite differences. These are easier to compute because the individual energies comprising each force constant can be accumulated directly into their corresponding force constant rather than being kept around for a big calculation at the end.

Again, as described in the normal coordinate section, normal coordinates are the only type for which the findiff option is used. Cartesian coordinates assume findiff = true (they always use finite differences), and SICs assume findiff = false (they always use an ANPASS fitting). Normal coordinates also default to findiff = false, but because of the benefits mentioned above, you will usually want to use them with findiff = true, so this default should probably be updated.

Checkpoints

Checkpoints are configured through the check_int option. As the name somewhat implies, this sets the checkpoint interval in units of polling iterations (see Sleep Interval for more details). A value of 0 disables checkpoints, while a positive value will write a psqs checkpoint file (chk.json) at that interval.

In addition to the chk.json file written by psqs, which contains all of the information needed to recreate the queue of running chemistry jobs, pbqff writes its own checkpoint to the res.chk file. This file contains the rest of the state needed to pick up after psqs resumes from its own checkpoint. Despite the strange file extension, this is also a JSON file that you can inspect if needed, although modifying either of these files could cause some very strange issues unless you are very careful.

To resume a QFF from these two checkpoint files, provide the pbqff binary with the -c or --checkpoint flag.

As described in the normal coordinate section, there is also a configuration option called norm_resume_hff, which allows resuming a normal coordinate QFF from within the initial HFF (as in before it moves on to the QFF portion). This relies on the same res.chk file as the normal checkpoints, but its contents are very different during the HFF. To trigger this kind of checkpoint, you actually need to omit the -c command line flag and only include this option in the input file itself.

Atomic Masses

The optional weights keyword is used to specify the atomic masses to use for normal coordinate generation and mass-dependent spectroscopic calculations. If this is not provided, the mass of the most abundant isotope from the NIST atomic weight database is used. See below for an example of how to include this in your input file for water. Currently the number of weights must equal the number of atoms. There is no way to specify weights for a subset of atoms.

Example

geometry = """
O
H 1 OH
H 1 OH 2 HOH

OH = 1.0
HOH = 109.5
"""

# one-line version
weight = [15.99491461957, 1.00782503223, 1.00782503223]

# multiline version, useful for larger molecules
weight = [
    15.99491461957,
    1.00782503223,
    1.00782503223,
]

# use one deuterium
weight = [
    15.99491461957,
    2.01410177812, # <- D
    1.00782503223,
]

Dummy Atoms

This is possibly the worst-named pbqff option because it has nothing to do with the kind of dummy atoms typically included in a Z-matrix, for example, which do not require any special treatment in the pbqff input file (they should be erased by the chemistry program before pbqff sees the Cartesian geometry). Instead this is used to specify a number of atoms that should not be displaced in the course of the QFF. This is a somewhat experimental option that I have currently only used in Molpro to compute QFFs for small molecules in the presence of a non-bonded atom. This takes advantage of Molpro's Z-matrix input, which allows including some Cartesian coordinates too. These Cartesian coordinates are frozen in geometry optimizations, allowing for the type of calculation described here. I consider it unlikely that this option will be useful or even usable in other packages, but it exists if you need it.

Resuming Normal Coordinate QFFs

Finally, the norm_resume_hff option allows you to resume a (presumably failed) normal coordinate QFF from the initial HFF stage. This was a much-requested feature by users of pbqff, but the separation of normal coordinate QFFs into the HFF and QFF stages made it difficult to integrate with the existing checkpoint mechanism, leading to this separate option. Indeed, it is fully separate from the command line --checkpoint option, which must be omitted when using norm_resume_hff.

Command Line Options

In addition to the settings in the input file described in the previous section, pbqff takes several command line options. To some extent, the most important of these is the name of the input file, but even this has a default value of pbqff.toml, allowing it to be omitted if you use that as your input file name. The other options are summarized in the table below.

Summary of command line options
Option Type Description
-c/--checkpoint bool Resume from the checkpoint files in the current directory (chk.json and res.chk). Defaults to false.
-o/--overwrite bool Overwrite existing output from a previous run. Defaults to false.
-v/--version bool Print the git commit hash and exit. Defaults to false.
-t/--threads usize Set the maximum number of threads to use. Defaults to 0, which means to use as many threads as there are CPUS.
-j/--json bool Serialize the input file to JSON and exit. For use by qffbuddy.
-n/--no_del bool Don't delete any files when running the single-point energies. Defaults to false.

As indicated by the slash, each of these options can either be provided as a short option with a single dash and its first letter or as a long option with two dashes and the full name.

Of these options, -t/--threads is probably the most important on a regular basis since it controls the number of threads spawned by pbqff. If you're running pbqff on a local computer, you may want to limit this to ~half of the CPUs on the machine to allow other programs to run smoothly. Otherwise, if you're running pbqff on some kind of HPC cluster, you'll likely want to set this to the number of CPUs you request in your job submission script.

The -o/--overwrite option is also commonly used, especially if you make a typo in your first submission of a pbqff job. By default, pbqff will refuse to overwrite any output files in the current directory, even if they just contain an error message. Passing the -o flag will bypass this check and allow the run to continue.

The -c/--checkpoint flag can be used to resume from a previously-written checkpoint file. See the Checkpoints section above for more details.

Finally, the -n/--no_del option prevents pbqff from cleaning up the (sometimes numerous) single-point energy input and output files as it runs. This is primarily useful if you're debugging some kind of issue in pbqff and need to see all of the output files at the end of a run. This is not the default because large QFFs produce a very large quantity of very similar files that HPC administrators don't always like to keep around.

As noted in the table, the -j/--json flag is intended primarily for use by qffbuddy and is hidden from the built-in -h/--help flag contained in pbqff. However, it is documented here in case it proves useful for other interfaces to pbqff.

qffbuddy

You may know qffbuddy as a Python GUI for writing and even running pbqff jobs. This section contains a reimplementation of qffbuddy as an HTML form. While you can't use it to run pbqff directly, submitting the form will copy the generated TOML file to your clipboard for pasting into an actual TOML file.










Coordinate type



Chemistry Program

Queuing System