This page contains the complete documentation for pbqff as a single HTML
page. For a quick reference see
the man
directory in the main repository, where you can find a manual page in both
Unix man
and PDF formats.
pbqff is a package (really a collection of packages) for computing anharmonic vibrational and rotational spectroscopic constants for small molecules. It does this through the use of quartic force fields (QFFs) to approximate the molecular potential energy function and second-order vibrational and rotational perturbation theory, which utilize the derivative information from the function approximation to compute the actual spectroscopic constants.
Because this is a complex task, pbqff is broken into several reusable components, only the final piece of which is included in this repository. Other major components include:
Molecule
data structure and symmetry operations
pbqff itself is a relatively thin wrapper around these packages that
primarily handles tasks like generating the grid of displaced geometries
to map out the QFF in various coordinate systems. This fact isn't really
important directly, but it may help to explain some of the potential error
messages you encounter, if they reference source code
in psqs
, for example.
The primary input for pbqff is a TOML file. TOML as a format is widely used in the Rust community and is pretty easy for humans to write, although you will end up having to type a lot of quotes compared to something like YAML. To start, here's an example input file for a MOPAC QFF on a PBS queue:
geometry = """
O
H 1 OH
H 1 OH 2 HOH
OH = 1.0
HOH = 109.5
"""
optimize = true
charge = 0
step_size = 0.005
sleep_int = 2
job_limit = 2048
chunk_size = 1
coord_type = "norm"
template = "scfcrt=1.D-21 aux(precision=14 comp xp xs xw) PM6 THREADS=1"
queue_template = """
#!/bin/sh
#PBS -N {{.basename}}
#PBS -S /bin/bash
#PBS -j oe
#PBS -o pts/{{.basename}}.out
#PBS -W umask=022
#PBS -l walltime=100:00:00
#PBS -l cput=100:00:00
#PBS -l ncpus=1
#PBS -l mem=16gb
#PBS -q workq
module load openpbs
export WORKDIR=$PBS_O_WORKDIR
export TMPDIR=/tmp/$USER/$PBS_JOBID
export MOPAC_CMD=/ddnlus/r2518/Packages/mopac/build/mopac
export LD_LIBRARY_PATH=/ddnlus/r2518/Packages/mopac/build
cd $WORKDIR
mkdir -p $TMPDIR
trap 'rm -rf $TMPDIR' EXIT
"""
program = "mopac"
queue = "slurm"
findiff = true
check_int = 0
In addition to the options specified here, there are a few other optional
values like hybrid_template
for specifying a separate quantum
chemistry program template for the cubic and quartic force
constants, weights
for specifying atomic
weights, dummy_atoms
for specifying a number of trailing
dummy atoms, and the very specialized norm_resume_hff
for
resuming a normal coordinate QFF from the finished Cartesian harmonic
force field (HFF). A summary of all the possible input options is given in
the table below, followed by a more detailed description of each option.
Option | Type | Description |
---|---|---|
geometry |
String | The initial molecular geometry as XYZ or Z-matrix |
optimize |
bool | Whether or not to optimize the geometry before running the QFF |
charge |
isize | The molecular charge |
step_size |
f64 | The displacement size for the QFF in Å |
sleep_int |
usize | Interval in seconds to sleep between job polling iterations |
job_limit |
usize | Maximum number of jobs to submit to the queue at once |
chunk_size |
usize | Number of jobs to combine into a single queue submission |
coord_type |
CoordType | Displacement type for the QFF. Supported values are sic, cart, and norm |
template |
TemplateSrc | Quantum chemistry program template. Can either be a literal String
or a map with the key file |
hybrid_template |
Option<TemplateSrc> | Separate quantum chemistry program template for cubic and quartic force constants |
queue_template |
Option<TemplateSrc> | Queuing system template |
program |
Program | A quantum chemistry program name. Supported values are dftb+, mopac, molpro, and cfour |
queue |
Queue | A queuing system. Supported values are pbs, slurm, and local |
findiff |
Option<bool> | Whether to use finite differences for derivative evaluations. Not all coordinate types support this |
check_int |
usize | Interval in polling iterations to write checkpoints. A value of 0 disables checkpoints |
weights |
Option<Vec<f64>> | An optional sequence of atomic masses for use in normal coordinate QFFs and spectroscopic calculations |
dummy_atoms |
Option<usize> | An optional number of trailing dummy atoms |
norm_resume_hff |
Option<bool> | Resume a normal coordinate QFF from a previously-finished HFF checkpoint |
The geometry
can be specified as either an XYZ geometry with
an optional number of atoms and comment line or as a Z-matrix. Currently,
pbqff cannot automatically convert between Z-matrix and XYZ geometries,
so optimize = true
is required if the geometry is provided as
a Z-matrix. This allows pbqff to extract the Cartesian geometry from the
quantum chemistry program's output file and use that for the geometrical
displacements. For completeness, here is an example XYZ geometry for
cyclopropenylidene:
5
c3h2 geometry
C 0.0000000000 0.0000000000 -0.8887390447
C 0.0000000000 -0.6627588746 0.3682596138
C 0.0000000000 0.6627588746 0.3682596138
H 0.0000000000 -1.5952720351 0.9069548903
H 0.0000000000 1.5952720351 0.9069548903
Again, the first two lines are optional, but if either one is present then both must be present. And an example Z-matrix geometry for water:
O
H 1 OH
H 1 OH 2 HOH
OH = 1.0
HOH = 109.5
Aside from the caveats above about Z-matrix input,
the optimize
keyword tells pbqff whether or not to optimize
the geometry before building the rest of the QFF. Because the quality of
the Taylor series approximation underlying the QFF is tied to the
assumption that the input geometry is a minimum on the potential energy
surface (PES), any geometries provided with optimize = false
should definitely have been optimized prior to the pbqff run and at the
same level of theory as the QFF.
charge
is possibly the most straightforward input option as
it is simply the molecular charge as a whole number. As we'll see in later
sections, this won't necessarily be used if you omit
a {{.charge}}
entry in your program template, but the
option must be provided in the pbqff input file even in those cases.
step_size
sets the displacement size for the QFF in units of
Ångstroms. 0.005 is a reasonable default, with larger values like 0.010 or
even 0.020 helping occasionally with floppy systems.
pbqff uses a polling model for running jobs in parallel. On each iteration
of a big loop that runs until all jobs have finished, pbqff checks each
output file for the currently-running jobs to see if any have finished.
For very short single-point energy calculations (like PM6 or HF/STO-3G)
this makes a lot of sense because many jobs are likely to finish
constantly and each iteration will make some progress. On the other hand,
for long-running individual jobs (like CCSD(T)-F12/cc-pCVTZ-F12 on a large
molecule), pbqff will waste a lot of CPU time constantly checking the same
output files that may not have changed meaningfully since the last polling
interval. Thus, sleep_int
is an interval in seconds for pbqff
to sleep between such polling attempts. Higher values should cause pbqff
to spend less CPU time reading the same files over and over, but extremely
high values could potentially lead to periods of no jobs running because
submitting more jobs as old ones finish is also part of the polling loop.
In practice, I usually use a minimum of 60 seconds for this and go up to
~600 seconds for what I consider really long jobs. The options in the next
section also have some effect on what to choose here.
These two options, job_limit
and chunk_size
are
in the same section because they are very closely related. Early versions
of pbqff submitted each single-point energy calculation as a separate
queue submission limited by the job_limit
setting. However,
this was inefficient enough to attract the attention of the HPC
administrators who recommended combining multiple program invocations into
a single submission script. This led to the addition of
the chunk_size
option, which sets the number of invocations
per submission script (chunk). Hopefully this is already clear, but below
is an illustration of a chunk of MOPAC jobs for a local queue, where "job
submission" is handled by a very simple bash script. This would correspond
to a chunk_size
of 5.
/opt/mopac/mopac job.0000000.mop
/opt/mopac/mopac job.0000001.mop
/opt/mopac/mopac job.0000002.mop
/opt/mopac/mopac job.0000003.mop
/opt/mopac/mopac job.0000004.mop
Because of this legacy, job_limit
still corresponds to
individual calculations, despite the fact that they are now grouped into
chunks. As a result, if you want to restrict the number of jobs as
reported by your queue manager you should consider the
ratio job_limit / chunk_size
rather
than job_limit
alone. For example, to target a maximum number
of queued jobs of 500 with a chunk size of 25, you could set
a job_limit
of 500 * 25 = 12_500
. Note the use
of the underscore as a digit separator. This is not required in TOML but
can be handy for large numbers.
The main consideration when choosing a chunk_size
is the
amount of time spent submitting jobs versus actually running jobs.
According to our system administrators, the cutoff (likely pretty
conservative) is around 5 minutes of job time. This can lead to pretty
crazy chunk sizes, especially for PM6 calculations on small molecules,
where single-point energy computations can take on the order of 0.01
seconds (5 minutes = 300 seconds | 1 job / 0.01 seconds => 30_000
jobs per chunk
!). However, more expensive calculations can easily
take five minutes or more each, allowing you to use a chunk size of 1.
Individual jobs give pbqff the greatest freedom to queue new jobs as old
ones finish, but, again, it's possible to spend more time writing and
submitting separate jobs (and taxing the queue system) than actually
running calculations.
Of course, the preceding paragraphs all assume that you're running pbqff
on a large molecule with many single-point calculations and that
you're on some kind of distributed compute cluster. If, on the other hand,
you're running pbqff on a personal computer or a single node on a cluster
using the local queue option, you will only be able to run a very small
number of chunks at once anyway (roughly one per CPU or thread you allowed
pbqff to use). In this case, there's no real incentive to shrink
your chunk_size
to allow more flexible submission. As long as
your chunk size is small enough to send one chunk to each core you
requested, all of your compute will be saturated already. For example, you
wouldn't want to request a chunk size of 250 for a 1000-job QFF and
request 8 cores because only 4 cores would get chunks to work on, but
there's no real reason to set a chunk size of 25 either. It will just lead
to a lot more shell scripts being written. The logic is similar for a
small molecule like a diatomic where the total number of points may be
smaller than a reasonable chunk size for other calculations.
This table contains my general heuristics for job_limit
and chunk_size
requests on our compute cluster, where the
hard limit on queued jobs is around 1000, but most of the time I can only
get ~50 jobs to run at once anyway.
Level of Theory | Time Per Job (s) | Job Limit | Chunk Size |
---|---|---|---|
PM6 (5 atoms) | 0.01 | 1_500_000 | 30_000 |
F12-DZ (5 atoms) | 1 | 15_000 | 300 |
TZ-cCR (8 atoms) | 300 | 50 | 1 |
pbqff originally grew out of two separate projects: go-cart
,
which was my first implementation of finite-difference
Cartesian-coordinate QFFs, and pbqff proper, which automated our group's
traditional least-squares fitted symmetry-internal coordinate (SIC) QFFs.
These two (SICs and Cartesian coordinates) are still supported, along with
the newest but already most-used normal-coordinate QFFs. Fully documenting
the construction of a symmetry-internal coordinate system is beyond the
scope of this manual (at least for now but likely forever) because it
involves a lot of practice and manual construction. Thus, the two
recommended coordinate systems for pbqff are Cartesian coordinates,
requested by coord_type = "cart"
and normal coordinates,
requested by coord_type = "norm"
.
Cartesian coordinates are the most straightforward coordinate system. The
initial geometry is either already in Cartesian (XYZ) form or can be
converted from a Z-matrix by the quantum chemistry program. Then, each of
these coordinates is displaced in all of the two-, three-, and
four-coordinate combinations needed to map out the QFF. It is
theoretically possible to approximate the PES with a least-squares fit in
Cartesian coordinates, but this is not currently implemented. I tried this
in a previous version of pbqff, and it led to significant bottlenecks in
the least-squares fit due to the much larger number of Cartesian points
compared to SICs without any improvement (and occasional worsening) of the
results. In other words, Cartesian coordinates imply findiff =
true
.
Normal coordinates are generally a clear improvement over Cartesian
coordinates. First, the VPT2 implementation in spectro only requires the
semi-diagonal normal coordinate force constants (Φiijj and
Φiiii), but these require the fully-off-diagonal Cartesian
coordinate force constants like Fijkl to compute. Similarly,
and even more straightforwardly, normal coordinates discard the rotational
and translational degrees of freedom present in Cartesian coordinates,
lowering the number of coordinates from 3N
to 3N
− 6
. Combined, these factors allow for far fewer single-point
energies to be evaluated in normal coordinates.
Further, normal coordinates share the convenience of Cartesian coordinates: they can be derived automatically from the input geometry. This may not be clear from this manual because of the limited treatment of SICs, but this is the primary contrast between SICs and normal coordinates because SICs also require far fewer points than Cartesians, putting them on more equal footing with normal coordinates in terms of computational cost (but still worse for even medium-sized molecules) but certainly not in convenience.
Of course, the similarity to Cartesian coordinates is no accident. Normal
coordinates can only be derived automatically by computing a Cartesian
harmonic force field (HFF) to obtain the Hessian matrix, from which the
mass-weighted eigenvalues can be extracted to yield the normal
coordinates. This separation of the HFF and QFF leads to two additional
options for normal coordinate QFFs: hybrid_template
and norm_resume_hff
. These are both discussed more fully in
later sections, but in brief, hybrid_template
allows you to
use the plain template
as the template for the HFF points and
the hybrid_template
for the cubic and quartic points;
and norm_resume_hff
allows you to resume a normal coordinate
QFF from a checkpoint after the initial HFF.
Normal coordinates are also the only coordinate type for which
the findiff
option actually has any effect. SICs are assumed
to be fitted (findiff = false
) and Cartesians are assumed to
be finite-difference. However, because of the way normal coordinates are
constructed (via the initial HFF), fitted normal coordinates are not refit
to zero the approximated function gradient as was typical with SICs.
Again, this is likely not to make much sense unless you're familiar with
the SIC procedure itself, but in short, this means normal coordinates
don't really capture the benefits SICs got from performing this fitting,
and finite differences are more economical, so stick with finite
differences (findiff = true
) unless you really know what
you're doing.
Finally, normal coordinates are not without their limitations. Their main shortcoming relates to the mass-weighting involved in their definition, which causes the atomic masses to be baked into the resulting force constants in a way that is not easy to reverse. Practically, this means that computing isotopic spectroscopic data with normal coordinate QFFs requires recomputing the QFF for each combination of atomic masses. Second, because they are based on the Cartesian HFF, any numerical issues in the initial HFF can be dramatically compounded in the QFF. This mostly arises with highly symmetrical molecules (symmetric and spherical tops) with many resonances, but it can also pop up for other tricky molecules. It should be pretty obvious when this happens (anharmonic frequencies of 10,000 or more), but it's something to keep in the back of your mind.
As mentioned above, fully documenting the SIC code could double the size
of this document. All I'll say here is that running an SIC QFF in pbqff
requires the top of an INTDER input file (the simple- and
symmetry-internal coordinate definitions) in a file
named intder.in
in the directory where you invoke pbqff. If
there's a resurgence of interest in SICs, I'd like to handle this kind of
input in a more compact way and without requiring a separate file. Feel
free to send me an email or open a GitHub issue if you are interested in
this or try using it and encounter any problems.
As mentioned in the introduction, pbqff offloads the interface to both chemistry programs and queuing systems to the psqs library. As listed in the psqs README, it currently supports writing input files and reading output files for the following programs:
Where these are listed roughly in order of support quality. MOPAC and
Molpro have been tested the most thoroughly, while CFOUR and DFTB+ have
been tested but to a much more limited extent. Each of these can be
requested in the pbqff input file using the program
keyword
with valid options being the lowercase versions of each program
name as a TOML
string: "mopac"
, "molpro"
, "cfour"
, "dftb+"
.
Like the program
option above, the queue
option
depends on the values supported by psqs, which are the following:
PBS and Slurm are popular queuing systems on HPC clusters, while the local queue is a way to run pbqff on a single computer. Again, these are listed roughly in order of support. Hundreds of QFFs have been run with pbqff on a PBS cluster, while a much smaller number have been run on a smaller Slurm cluster. The local queue is used in all of the pbqff unit tests, so by that metric it is arguably the best-tested, but it essentially has no error handling, making it suited really only for these tests and other short test calculations rather than "production" QFFs.
These can be selected by their lowercase TOML string forms with
the queue
keyword: "pbs"
, "slurm"
,
and "local"
.
The program template
is the main way for you to customize how
pbqff runs the single-point energies that compose the QFF. The template
syntax is inspired by
Go's template library,
where patterns like {{.geom}}
are replaced by the
appropriate information (somewhat obviously the geometry in this example).
Currently each supported program has its own set of supported options, but
most of them are the same at least.
Most users of pbqff specify both these template
values and
the queue_template
discussed in the next section as a literal
TOML string, but you can also specify an external file using the syntax
below.
template = { file = "/path/to/template" }
All of the examples in the following sections will work for either
plain template
input or for the
special hybrid_template
option. As mentioned in the section
on normal
coordinates, hybrid_template
allows you to run the HFF
portion of a normal coordinate QFF with one level of theory and the cubic
and quartic portions of the QFF at a second level of theory. As far as I
know, this has only been tested to work with Molpro,
so see that section below for an example input file.
MOPAC has the simplest template
format because MOPAC uses a
single line at the start of the file to specify all options, followed by
two comment lines and the geometry. As such, pbqff does not handle any
template parameter replacements for MOPAC (the curly brace stuff from the
paragraphs above). Options like the charge and whether or not to run an
optimization are simply appended to the first line of the template, and
the geometry for each single-point energy calculation is appended to the
file after two arbitrary comment lines. As such, the template
value in a MOPAC QFF should be a single-line TOML string like those in the
examples below.
template = "scfcrt=1.D-21 aux(precision=14 comp xp xs xw) PM6 THREADS=1"
template = """gnorm=0.0 scfcrt=1.D-21 aux(precision=14 comp xp xs xw) \
PM6 SINGLET THREADS=1 external=params.dat"""
As shown in the second example, you can also use the multi-line TOML string syntax (triple quotes) to allow very long templates to be broken across multiple lines with a backslash. Unfortunately this is not supported by single-line TOML strings with single quotes.
MOPAC defaults to running a geometry optimization, so the input
option 1SCF
is added to the input line to disable this for
the single-point energy computations. As far as I know it's safe to
include any geometry-optimization-specific options
(like gnorm
above) in these single-point calculations, so
pbqff doesn't have to do any further processing on the template.
For Molpro, pbqff supports two template
parameters: {{.geom}}
and {{.charge}}
,
although it's often easier to supply the charge literally in
your template
. The charge
option is quite
straightforward as the input charge
option is simply
substituted directly for {{.charge}}
in the template
string. The geometry handling, on the other hand, is slightly more
complicated. Notice that there is no closing curly brace in the templates
below. The reason for this is simply programming convenience, at a slight
risk of confusion for the user. For Z-matrix geometries, Molpro expects
the main Z-matrix to be inside the curly braces and the variable
definitions to follow the closing curly brace. In contrast, XYZ geometries
should be fully enclosed in curly braces. Omitting the closing curly brace
makes it easier for pbqff to handle this distinction.
Here's a somewhat complicated template example that captures the flexibility of the template input for a TZ-cCR calculation:
template = """memory,8,g
gthresh,energy=1.d-12,zero=1.d-22,oneint=1.d-22,twoint=1.d-22;
gthresh,optgrad=1.d-8,optstep=1.d-8;
nocompress;
geometry={
{{.geom}}
basis={
default,cc-pCVTZ-f12
}
set,charge={{.charge}}
set,spin=0
hf,accuracy=16,energy=1.0d-10
{CCSD(T)-F12,thrden=1.0d-12,thrvar=1.0d-10,nocheck;core}
{optg,grms=1.d-8,srms=1.d-8}
etz=energy
basis=cc-pvtz-dk
hf,accuracy=16,energy=1.0d-10
{CCSD(T),thrden=1.0d-12,thrvar=1.0d-10,nocheck;}
edk=energy
basis=cc-pvtz-dk
dkroll=1
hf,accuracy=16,energy=1.0d-10
{CCSD(T),thrden=1.0d-12,thrvar=1.0d-10,nocheck;}
edkr=energy
pbqff=etz(2)+edkr-edk
show[1,f20.12],pbqff"""
In addition to the note about the missing curly brace, also note that
pbqff will only read the output energy if it is named pbqff
and displayed in the output using the show
directive. For
simple calculations, this can be achieved using an expression
like pbqff=energy
, followed by thte show
line in
the example above. However, for F12 calculations in particular, it's
important to remember that the F12 energy in Molpro is a vector, with the
F12a and F12b energies in it. The example above uses the F12b energy
(etz(2)
). This might look unusual if you're a programmer used
to indices starting at zero, but Fortran indices actually start at 1,
meaning etz(1)
is the F12a energy and etz(2)
is
the F12b energy.
Another important feature of Molpro template handling is how it adds or subtracts geometry optimization keywords depending on the type of calculation requested. There are four possibilities here:
optg
included in templateoptg
includedoptg
includedoptg
included
In the first and last cases, pbqff doesn't have to make any changes to
your template. However, in the second and third cases, pbqff needs to
bring the request for an initial geometry optimization into alignment with
the provided template. It does this by either adding a
default optg
directive
({optg,grms=1.d-8,srms=1.d-8}
) in the second case, or by
deleting a line matching the regular
expression (?i)^.*optg(,|\s*$)
. If you're not familiar with
the intricacies of regular expressions, this looks for a line starting
with some optional junk (such as a curly brace), followed by the literal
word optg
, followed by either a comma or some number of
spaces and a newline. While writing this, it doesn't seem quite right, but
it has worked for a couple of years in practice to identify
typical optg
lines
like {optg,grms=1.d-8,srms=1.d-8}
and skip lines
like gthresh,optgrad=1.d-8,optstep=1.d-8;
, which also
includes optg
but not in the right way.
In summary, if you're picky about how the optimization should be run, be
sure to include a single-line optg
directive of your own
choosing, and pbqff will remove it as needed for single-point energy
calculations. Otherwise you can omit an optg
line entirely
and pbqff will include a default one as needed too.
As one final example, here's a combination template
and hybrid_template
input file section for a TZ-cCR+DZ
calculation:
template = """memory,1,g
gthresh,energy=1.d-12,zero=1.d-22,oneint=1.d-22,twoint=1.d-22;
gthresh,optgrad=1.d-8,optstep=1.d-8;
nocompress;
geometry={
{{.geom}}
basis={
default,cc-pCVTZ-f12
}
set,charge={{.charge}}
set,spin=0
hf,accuracy=16,energy=1.0d-10
{CCSD(T)-F12,thrden=1.0d-12,thrvar=1.0d-10,nocheck;core}
{optg,grms=1.d-8,srms=1.d-8}
etz=energy
basis=cc-pvtz-dk
hf,accuracy=16,energy=1.0d-10
{CCSD(T),thrden=1.0d-12,thrvar=1.0d-10,nocheck;}
edk=energy
basis=cc-pvtz-dk
dkroll=1
hf,accuracy=16,energy=1.0d-10
{CCSD(T),thrden=1.0d-12,thrvar=1.0d-10,nocheck;}
edkr=energy
pbqff=etz(2)+edkr-edk
show[1,f20.12],pbqff"""
hybrid_template = """memory,1,g
gthresh,energy=1.d-12,zero=1.d-22,oneint=1.d-22,twoint=1.d-22;
gthresh,optgrad=1.d-8,optstep=1.d-8;
nocompress;
geometry={
{{.geom}}
basis={
default,cc-pVDZ-f12
}
set,charge={{.charge}}
set,spin=0
hf,accuracy=16,energy=1.0d-10
{CCSD(T)-F12,thrden=1.0d-12,thrvar=1.0d-10}
{optg,grms=1.d-8,srms=1.d-8}
pbqff=energy(2)
show[1,f20.12],pbqff
"""
If renamed just to template
, the hybrid_template
would serve as an F12-DZ template itself.
Like with Molpro, pbqff also supports the {{.geom}}
and {{.charge}}
keywords for CFOUR. Additionally, pbqff
recognizes a {{.keywords}}
template parameter that it
needs to know where to insert things like COORD=CARTESIAN
for
the single-point energy calculations. One, admittedly strange, difference
with the CFOUR {{.charge}}
parameter is that it also adds
the prefix CHARGE=
as expected by CFOUR. With these
considerations in mind, an example CFOUR template for a CCSD/cc-pVTZ QFF
is shown below.
comment line
{{.geom}}
*CFOUR(CALC=CCSD,BASIS=PVTZ,MEMORY_SIZE=8,MEM_UNIT=GB,REF=RHF,MULT=1
{{.charge}}
{{.keywords}})
Unlike Molpro, there is no special handling of geometry optimization. If
you request a geometry optimization, pbqff assumes that you have marked
all of the variables in the Z-matrix to optimize with *
as
expected by CFOUR and will extract the optimized Cartesian geometry from
the MOLDEN output file in the same directory as the main CFOUR output.
Again, like Molpro and CFOUR, DFTB+ supports
the {{.geom}}
and {{.charge}}
template
parameters. Also like Molpro, DFTB+ requires a rather elaborate geometry
optimization input section, so if you don't provide a section matching the
regular expression (?i)Driver = GeometryOptimization
(basically just that literal text but ignoring case), pbqff will add the
default value shown below.
Driver = GeometryOptimization {
Optimizer = Rational {}
MovedAtoms = 1:-1
MaxSteps = 100
OutputPrefix = "geom.out"
Convergence {
Energy = 1e-8
GradElem = 1e-7
GradNorm = 1e-7
DispElem = 1e-7
DispNorm = 1e-7
}
}
And here is a full DFTB+ example template with a custom geometry optimization block that is probably a more sensible default.
template = """Geometry = xyzFormat {
{{.geom}}
}
Hamiltonian = DFTB {
Scc = Yes
SlaterKosterFiles = Type2FileNames {
Prefix = "/home/brent/chem/dftb/matsci/"
Separator = "-"
Suffix = ".skf"
}
MaxAngularMomentum {
C = "p"
H = "s"
}
Charge = {{.charge}}
PolynomialRepulsive = SetForAll { Yes }
}
Options {
}
Analysis {
CalculateForces = Yes
}
ParserOptions {
ParserVersion = 12
}
Driver = GeometryOptimization {
Optimizer = LBFGS { }
MovedAtoms = 1:-1
MaxSteps = 500
OutputPrefix = "geom.out"
Convergence {
Energy = 1e-8
GradElem = 1e-8
GradNorm = 1e-8
DispElem = 1e-7
DispNorm = 1e-7
}
}
"""
Just like the plain template
keyword allows you to customize
the chemistry program input, the queue_template
keyword
allows you to tell pbqff how to submit your jobs to the three kinds of
queues it knows about: PBS, Slurm, and what it calls a "local" queue.
One thing to keep in mind with the queue templates is how the chemistry
program is invoked for each input job. The primary means of customizing
this is through the PROGRAM_CMD
variables. The table below
shows the command that pbqff writes to a submission script for each job in
a chunk. Don't worry about {filename}
here, that's a Rust
format string parameter, not a template parameter like we saw in the
previous sections. pbqff will automatically replace this with the proper
filename.
Program | Template |
---|---|
MOPAC | $MOPAC_CMD {filename}.mop |
Molpro | $MOLPRO_CMD {filename}.inp |
CFOUR | (cd {filename} && $CFOUR_CMD) |
DFTB+ | (cd {filename} && $DFTB_CMD > out) |
Note that CFOUR and DFTB+ expect a single filename for each input file (ZMAT and dftb_in.hsd, respectively), so their "filenames" actually correspond to directories. Again, that is effectively an implementation detail of pbqff, not something you should have to worry about.
In addition to using these command variables, all of the queue
implementations have access to the template
parameters {{.basename}}
and {{.filename}}
. These both refer to the filename of
the submission script itself (usually something
like main0.pbs
or main0.slurm
). basename
just strips off any
directory information, which makes it a bit neater to use as the job name,
as shown in the examples below.
The structure of each section below is the same. First, tables containing
the default CMD
values for each of the supported programs on
that queue are shown, followed by example queue_template
values for each program.
Program | Default Command |
---|---|
MOPAC | /ddnlus/r2518/Packages/mopac/build/mopac |
Molpro | "molpro -t $NCPUS --no-xml-output" |
CFOUR | "/ddnlus/r2518/bin/c4ext_new.sh $NCPUS" |
DFTB+ | /ddnlus/r2518/.conda/envs/dftb/bin/dftb+ |
MOPAC and DFTB+ have simple commands (just paths to executables here that don't require quotes, but Molpro and CFOUR require additional arguments separated by spaces.
This example is repeated from the initial input file example. One feature
to point out here that will be shared with later templates is the use of
the trap
shell command to run any commands that need to be
run after the single-point energy calculations. pbqff just appends
each CMD
line to the template, so there's not any opportunity
to append commands after that. trap
allows you to associate
an action with a signal received by the shell process, in this case the
EXIT signal sent when the process exits. If you need to run multiple
commands you can either separate them with colons or &&
, or
define a simple shell cleanup
function and invoke it like
this: trap cleanup EXIT
.
queue_template = """
#!/bin/sh
#PBS -N {{.basename}}
#PBS -S /bin/bash
#PBS -j oe
#PBS -o pts/{{.basename}}.out
#PBS -W umask=022
#PBS -l walltime=100:00:00
#PBS -l cput=100:00:00
#PBS -l ncpus=1
#PBS -l mem=16gb
#PBS -q workq
module load openpbs
export WORKDIR=$PBS_O_WORKDIR
export TMPDIR=/tmp/$USER/$PBS_JOBID
cd $WORKDIR
mkdir -p $TMPDIR
trap 'rm -rf $TMPDIR' EXIT
export LD_LIBRARY_PATH=/ddnlus/r2518/Packages/mopac/build
export MOPAC_CMD=/ddnlus/r2518/Packages/mopac/build/mopac
"""
Note the addition of the LD_LIBRARY_PATH
variable along
with MOPAC_CMD
. This is not a pbqff requirement but something
to keep in mind if your command points to a dynamically-linked executable
like this version of MOPAC.
queue_template = """
#!/bin/sh
#PBS -N {{.basename}}
#PBS -S /bin/bash
#PBS -j oe
#PBS -o {{.basename}}.out
#PBS -W umask=022
#PBS -l walltime=1000:00:00
#PBS -l ncpus=1
#PBS -l mem=8gb
#PBS -q workq
module load openpbs molpro
export WORKDIR=$PBS_O_WORKDIR
export TMPDIR=/tmp/$USER/$PBS_JOBID
cd $WORKDIR
mkdir -p $TMPDIR
trap 'rm -rf $TMPDIR' EXIT
export MOLPRO_CMD=\"molpro -t $NCPUS --no-xml-output\"
"""
One potentially surprising part of this template is the use
of basename
instead of filename
in
the #PBS -o
line. While Molpro PBS jobs get written into a
single directory, unlike CFOUR and DFTB+ jobs, the Molpro module on Maple
doesn't allow Molpro jobs to be submitted from a separate directory
anymore. As a result, pbqff will change to the appropriate directory
before submitting a job, leading to this template discrepancy.
This template and the following are pretty much the same, as were the previous two, so there's not much more to say. In general, the top portion of each template just consists of normal PBS directives with optional template parameters, followed by a single command or couple of commands (for MOPAC) to set the variable used by pbqff.
queue_template = """
#!/bin/sh
#PBS -N {{.basename}}
#PBS -S /bin/bash
#PBS -j oe
#PBS -o {{.filename}}.out
#PBS -W umask=022
#PBS -l walltime=1000:00:00
#PBS -l ncpus=1
#PBS -l mem=8gb
#PBS -q workq
module load openpbs
export WORKDIR=$PBS_O_WORKDIR
cd $WORKDIR
CFOUR_CMD=\"/ddnlus/r2518/bin/c4ext_new.sh $NCPUS\"
"""
queue_template = """
#!/bin/sh
#PBS -N {{.basename}}
#PBS -S /bin/bash
#PBS -j oe
#PBS -o {{.filename}}.out
#PBS -W umask=022
#PBS -l walltime=1000:00:00
#PBS -l ncpus=1
#PBS -l mem=8gb
#PBS -q workq
module load openpbs
export WORKDIR=$PBS_O_WORKDIR
cd $WORKDIR
export DFTB_CMD=/ddnlus/r2518/.conda/envs/dftb/bin/dftb+
"""
The general approach for Slurm and the local queue is identical to PBS, so you may want to read the previous section even if you're intending to use one of these other queues. Here I will just present some example templates for quick reference.
This template and the Molpro one below are embedded directly from psqs,
where the templates for pbqff are defined. psqs doesn't currently define
default templates for CFOUR or DFTB+, so you'll have to provide
a queue_template
of your own for those programs.
#!/bin/bash
#SBATCH --job-name=semp
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH -o {{.filename}}.out
#SBATCH --no-requeue
#SBATCH --mem=1gb
export LD_LIBRARY_PATH=/home/qc/mopac2016/
export MOPAC_CMD=/home/qc/mopac2016/MOPAC2016.exe
echo $SLURM_JOB_ID
date
hostname
#!/bin/bash
#SBATCH --job-name={{.filename}}
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH -o {{.filename}}.out
#SBATCH --no-requeue
#SBATCH --mem=8gb
MOLPRO_CMD="/home/qc/bin/molpro2020.sh 1 1"
#!/bin/bash
#SBATCH --job-name={{.filename}}
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH -o {{.filename}}.out
#SBATCH --no-requeue
#SBATCH --mem=8gb
CFOUR_CMD="cfour"
#!/bin/bash
#SBATCH --job-name={{.filename}}
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH -o {{.filename}}.out
#SBATCH --no-requeue
#SBATCH --mem=8gb
DFTB_CMD="dftb+"
Like for Slurm, pbqff includes only two default templates for the local queue option, which is primarily used for testing. Because the local queue simply "submits" jobs using bash scripts rather than PBS or Slurm submission scripts, a local queue template can be much simpler than a PBS or Slurm template, essentially just defining the path to the executable to be run. Of course, you can add any additional commands you want, but this is all that pbqff requires.
export MOPAC_CMD=/opt/mopac/mopac
export LD_LIBRARY_PATH=/opt/mopac/
DFTB_CMD=/opt/dftb+/dftb+
MOLPRO_CMD=molpro
CFOUR_CMD=cfour
As alluded to in the coordinate type discussion, pbqff supports two types of derivative calculations for the force constants used in the VPT2 calculations. The first, and more traditional, of these involves a least-squares fit of the QFF energies to the Taylor series expansion of the potential energy surface using the ANPASS program. This works well for small molecules and can help to smooth over minor numerical issues because the initial guess at the PES can be refit to the function minimum to yield zero-gradient force constants. On the other hand, for large molecules, assembling the matrix of energies and performing the least-squares solution actually becomes a fairly significant bottleneck in terms of both time and memory requirements. This is compounded by larger coordinate systems like Cartesians.
As such, most of our recent calculations have relied instead on finite difference derivatives, specifically central finite differences. These are easier to compute because the individual energies comprising each force constant can be accumulated directly into their corresponding force constant rather than being kept around for a big calculation at the end.
Again, as described in the normal
coordinate section, normal coordinates are the only type for which
the findiff
option is used. Cartesian coordinates
assume findiff = true
(they always use finite differences),
and SICs assume findiff = false
(they always use an ANPASS
fitting). Normal coordinates also default to findiff = false
,
but because of the benefits mentioned above, you will usually want to use
them with findiff = true
, so this default should probably be
updated.
Checkpoints are configured through the check_int
option. As
the name somewhat implies, this sets the checkpoint interval in units of
polling iterations (see Sleep Interval for
more details). A value of 0
disables checkpoints, while a
positive value will write a psqs checkpoint file (chk.json
)
at that interval.
In addition to the chk.json
file written by psqs, which
contains all of the information needed to recreate the queue of running
chemistry jobs, pbqff writes its own checkpoint to
the res.chk
file. This file contains the rest of the state
needed to pick up after psqs resumes from its own checkpoint. Despite the
strange file extension, this is also a JSON file that you can inspect if
needed, although modifying either of these files could cause some very
strange issues unless you are very careful.
To resume a QFF from these two checkpoint files, provide the pbqff binary
with the -c
or --checkpoint
As described in the normal coordinate
section, there is also a configuration option
called norm_resume_hff
, which allows resuming a normal
coordinate QFF from within the initial HFF (as in before it moves on to
the QFF portion). This relies on the same res.chk
file as the
normal checkpoints, but its contents are very different during the HFF. To
trigger this kind of checkpoint, you actually need to omit
the -c
command line flag and only include this option in the
input file itself.
The optional weights
keyword is used to specify the atomic
masses to use for normal coordinate generation and mass-dependent
spectroscopic calculations. If this is not provided, the mass of the most
abundant isotope from
the NIST
atomic weight database is used. See below for an example of how to
include this in your input file for water. Currently the number of weights
must equal the number of atoms. There is no way to specify weights for a
subset of atoms.
geometry = """
O
H 1 OH
H 1 OH 2 HOH
OH = 1.0
HOH = 109.5
"""
# one-line version
weight = [15.99491461957, 1.00782503223, 1.00782503223]
# multiline version, useful for larger molecules
weight = [
15.99491461957,
1.00782503223,
1.00782503223,
]
# use one deuterium
weight = [
15.99491461957,
2.01410177812, # <- D
1.00782503223,
]
This is possibly the worst-named pbqff option because it has nothing to do with the kind of dummy atoms typically included in a Z-matrix, for example, which do not require any special treatment in the pbqff input file (they should be erased by the chemistry program before pbqff sees the Cartesian geometry). Instead this is used to specify a number of atoms that should not be displaced in the course of the QFF. This is a somewhat experimental option that I have currently only used in Molpro to compute QFFs for small molecules in the presence of a non-bonded atom. This takes advantage of Molpro's Z-matrix input, which allows including some Cartesian coordinates too. These Cartesian coordinates are frozen in geometry optimizations, allowing for the type of calculation described here. I consider it unlikely that this option will be useful or even usable in other packages, but it exists if you need it.
Finally, the norm_resume_hff
option allows you to resume a
(presumably failed) normal coordinate QFF from the initial HFF stage. This
was a much-requested feature by users of pbqff, but the separation of
normal coordinate QFFs into the HFF and QFF stages made it difficult to
integrate with the existing checkpoint mechanism, leading to this separate
option. Indeed, it is fully separate from the command
line --checkpoint
option, which must be omitted when
using norm_resume_hff
.
In addition to the settings in the input file described in the previous
section, pbqff takes several command line options. To some extent, the
most important of these is the name of the input file, but even this has a
default value of pbqff.toml
, allowing it to be omitted if you
use that as your input file name. The other options are summarized in the
table below.
Option | Type | Description |
---|---|---|
-c/--checkpoint |
bool | Resume from the checkpoint files in the current directory (chk.json and res.chk). Defaults to false. |
-o/--overwrite |
bool | Overwrite existing output from a previous run. Defaults to false. |
-v/--version |
bool | Print the git commit hash and exit. Defaults to false. |
-t/--threads |
usize | Set the maximum number of threads to use. Defaults to 0, which means to use as many threads as there are CPUS. |
-j/--json |
bool | Serialize the input file to JSON and exit. For use by qffbuddy. |
-n/--no_del |
bool | Don't delete any files when running the single-point energies. Defaults to false. |
As indicated by the slash, each of these options can either be provided as a short option with a single dash and its first letter or as a long option with two dashes and the full name.
Of these options, -t/--threads
is probably the most important
on a regular basis since it controls the number of threads spawned by
pbqff. If you're running pbqff on a local computer, you may want to limit
this to ~half of the CPUs on the machine to allow other programs to run
smoothly. Otherwise, if you're running pbqff on some kind of HPC cluster,
you'll likely want to set this to the number of CPUs you request in your
job submission script.
The -o/--overwrite
option is also commonly used, especially
if you make a typo in your first submission of a pbqff job. By default,
pbqff will refuse to overwrite any output files in the current directory,
even if they just contain an error message. Passing the -o
flag will bypass this check and allow the run to continue.
The -c/--checkpoint
flag can be used to resume from a
previously-written checkpoint file. See
the Checkpoints section above for more details.
Finally, the -n/--no_del
option prevents pbqff from cleaning
up the (sometimes numerous) single-point energy input and output files as
it runs. This is primarily useful if you're debugging some kind of issue
in pbqff and need to see all of the output files at the end of a run. This
is not the default because large QFFs produce a very large quantity of
very similar files that HPC administrators don't always like to keep
around.
As noted in the table, the -j/--json
flag is intended
primarily for use by qffbuddy and is hidden from the
built-in -h/--help
flag contained in pbqff. However, it is
documented here in case it proves useful for other interfaces to pbqff.
You may know qffbuddy as a Python GUI for writing and even running pbqff jobs. This section contains a reimplementation of qffbuddy as an HTML form. While you can't use it to run pbqff directly, submitting the form will copy the generated TOML file to your clipboard for pasting into an actual TOML file.