fpcat command-line options¶
The following comes from fpcat --help
:
usage: fpcat [-h] [--in FORMAT] [--merge] [-o FILENAME] [--out FORMAT]
[--level LEVEL] [--reorder] [--preserve-order] [--alignment N]
[--show-progress] [--max-spool-size SIZE] [--tmpdir DIRNAME]
[--version] [--license-check]
[filename ...]
Combine multiple fingerprint files into a single file
positional arguments:
filename input fingerprint filenames (default: use stdin)
options:
-h, --help show this help message and exit
--in FORMAT input fingerprint format. One of fps or fpb (with
optional gz or zst compression), or flush. (default
guesses from filename or is fps)
--merge assume the input fingerprint files are in popcount
order and do a merge sort
-o FILENAME, --output FILENAME
save the fingerprints to FILENAME (default=stdout)
--out FORMAT output fingerprint format. One of fps, fps.gz,
fps.zst, fpb, or flush. (default guesses from output
filename, or is 'fps')
--level LEVEL compression level. Must be a positive integer or one
of 'min', 'default', or 'max'.
--reorder reorder the output fingerprints by popcount (default
for FPB output)
--preserve-order save the output fingerprints in the same order as the
input (default for FPS output)
--alignment N alignment size when saving a FPB file (default=8)
--show-progress show progress
--max-spool-size SIZE
use temporary files for extra storage space for huge
FPB files (default uses RAM)
--tmpdir DIRNAME directory for the temporary files (default uses the
system temp directory)
--version show program's version number and exit
--license-check Check the license and report results to stdout.
Examples:
fpcat can be used to convert between FPS and FPB formats. This is
handy if you want to see what's inside of an FPB file:
fpcat fingerprints.fpb
You can use also use fpcat to make an FPB file from an FPS file:
fpcat fingerprints.fps -o fingerprints.fpb
You might have generated a set of FPS file which you want to merge
into a single FPB. (For example, you might have used GNU parallel to
generate FPS files for each of the PubChem files, which you want to
merge into a single file.):
fpcat Compound_*.fps -o pubchem.fpb
By default the FPB format sorts the fingerprints by popcount. (Use
--preserve-order if you really want to preserve the input order.) The
sort overhead for PubChem uses about 10 GB of RAM. If you don't have
that much memory then ask fpcat to use less memory:
fpcat --max-spool-size 1GB Compound_*.fps -o pubchem.fpb
This will use about 2 GB of RAM and the --tmpdir for the rest. (Yes,
it would be nice if I could get those two memory size numbers to
match.)
The --merge option is experimental. Use it if the input fingerprints
are in popcount order, because sorted output is a simple merge sort of
the individual sorted inputs. However, this option opens all input
files at the same time, which may exceed your resource limit on file
descriptors. The current implementation also requires a lot of disk
seeks so is slow for many files.
The flush format is only available if the chemfp_converter package was
installed.