fpcat command-line options¶
The following comes from
Usage: fpcat [OPTIONS] FILENAME Combine multiple fingerprint files into a single file. Options: --in FORMAT Input fingerprint format. One of fps or fpb (with optional gz or zst compression), or flush. (default guesses from filename or is fps) --merge Assume the input fingerprint files are in popcount order and do a merge sort. -o, --output FILENAME Save the fingerprints to FILENAME (default=stdout) --out FORMAT Output format, one of 'fps', 'fps.gz', 'fps.zst', 'fpb', or 'flush' (default guesses from output filename, or is 'fps') --include-metadata / --no-metadata With --no-metadata, do not include the header metadata for FPS output. --no-date Do not include the 'date' metadata in the output header --date STR An ISO 8601 date (like '2022-02-07T11:10:15') to use for the 'date' metadata in the output header --level LEVEL Compression level. Must be a positive integer or one of 'min', 'default', or 'max'. --reorder Reorder the output fingerprints by popcount. (default for FPB output) --preserve-order Save the output fingerprints in the same order as the input. (default for FPS output) --alignment [1|2|4|8|16|32|64|128|256] Alignment size when saving a FPB file. (default=8) --show-progress Show progress. --max-spool-size SIZE Use temporary files for extra storage space for huge FPB files (default uses RAM). --tmpdir DIRNAME Directory for the temporary files (default uses the system temp directory). --version Show the version and exit. --license-check Check the license and report results to stdout. --license-file FILENAME Specify a chemfp license file --traceback Print the traceback on KeyboardInterrupt --version Show the version and exit. --help Show this message and exit. Examples: fpcat can be used to convert between FPS and FPB formats. This is handy if you want to see what's inside of an FPB file: fpcat fingerprints.fpb You can use also use fpcat to make an FPB file from an FPS file: fpcat fingerprints.fps -o fingerprints.fpb You might have generated a set of FPS file which you want to merge into a single FPB. (For example, you might have used GNU parallel to generate FPS files for each of the PubChem files, which you want to merge into a single file.): fpcat Compound_*.fps -o pubchem.fpb By default the FPB format sorts the fingerprints by popcount. (Use --preserve-order if you really want to preserve the input order.) The sort overhead for PubChem uses about 10 GB of RAM. If you don't have that much memory then ask fpcat to use less memory: fpcat --max-spool-size 1GB Compound_*.fps -o pubchem.fpb This will use about 2 GB of RAM and the --tmpdir for the rest. (Yes, it would be nice if I could get those two memory size numbers to match.) The --merge option is experimental. Use it if the input fingerprints are in popcount order, because sorted output is a simple merge sort of the individual sorted inputs. However, this option opens all input files at the same time, which may exceed your resource limit on file descriptors. The current implementation also requires a lot of disk seeks so is slow for many files. The flush format is only available if the chemfp_converter package was installed.