chemfp spherex command-line options¶
The following comes from
chemfp spherex --help:
Usage: chemfp spherex [OPTIONS] CANDIDATES Diversity selection using the sphere exclusion algorithm. Options: -t, --threshold FLOAT Maximum similarity (default: 1.0) -n, --num-picks N Number of picks (default: 'all') --dise Use directed sphere exclusion --dise-references FILENAME DISE reference structures or fingerprints (default uses the Gobbi & Lee structures) --dise-format FORMAT Format of the DISE reference file (default uses the file extension, else 'fps') --ranks PATH File containing fingerprint rank values --ranks-default N Default rank value if candidate id not found in the ranks file (default: 2**32-1) [0<=x<=4294967296] --ranks-format FORMAT Format for the ranks file (can be 'tsv' or a fingerprint format) --ranks-has-header / --ranks-no-header Skip the first line of the ranks file --in, --candidates-format TEXT Format of the candidates file (default uses filename extension, or 'fps') --references PATH Fingerprint file containing reference fingerprints to avoid (the fingerprints you have) --references-format FORMAT Format of the references file (default uses filename extension, or 'fps') --pick-id STR Initial candidate id (if no reference file). Can be used more than once. --pick-id-file PATH File containing initial candidate ids, one per line --randomize / --no-randomize Use --randomize (default for undirected picking) to randomly pick from the available candidates, or --no-randomize (default for directed picking) to pick the candidate with the smallest arena index. --seed N Specify the random number generator seed between 0 and 2**64-1, inclusive, or use -1 to have one picked at random (default: -1) --mmap / --no-mmap Don't use mmap to read uncompressed FPB files. May give better performance on networked file systems, at the expense of higher memory use. -j, --num-threads N The number of threads to use. The default (-1) uses the default value for this computer (8). 0 and 1 both mean single- threaded. --include-members / --no-members Include ids and scores for fingerprint members in each sphere --save-picks-format PATH Specify the format for the picked fingerprints. --save-candidates PATH Write remaining candidate fingerprints to the named file. --save-candidates-format FORMAT Specify the format for the remaining candidate fingerprints. --save-picks PATH Write picked fingerprints to the named file. --precision [1|2|3|4|5|6|7|8|9|10] Number of digits in Tanimoto score (default: based on the fingerprint size) -o, --output PATH Write output to the named file instead of stdout. --out TEXT Output format. Must be one of 'chemfp' (the default), 'csv', or 'tsv' with optional compression --include-metadata / --no-metadata With --no-metadata, do not include header metadata in 'spherex' or 'centroid' output formats. --include-empty / --no-include-empty In csv and tsv format with --include-hits, include picks with no hits (the default) --empty-hit-id TEXT The hit id if --include-empty outputs a pick with no hits (default: '*') --empty-score TEXT The score if --include-empty outputs a pick with no hits (default: 'NaN') --pick-time / --no-pick-time Include the elapsed time for each pick --no-date Do not include the 'date' metadata in the output header --date STR An ISO 8601 date (like '2022-02-07T11:10:15') to use for the 'date' metadata in the output header --times / --no-times Write timing information to stderr --progress / --no-progress Show a progress bar (default: show unless the output is a terminal) --help Show this message and exit. Select diverse fingerprints using the sphere exclusion algorithm (Hudson et al. (1996) QSAR, https://doi.org/10.1002/qsar.19960150402) with optional ranking for directed sphere exclusion (Gobbi and Lee (2003) JCICS, https://doi.org/10.1021/ci025554v). This method iteratively picks `--num-pick` / `-n` fingerprints from a set of candidates such that the fingerprint is not within a given threshold of similarity to any previously selected fingerprint. The default `--threshold` of 1.0 means only identical fingerprints will be selected. = Undirected picking = When no ranks are specified, the fingerprints are picked at random from the remaining candidate fingerprints. Use `--no-randomize` to select the fingerprints in fingerprint index order, which is based on the number of bits set in the fingerprint. = Directed picking = In directed picking, the fingerprints are picked in rank order, from smallest rank to largest. If multiple fingerprints have the same rank then by default the first is used. Use `--randomize` to randomize the order in a rank. There are several ways to specify the ranks. The `--dise` option uses the three SMILES from the DISE paper by Gobi and Lee to generate reference fingerprints and rank the candidate fingerprints by successive similarity to the references. Use a `--dise-references` file to specify different reference structures or fingerprints. The ranks can be specified in a `--ranks` file, in one of several formats. The 'tsv' format contains two tab-separated columns and an optional header. The first column is the candidate fingerprint id, the second column is its associated rank, which must be an integer or float. The 'txt' format contains one id per line and an optional header. The rank is 1 for the first id, 2 for the second, and so on. If the ranks file is a fingerprint file then the rank is 1 for the id of the first fingerprint, 2 for the second, and so on. = References = Use a `--references` fingerprint file to remove all candiate fingerprints which are within `--threshold` similarity of any of the reference fingerprints. = Initial picks = The initial picks can be specified by id (this cannot be combined with `--references`) either by using one `--pick-id` option per id, or using `--pick-id-file`, with one id per line. NOTE: `--pick-id-file` and a "txt"-formatted `--ranks` file are similar but not identical. When a pick id is specified, it is always included in the output, even that fingerprint was included in an earlier picked sphere. (In that case its count is 0, because its sphere doesn't even include itself.) In addition, when only some pick ids are specified then the remaining ids by default are picked at random, while unspecified rank by default picked are picked in index order. (These can be changed with `--randomize` and `--no- randomize.) = Output options = The picks can be saved in one of several `--out` output formats. The default "spherex" format writes the information about each sphere on a single line. By default this includes the center id and number of members in the sphere. Use `--include-members` to include the member ids and scores. The "centroid" format is similar to the "spherex" format, but with different column headers. This format matches the default "centroid" output format for the "chemfp butina" command, which should make it easy to swap one option in for the other. NOTE: a future version of the spherex will likely default to "centroid" output. The "csv" and "tsv" formats print one sphere hit on each line, in comma- or tab-delimited columns. By default this is only the sphere center id and its counts. With `--include-members` each line contain the sphere center id, the hit id, and its score. If a sphere contains no members (which may occur if a pick id is specified but the fingerprint is in another sphere) then a synthetic record is generated with an id of `--empty-hit-id` and score of `--empty-score`. Use `--no-include-empty` to skip this record. After sphere picking finishes, the remaining candidate fingerprints can be saved to the `--save-candidates` file.