simsearch command-line options¶
The following comes from simsearch --help
:
usage: simsearch [-h] [-k K_NEAREST] [-t THRESHOLD] [--alpha ALPHA]
[--beta BETA] [--queries QUERIES] [--NxN] [--query QUERY]
[--hex-query HEX_QUERY] [--query-id QUERY_ID]
[--query-format FORMAT] [--target-format FORMAT]
[--query-type STRING] [--id-tag NAME]
[--errors {strict,report,ignore}] [-R NAME=VALUE]
[--delimiter {tab,whitespace,to-eol,space}] [--has-header]
[-o FILENAME] [--out FORMAT]
[--include-empty | --no-include-empty]
[--empty-target-id EMPTY_TARGET_ID]
[--empty-score EMPTY_SCORE] [--precision N] [-c]
[-b BATCH_SIZE] [--scan] [--memory] [--no-mmap] [--times]
[--version] [--license-check]
target_filename
Search an FPS or FPB file for similar fingerprints
positional arguments:
target_filename target filename
options:
-h, --help show this help message and exit
-k K_NEAREST, --k-nearest K_NEAREST
select the k nearest neighbors (use 'all' for all
neighbors)
-t THRESHOLD, --threshold THRESHOLD
minimum similarity score threshold
--alpha ALPHA Tversky alpha parameter (default: 1.0)
--beta BETA Tversky beta parameter (default: the value of --alpha)
--queries QUERIES, -q QUERIES
filename containing the query fingerprints
--NxN use the targets as the queries, and exclude the self-
similarity term
--query QUERY query as a structure record (default format: 'smi')
--hex-query HEX_QUERY
query in hex
--query-id QUERY_ID id for the query or hex-query (default: 'Query1'
--query-format FORMAT, --in FORMAT
input query format (default uses the file extension,
else 'fps')
--target-format FORMAT
input target format (default uses the file extension,
else 'fps')
--query-type STRING fingerprint type string if the queries are structures
(default: use the target fingerprint type)
--id-tag NAME tag containing the record id if --query-format is an
SD file)
--errors {strict,report,ignore}
how should structure parse errors be handled?
(default=ignore)
-R NAME=VALUE specify a reader argument
--delimiter {tab,whitespace,to-eol,space}
delimiter style for SMILES and InChI files. Alias for
'-R delimiter=VALUE'.
--has-header Skip the first line of a SMILES or InChI file Alias
for '-R has_header=1'
-o FILENAME, --output FILENAME
output filename (default is stdout)
--out FORMAT Output format. One of 'chemfp', 'csv', or 'tsv'
(default: based on filename, or 'chemfp')
--include-empty, --no-include-empty
In csv or tsv output, include a line for queries with
no hits (the default) (default: True)
--empty-target-id EMPTY_TARGET_ID
In csv or tsv output, the target id for a query with
no hits (default: '*')
--empty-score EMPTY_SCORE
In csv or tsv output, the score for a query with no
hits (default: 'NaN')
--precision N Number of digits in Tanimoto score (default: based on
the fingerprint size)
-c, --count report counts
-b BATCH_SIZE, --batch-size BATCH_SIZE
batch size
--scan scan the file to find matches (low memory overhead)
--memory build and search an in-memory data structure (faster
for multiple queries)
--no-mmap don't use mmap to read uncompressed FPB files. May
give better performance on networked file systems, at
the expense of higher memory use.
--times report load and execution times to stderr
--version show program's version number and exit
--license-check Check the license and report results to stdout.