chemfp API

This chapter contains the docstrings for the public portion of the chemfp API.

chemfp module

The following functions and classes are in the chemfp module.

open

chemfp.open(source, format=None)

Read fingerprints from a fingerprint file

Read fingerprints from ‘source’, using the given format. If ‘source’ is a string then it is treated as a filename. If ‘source’ is None then fingerprints are read from stdin. Otherwise, ‘source’ must be a Python file object supporting ‘read’ and ‘readline’.

If ‘format’ is None then the fingerprint file format and compression type are derived from the source filename, or from the name attribute of the source file object. If the source is None then the stdin is assumed to be uncompressed data in “fps” format.

The supported format strings are:

fps, fps.gz - fingerprints are in FPS format

The result is an FPSReader. Here’s an example of printing the contents of the file:

reader = open("example.fps.gz")
for id, fp in reader:
    print id, fp.encode("hex")
Parameters:
  • source (A filename string, a file object, or None) – The fingerprint source.
  • format (string, or None) – The file format and optional compression.
Returns:

an FPSReader

load_fingerprints

chemfp.load_fingerprints(reader, metadata=None, reorder=True)

Load all of the fingerprints into an in-memory FingerprintArena data structure

The FingerprintArena data structure reads all of the fingerprints and identifers from ‘reader’ and stores them into an in-memory data structure which supports fast similarity searches.

If ‘reader’ is a string or implements “read” then the contents will be parsed with the ‘chemfp.open’ function. Otherwise it must support iteration returning (id, fingerprint) pairs. ‘metadata’ contains the metadata the arena. If not specified then ‘reader.metadata’ is used.

The loader may reorder the fingerprints for better search performance. To prevent ordering, use reorder=False.

The ‘alignment’ option specifies the alignment data alignment and padding size for each fingerprint. A value of 8 means that each fingerprint will start on a 8 byte alignment, and use storage space which a multiple of 8 bytes long. The default value of None determines the best alignment based on the fingerprint size and available popcount methods.

Parameters:
  • reader (a string, file object, or (id, fingerprint) iterator) – An iterator over (id, fingerprint) pairs
  • metadata (Metadata) – The metadata for the arena, if other than reader.metadata
  • reorder (True or False) – Specify if fingerprints should be reordered for better performance
  • alignment – Alignment size (both data alignment and padding)
Returns:

FingerprintArena

read_structure_fingerprints

read_structure_fingerprints(type, source=None, format=None, id_tag=None, errors="strict"):

Read structures from ‘source’ and return the corresponding ids and fingerprints

This returns a FingerprintReader which can be iterated over to get the id and fingerprint for each read structure record. The fingerprint generated depends on the value of ‘type’. Structures are read from ‘source’, which can either be the structure filename, or None to read from stdin.

‘type’ contains the information about how to turn a structure into a fingerprint. It can be a string or a metadata instance. String values look like “OpenBabel-FP2/1”, “OpenEye-Path”, and “OpenEye-Path/1 min_bonds=0 max_bonds=5 atype=DefaultAtom btype=DefaultBond”. Default values are used for unspecified parameters. Use a Metadata instance with ‘type’ and ‘aromaticity’ values set in order to pass aromaticity information to OpenEye.

If ‘format’ is None then the structure file format and compression are determined by the filename’s extension(s), defaulting to uncompressed SMILES if that is not possible. Otherwise ‘format’ may be “smi” or “sdf” optionally followed by ”.gz” or “bz2” to indicate compression. The OpenBabel and OpenEye toolkits also support additional formats.

If ‘id_tag’ is None, then the record id is based on the title field for the given format. If the input format is “sdf” then ‘id_tag’ specifies the tag field containing the identifier. (Only the first line is used for multi-line values.) For example, ChEBI omits the title from the SD files and stores the id after the “> <ChEBI ID>” line. In that case, use id_tag = “ChEBI ID”.

‘aromaticity’ specifies the aromaticity model, and is only appropriate for OEChem. It must be a string like “openeye” or “daylight”.

Here is an example of using fingerprints generated from structure file:

fp_reader = read_structure_fingerprints("OpenBabel-FP4/1", "example.sdf.gz")
print "Each fingerprint has", fps.metadata.num_bits, "bits"
for (id, fp) in fp_reader:
   print id, fp.encode("hex")
Parameters:
  • type (string or Metadata) – information about how to convert the input structure into a fingerprint
  • source (A filename (as a string), a file object, or None to read from stdin) – The structure data source.
  • format (string, or None to autodetect based on the source) – The file format and optional compression. Examples: ‘smi’ and ‘sdf.gz’
  • id_tag (string, or None to use the default title for the given format) – The tag containing the record id. Example: ‘ChEBI ID’. Only valid for SD files.
Returns:

a FingerprintReader

count_tanimoto_hits

chemfp.count_tanimoto_hits(queries, targets, threshold=0.7, arena_size=100)

Count the number of targets within ‘threshold’ of each query term

For each query in ‘queries’, count the number of targets in ‘targets’ which are at least ‘threshold’ similar to the query. This function returns an iterator containing the (query_id, count) pairs.

Example:

queries = chemfp.open("queries.fps")
targets = chemfp.load_fingerprints("targets.fps.gz")
for (query_id, count) in chemfp.count_tanimoto_hits(queries, targets, threshold=0.9):
    print query_id, "has", count, "neighbors with at least 0.9 similarity"

Internally, queries are processed in batches of size ‘arena_size’. A small batch size uses less overall memory and has lower processing latency, while a large batch size has better overall performance. Use arena_size=None to process the input as a single batch.

Note: the FPSReader may be used as a target but it can only process one batch, and searching a FingerprintArena is faster if you have more than a few queries.

Parameters:
  • queries (any fingerprint container) – The query fingerprints.
  • targets (FingerprintArena or the slower FPSReader) – The target fingerprints.
  • threshold (float between 0.0 and 1.0, inclusive) – The minimum score threshold.
  • arena_size (a positive integer, or None) – The number of queries to process in a batch
Returns:

An iterator containing (query_id, score) pairs, one for each query

Metadata

class chemfp.Metadata(num_bits=None, num_bytes=None, type=None, aromaticity=None, software=None, sources=None, date=None)

Store information about a set of fingerprints

The metadata attributes are:
num_bits:
number of bits in the fingerprint
num_bytes:
number of bytes in the fingerprint
type:
fingerprint type
aromaticity:
aromaticity model (only used with OEChem)
software:
software used to make the fingerprints
sources:
list of sources used to make the fingerprint
date:
timestamp of when the fingerprints were made

FingerprintReader (base class)

class chemfp.FingerprintReader(metadata)

Initialize with a Metadata instance

Base class for all chemfp objects holding fingerprint records

All FingerprintReader instances have a ‘metadata’ attribute containing a Metadata and can be iteratated over to get the (id, fingerprint) for each record.

iter(arena)

iterate over the (id, fingerprint) pairs

iter_arenas

chemfp.iter_arenas(arena_size=1000)

iterate through ‘arena_size’ fingerprints at a time

This iterates through the fingerprints ‘arena_size’ at a time, yielding a FingerprintArena for each group. Working with arenas is often faster than processing one fingerprint at a time, and more memory efficient than processing all fingerprints at once.

If arena_size=None then this makes an iterator containing a single arena containing all of the input.

Parameters:arena_size (positive integer, or None) – The number of fingerprints to put into an arena.

chemfp.arena module

FingerprintArena instances are returned as part of the public API but should not be constructed directly.

FingerprintArena

Implements the FingerprintReader interface.

class chemfp.arena.FingerprintArena(... do not call directly ...)

Stores fingerprints in a contiguous block of memory

The public attributes are:
metadata
Metadata about the fingerprints
ids
list of identifiers, ordered by position

arena.ids

A list of the fingerprint identifiers, in the same order as the fingerprints.

len(arena)

Number of fingerprint records in the FingerprintArena

arena[i]

Return the (id, fingerprint) at position i

copy

FingerprintArena.copy(indices=None, reorder=None)

Create a new arena using either all or some of the fingerprints in this arena

By default this create a new arena. The fingerprint data block and ids may be shared with the original arena, which makes this a shallow copy. If the original arena is a slice, or “sub-arena” of an arena, then the copy will allocate new space to store just the fingerprints in the slice and use its own list for the ids.

The indices parameter, if not None, is an iterable which contains the indicies of the fingerprint records to copy. Duplicates are allowed, though discouraged.

If indices are specified then the default reorder=None or a reorder=True will reorder the fingerprints for the new arena by popcount. This improves overall search performance. With reorder=False, the fingerprints will be in order given by the indices.

If indices are not given, then the default is to preserve the order type of the original arena. Otherwise reorder=True will always reorder and reorder=False will leave them in the current order.

Parameters:
  • indices (iterable containing integers, or None) – indicies of the records to copy into the new arena
  • reorder (True to reorder, False to leave in input order, None for default action) – describes how to order the fingerprints

get_by_id

FingerprintArena.get_by_id(id)

Given the record identifier, return the (id, fingerprint) tuple or None if not present

get_fingerprint_by_id

FingerprintArena.get_fingerprint_by_id(id)

Given the record identifier, return its fingerprint or None if not present

get_index_by_id

FingerprintArena.get_index_by_id(id)

Given the record identifier, return the record index or None if not present

iter(arena)

Iterate over the (id, fingerprint) contents of the arena

iter_arenas

FingerprintArena.iter_arenas(arena_size=1000)

iterate through arena_size fingerprints at a time

This iterates through the fingerprints arena_size at a time, yielding a FingerprintArena for each group. Working with arenas is often faster than processing one fingerprint at a time, and more memory efficient than processing all fingerprints at once.

If arena_size=None then this makes an iterator containing a single arena containing all of the input.

Parameters:arena_size (positive integer, or None) – The number of fingerprints to put into an arena.

save

FingerprintArena.save(destination)

Save the arena contents to the given filename or file object

count_tanimoto_hits_fp

FingerprintArena.count_tanimoto_hits_fp(query_fp, threshold=0.7)

Count the fingerprints which are similar enough to the query fingerprint

DEPRECATED: Use chemfp.search.count_tanimoto_hits_fp instead.

Return the number of fingerprints in this arena which are at least threshold similar to the query fingerprint query_fp.

Parameters:
  • query_fp (byte string) – query fingerprint
  • threshold (float between 0.0 and 1.0, inclusive) – minimum similarity threshold (default: 0.7)
Returns:

integer count

count_tanimoto_hits_arena

FingerprintArena.count_tanimoto_hits_arena(query_arena, threshold=0.7)

Count the fingerprints which are similar enough to each query fingerprint

DEPRECATED: Use chemfp.search.count_tanimoto_hits_arena or chemfp.search.count_tanimoto_hits_symmetric instead.

Returns an iterator containing the (query_id, count) for each fingerprint in queries, where query_id is the query fingerprint id and count is the number of fingerprints found which are at least threshold similar to the query.

The order of results is the same as the order of the queries. For efficiency reasons, arena_size queries are processed at a time.

Parameters:
  • queries – query fingerprints
  • threshold (float between 0.0 and 1.0, inclusive) – minimum similarity threshold (default: 0.7)
  • arena_size (positive integer) – number of queries to process at a time (default: 100)
Returns:

list of (query_id, integer count) pairs, one for each query

threshold_tanimoto_search_fp

FingerprintArena.threshold_tanimoto_search_fp(query_fp, threshold=0.7)

Find the fingerprints which are similar enough to the query fingerprint

DEPRECATED: Use chemfp.search.threshold_tanimoto_search_fp instead.

Find all of the fingerprints in this arena which are at least threshold similar to the query fingerprint query_fp. The hits are returned as a list containing (id, score) tuples in arbitrary order.

Parameters:
  • query_fp (byte string) – query fingerprint
  • threshold (float between 0.0 and 1.0, inclusive) – minimum similarity threshold (default: 0.7)
Returns:

list of (int, score) tuples

threshold_tanimoto_search_arena

FingerprintArena.threshold_tanimoto_search_arena(query_arena, threshold=0.7)

Find the fingerprints which are similar to each of the query fingerprints

DEPRECATED: Use chemfp.search.threshold_tanimoto_search_arena or chemfp.search.threshold_tanimoto_search_symmetric instead.

For each fingerprint in the query_arena, find all of the fingerprints in this arena which are at least threshold similar. The hits are returned as a SearchResults instance.

Parameters:
  • query_arena (FingerprintArena) – query arena
  • threshold (float between 0.0 and 1.0, inclusive) – minimum similarity threshold (default: 0.7)
Returns:

SearchResults

knearest_tanimoto_search_fp

FingerprintArena.knearest_tanimoto_search_fp(query_fp, k=3, threshold=0.7)

Find the k-nearest fingerprints which are similar to the query fingerprint

DEPRECATED: Use chemfp.search.knearest_tanimoto_search_fp instead.

Find the k fingerprints in this arena which are most similar to the query fingerprint query_fp and which are at least threshold similar to the query. The hits are returned as a list of (id, score) tuples sorted with the highest similarity first. Ties are broken arbitrarily.

Parameters:
  • query_fp (byte string) – query fingerpring
  • k (positive integer) – number of nearest neighbors to find (default: 3)
  • threshold (float between 0.0 and 1.0, inclusive) – minimum similarity threshold (default: 0.7)
Returns:

SearchResults

knearest_tanimoto_search_arena

FingerprintArena.knearest_tanimoto_search_arena(query_arena, k=3, threshold=0.7)

Find the k-nearest fingerprint which are similar to each of the query fingerprints

DEPRECATED: Use chemfp.search.knearest_tanimoto_search_arena or chemfp.search.knearest_tanimoto_search_symmetric instead.

For each fingerprint in the query_arena, find the k fingerprints in this arena which are most similar and which are at least threshold similar to the query fingerprint. The hits are returned as a SearchResult where the hits are sorted with the highest similarity first. Ties are broken arbitrarily.

Parameters:
  • query_arena (FingerprintArena) – query arena
  • k (positive integer) – number of nearest neighbors to find (default: 3)
  • threshold (float between 0.0 and 1.0, inclusive) – minimum similarity threshold (default: 0.7)
Returns:

SearchResult

chemfp.search module

The following functions and classes are in the chemfp.search module.

Module functions

The *_fp functions search a query fingerprint against a target arena. The *_arena functions search a query arena against a target arena. The *_symmetric functions use the same arena as query and target, and exclude matching a fingerprint against itself.

count_tanimoto_hits_fp

chemfp.search.count_tanimoto_hits_fp(query_fp, target_arena, threshold=0.7)

Count the number of hits in target_arena at least threshold similar to the query_fp

Example:

query_id, query_fp = chemfp.load_fingerprints("queries.fps")[0]
targets = chemfp.load_fingerprints("targets.fps")
print chemfp.search.count_tanimoto_hits_fp(query_fp, targets, threshold=0.1)
Parameters:
  • query_fp (a byte string) – the query fingerprint
  • target_arena – the target arena
  • threshold (float between 0.0 and 1.0, inclusive) – The minimum score threshold.
Returns:

an integer count

count_tanimoto_hits_arena

chemfp.search.count_tanimoto_hits_arena(query_arena, target_arena, threshold=0.7)

For each fingerprint in query_arena, count the number of hits in target_arena at least threshold similar to it

Example:

queries = chemfp.load_fingerprints("queries.fps")
targets = chemfp.load_fingerprints("targets.fps")
counts = chemfp.search.count_tanimoto_hits_arena(queries, targets, threshold=0.1)
print counts[:10]

The result is implementation specific. You’ll always be able to get its length and do an index lookup to get an integer count. Currently it’s a ctype array of longs, but it could be an array.array or Python list in the future.

Parameters:
  • query_arena (a FingerprintArena) – The query fingerprints.
  • target_arena – The target fingerprints.
  • threshold (float between 0.0 and 1.0, inclusive) – The minimum score threshold.
Returns:

an array of counts

count_tanimoto_hits_symmetric

chemfp.search.count_tanimoto_hits_symmetric(arena, threshold=0.7, batch_size=100)

For each fingerprint in the arena, count the number of other fingerprints at least threshold similar to it

A fingerprint never matches itself.

The computation can take a long time. Python won’t check check for a ^C until the function finishes. This can be irritating. Instead, process only batch_size rows at a time before checking for a ^C.

Example:

arena = chemfp.load_fingerprints("targets.fps")
counts = chemfp.search.count_tanimoto_hits_symmetric(arena, threshold=0.2)
print counts[:10]

The result object is implementation specific. You’ll always be able to get its length and do an index lookup to get an integer count. Currently it’s a ctype array of longs, but it could be an array.array or Python list in the future.

Parameters:
  • arena (a FingerprintArena) – the set of fingerprints
  • threshold (float between 0.0 and 1.0, inclusive) – The minimum score threshold.
  • batch_size (integer) – the number of rows to process before checking for a ^C
Returns:

an array of counts

threshold_tanimoto_search_fp

chemfp.search.threshold_tanimoto_search_fp(query_fp, target_arena, threshold=0.7)

Search for fingerprint hits in target_arena which are at least threshold similar to query_fp

The hits in the returned SearchResult are in arbitrary order.

Example:

query_id, query_fp = chemfp.load_fingerprints("queries.fps")[0]
targets = chemfp.load_fingerprints("targets.fps")
print list(chemfp.search.threshold_tanimoto_search_fp(query_fp, targets, threshold=0.15))
Parameters:
  • query_fp (a byte string) – the query fingerprint
  • target_arena – the target arena
  • threshold (float between 0.0 and 1.0, inclusive) – The minimum score threshold.
Returns:

a SearchResult

threshold_tanimoto_search_arena

chemfp.search.threshold_tanimoto_search_arena(query_arena, target_arena, threshold=0.7)

Search for the hits in the target_arena at least threshold similar to the fingerprints in query_arena

The hits in the returned SearchResults are in arbitrary order.

Example:

queries = chemfp.load_fingerprints("queries.fps")
targets = chemfp.load_fingerprints("targets.fps")
results = chemfp.search.threshold_tanimoto_search_arena(queries, targets, threshold=0.5)
for query_id, query_hits in zip(queries.ids, results):
    if len(query_hits) > 0:
        print query_id, "->", ", ".join(query_hits.get_ids())
Parameters:
  • query_arena (a FingerprintArena) – The query fingerprints.
  • target_arena (a FingerprintArena) – The target fingerprints.
  • threshold (float between 0.0 and 1.0, inclusive) – The minimum score threshold.
Returns:

a SearchResults instance

threshold_tanimoto_search_symmetric

chemfp.search.threshold_tanimoto_search_symmetric(arena, threshold=0.7, include_lower_triangle=True, batch_size=100)

Search for the hits in the arena at least threshold similar to the fingerprints in the arena

When include_lower_triangle is True, compute the upper-triangle similarities, then copy the results to get the full set of results. When include_lower_triangle is False, only compute the upper triangle.

The computation can take a long time. Python won’t check check for a ^C until the function finishes. This can be irritating. Instead, process only batch_size rows at a time before checking for a ^C.

The hits in the returned SearchResults are in arbitrary order.

Example:

arena = chemfp.load_fingerprints("queries.fps")
full_result = chemfp.search.threshold_tanimoto_search_symmetric(arena, threshold=0.2)
upper_triangle = chemfp.search.threshold_tanimoto_search_symmetric(
          arena, threshold=0.2, include_lower_triangle=False)
assert sum(map(len, full_result)) == sum(map(len, upper_triangle))*2
Parameters:
  • arena (a FingerprintArena) – the set of fingerprints
  • threshold (float between 0.0 and 1.0, inclusive) – The minimum score threshold.
  • include_lower_triangle (boolean) – if False, compute only the upper triangle, otherwise use symmetry to compute the full matrix
  • batch_size (integer) – the number of rows to process before checking for a ^C
Returns:

a SearchResults instance

knearest_tanimoto_search_fp

chemfp.search.knearest_tanimoto_search_fp(query_fp, target_arena, k=3, threshold=0.7)

Search for k-nearest hits in target_arena which are at least threshold similar to query_fp

The hits in the SearchResults are ordered by decreasing similarity score.

Example:

query_id, query_fp = chemfp.load_fingerprints("queries.fps")[0]
targets = chemfp.load_fingerprints("targets.fps")
print list(chemfp.search.knearest_tanimoto_search_fp(query_fp, targets, k=3, threshold=0.0))
Parameters:
  • query_fp (a byte string) – the query fingerprint
  • target_arena – the target arena
  • k (positive integer) – the number of nearest neighbors to find.
  • threshold (float between 0.0 and 1.0, inclusive) – The minimum score threshold.
Returns:

a SearchResult

knearest_tanimoto_search_arena

chemfp.search.knearest_tanimoto_search_arena(query_arena, target_arena, k=3, threshold=0.7)

Search for the k nearest hits in the target_arena at least threshold similar to the fingerprints in query_arena

The hits in the SearchResults are ordered by decreasing similarity score.

Example:

queries = chemfp.load_fingerprints("queries.fps")
targets = chemfp.load_fingerprints("targets.fps")
results = chemfp.search.knearest_tanimoto_search_arena(queries, targets, k=3, threshold=0.5)
for query_id, query_hits in zip(queries.ids, results):
    if len(query_hits) >= 2:
        print query_id, "->", ", ".join(query_hits.get_ids())
Parameters:
  • query_arena (a FingerprintArena) – The query fingerprints.
  • target_arena (a FingerprintArena) – The target fingerprints.
  • k (positive integer) – the number of nearest neighbors to find.
  • threshold (float between 0.0 and 1.0, inclusive) – The minimum score threshold.
Returns:

a SearchResults instance

knearest_tanimoto_search_symmetric

chemfp.search.knearest_tanimoto_search_symmetric(arena, k=3, threshold=0.7, batch_size=100)

Search for the k-nearest hits in the arena at least threshold similar to the fingerprints in the arena

The computation can take a long time. Python won’t check check for a ^C until the function finishes. This can be irritating. Instead, process only batch_size rows at a time before checking for a ^C.

The hits in the SearchResults are ordered by decreasing similarity score.

Example:

arena = chemfp.load_fingerprints("queries.fps")
results = chemfp.search.knearest_tanimoto_search_symmetric(arena, k=3, threshold=0.8)
for (query_id, hits) in zip(arena.ids, results):
    print query_id, "->", ", ".join(("%s %.2f" % hit) for hit in  hits.get_ids_and_scores())
Parameters:
  • arena (a FingerprintArena) – the set of fingerprints
  • k (positive integer) – the number of nearest neighbors to find.
  • threshold (float between 0.0 and 1.0, inclusive) – The minimum score threshold.
  • include_lower_triangle (boolean) – if False, compute only the upper triangle, otherwise use symmetry to compute the full matrix
  • batch_size (integer) – the number of rows to process before checking for a ^C
Returns:

a SearchResults instance

SearchResults

class chemfp.search.SearchResults(... do not call directly ...)

Search results for a list of query fingerprints against a target arena

This acts like a list of SearchResult elements, with the ability to iterate over each search results, look them up by index, and get the number of scores.

In addition, there are helper methods to iterate over each hit and to get the hit indicies, scores, and identifiers directly as Python lists, sort the list contents, and more.

len(results)

The number of rows in the SearchResults

results[i]

Get the ‘i’th SearchResult

clear_all

SearchResults.clear_all()

Remove all hits from all of the search results

count_all

SearchResults.count_all(min_score=None, max_score=None, interval="[]")

Remove all hits from all of the search results

cumulative_score_all

SearchResults.cumulative_score_all(min_score=None, max_score=None, interval="[]")

The sum of all scores in all rows which are between min_score and max_score

Using the default parameters this returns the sum of all of the scores in all of the results. With a specified range this returns the sum of all of the scores in that range. The cumulative score is also known as the raw score.

The default min_score of None is equivalent to -infinity. The default max_score of None is equivalent to +infinity.

The interval parameter describes the interval end conditions. The default of “[]” uses a closed interval, where min_score <= score <= max_score. The interval “()” uses the open interval where min_score < score < max_score. The half-open/half-closed intervals “(]” and “[)” are also supported.

Parameters:
  • min_score (a float, or None for -infinity) – the minimum score in the range.
  • max_score (a float, or None for +infinity) – the maximum score in the range.
  • interval (one of “[]”, “()”, “(]”, “[)”) – specify if the end points are open or closed.
Returns:

an floating point count

iter(results)

Iterate over each SearchResult hit

iter_ids

SearchResults.iter_ids()

For each hit, yield the list of target identifiers

iter_ids_and_scores

SearchResults.iter_ids_and_scores()

For each hit, yield the list of (target id, score) tuples

iter_indices

SearchResults.iter_indices()

For each hit, yield the list of target indices

iter_indices_and_scores

SearchResults.iter_indices_and_scores()

For each hit, yield the list of (target index, score) tuples

iter_scores

SearchResults.iter_scores()

For each hit, yield the list of target scores

iter_hits

REMOVED: Renamed to iter_ids_and_scores for 1.1.

reorder_all

SearchResults.reorder_all()

Reorder the hits for all of the rows based on the requested order.

The available orderings are:
increasing-score: sort by increasing score decreasing-score: sort by decreasing score increasing-index: sort by increasing target index decreasing-index: sort by decreasing target index move-closest-first: move the hit with the highest score to the first position reverse: reverse the current ordering
Parameters:ordering – the name of the ordering to use

SearchResult

class chemfp.search.SearchResult(... do not call directly ...)

Search results for a query fingerprint against a target arena.

The results contains a list of hits. Hits contain a target index, score, and optional target ids. The hits can be reordered based on score or index.

len(result)

The number of hits

iter(result)

Iterate through the pairs of (target index, score) using the current ordering

clear

SearchResult.clear()

Remove all hits from this result

count

SearchResult.count(min_score=None, max_score=None, interval="[]")

Count the number of hits with a score between min_score and max_score

Using the default parameters this returns the number of hits in the result.

The default min_score of None is equivalent to -infinity. The default max_score of None is equivalent to +infinity.

The interval parameter describes the interval end conditions. The default of “[]” uses a closed interval, where min_score <= score <= max_score. The interval “()” uses the open interval where min_score < score < max_score. The half-open/half-closed intervals “(]” and “[)” are also supported.

Parameters:
  • min_score (a float, or None for -infinity) – the minimum score in the range.
  • max_score (a float, or None for +infinity) – the maximum score in the range.
  • interval (one of “[]”, “()”, “(]”, “[)”) – specify if the end points are open or closed.
Returns:

an integer count

cumulative_score

SearchResult.cumulative_score(min_score=None, max_score=None, interval="[]")

The sum of the scores which are between min_score and max_score

Using the default parameters this returns the sum of all of the scores in the result. With a specified range this returns the sum of all of the scores in that range. The cumulative score is also known as the raw score.

The default min_score of None is equivalent to -infinity. The default max_score of None is equivalent to +infinity.

The interval parameter describes the interval end conditions. The default of “[]” uses a closed interval, where min_score <= score <= max_score. The interval “()” uses the open interval where min_score < score < max_score. The half-open/half-closed intervals “(]” and “[)” are also supported.

Parameters:
  • min_score (a float, or None for -infinity) – the minimum score in the range.
  • max_score (a float, or None for +infinity) – the maximum score in the range.
  • interval (one of “[]”, “()”, “(]”, “[)”) – specify if the end points are open or closed.
Returns:

a floating point value

get_ids

SearchResult.get_ids()

The list of target identifiers (if available), in the current ordering

get_ids_and_scores

SearchResult.get_ids_and_scores()

The list of (target identifier, target score) pairs, in the current ordering

Raises a TypeError if the target IDs are not available.

get_indices

SearchResult.get_indices()

The list of target indices, in the current ordering.

get_indices_and_scores

SearchResult.get_indices_and_scores()

The list of (target index, score) pairs, in the current ordering

get_scores

SearchResult.get_scores()

The list of target scores, in the current ordering

reorder

SearchResult.reorder(ordering="decreasing-score")

Reorder the hits based on the requested ordering.

The available orderings are:
increasing-score: sort by increasing score decreasing-score: sort by decreasing score increasing-index: sort by increasing target index decreasing-index: sort by decreasing target index move-closest-first: move the hit with the highest score to the first position reverse: reverse the current ordering
Parameters:ordering – the name of the ordering to use

chemfp.bitopts module

The following functions are in the chemfp.bitops module. They provide low-level bit operations on byte and hex fingerprints.

byte_popcount

chemfp.bitops.byte_popcount()

byte_popcount(fp)

Return the number of bits set in a byte fingerprint

byte_intersect_popcount

chemfp.bitops.byte_intersect_popcount()

byte_intersect_popcount(fp1, fp2)

Return the number of bits set in the instersection of the two byte fingerprints

byte_tanimoto

chemfp.bitops.byte_tanimoto()

byte_tanimoto(fp1, fp2)

Compute the Tanimoto similarity between two byte fingerprints

byte_contains

chemfp.bitops.byte_contains()

byte_contains(super_fp, sub_fp)

Return 1 if the on bits of sub_fp are also 1 bits in super_fp

hex_isvalid

chemfp.bitops.hex_isvalid()

hex_isvalid(s)

Return 1 if the string is a valid hex fingerprint, otherwise 0

hex_popcount

chemfp.bitops.hex_popcount()

hex_popcount(fp)

Return the number of bits set in a hex fingerprint, or -1 for non-hex strings

hex_intersect_popcount

chemfp.bitops.hex_intersect_popcount()

hex_intersect_popcount(fp1, fp2)

Return the number of bits set in the intersection of the two hex fingerprint, or -1 if either string is a non-hex string

hex_tanimoto

chemfp.bitops.hex_tanimoto()

hex_tanimoto(fp1, fp2)

Compute the Tanimoto similarity between two hex fingerprints. Return a float between 0.0 and 1.0, or -1.0 if either string is not a hex fingerprint

hex_contains