chemfp
latest
  • What’s new in chemfp?
  • Installing chemfp
  • Working with the command-line tools
  • Help for the command-line tools
  • Getting started with the API
  • Fingerprint family and type examples
  • Toolkit API examples
  • Text toolkit examples
  • Examples
  • chemfp API
    • chemfp top-level
    • chemfp.arena
    • chemfp.base_toolkit
    • chemfp.bitops
    • chemfp.cdk_toolkit
    • chemfp.cdk_types
    • chemfp.diversity
    • chemfp.encodings
    • chemfp.fpb_io
    • chemfp.fps_io
    • chemfp.fps_search
    • chemfp.highlevel.conversion
    • chemfp.highlevel.diversity
    • chemfp.highlevel.similarity
    • chemfp.io
    • chemfp.openbabel_toolkit
    • chemfp.openbabel_types
    • chemfp.openeye_toolkit
    • chemfp.openeye_types
    • chemfp.rdkit_toolkit
    • chemfp.rdkit_types
    • chemfp.search
    • chemfp.text_records
    • chemfp.text_toolkit
    • chemfp.types
    • Overview
  • Licenses
chemfp
  • Docs »
  • chemfp API »
  • chemfp.fps_search module
  • View page source

chemfp.fps_search module¶

FPS file similarity search and search result implementations.

Chemfp implements similarity search methods which work directly on FPS files. This might be useful in a streaming environment (where the FPS data is generated on-the-fly and not saved), and where you have at most a handful of queries. In that case, an FPS search is faster than an arena-based search because the FPS parsing overhead is about the same, but the FPS search have the arena creation or memory overhead an in-memory search would have.

class chemfp.fps_search.FPSSearchResult(ids, scores, query_id=None)¶

Bases: object

Search results for a query fingerprint against a target FPS reader.

The results contains a list of hits. Hits contain a target id and score. The hits can be reordered based on id or score.

__getitem__(i)¶

Return the (id, score) pair for the given index.

__iter__()¶

Iterate through the pairs of (target id, score) using the current ordering

__len__()¶

Return the number of hits.

clear()¶

Remove all ids and scores from this result

Deprecated since version 4.0: This function will likely be removed in a future version of chemfp as it doesn’t seem useful.

get_ids()¶

The list of target identifiers in the current ordering.

This returns the same list each time.

get_ids_and_scores()¶

The list of (target identifier, target score) pairs, in the current ordering

get_scores()¶

The list of target scores, in the current ordering.

This returns the same list each time.

query_id = None¶

The id of the query fingerprint, if available, otherwise None.

reorder(order='decreasing-score')¶

Reorder the hits based on the requested ordering.

The available orderings are:
  • increasing-score - sort by increasing score
  • decreasing-score - sort by decreasing score
  • increasing-score-plus - sort by increasing score, break ties by increasing index
  • decreasing-score-plus - sort by decreasing score, break ties by increasing index
  • increasing-id - sort by increasing target id
  • decreasing-id - sort by decreasing target id
  • move-closest-first - move the hit with the highest score to the first position
  • reverse - reverse the current ordering
scores = None¶

The similarity scores for the hits.

to_pandas(*, columns=['target_id', 'score'])¶

Return a pandas DataFrame with the target ids and scores

The first column contains the ids, the second column contains the ids. The default columns headers are “target_id” and “score”. Use columns to specify different headers.

Parameters:columns (a list of two strings) – column names for the returned DataFrame
Returns:a pandas DataFrame
class chemfp.fps_search.FPSSearchResults(query_ids, results)¶

Bases: object

Search results for a query arena against a target FPS reader.

__getitem__(i)¶

Return a SearchResult by index

__iter__()¶

Iterate through the search results

__len__()¶

The number of search results in this collection

clear_all()¶

Remove all hits from all of the search results

Deprecated since version 4.0: This function will likely be removed in a future version of chemfp as it doesn’t seem useful.

iter_ids()¶

For each search result, yield the list of target identifiers

iter_ids_and_scores()¶

For each search result, yield the list of target (id, score) tuples

iter_scores()¶

For each search result, yield the list of target scores

query_ids = None¶

A list of query ids, one for each result. This comes from the query arena’s ids.

reorder_all(order='decreasing-score')¶

Reorder the hits for all of the rows based on the requested order.

The available orderings are:

  • increasing-score - sort by increasing score
  • decreasing-score - sort by decreasing score
  • increasing-id - sort by increasing target id
  • decreasing-id - sort by decreasing target id
  • move-closest-first - move the hit with the highest score to the first position
  • reverse - reverse the current ordering
to_pandas(*, columns=['query_id', 'target_id', 'score'], empty=('*', None))¶

Return a pandas DataFrame with query_id, target_id and score columns.

Each query has zero or more hits. Each hit becomes a row in the output table, with the query id in the first column, the hit target id in the second, and the hit score in the third.

If a query has no hits then by default a row is added with the query id, ‘*’ as the target id, and None as the score (which pandas will treat as a NA value).

Use empty to specify different behavior for queries with no hits. If empty is None then no row is added to the table. If empty is a 2-element tuple the first element is used as the target id and the second is used as the score.

Parameters:
  • columns (a list of three strings) – column names for the returned DataFrame
  • empty (a list of two strings, or None) – the target id and score used for queries with no hits, or None to not include a row for that case
Returns:

a pandas DataFrame

chemfp.fps_search.count_tanimoto_hits_fp(query_fp, target_reader, threshold=0.7)¶

Count the number of hits in target_reader at least threshold similar to the query_fp

This uses Tanimoto similarity.

chemfp.fps_search.count_tanimoto_hits_arena(query_arena, target_reader, threshold=0.7)¶

For each fingerprint in query_arena, count the number of hits in target_reader at least threshold similar to it

This uses Tanimoto similarity.

chemfp.fps_search.threshold_tanimoto_search_fp(query_fp, target_reader, threshold=0.7)¶

Find matches in the target reader which are at least threshold similar to the query fingerprint

Returns:an FPSSearchResult instance contain the result.
chemfp.fps_search.threshold_tanimoto_search_arena(query_arena, target_reader, threshold)¶

Find matches in the target reader which are at least threshold similar to the query arena fingerprints

Returns:an FPSSearchResults instance containing a list of query results.
chemfp.fps_search.knearest_tanimoto_search_fp(query_fp, target_reader, k=3, threshold=0.0)¶

Find the nearest k matches in the target reader which are at least threshold similar to the query fingerprint

This uses Tanimoto similarity.

Returns:an FPSSearchResult instance contain the result.
chemfp.fps_search.knearest_tanimoto_search_arena(query_arena, target_reader, k=3, threshold=0.0)¶

Find the nearest k matches in the target reader which are at least threshold similar to the query arena fingerprints

This uses Tanimoto similarity.

Returns:an FPSSearchResults instance containing a list of query results.
chemfp.fps_search.count_tversky_hits_fp(query_fp, target_reader, threshold, alpha=1.0, beta=1.0)¶

Count the number of hits in target_reader at least threshold similar to the query_fp

This uses Tversky similarity with the specified values of alpha and beta.

chemfp.fps_search.count_tversky_hits_arena(query_arena, target_reader, threshold, alpha=1.0, beta=1.0)¶

Count the number of hits in target_reader at least threshold similar to the query_fp

This uses Tversky similarity with the specified values of alpha and beta.

chemfp.fps_search.threshold_tversky_search_fp(query_fp, target_reader, threshold, alpha=1.0, beta=1.0)¶

Find matches in the target reader which are at least threshold similar to the query fingerprint

This uses Tversky similarity with the specified values of alpha and beta.

Returns:an FPSSearchResult instance contain the result.
chemfp.fps_search.threshold_tversky_search_arena(query_arena, target_reader, threshold, alpha=1.0, beta=1.0)¶

Find matches in the target reader which are at least threshold similar to the query arena fingerprints

This uses Tversky similarity with the specified values of alpha and beta.

Returns:an FPSSearchResults instance containing a list of query results.
chemfp.fps_search.knearest_tversky_search_fp(query_fp, target_reader, k=3, threshold=0.0, alpha=1.0, beta=1.0)¶

Find the nearest k matches in the target reader which are at least threshold similar to the query fingerprint

This uses Tversky similarity with the specified values of alpha and beta.

Returns:an FPSSearchResult instance contain the result.
chemfp.fps_search.knearest_tversky_search_arena(query_arena, target_reader, k=3, threshold=0.0, alpha=1.0, beta=1.0)¶

Find the nearest k matches in the target reader which are at least threshold similar to the query arena fingerprints

This uses Tversky similarity with the specified values of alpha and beta.

Returns:an FPSSearchResults instance containing a list of query results.
Next Previous

© Copyright 2010-2022, Andrew Dalke Revision d7fe9e71f535.

Built with Sphinx using a theme provided by Read the Docs.
Read the Docs v: latest
Versions
latest
chemfp-4x
chemfp-4.0
chemfp-3.5
chemfp-3.4.1
chemfp-3.4
chemfp-3.3
chemfp-3.2.1
chemfp-3.2
chemfp-3.1
chemfp-1.x
chemfp-1.6.1
chemfp-1.6
chemfp-1.5
chemfp-1.4
chemfp-1.3
Downloads
On Read the Docs
Project Home
Builds

Free document hosting provided by Read the Docs.