chemfp.fps_search module¶
FPS file similarity search and search result implementations.
Chemfp implements similarity search methods which work directly on FPS files. This might be useful in a streaming environment (where the FPS data is generated on-the-fly and not saved), and where you have at most a handful of queries. In that case, an FPS search is faster than an arena-based search because the FPS parsing overhead is about the same, but the FPS search have the arena creation or memory overhead an in-memory search would have.
-
class
chemfp.fps_search.
FPSSearchResult
(ids, scores, query_id=None)¶ Bases:
object
Search results for a query fingerprint against a target FPS reader.
The results contains a list of hits. Hits contain a target id and score. The hits can be reordered based on id or score.
-
__getitem__
(item)¶ Return the (id, score) pair for the given index, or pairs if item is a slice
-
__iter__
()¶ Iterate through the pairs of (target id, score) using the current ordering
-
__len__
()¶ Return the number of hits.
-
get_ids
()¶ The list of target identifiers in the current ordering.
This returns the same list each time.
-
get_ids_and_scores
()¶ The list of (target identifier, target score) pairs, in the current ordering
-
get_scores
()¶ The list of target scores, in the current ordering.
This returns the same list each time.
-
query_id
= None¶ The id of the query fingerprint, if available, otherwise None.
-
reorder
(order='decreasing-score')¶ Reorder the hits based on the requested ordering.
- The available orderings are:
- increasing-score - sort by increasing score
- decreasing-score - sort by decreasing score
- increasing-score-plus - sort by increasing score, break ties by increasing index
- decreasing-score-plus - sort by decreasing score, break ties by increasing index
- increasing-id - sort by increasing target id
- decreasing-id - sort by decreasing target id
- move-closest-first - move the hit with the highest score to the first position
- reverse - reverse the current ordering
-
scores
= None¶ The similarity scores for the hits.
-
to_pandas
(*, columns=['target_id', 'score'])¶ Return a pandas DataFrame with the target ids and scores
The first column contains the ids, the second column contains the ids. The default columns headers are “target_id” and “score”. Use columns to specify different headers.
Parameters: columns (a list of two strings) – column names for the returned DataFrame Returns: a pandas DataFrame
-
-
class
chemfp.fps_search.
FPSSearchResults
(query_ids, results)¶ Bases:
object
Search results for a query arena against a target FPS reader.
-
__getitem__
(i)¶ Return a
SearchResult
by index
-
__iter__
()¶ Iterate through the search results
-
__len__
()¶ The number of search results in this collection
-
iter_ids
()¶ For each search result, yield the list of target identifiers
-
iter_ids_and_scores
()¶ For each search result, yield the list of target (id, score) tuples
-
iter_scores
()¶ For each search result, yield the list of target scores
-
query_ids
= None¶ A list of query ids, one for each result. This comes from the query arena’s ids.
-
reorder_all
(order='decreasing-score')¶ Reorder the hits for all of the rows based on the requested order.
The available orderings are:
- increasing-score - sort by increasing score
- decreasing-score - sort by decreasing score
- increasing-id - sort by increasing target id
- decreasing-id - sort by decreasing target id
- move-closest-first - move the hit with the highest score to the first position
- reverse - reverse the current ordering
-
to_pandas
(*, columns=['query_id', 'target_id', 'score'], empty=('*', None))¶ Return a pandas DataFrame with query_id, target_id and score columns.
Each query has zero or more hits. Each hit becomes a row in the output table, with the query id in the first column, the hit target id in the second, and the hit score in the third.
If a query has no hits then by default a row is added with the query id, ‘*’ as the target id, and None as the score (which pandas will treat as a NA value).
Use empty to specify different behavior for queries with no hits. If empty is None then no row is added to the table. If empty is a 2-element tuple the first element is used as the target id and the second is used as the score.
Parameters: - columns (a list of three strings) – column names for the returned DataFrame
- empty (a list of two strings, or None) – the target id and score used for queries with no hits, or None to not include a row for that case
Returns: a pandas DataFrame
-
-
chemfp.fps_search.
count_tanimoto_hits_fp
(query_fp, target_reader, threshold=0.7)¶ Count the number of hits in target_reader at least threshold similar to the query_fp
This uses Tanimoto similarity.
-
chemfp.fps_search.
count_tanimoto_hits_arena
(query_arena, target_reader, threshold=0.7)¶ For each fingerprint in query_arena, count the number of hits in target_reader at least threshold similar to it
This uses Tanimoto similarity.
-
chemfp.fps_search.
threshold_tanimoto_search_fp
(query_fp, target_reader, threshold=0.7)¶ Find matches in the target reader which are at least threshold similar to the query fingerprint
Returns: an FPSSearchResult
instance contain the result.
-
chemfp.fps_search.
threshold_tanimoto_search_arena
(query_arena, target_reader, threshold)¶ Find matches in the target reader which are at least threshold similar to the query arena fingerprints
Returns: an FPSSearchResults
instance containing a list of query results.
-
chemfp.fps_search.
knearest_tanimoto_search_fp
(query_fp, target_reader, k=3, threshold=0.0)¶ Find the nearest k matches in the target reader which are at least threshold similar to the query fingerprint
This uses Tanimoto similarity.
Returns: an FPSSearchResult
instance contain the result.
-
chemfp.fps_search.
knearest_tanimoto_search_arena
(query_arena, target_reader, k=3, threshold=0.0)¶ Find the nearest k matches in the target reader which are at least threshold similar to the query arena fingerprints
This uses Tanimoto similarity.
Returns: an FPSSearchResults
instance containing a list of query results.
-
chemfp.fps_search.
count_tversky_hits_fp
(query_fp, target_reader, threshold, alpha=1.0, beta=1.0)¶ Count the number of hits in target_reader at least threshold similar to the query_fp
This uses Tversky similarity with the specified values of alpha and beta.
-
chemfp.fps_search.
count_tversky_hits_arena
(query_arena, target_reader, threshold, alpha=1.0, beta=1.0)¶ Count the number of hits in target_reader at least threshold similar to the query_fp
This uses Tversky similarity with the specified values of alpha and beta.
-
chemfp.fps_search.
threshold_tversky_search_fp
(query_fp, target_reader, threshold, alpha=1.0, beta=1.0)¶ Find matches in the target reader which are at least threshold similar to the query fingerprint
This uses Tversky similarity with the specified values of alpha and beta.
Returns: an FPSSearchResult
instance contain the result.
-
chemfp.fps_search.
threshold_tversky_search_arena
(query_arena, target_reader, threshold, alpha=1.0, beta=1.0)¶ Find matches in the target reader which are at least threshold similar to the query arena fingerprints
This uses Tversky similarity with the specified values of alpha and beta.
Returns: an FPSSearchResults
instance containing a list of query results.
-
chemfp.fps_search.
knearest_tversky_search_fp
(query_fp, target_reader, k=3, threshold=0.0, alpha=1.0, beta=1.0)¶ Find the nearest k matches in the target reader which are at least threshold similar to the query fingerprint
This uses Tversky similarity with the specified values of alpha and beta.
Returns: an FPSSearchResult
instance contain the result.
-
chemfp.fps_search.
knearest_tversky_search_arena
(query_arena, target_reader, k=3, threshold=0.0, alpha=1.0, beta=1.0)¶ Find the nearest k matches in the target reader which are at least threshold similar to the query arena fingerprints
This uses Tversky similarity with the specified values of alpha and beta.
Returns: an FPSSearchResults
instance containing a list of query results.