chemfp.bitops module

Low-level fingerprint functions and global configuration.

The bitops module contains functions that work on byte-encoded and/or hex-encoded fingerprints, such as to compute the Tanimoto between two byte fingerprints, or to count the number of bits in a fingerprint.

It also contains functions to change the internal configuration of how chemfp does it’s bit-wise operations, and to report that configuration. Currently only two of those are part of the public API.

chemfp.bitops.byte_popcount(fp)

Return the number of bits set in the byte fingerprint fp

chemfp.bitops.byte_intersect_popcount(fp1, fp2)

Return the number of bits set in the intersection of the two byte fingerprints fp1 and fp2

chemfp.bitops.byte_union_popcount(fp1, fp2)

Return the number of bits set in the union of the two byte fingerprints fp1 and fp2

chemfp.bitops.byte_xor_popcount(fp1, fp2)

Return the number of bits set in the xor of the two byte fingerprints fp1 and fp2 (also called the Hamming, Manhattan, or taxicab distance)

chemfp.bitops.byte_tanimoto(fp1, fp2)

Compute the Tanimoto similarity between the two byte fingerprints fp1 and fp2

chemfp.bitops.byte_tversky(fp1, fp2, alpha=1.0, beta=1.0)

Compute the Tversky index between the two byte fingerprints fp1 and fp2

chemfp.bitops.byte_contains(sub_fp, super_fp)

Return 1 if the on bits of sub_fp are also 1 bits in super_fp, that is, if super_fp contains sub_fp.

chemfp.bitops.byte_contains_bit(fp, bit_index)

Return True if the the given bit position is on, otherwise False

chemfp.bitops.byte_to_bitlist(bitlist)

Return a sorted list of the on-bit positions in the byte fingerprint

chemfp.bitops.byte_from_bitlist(fp[, num_bits=1024])

Convert a list of bit positions into a byte fingerprint, including modulo folding

chemfp.bitops.byte_intersect(fp1, fp2)

Return the intersection of the two byte strings, fp1 & fp2

chemfp.bitops.byte_union(fp1, fp2)

Return the union of the two byte strings, fp1 | fp2

chemfp.bitops.byte_xor(fp1, fp2)

Return the xor (absolute difference) between the two byte strings, fp1 ^ fp2

chemfp.bitops.hex_isvalid(s)

Return 1 if the string s is a valid hex fingerprint, otherwise 0

chemfp.bitops.hex_popcount(fp)

Return the number of bits set in a hex fingerprint fp, or -1 for non-hex strings

chemfp.bitops.hex_intersect_popcount(fp1, fp2)

Return the number of bits set in the intersection of the two hex fingerprints fp1 and fp2, or raise a ValueError if either string is a non-hex string

chemfp.bitops.hex_tanimoto(fp1, fp2)

Compute the Tanimoto similarity between two hex fingerprints. Return a float between 0.0 and 1.0, or raise a ValueError if either string is not a hex fingerprint

chemfp.bitops.hex_tversky(fp1, fp2, alpha=1.0, beta=1.0)

Compute the Tversky index between two hex fingerprints. Return a float between 0.0 and 1.0, or raise a ValueError if either string is not a hex fingerprint

chemfp.bitops.hex_contains(sub_fp, super_fp)

Return 1 if the on bits of sub_fp are also on bits in super_fp, otherwise 0. Return -1 if either string is not a hex fingerprint

chemfp.bitops.hex_contains_bit(fp, bit_index)

Return True if the the given bit position is on, otherwise False.

This function does not validate that the hex fingerprint is actually in hex.

chemfp.bitops.byte_hex_tanimoto(fp1, fp2)

Compute the Tanimoto similarity between the byte fingerprint fp1 and the hex fingerprint fp2. Return a float between 0.0 and 1.0, or raise a ValueError if fp2 is not a hex fingerprint

chemfp.bitops.byte_hex_tversky(fp1, fp2, alpha=1.0, beta=1.0)

Compute the Tversky index between the byte fingerprint fp1 and the hex fingerprint fp2. Return a float between 0.0 and 1.0, or raise a ValueError if fp2 is not a hex fingerprint

chemfp.bitops.hex_to_bitlist(bitlist)

Return a sorted list of the on-bit positions in the hex fingerprint

chemfp.bitops.hex_from_bitlist(fp[, num_bits=1024])

Convert a list of bit positions into a hex fingerprint, including modulo folding

chemfp.bitops.hex_intersect(fp1, fp2)

Return the intersection of the two hex strings, fp1 & fp2. Raises a ValueError for non-hex fingerprints.

chemfp.bitops.hex_union(fp1, fp2)

Return the union of the two hex strings, fp1 | fp2. Raises a ValueError for non-hex fingerprints.

chemfp.bitops.hex_xor(fp1, fp2)

Return the xor (absolute difference) between the two hex strings, fp1 ^ fp2. Raises a ValueError for non-hex fingerprints.

chemfp.bitops.hex_encode(s)

Encode the byte string or ASCII string to hex. Returns a text string.

chemfp.bitops.hex_encode_as_bytes(s)

Encode the byte string or ASCII string to hex. Returns a byte string.

chemfp.bitops.hex_decode(s)

Decode the hex-encoded value to a byte string

chemfp.bitops.get_tanimoto_precision(num_bits: float) → int

Return the minimum precision needed to distinguish Tanimoto all values with num_bit bits

Given two Tanimoto values from fingerprints of length num_bits, stored as a Python 64-bit float, how many decimal digits are needed to ensure they are distinct?

For example, for 2048-bit fingerprints you need at least 7 digits:

>>> "'%.6f' and '%.6f'" % (1/1023, 1/1022)
"'0.000978' and '0.000978'"
>>> "'%.7f' and '%.7f'" % (1/1023, 1/1022)
"'0.0009775' and '0.0009785'"

This function returns the minumum number of required decimial digits, given 1 <= num_bits <= 2**18. For example:

>>> bitops.get_tanimoto_precision(2048)
7

This might be used as (‘%.7f’ % score) or f’{score:.7f}’.

Parameters:num_bits (integer between 1 and 2**18) – The number of bits in the fingerprint
Returns:the precision, as an integer
chemfp.bitops.print_report(out=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)

Print the configuration report to the given file (default: stout)

chemfp.bitops.use_environment_variables(environ=None, outfile=<_io.TextIOWrapper name='<stderr>' mode='w' encoding='utf-8'>)

Set the chemfp configuration using environment variables or a dictionary

By default, process os.environ to find chemfp environment variables (which all start with “CHEMFP_”) and use them to configure chemfp internals.

This is meant to be used by any program which wants the same configuration system as the core chemfp components.