chemfp.rdkit_types module

This module should not be imported directly.

It contains internal implementation details of RDKit fingerprint generation.

This module is included in the documentation because parts of this module are returned to the user, and are part of the public API.

class chemfp.rdkit_types.RDKitBaseFingerprintType(fingerprint_kwargs)

Bases: chemfp.types.ThreadsafeFingerprinterMixin, chemfp.types.FingerprintType

from_inchi(content: Union[str, bytes], *, sanitize: bool = True, removeHs: bool = True, logLevel: Optional[int, None] = None, treatWarningAsError: bool = False, delimiter: Optional[Literal[to_eol, space, tab, comma, whitespace, native, , ], None] = None, errors: str = 'strict')

Generate a fingerprint from an InChI string and its id

This is equivalent to calling:

mol = fptype.toolkit.from_inchi(content, ..., errors=errors)
fp = fptype.from_mol(mol) if (mol is not None) else None
Parameters:
  • sanitize (Boolean (default: True)) – If true, sanitize the molecule after parsing
  • removeHs (Boolean (default: True)) – If true, remove simple hydrogens from the molecular graph
  • logLevel (an integer, or None to disable logging completely (default: None)) – the log level for the InChI API
  • treatWarningAsError (Boolean (default: False)) – treat any InChI warnings as an error
  • delimiter (One of None, 'to_eol', 'space', 'tab', 'comma', 'whitespace', 'native', or the space or tab characters (default: None)) – The separator between the SMILES and the id
  • errors (one of "strict", "ignore", or "log") – specify how to handle errors
Returns:

a fingerprint byte string

from_inchistring(content: Union[str, bytes], *, sanitize: bool = True, removeHs: bool = True, logLevel: Optional[int, None] = None, treatWarningAsError: bool = False, errors: str = 'strict')

Generate a fingerprint from an InChI string

This is equivalent to calling:

mol = fptype.toolkit.from_inchistring(content, ..., errors=errors)
fp = fptype.from_mol(mol) if (mol is not None) else None
Parameters:
  • sanitize (Boolean (default: True)) – If true, sanitize the molecule after parsing
  • removeHs (Boolean (default: True)) – If true, remove simple hydrogens from the molecular graph
  • logLevel (an integer, or None to disable logging completely (default: None)) – the log level for the InChI API
  • treatWarningAsError (Boolean (default: False)) – treat any InChI warnings as an error
  • errors (one of "strict", "ignore", or "log") – specify how to handle errors
Returns:

a fingerprint byte string

from_molfile(content: Union[str, bytes], *, sanitize: bool = True, removeHs: bool = True, strictParsing: bool = True, errors: str = 'strict')

Generate a fingerprint from a molfile

This is equivalent to calling:

mol = fptype.toolkit.from_molfile(content, ..., errors=errors)
fp = fptype.from_mol(mol) if (mol is not None) else None
Parameters:
  • sanitize (Boolean (default: True)) – If true, sanitize the molecule after parsing
  • removeHs (Boolean (default: True)) – If true, remove simple hydrogens from the molecular graph
  • strictParsing (Boolean (default: True)) – If true, require stricter adherence to the SDF specification
  • errors (one of "strict", "ignore", or "log") – specify how to handle errors
Returns:

a fingerprint byte string

from_sdf(content: Union[str, bytes], *, sanitize: bool = True, removeHs: bool = True, strictParsing: bool = True, includeTags: bool = True, errors: str = 'strict')

Generate a fingerprint from an SDF record

This is equivalent to calling:

mol = fptype.toolkit.from_sdf(content, ..., errors=errors)
fp = fptype.from_mol(mol) if (mol is not None) else None
Parameters:
  • sanitize (Boolean (default: True)) – If true, sanitize the molecule after parsing
  • removeHs (Boolean (default: True)) – If true, remove simple hydrogens from the molecular graph
  • strictParsing (Boolean (default: True)) – If true, require stricter adherence to the SDF specification
  • includeTags (Boolean (default: True)) – if true, extract the struture data tag fields
  • errors (one of "strict", "ignore", or "log") – specify how to handle errors
Returns:

a fingerprint byte string

from_smi(content: Union[str, bytes], *, sanitize: bool = True, cxsmiles: bool = True, delimiter: Optional[Literal[to_eol, space, tab, comma, whitespace, native, , ], None] = None, errors: str = 'strict')

Generate a fingerprint from a SMILES string and its id

This is equivalent to calling:

mol = fptype.toolkit.from_smi(content, ..., errors=errors)
fp = fptype.from_mol(mol) if (mol is not None) else None
Parameters:
  • sanitize (Boolean (default: True)) – If true, sanitize the molecule after parsing
  • cxsmiles (Boolean (default: True)) – If true, look for ChemAxon CXSMILES extensions after the SMILES string
  • delimiter (One of None, 'to_eol', 'space', 'tab', 'comma', 'whitespace', 'native', or the space or tab characters (default: None)) – The separator between the SMILES and the id
  • errors (one of "strict", "ignore", or "log") – specify how to handle errors
Returns:

a fingerprint byte string

from_smiles(content: Union[str, bytes], *, sanitize: bool = True, cxsmiles: bool = True, errors: str = 'strict')

Generate a fingerprint from a SMILES string

This is equivalent to calling:

mol = fptype.toolkit.from_smistring(content, ..., errors=errors)
fp = fptype.from_mol(mol) if (mol is not None) else None
Parameters:
  • sanitize (Boolean (default: True)) – If true, sanitize the molecule after parsing
  • cxsmiles (Boolean (default: True)) – If true, look for ChemAxon CXSMILES extensions after the SMILES string
  • errors (one of "strict", "ignore", or "log") – specify how to handle errors
Returns:

a fingerprint byte string

from_smistring(content: Union[str, bytes], *, sanitize: bool = True, cxsmiles: bool = True, errors: str = 'strict')

Generate a fingerprint from a SMILES string

This is equivalent to calling:

mol = fptype.toolkit.from_smistring(content, ..., errors=errors)
fp = fptype.from_mol(mol) if (mol is not None) else None
Parameters:
  • sanitize (Boolean (default: True)) – If true, sanitize the molecule after parsing
  • cxsmiles (Boolean (default: True)) – If true, look for ChemAxon CXSMILES extensions after the SMILES string
  • errors (one of "strict", "ignore", or "log") – specify how to handle errors
Returns:

a fingerprint byte string

module = <module 'chemfp.rdkit_toolkit>'
software = ...

a description of the RDKit and chemfp software packages used

class chemfp.rdkit_types.VariableSizeFingerprint(fingerprint_kwargs)

Bases: chemfp.rdkit_types.RDKitBaseFingerprintType

This is a variable-size fingerprint type, specified by the user

class chemfp.rdkit_types.FixedSizeFingerprint(fingerprint_kwargs)

Bases: chemfp.rdkit_types.RDKitBaseFingerprintType

This is a fixed-size fingerprint type

class chemfp.rdkit_types.RDKitFingerprintType_v1(fingerprint_kwargs)

Bases: chemfp.rdkit_types.VariableSizeFingerprint

RDKit’s Daylight-like fingerprint based on linear path and branched tree enumeration, version 1

See https://rdkit.org/docs/source/rdkit.Chem.rdmolops.html#rdkit.Chem.rdmolops.RDKFingerprint

The RDKit-Fingerprint/1 FingerprintType parameters are:

  • fpSize - number of bits in the fingerprint (default: 2048)
  • minPath - minimum number of bonds (default: 1)
  • maxPath - maximum number of bonds (default: 7)
  • nBitsPerHash - number of bits to set for each path hash (default: 2)
  • useHs - include information about the number of hydrogens on each atom? (default: True)

Note: this version is only available in ancient (pre-2014) versions of RDKit

name = 'RDKit-Fingerprint/1'
class chemfp.rdkit_types.RDKitFingerprintType_v2(fingerprint_kwargs)

Bases: chemfp.rdkit_types.VariableSizeFingerprint

RDKit’s Daylight-like fingerprint based on linear path and branched tree enumeration, version 2

See https://rdkit.org/docs/source/rdkit.Chem.rdmolops.html#rdkit.Chem.rdmolops.RDKFingerprint

The RDKit-Fingerprint/2 FingerprintType parameters are:

  • fpSize - number of bits in the fingerprint (default: 2048)
  • minPath - minimum number of bonds (default: 1)
  • maxPath - maximum number of bonds (default: 7)
  • nBitsPerHash - number of bits to set for each path hash (default: 2)
  • useHs - include information about the number of hydrogens on each atom? (default: True)
  • branchedPaths - include both branched and unbranched paths (default: True)
  • useBondOrder - use both bond orders in the path hashes (default: True)
  • fromAtoms - a comma-separated list of atom indices which must be part of the path enumeration
name = 'RDKit-Fingerprint/2'
class chemfp.rdkit_types.RDKitMACCSFingerprintType_v1(fingerprint_kwargs)

Bases: chemfp.types.NoFingerprintParametersMixin, chemfp.rdkit_types.FixedSizeFingerprint

RDKit’s implementation of the 166 MACCS keys, version 1

See https://rdkit.org/docs/source/rdkit.Chem.rdMolDescriptors.html#rdkit.Chem.rdMolDescriptors.GetMACCSKeysFingerprint

The RDKit-MACCS166/1 fingerprints have no parameters.

This comes from an ancient version of RDKit which does not support MACCS key 44 (“OTHER”).

name = 'RDKit-MACCS166/1'
num_bits = 166
class chemfp.rdkit_types.RDKitMACCSFingerprintType_v2(fingerprint_kwargs)

Bases: chemfp.types.NoFingerprintParametersMixin, chemfp.rdkit_types.FixedSizeFingerprint

RDKit’s implementation of the 166 MACCS keys, version 2

See https://rdkit.org/docs/source/rdkit.Chem.rdMolDescriptors.html#rdkit.Chem.rdMolDescriptors.GetMACCSKeysFingerprint

The RDKit-MACCS166/2 fingerprints have no parameters. RDKit version added this version in late 2014 to support MACCS key 44 (“OTHER”).

name = 'RDKit-MACCS166/2'
num_bits = 166
class chemfp.rdkit_types.RDKitMorganFingerprintType_v1(fingerprint_kwargs)

Bases: chemfp.rdkit_types.VariableSizeFingerprint

RDKit Morgan (ECFP-like) fingerprints, version 1

See https://rdkit.org/docs/source/rdkit.Chem.rdMolDescriptors.html#rdkit.Chem.rdMolDescriptors.GetMorganFingerprintAsBitVect

The RDKit-Morgan/1 FingerprintType parameters are:

  • fpSize - number of bits in the fingerprint (default: 2048)
  • radius - radius for the Morgan algorithm (default: 2)
  • useFeatures - use chemical-feature invariants (default: 0)
  • useChirality - use chirality information (default: 0)
  • useBondTypes - include bond type information (default: 1)
  • includeRedundantEnvironments - if 1, do not check for redundant environments (added in RDKit 2020-3) (default: 0)
  • fromAtoms - a comma-separated list of atom indices to use as centers
name = 'RDKit-Morgan/1'
class chemfp.rdkit_types.RDKitBaseAtomPairFingerprintType(fingerprint_kwargs)

Bases: chemfp.rdkit_types.VariableSizeFingerprint

Base class for the RDKitAtomPair fingerprint types

class chemfp.rdkit_types.RDKitAtomPairFingerprint_v1(fingerprint_kwargs)

Bases: chemfp.rdkit_types.RDKitBaseAtomPairFingerprintType

RDKit atom pair fingerprints, version 1

See https://rdkit.org/docs/source/rdkit.Chem.rdMolDescriptors.html#rdkit.Chem.rdMolDescriptors.GetHashedAtomPairFingerprintAsBitVect

The RDKit-AtomPair/1 FingerprintType parameters are:

  • fpSize - number of bits in the fingerprint (default: 2048)
  • minLength - minimum bond count for a pair (default: 1)
  • maxLength - maximum bond count for a pair (default: 30)

Note: this version was only available in ancient (pre-2012) versions of RDKit. Chemfp no longer supports those versions of RDKit.

name = 'RDKit-AtomPair/1'
class chemfp.rdkit_types.RDKitAtomPairFingerprint_v2(fingerprint_kwargs)

Bases: chemfp.rdkit_types.RDKitBaseAtomPairFingerprintType

RDKit atom pair fingerprints, version 2

See https://rdkit.org/docs/source/rdkit.Chem.rdMolDescriptors.html#rdkit.Chem.rdMolDescriptors.GetHashedAtomPairFingerprintAsBitVect

The RDKit-AtomPair/2 FingerprintType parameters are:

  • fpSize - number of bits in the fingerprint (default: 2048)
  • minLength - minimum bond count for a pair (default: 1 bond)
  • maxLength - maximum bond count for a pair (default: 30, max: 63)
  • nBitsPerEntry - number of bits to use in simulating counts (default: 4)
  • includeChirality - if 1, chirality will be used in the atom invariants (default: 0)
  • use2D - if 1, use a 2D distance matrix, if 0 use the 3D matrix from the first
    set of conformers, or return an empty fingerprint if no conformers (default: 1)
  • fromAtoms - a comma-separated list of atom indices which must be in the pair
name = 'RDKit-AtomPair/2'
class chemfp.rdkit_types.RDKitBaseTorsionFingerprintType(fingerprint_kwargs)

Bases: chemfp.rdkit_types.VariableSizeFingerprint

class chemfp.rdkit_types.RDKitTorsionFingerprintType_v1(fingerprint_kwargs)

Bases: chemfp.rdkit_types.RDKitBaseTorsionFingerprintType

RDKit torsion fingerprints, version 1

See https://rdkit.org/docs/source/rdkit.Chem.AtomPairs.Torsions.html

An implementation of Topological-torsion fingerprints, as described in: R. Nilakantan, N. Bauman, J. S. Dixon, R. Venkataraghavan; “Topological Torsion: A New Molecular Descriptor for SAR Applications. Comparison with Other Descriptors” JCICS 27, 82-85 (1987).

The RDKit-Torsion/1 FingerprintType parameters are:

  • fpSize - number of bits in the fingerprint (default: 2048)
  • targetSize - number of bonds per torsion (default: 4)

Note: this version is only available in older (pre-2014) versions of RDKit Chemfp no longer supports those versions of RDKit.

name = 'RDKit-Torsion/1'
class chemfp.rdkit_types.RDKitTorsionFingerprintType_v2(fingerprint_kwargs)

Bases: chemfp.rdkit_types.RDKitBaseTorsionFingerprintType

RDKit torsion fingerprints, version 2

See https://rdkit.org/docs/source/rdkit.Chem.AtomPairs.Torsions.html

An implementation of Topological-torsion fingerprints, as described in: R. Nilakantan, N. Bauman, J. S. Dixon, R. Venkataraghavan; “Topological Torsion: A New Molecular Descriptor for SAR Applications. Comparison with Other Descriptors” JCICS 27, 82-85 (1987).

The RDKit-Torsion/2 FingerprintType parameters are:

  • fpSize - number of bits in the fingerprint (default: 2048)
  • targetSize - number of bonds per torsion (default: 4)
  • nBitsPerEntry - number of bits to set per entry (default: 4)
  • includeChirality - include chirality information (default: 0)
  • fromAtoms - a comma-separated list of atom indices which must be part of the torsion
name = 'RDKit-Torsion/2'
class chemfp.rdkit_types.RDKitTorsionFingerprintType_v3(fingerprint_kwargs)

Bases: chemfp.rdkit_types.RDKitBaseTorsionFingerprintType

RDKit torsion fingerprints, version 3

See https://rdkit.org/docs/source/rdkit.Chem.AtomPairs.Torsions.html

An implementation of Topological-torsion fingerprints, as described in: R. Nilakantan, N. Bauman, J. S. Dixon, R. Venkataraghavan; “Topological Torsion: A New Molecular Descriptor for SAR Applications. Comparison with Other Descriptors” JCICS 27, 82-85 (1987).

This version started with RDKit 2023.03.1, which changed how includeChirality=1 works.

The RDKit-Torsion/3 FingerprintType parameters are:

  • fpSize - number of bits in the fingerprint (default: 2048)
  • targetSize - number of bonds per torsion (default: 4)
  • nBitsPerEntry - number of bits to set per entry (default: 4)
  • includeChirality - include chirality information (default: 0)
  • fromAtoms - a comma-separated list of atom indices which must be part of the torsion
name = 'RDKit-Torsion/3'
class chemfp.rdkit_types.RDKitBasePatternFingerprint(fingerprint_kwargs)

Bases: chemfp.rdkit_types.VariableSizeFingerprint

class chemfp.rdkit_types.RDKitPatternFingerprint_v1(fingerprint_kwargs)

Bases: chemfp.rdkit_types.RDKitBasePatternFingerprint

RDKit’s experimental substructure screen fingerprint, version 1

See https://rdkit.org/docs/source/rdkit.Chem.rdmolops.html#rdkit.Chem.rdmolops.PatternFingerprint

The RDKit-Pattern/1 fingerprint has no parameters.

Note: this version is only available in ancient versions of RDKit. Chemfp no longer supports those versions of RDKit.

name = 'RDKit-Pattern/1'
class chemfp.rdkit_types.RDKitPatternFingerprint_v2(fingerprint_kwargs)

Bases: chemfp.rdkit_types.RDKitBasePatternFingerprint

RDKit’s experimental substructure screen fingerprint, version 2

See https://rdkit.org/docs/source/rdkit.Chem.rdmolops.html#rdkit.Chem.rdmolops.PatternFingerprint

The RDKit-Pattern/2 fingerprint has no parameters.

Note: this version is only available in ancient versions of RDKit. Chemfp no longer supports those versions of RDKit.

name = 'RDKit-Pattern/2'
class chemfp.rdkit_types.RDKitPatternFingerprint_v3(fingerprint_kwargs)

Bases: chemfp.rdkit_types.RDKitBasePatternFingerprint

RDKit’s experimental substructure screen fingerprint, version 3

See https://rdkit.org/docs/source/rdkit.Chem.rdmolops.html#rdkit.Chem.rdmolops.PatternFingerprint

The RDKit-Pattern/3 fingerprint has no parameters. This version was released 2017.03.1.

Note: Chemfp no longer supports those versions of RDKit.

name = 'RDKit-Pattern/3'
class chemfp.rdkit_types.RDKitPatternFingerprint_v4(fingerprint_kwargs)

Bases: chemfp.rdkit_types.RDKitBasePatternFingerprint

RDKit’s experimental substructure screen fingerprint, version 4

See https://rdkit.org/docs/source/rdkit.Chem.rdmolops.html#rdkit.Chem.rdmolops.PatternFingerprint

The RDKit-Pattern/4 fingerprint has no parameters. This version was introduced in August 2017 for the 2017.09.1 release.

name = 'RDKit-Pattern/4'
class chemfp.rdkit_types.RDKitAvalonFingerprintType_v1(fingerprint_kwargs)

Bases: chemfp.rdkit_types.VariableSizeFingerprint

Avalon fingerprints

The Avalon Cheminformatics toolkit is available from https://sourceforge.net/projects/avalontoolkit/ . It is not part of the core RDKit distribution. Instead, RDKit has a compile-time option to download and include it as part of the build process.

The Avalon fingerprint are described in the supplemental information for “QSAR - How Good Is It in Practice? Comparison of Descriptor Sets on an Unbiased Cross Section of Corporate Data Sets”, Peter Gedeck, Bernhard Rohde, and Christian Bartels, J. Chem. Inf. Model., 2006, 46 (5), pp 1924-1936, DOI: 10.1021/ci050413p. The supplemental information is available from https://pubs.acs.org/doi/suppl/10.1021/ci050413p

It uses a set of feature classes which “have been fine-tuned to provide good screen-out for the set of substructure queries encounted at Novartis while limiting redundancy.” The classes are ATOM_COUNT, ATOM_SYMBOL_PATH, AUGMENTED_ATOM, AUGMENTED_BOND, HCOUNT_PAIR, HCOUNT_PATH, RING_PATH, BOND_PATH, HCOUNT_CLASS_PATH, ATOM_CLASS_PATH, RING_PATTERN, RING_SIZE_COUNTS, DEGREE_PATHS, CLASS_SPIDERS, FEATURE_PAIRS and ALL_PATTERNS.

name = 'RDKit-Avalon/1'
class chemfp.rdkit_types.RDKitSECFPFingerprintType_v1(fingerprint_kwargs)

Bases: chemfp.rdkit_types.VariableSizeFingerprint

SECFP fingerprints

The SMILES Extended Connectivity Fingerprint, as described in:
Probst, D., Reymond, J. A probabilistic molecular fingerprint for big data settings. J Cheminform 10, 66 (2018). https://doi.org/10.1186/s13321-018-0321-8 https://jcheminf.biomedcentral.com/articles/10.1186/s13321-018-0321-8

These are circular fingerprints which encode the circular region as a fragment SMILES, which is then hashed to produce the fingerprint bits.

The RDKit-SECFP/1 FingerprintType parameters are:

  • fpSize - number of bits in the fingerprint (default: 2048)
  • radius - analogous to the radius for the Morgan algorithm (default: 3)
  • rings - include ring membership (default: 1)
  • isomeric - use isomeric SMILES (default: 0)
  • kekulize - Kekulize the molecule and use Kekule SMILES (default: 0)
  • min_radius - minimum radius for the Morgan algorithm (default: 1)
name = 'RDKit-SECFP/1'