chemfp.rdkit_types module¶
This module should not be imported directly.
It contains internal implementation details of RDKit fingerprint generation.
This module is included in the documentation because parts of this module are returned to the user, and are part of the public API.
-
class
chemfp.rdkit_types.
RDKitBaseFingerprintType
(fingerprint_kwargs)¶ Bases:
chemfp.types.ThreadsafeFingerprinterMixin
,chemfp.types.FingerprintType
-
from_inchi
(content: Union[str, bytes], *, sanitize: bool = True, removeHs: bool = True, logLevel: Optional[int, None] = None, treatWarningAsError: bool = False, delimiter: Optional[Literal[to_eol, space, tab, comma, whitespace, native, , ], None] = None, errors: str = 'strict')¶ Generate a fingerprint from an InChI string and its id
This is equivalent to calling:
mol = fptype.toolkit.from_inchi(content, ..., errors=errors) fp = fptype.from_mol(mol) if (mol is not None) else None
Parameters: - sanitize (Boolean (default: True)) – If true, sanitize the molecule after parsing
- removeHs (Boolean (default: True)) – If true, remove simple hydrogens from the molecular graph
- logLevel (an integer, or None to disable logging completely (default: None)) – the log level for the InChI API
- treatWarningAsError (Boolean (default: False)) – treat any InChI warnings as an error
- delimiter (One of None, 'to_eol', 'space', 'tab', 'comma', 'whitespace', 'native', or the space or tab characters (default: None)) – The separator between the SMILES and the id
- errors (one of "strict", "ignore", or "log") – specify how to handle errors
Returns: a fingerprint byte string
-
from_inchistring
(content: Union[str, bytes], *, sanitize: bool = True, removeHs: bool = True, logLevel: Optional[int, None] = None, treatWarningAsError: bool = False, errors: str = 'strict')¶ Generate a fingerprint from an InChI string
This is equivalent to calling:
mol = fptype.toolkit.from_inchistring(content, ..., errors=errors) fp = fptype.from_mol(mol) if (mol is not None) else None
Parameters: - sanitize (Boolean (default: True)) – If true, sanitize the molecule after parsing
- removeHs (Boolean (default: True)) – If true, remove simple hydrogens from the molecular graph
- logLevel (an integer, or None to disable logging completely (default: None)) – the log level for the InChI API
- treatWarningAsError (Boolean (default: False)) – treat any InChI warnings as an error
- errors (one of "strict", "ignore", or "log") – specify how to handle errors
Returns: a fingerprint byte string
-
from_molfile
(content: Union[str, bytes], *, sanitize: bool = True, removeHs: bool = True, strictParsing: bool = True, errors: str = 'strict')¶ Generate a fingerprint from a molfile
This is equivalent to calling:
mol = fptype.toolkit.from_molfile(content, ..., errors=errors) fp = fptype.from_mol(mol) if (mol is not None) else None
Parameters: - sanitize (Boolean (default: True)) – If true, sanitize the molecule after parsing
- removeHs (Boolean (default: True)) – If true, remove simple hydrogens from the molecular graph
- strictParsing (Boolean (default: True)) – If true, require stricter adherence to the SDF specification
- errors (one of "strict", "ignore", or "log") – specify how to handle errors
Returns: a fingerprint byte string
-
from_sdf
(content: Union[str, bytes], *, sanitize: bool = True, removeHs: bool = True, strictParsing: bool = True, includeTags: bool = True, errors: str = 'strict')¶ Generate a fingerprint from an SDF record
This is equivalent to calling:
mol = fptype.toolkit.from_sdf(content, ..., errors=errors) fp = fptype.from_mol(mol) if (mol is not None) else None
Parameters: - sanitize (Boolean (default: True)) – If true, sanitize the molecule after parsing
- removeHs (Boolean (default: True)) – If true, remove simple hydrogens from the molecular graph
- strictParsing (Boolean (default: True)) – If true, require stricter adherence to the SDF specification
- includeTags (Boolean (default: True)) – if true, extract the struture data tag fields
- errors (one of "strict", "ignore", or "log") – specify how to handle errors
Returns: a fingerprint byte string
-
from_smi
(content: Union[str, bytes], *, sanitize: bool = True, cxsmiles: bool = True, delimiter: Optional[Literal[to_eol, space, tab, comma, whitespace, native, , ], None] = None, errors: str = 'strict')¶ Generate a fingerprint from a SMILES string and its id
This is equivalent to calling:
mol = fptype.toolkit.from_smi(content, ..., errors=errors) fp = fptype.from_mol(mol) if (mol is not None) else None
Parameters: - sanitize (Boolean (default: True)) – If true, sanitize the molecule after parsing
- cxsmiles (Boolean (default: True)) – If true, look for ChemAxon CXSMILES extensions after the SMILES string
- delimiter (One of None, 'to_eol', 'space', 'tab', 'comma', 'whitespace', 'native', or the space or tab characters (default: None)) – The separator between the SMILES and the id
- errors (one of "strict", "ignore", or "log") – specify how to handle errors
Returns: a fingerprint byte string
-
from_smiles
(content: Union[str, bytes], *, sanitize: bool = True, cxsmiles: bool = True, errors: str = 'strict')¶ Generate a fingerprint from a SMILES string
This is equivalent to calling:
mol = fptype.toolkit.from_smistring(content, ..., errors=errors) fp = fptype.from_mol(mol) if (mol is not None) else None
Parameters: - sanitize (Boolean (default: True)) – If true, sanitize the molecule after parsing
- cxsmiles (Boolean (default: True)) – If true, look for ChemAxon CXSMILES extensions after the SMILES string
- errors (one of "strict", "ignore", or "log") – specify how to handle errors
Returns: a fingerprint byte string
-
from_smistring
(content: Union[str, bytes], *, sanitize: bool = True, cxsmiles: bool = True, errors: str = 'strict')¶ Generate a fingerprint from a SMILES string
This is equivalent to calling:
mol = fptype.toolkit.from_smistring(content, ..., errors=errors) fp = fptype.from_mol(mol) if (mol is not None) else None
Parameters: - sanitize (Boolean (default: True)) – If true, sanitize the molecule after parsing
- cxsmiles (Boolean (default: True)) – If true, look for ChemAxon CXSMILES extensions after the SMILES string
- errors (one of "strict", "ignore", or "log") – specify how to handle errors
Returns: a fingerprint byte string
-
module
= <module 'chemfp.rdkit_toolkit>'¶
-
software
= ...¶ a description of the RDKit and chemfp software packages used
-
-
class
chemfp.rdkit_types.
VariableSizeFingerprint
(fingerprint_kwargs)¶ Bases:
chemfp.rdkit_types.RDKitBaseFingerprintType
This is a variable-size fingerprint type, specified by the user
-
class
chemfp.rdkit_types.
FixedSizeFingerprint
(fingerprint_kwargs)¶ Bases:
chemfp.rdkit_types.RDKitBaseFingerprintType
This is a fixed-size fingerprint type
-
class
chemfp.rdkit_types.
RDKitFingerprintType_v1
(fingerprint_kwargs)¶ Bases:
chemfp.rdkit_types.VariableSizeFingerprint
RDKit’s Daylight-like fingerprint based on linear path and branched tree enumeration, version 1
See https://rdkit.org/docs/source/rdkit.Chem.rdmolops.html#rdkit.Chem.rdmolops.RDKFingerprint
The RDKit-Fingerprint/1
FingerprintType
parameters are:- fpSize - number of bits in the fingerprint (default: 2048)
- minPath - minimum number of bonds (default: 1)
- maxPath - maximum number of bonds (default: 7)
- nBitsPerHash - number of bits to set for each path hash (default: 2)
- useHs - include information about the number of hydrogens on each atom? (default: True)
Note: this version is only available in ancient (pre-2014) versions of RDKit
-
name
= 'RDKit-Fingerprint/1'¶
-
class
chemfp.rdkit_types.
RDKitFingerprintType_v2
(fingerprint_kwargs)¶ Bases:
chemfp.rdkit_types.VariableSizeFingerprint
RDKit’s Daylight-like fingerprint based on linear path and branched tree enumeration, version 2
See https://rdkit.org/docs/source/rdkit.Chem.rdmolops.html#rdkit.Chem.rdmolops.RDKFingerprint
The RDKit-Fingerprint/2
FingerprintType
parameters are:- fpSize - number of bits in the fingerprint (default: 2048)
- minPath - minimum number of bonds (default: 1)
- maxPath - maximum number of bonds (default: 7)
- nBitsPerHash - number of bits to set for each path hash (default: 2)
- useHs - include information about the number of hydrogens on each atom? (default: True)
- branchedPaths - include both branched and unbranched paths (default: True)
- useBondOrder - use both bond orders in the path hashes (default: True)
- fromAtoms - a comma-separated list of atom indices which must be part of the path enumeration
-
name
= 'RDKit-Fingerprint/2'¶
-
class
chemfp.rdkit_types.
RDKitMACCSFingerprintType_v1
(fingerprint_kwargs)¶ Bases:
chemfp.types.NoFingerprintParametersMixin
,chemfp.rdkit_types.FixedSizeFingerprint
RDKit’s implementation of the 166 MACCS keys, version 1
The RDKit-MACCS166/1 fingerprints have no parameters.
This comes from an ancient version of RDKit which does not support MACCS key 44 (“OTHER”).
-
name
= 'RDKit-MACCS166/1'¶
-
num_bits
= 166¶
-
-
class
chemfp.rdkit_types.
RDKitMACCSFingerprintType_v2
(fingerprint_kwargs)¶ Bases:
chemfp.types.NoFingerprintParametersMixin
,chemfp.rdkit_types.FixedSizeFingerprint
RDKit’s implementation of the 166 MACCS keys, version 2
The RDKit-MACCS166/2 fingerprints have no parameters. RDKit version added this version in late 2014 to support MACCS key 44 (“OTHER”).
-
name
= 'RDKit-MACCS166/2'¶
-
num_bits
= 166¶
-
-
class
chemfp.rdkit_types.
RDKitMorganFingerprintType_v1
(fingerprint_kwargs)¶ Bases:
chemfp.rdkit_types.VariableSizeFingerprint
RDKit Morgan (ECFP-like) fingerprints, version 1
The RDKit-Morgan/1
FingerprintType
parameters are:- fpSize - number of bits in the fingerprint (default: 2048)
- radius - radius for the Morgan algorithm (default: 2)
- useFeatures - use chemical-feature invariants (default: 0)
- useChirality - use chirality information (default: 0)
- useBondTypes - include bond type information (default: 1)
- includeRedundantEnvironments - if 1, do not check for redundant environments (added in RDKit 2020-3) (default: 0)
- fromAtoms - a comma-separated list of atom indices to use as centers
-
name
= 'RDKit-Morgan/1'¶
-
class
chemfp.rdkit_types.
RDKitBaseAtomPairFingerprintType
(fingerprint_kwargs)¶ Bases:
chemfp.rdkit_types.VariableSizeFingerprint
Base class for the RDKitAtomPair fingerprint types
-
class
chemfp.rdkit_types.
RDKitAtomPairFingerprint_v1
(fingerprint_kwargs)¶ Bases:
chemfp.rdkit_types.RDKitBaseAtomPairFingerprintType
RDKit atom pair fingerprints, version 1
The RDKit-AtomPair/1
FingerprintType
parameters are:- fpSize - number of bits in the fingerprint (default: 2048)
- minLength - minimum bond count for a pair (default: 1)
- maxLength - maximum bond count for a pair (default: 30)
Note: this version was only available in ancient (pre-2012) versions of RDKit. Chemfp no longer supports those versions of RDKit.
-
name
= 'RDKit-AtomPair/1'¶
-
class
chemfp.rdkit_types.
RDKitAtomPairFingerprint_v2
(fingerprint_kwargs)¶ Bases:
chemfp.rdkit_types.RDKitBaseAtomPairFingerprintType
RDKit atom pair fingerprints, version 2
The RDKit-AtomPair/2
FingerprintType
parameters are:- fpSize - number of bits in the fingerprint (default: 2048)
- minLength - minimum bond count for a pair (default: 1 bond)
- maxLength - maximum bond count for a pair (default: 30, max: 63)
- nBitsPerEntry - number of bits to use in simulating counts (default: 4)
- includeChirality - if 1, chirality will be used in the atom invariants (default: 0)
- use2D - if 1, use a 2D distance matrix, if 0 use the 3D matrix from the first
- set of conformers, or return an empty fingerprint if no conformers (default: 1)
- fromAtoms - a comma-separated list of atom indices which must be in the pair
-
name
= 'RDKit-AtomPair/2'¶
-
class
chemfp.rdkit_types.
RDKitBaseTorsionFingerprintType
(fingerprint_kwargs)¶
-
class
chemfp.rdkit_types.
RDKitTorsionFingerprintType_v1
(fingerprint_kwargs)¶ Bases:
chemfp.rdkit_types.RDKitBaseTorsionFingerprintType
RDKit torsion fingerprints, version 1
See https://rdkit.org/docs/source/rdkit.Chem.AtomPairs.Torsions.html
An implementation of Topological-torsion fingerprints, as described in: R. Nilakantan, N. Bauman, J. S. Dixon, R. Venkataraghavan; “Topological Torsion: A New Molecular Descriptor for SAR Applications. Comparison with Other Descriptors” JCICS 27, 82-85 (1987).
The RDKit-Torsion/1
FingerprintType
parameters are:- fpSize - number of bits in the fingerprint (default: 2048)
- targetSize - number of bonds per torsion (default: 4)
Note: this version is only available in older (pre-2014) versions of RDKit Chemfp no longer supports those versions of RDKit.
-
name
= 'RDKit-Torsion/1'¶
-
class
chemfp.rdkit_types.
RDKitTorsionFingerprintType_v2
(fingerprint_kwargs)¶ Bases:
chemfp.rdkit_types.RDKitBaseTorsionFingerprintType
RDKit torsion fingerprints, version 2
See https://rdkit.org/docs/source/rdkit.Chem.AtomPairs.Torsions.html
An implementation of Topological-torsion fingerprints, as described in: R. Nilakantan, N. Bauman, J. S. Dixon, R. Venkataraghavan; “Topological Torsion: A New Molecular Descriptor for SAR Applications. Comparison with Other Descriptors” JCICS 27, 82-85 (1987).
The RDKit-Torsion/2
FingerprintType
parameters are:- fpSize - number of bits in the fingerprint (default: 2048)
- targetSize - number of bonds per torsion (default: 4)
- nBitsPerEntry - number of bits to set per entry (default: 4)
- includeChirality - include chirality information (default: 0)
- fromAtoms - a comma-separated list of atom indices which must be part of the torsion
-
name
= 'RDKit-Torsion/2'¶
-
class
chemfp.rdkit_types.
RDKitTorsionFingerprintType_v3
(fingerprint_kwargs)¶ Bases:
chemfp.rdkit_types.RDKitBaseTorsionFingerprintType
RDKit torsion fingerprints, version 3
See https://rdkit.org/docs/source/rdkit.Chem.AtomPairs.Torsions.html
An implementation of Topological-torsion fingerprints, as described in: R. Nilakantan, N. Bauman, J. S. Dixon, R. Venkataraghavan; “Topological Torsion: A New Molecular Descriptor for SAR Applications. Comparison with Other Descriptors” JCICS 27, 82-85 (1987).
This version started with RDKit 2023.03.1, which changed how includeChirality=1 works.
The RDKit-Torsion/3
FingerprintType
parameters are:- fpSize - number of bits in the fingerprint (default: 2048)
- targetSize - number of bonds per torsion (default: 4)
- nBitsPerEntry - number of bits to set per entry (default: 4)
- includeChirality - include chirality information (default: 0)
- fromAtoms - a comma-separated list of atom indices which must be part of the torsion
-
name
= 'RDKit-Torsion/3'¶
-
class
chemfp.rdkit_types.
RDKitBasePatternFingerprint
(fingerprint_kwargs)¶
-
class
chemfp.rdkit_types.
RDKitPatternFingerprint_v1
(fingerprint_kwargs)¶ Bases:
chemfp.rdkit_types.RDKitBasePatternFingerprint
RDKit’s experimental substructure screen fingerprint, version 1
See https://rdkit.org/docs/source/rdkit.Chem.rdmolops.html#rdkit.Chem.rdmolops.PatternFingerprint
The RDKit-Pattern/1 fingerprint has no parameters.
Note: this version is only available in ancient versions of RDKit. Chemfp no longer supports those versions of RDKit.
-
name
= 'RDKit-Pattern/1'¶
-
-
class
chemfp.rdkit_types.
RDKitPatternFingerprint_v2
(fingerprint_kwargs)¶ Bases:
chemfp.rdkit_types.RDKitBasePatternFingerprint
RDKit’s experimental substructure screen fingerprint, version 2
See https://rdkit.org/docs/source/rdkit.Chem.rdmolops.html#rdkit.Chem.rdmolops.PatternFingerprint
The RDKit-Pattern/2 fingerprint has no parameters.
Note: this version is only available in ancient versions of RDKit. Chemfp no longer supports those versions of RDKit.
-
name
= 'RDKit-Pattern/2'¶
-
-
class
chemfp.rdkit_types.
RDKitPatternFingerprint_v3
(fingerprint_kwargs)¶ Bases:
chemfp.rdkit_types.RDKitBasePatternFingerprint
RDKit’s experimental substructure screen fingerprint, version 3
See https://rdkit.org/docs/source/rdkit.Chem.rdmolops.html#rdkit.Chem.rdmolops.PatternFingerprint
The RDKit-Pattern/3 fingerprint has no parameters. This version was released 2017.03.1.
Note: Chemfp no longer supports those versions of RDKit.
-
name
= 'RDKit-Pattern/3'¶
-
-
class
chemfp.rdkit_types.
RDKitPatternFingerprint_v4
(fingerprint_kwargs)¶ Bases:
chemfp.rdkit_types.RDKitBasePatternFingerprint
RDKit’s experimental substructure screen fingerprint, version 4
See https://rdkit.org/docs/source/rdkit.Chem.rdmolops.html#rdkit.Chem.rdmolops.PatternFingerprint
The RDKit-Pattern/4 fingerprint has no parameters. This version was introduced in August 2017 for the 2017.09.1 release.
-
name
= 'RDKit-Pattern/4'¶
-
-
class
chemfp.rdkit_types.
RDKitAvalonFingerprintType_v1
(fingerprint_kwargs)¶ Bases:
chemfp.rdkit_types.VariableSizeFingerprint
Avalon fingerprints
The Avalon Cheminformatics toolkit is available from https://sourceforge.net/projects/avalontoolkit/ . It is not part of the core RDKit distribution. Instead, RDKit has a compile-time option to download and include it as part of the build process.
The Avalon fingerprint are described in the supplemental information for “QSAR - How Good Is It in Practice? Comparison of Descriptor Sets on an Unbiased Cross Section of Corporate Data Sets”, Peter Gedeck, Bernhard Rohde, and Christian Bartels, J. Chem. Inf. Model., 2006, 46 (5), pp 1924-1936, DOI: 10.1021/ci050413p. The supplemental information is available from https://pubs.acs.org/doi/suppl/10.1021/ci050413p
It uses a set of feature classes which “have been fine-tuned to provide good screen-out for the set of substructure queries encounted at Novartis while limiting redundancy.” The classes are ATOM_COUNT, ATOM_SYMBOL_PATH, AUGMENTED_ATOM, AUGMENTED_BOND, HCOUNT_PAIR, HCOUNT_PATH, RING_PATH, BOND_PATH, HCOUNT_CLASS_PATH, ATOM_CLASS_PATH, RING_PATTERN, RING_SIZE_COUNTS, DEGREE_PATHS, CLASS_SPIDERS, FEATURE_PAIRS and ALL_PATTERNS.
-
name
= 'RDKit-Avalon/1'¶
-
-
class
chemfp.rdkit_types.
RDKitSECFPFingerprintType_v1
(fingerprint_kwargs)¶ Bases:
chemfp.rdkit_types.VariableSizeFingerprint
SECFP fingerprints
- The SMILES Extended Connectivity Fingerprint, as described in:
- Probst, D., Reymond, J. A probabilistic molecular fingerprint for big data settings. J Cheminform 10, 66 (2018). https://doi.org/10.1186/s13321-018-0321-8 https://jcheminf.biomedcentral.com/articles/10.1186/s13321-018-0321-8
These are circular fingerprints which encode the circular region as a fragment SMILES, which is then hashed to produce the fingerprint bits.
The RDKit-SECFP/1
FingerprintType
parameters are:- fpSize - number of bits in the fingerprint (default: 2048)
- radius - analogous to the radius for the Morgan algorithm (default: 3)
- rings - include ring membership (default: 1)
- isomeric - use isomeric SMILES (default: 0)
- kekulize - Kekulize the molecule and use Kekule SMILES (default: 0)
- min_radius - minimum radius for the Morgan algorithm (default: 1)
-
name
= 'RDKit-SECFP/1'¶