chemfp.cdk_toolkit module¶
The chemfp toolkit API wrapper for the CDK toolkit.
This module is also available as chemfp.cdk
.
-
chemfp.cdk_toolkit.
is_licensed
()¶ Return True - CDK is always licensed
Returns: True
-
chemfp.cdk_toolkit.
get_formats
(include_unavailable=False)¶ Get the list of structure formats that CDK supports
If include_unavailable is True then also include CDK formats which aren’t available to this specific version of CDK.
Parameters: include_unavailable (True or False) – include unavailable formats? Returns: a list of Format objects
-
chemfp.cdk_toolkit.
get_input_formats
()¶ Get the list of supported CDK input formats
Returns: a list of chemfp.base_toolkit.Format
objects
-
chemfp.cdk_toolkit.
get_output_formats
()¶ Get the list of supported CDK output formats
Returns: a list of chemfp.base_toolkit.Format
objects
-
chemfp.cdk_toolkit.
get_format
(format)¶ Get the named format, or raise a ValueError
This will raise a ValueError if CDK does not implement the format format_name or that format is not available.
Parameters: format_name (a string) – the format name Returns: a list of chemfp.base_toolkit.Format
objects
-
chemfp.cdk_toolkit.
get_input_format
(format)¶ Get the named input format, or raise a ValueError
This will raise a ValueError if CDK does not implement the format format_name or that format is not an input format.
Parameters: format_name (a string) – the format name Returns: a list of chemfp.base_toolkit.Format
objects
-
chemfp.cdk_toolkit.
get_output_format
(format)¶ Get the named format, or raise a ValueError
This will raise a ValueError if CDK does not implement the format format_name or that format is not an output format.
Parameters: format_name (a string) – the format name Returns: a list of chemfp.base_toolkit.Format
objects
-
chemfp.cdk_toolkit.
get_input_format_from_source
(source=None, format=None)¶ Get the most appropriate format given the available source and format information
If format is a
chemfp.base_toolkit.Format
then return it. If it’s a Format-like object with “name” and “compression” attributes use it to make a real Format object with the same attributes. If it’s a string then use it to create a Format object.If format is None, use the source to auto-detect the format. If auto-detection is not possible, assume it’s an uncompressed SMILES file.
Parameters: - source (a filename (as a string), a file object, or None to read from stdin) – the structure data source.
- format (a Format(-like) object, string, or None) – format information, if known.
Returns: a
chemfp.base_toolkit.Format
object
-
chemfp.cdk_toolkit.
get_output_format_from_destination
(destination=None, format=None)¶ Get the most appropriate format given the available destination and format information
If format is a
chemfp.base_toolkit.Format
then return it. If it’s a Format-like object with “name” and “compression” attributes use it to make a real Format object with the same attributes. If it’s a string then use it to create a Format object.If format is None, use the destination to auto-detect the format. If auto-detection is not possible, assume it’s an uncompressed SMILES file.
Parameters: - destination (a filename (as a string), a file object, or None to read from stdin) – The structure data source.
- format (a Format(-like) object, string, or None) – format information, if known.
Returns: a
chemfp.base_toolkit.Format
object
-
chemfp.cdk_toolkit.
read_molecules
(source=None, format=None, id_tag=None, reader_args=None, errors='strict', location=None, encoding='utf8', encoding_errors='strict')¶ Return an iterator that reads CDK molecules from a structure file
Iterate through the format structure records in source. If format is None then auto-detect the format based on the source. For SD files, use id_tag to get the record id from the given SD tag instead of the title line. (read_molecules() will ignore the id_tag. It exists to make it easier to switch between reader functions.)
Note: the reader returns a new CDK molecule each time.
The reader_args dictionary parameters depend on the format. These include:
- SMILES
- delimiter - one of “tab”, “space”, “to-eol”, the space or tab characters, or None
- has_header - True or False
- sanitize - True or default sanitizes; False for unsanitized processing
- InChI
- delimiter - one of “tab”, “space”, “to-eol”, the space or tab characters, or None
- sanitize - True or default sanitizes; False for unsanitized processing
- removeHs - True or default removes explicit hydrogens; False leaves them in the structure
- logLevel - an integer log level
- treatWarningAsError - True raises an exception on error; False or default keeps processing
- SDF
- sanitize - True or default sanitizes; False for unsanitized processing
- removeHs - True or default removes explicit hydrogens; False leaves them in the structure
- strictParsing - True or default for strict parsing; False for lenient parsing
The errors parameter specifies how to handle errors. “strict” raises an exception, “report” sends a message to stderr and goes to the next record, and “ignore” goes to the next record.
The location parameter takes a
chemfp.io.Location
instance. If None then a default Location will be created.See
chemfp.cdk_toolkit.read_ids_and_molecules()
if you want (id, molecule) pairs instead of just the molecules.Parameters: - source (a filename, file object, or None to read from stdin) – the structure source
- format (a format name string, or Format object, or None to auto-detect) – the input structure format
- id_tag (string, or None to use the record title) – SD tag containing the record id
- reader_args (a dictionary) – reader parameters passed to the underlying toolkit
- errors (one of "strict", "report", or "ignore") – specify how to handle errors
- location (a
chemfp.io.Location
object, or None) – object used to track parser state information
Returns: a
chemfp.base_toolkit.MoleculeReader
iterating CDK molecules- SMILES
-
chemfp.cdk_toolkit.
read_molecules_from_string
(content, format, id_tag=None, reader_args=None, errors='strict', location=None)¶ Return an iterator that reads CDK molecules from a string containing structure records
content is a string containing 0 or more records in the format format. See
chemfp.cdk_toolkit.read_molecules()
for details about the other parameters. Seechemfp.cdk_toolkit.read_ids_and_molecules_from_string()
if you want to read (id, CDK molecule) pairs instead of just molecules.Parameters: - content (a string) – the string containing structure records
- format (a format name string, or Format object) – the input structure format
- id_tag (string, or None to use the record title) – SD tag containing the record id
- reader_args (a dictionary) – reader arguments passed to the underlying toolkit
- errors (one of "strict", "report", or "ignore") – specify how to handle errors
- location (a
chemfp.io.Location
object, or None) – object used to track parser state information
Returns: a
chemfp.base_toolkit.MoleculeReader
iterating CDK molecules
-
chemfp.cdk_toolkit.
read_ids_and_molecules
(source=None, format=None, id_tag=None, reader_args=None, errors='strict', location=None, encoding='utf8', encoding_errors='strict')¶ Return an iterator that reads (id, CDK molecule) pairs from a structure file
See
chemfp.cdk_toolkit.read_molecules()
for full parameter details. The major difference is that this returns an iterator of (id, CDK molecule) pairs instead of just the molecules.Parameters: - source (a filename, file object, or None to read from stdin) – the structure source
- format (a format name string, or Format object, or None to auto-detect) – the input structure format
- id_tag (string, or None to use the record title) – SD tag containing the record id
- reader_args (a dictionary) – reader arguments passed to the underlying toolkit
- errors (one of "strict", "report", or "ignore") – specify how to handle errors
- location (a
chemfp.io.Location
object, or None) – object used to track parser state information
Returns: a
chemfp.base_toolkit.IdAndMoleculeReader
iterating (id, CDK molecule) pairs
-
chemfp.cdk_toolkit.
read_ids_and_molecules_from_string
(content, format, id_tag=None, reader_args=None, errors='strict', location=None)¶ Return an iterator that reads (id, CDK molecule) pairs from a string containing structure records
content is a string containing 0 or more records in the format format. See
chemfp.cdk_toolkit.read_molecules()
for details about the other parameters. Seechemfp.cdk_toolkit.read_molecules_from_string()
if you just want to read the CDK molecules instead of (id, molecule) pairs.Parameters: - content (a string) – the string containing structure records
- format (a format name string, or Format object) – the input structure format
- id_tag (string, or None to use the record title) – SD tag containing the record id
- reader_args (a dictionary) – reader arguments passed to the underlying toolkit
- errors (one of "strict", "report", or "ignore") – specify how to handle errors
- location (a
chemfp.io.Location
object, or None) – object used to track parser state information
Returns: a
chemfp.base_toolkit.IdAndMoleculeReader
iterating (id, CDK molecule) pairs
-
chemfp.cdk_toolkit.
make_id_and_molecule_parser
(format, id_tag=None, reader_args=None, errors='strict')¶ Create a specialized function which takes a record and returns an (id, CDK molecule) pair
The returned function is optimized for reading many records from individual strings because it only does parameter validation once. However, I haven’t really noticed much of a performance difference between this and
chemfp.cdk_toolkit.parse_id_and_molecule()
so you can probably so I suggest you use that function directly instead of making a specialized function. (Let me know if making a specialized function is useful.)See
chemfp.cdk_toolkit.read_molecules()
for details about the other parameters.Parameters: - format (a format name string, or Format object) – the input structure format
- id_tag (string, or None to use the record title) – SD tag containing the record id
- reader_args (a dictionary) – reader arguments passed to the underlying toolkit
- errors (one of "strict", "report", or "ignore") – specify how to handle errors
Returns: a function of the form
parser(record string) -> (id, CDK molecule)
-
chemfp.cdk_toolkit.
parse_molecule
(content, format, id_tag=None, reader_args=None, errors='strict')¶ Parse the first structure record from the content string and return a CDK molecule.
content is a string containing a single structure record in format format. (Additional records are ignored). See
chemfp.cdk_toolkit.read_molecules()
for details about the other parameters. Seechemfp.cdk_toolkit.parse_id_and_molecule()
if you want the (id, CDK molecule) pair instead of just the molecule.Parameters: - content (a string) – the string containing a structure record
- format (a format name string, or Format object) – the input structure format
- id_tag (string, or None to use the record title) – SD tag containing the record id
- reader_args (a dictionary) – reader arguments passed to the underlying toolkit
- errors (one of "strict", "report", or "ignore") – specify how to handle errors
Returns: a CDK molecule
-
chemfp.cdk_toolkit.
parse_id_and_molecule
(content, format, id_tag=None, reader_args=None, errors='strict')¶ Parse the first structure record from content and return the (id, CDK molecule) pair.
content is a string containing a single structure record in format format. (Additional records are ignored). See
chemfp.cdk_toolkit.read_molecules()
for details about the other parameters.See
chemfp.cdk_toolkit.read_molecules()
for details about the other parameters. Seechemfp.cdk_toolkit.parse_molecule()
if just want the CDK molecule and not the the (id, CDK molecule) pair.Parameters: - content (a string) – the string containing a structure record
- format (a format name string, or Format object) – the input structure format
- id_tag (string, or None to use the record title) – SD tag containing the record id
- reader_args (a dictionary) – reader arguments passed to the underlying toolkit
- errors (one of "strict", "report", or "ignore") – specify how to handle errors
Returns: an (id, CDK molecule) pair
-
chemfp.cdk_toolkit.
create_string
(mol, format, id=None, writer_args=None, errors='strict')¶ Convert a CDK molecule into a structure record in the given format as a Unicode string
If id is not None then use it instead of the molecule’s own title. Warning: this may briefly modify the molecule, so may not be thread-safe.
Parameters: - mol (a CDK molecule) – the molecule to use for the output
- format (a format name string, or Format object) – the output structure format
- id (a string, or None to use the molecule's own id) – an alternate record id
- writer_args (a dictionary) – writer arguments passed to the underlying toolkit
- errors (one of "strict", "report", or "ignore") – specify how to handle errors
Returns: a Unicode string
-
chemfp.cdk_toolkit.
create_bytes
(mol, format, id=None, writer_args=None, errors='strict', level=None)¶ Convert a CDK molecule into a structure record in the given format as a byte string
If id is not None then use it instead of the molecule’s own title. Warning: this may briefly modify the molecule, so may not be thread-safe.
Parameters: - mol (a CDK molecule) – the molecule to use for the output
- format (a format name string, or Format object) – the output structure format
- id (a string, or None to use the molecule's own id) – an alternate record id
- writer_args (a dictionary) – writer arguments passed to the underlying toolkit
- errors (one of "strict", "report", or "ignore") – specify how to handle errors
- level (None, a positive integer, or one of the strings 'min', 'default', or 'max') – compression level to use for compressed formats
Returns: a byte string
-
chemfp.cdk_toolkit.
open_molecule_writer
(destination=None, format=None, writer_args=None, errors='strict', location=None, encoding='utf8', encoding_errors='strict', level=None)¶ Return a MoleculeWriter which can write CDK molecules to a destination.
A
chemfp.base_toolkit.MoleculeWriter
has the methodswrite_molecule
,write_molecules
, andwrite_ids_and_molecules
, which are ways to write a CDK molecule, a CDK molecule iterator, or an (id, CDK molecule) pair iterator to a file.Molecules are written to destination. The output format can be a string like “sdf.gz” or “smi”, a
chemfp.base_toolkit.Format
, or Format-like object with “name” and “compression” attributes, or None to auto-detect based on the destination. If auto-detection is not possible, the output will be written as uncompressed SMILES.The writer_args dictionary parameters depend on the format. These include:
- SMILES
- delimiter - one of “tab”, “space”, “to-eol”, the space or tab characters, or None
- isomericSmiles - True to generate isomeric SMILES
- kekuleSmiles - True to generate SMILES in Kekule form
- canonical - True to generate a canonical SMILES
- allBondsExplicit - True to write explict ‘-’ and ‘:’ bonds, even if they can be inferred; default is False
- allHsExplicit - True to write explicit hydrogen counts; default is False
- cxsmiles - True to include CXSMILES annotations; default is False
InChI and InChIKey
- delimiter - one of “tab”, “space”, “to-eol”, the space or tab characters, or None
- include_id - True or default to include the id as the second column; False has no id column
- options - an options string passed to the underlying InChI library
- logLevel - an integer log level
- treatWarningAsError - True raises an exception on error; False or default keeps processing
SDF
- includeStereo - True include stereo information; False or default does not
- kekulize - True or default creates the connection table with bonds in Kekeule form
- v3k - True to always export in V3000 format
The errors parameter specifies how to handle errors. “strict” raises an exception, “report” sends a message to stderr and goes to the next record, and “ignore” goes to the next record.
The location parameter takes a
chemfp.io.Location
instance. If None then a default Location will be created.Parameters: - destination (a filename, file object, or None to write to stdout) – the structure destination
- format (a format name string, or Format(-like) object, or None to auto-detect) – the output structure format
- writer_args (a dictionary) – writer parameters passed to the underlying toolkit
- errors (one of "strict", "report", or "ignore") – specify how to handle errors
- location (a
chemfp.io.Location
object, or None) – object used to track writer state information - level (None, a positive integer, or one of the strings 'min', 'default', or 'max') – compression level to use for compressed formats
Returns: a
chemfp.base_toolkit.MoleculeWriter
expecting CDK molecules- SMILES
-
chemfp.cdk_toolkit.
open_molecule_writer_to_string
(format, writer_args=None, errors='strict', location=None)¶ Return a MoleculeStringWriter which can write molecule records in the given format to a string.
See
chemfp.cdk_toolkit.open_molecule_writer()
for full parameter details.Use the writer’s
chemfp.base_toolkit.MoleculeStringWriter.getvalue()
to get the output as a Unicode string.Parameters: - format (a format name string, or Format(-like) object, or None to auto-detect) – the output structure format
- writer_args (a dictionary) – writer arguments passed to the underlying toolkit
- errors (one of "strict", "report", or "ignore") – specify how to handle errors
- location (a
chemfp.io.Location
object, or None) – object used to track writer state information
Returns: a
chemfp.base_toolkit.MoleculeStringWriter
expecting CDK molecules
-
chemfp.cdk_toolkit.
open_molecule_writer_to_bytes
(format, writer_args=None, errors='strict', location=None, level=None)¶ Return a MoleculeStringWriter which can write molecule records in the given format to a text string.
See
chemfp.cdk_toolkit.open_molecule_writer()
for full parameter details.Use the writer’s
chemfp.base_toolkit.MoleculeStringWriter.getvalue()
to get the output as a byte string.Parameters: - format (a format name string, or Format(-like) object, or None to auto-detect) – the output structure format
- writer_args (a dictionary) – writer arguments passed to the underlying toolkit
- errors (one of "strict", "report", or "ignore") – specify how to handle errors
- location (a
chemfp.io.Location
object, or None) – object used to track writer state information - level (None, a positive integer, or one of the strings 'min', 'default', or 'max') – compression level to use for compressed formats
Returns: a
chemfp.base_toolkit.MoleculeStringWriter
expecting CDK molecules
-
chemfp.cdk_toolkit.
copy_molecule
(mol)¶ Return a new CDK molecule which is a copy of the given molecule
Parameters: mol (a CDK molecule) – the molecule to copy Returns: a new CDK Mol instance
-
chemfp.cdk_toolkit.
add_tag
(mol, tag, value)¶ Add an SD tag value to the CDK molecule
Parameters: - mol (a CDK molecule) – the molecule
- tag (string) – the SD tag name
- value (string) – the text for the tag
Returns: None
-
chemfp.cdk_toolkit.
get_tag
(mol, tag)¶ Get the named SD tag value, or None if it doesn’t exist
Parameters: - mol (a CDK molecule) – the molecule
- tag (string) – the SD tag name
Returns: a string, or None
-
chemfp.cdk_toolkit.
get_tag_pairs
(mol)¶ Get a list of all SD tag (name, value) pairs for the molecule
Parameters: mol (a CDK molecule) – the molecule Returns: a list of (string name, string value) pairs
-
chemfp.cdk_toolkit.
get_id
(mol)¶ Get the molecule’s id from CDK’s “cdk:Title” property
Parameters: mol (a CDK molecule) – the molecule Returns: a string
-
chemfp.cdk_toolkit.
set_id
(mol, id)¶ Set the molecule’s id as CDK’s “cdk:Title” property
Parameters: - mol (a CDK molecule) – the molecule
- id (string) – the new id
Returns: None
-
chemfp.cdk_toolkit.
from_smistring
(content: str, *, kekulise: bool = True, errors: str = 'strict')¶ Parse a SMILES string using the CDK toolkit
- This is equivalent to calling:
- parse_molecule(content, “smistring”, reader_args={…}, errors=errors)
Parameters: - kekulise (Boolean (default: True)) – if true, ensure a valid Kekule intepretation exists
- errors (one of "strict", "ignore", or "log") – specify how to handle errors
Returns: a CDK molecule object
-
chemfp.cdk_toolkit.
to_smistring
(mol: Any, *, id: Optional[str, None] = None, flavor: Union[int, str, None] = 'Default', errors: str = 'strict')¶ Generate a SMILES string from a CDK molecule
- This is equivalent to calling:
- create_string(mol, “smistring”, id=id, writer_args={…}, errors=errors)
- Available bit flag flavors are:
- ‘Canonical’ = 1 (in default bit flags) ‘InChILabelling’ = 3 ‘AtomAtomMap’ = 4 ‘AtomicMass’ = 8 (in default bit flags) ‘UseAromaticSymbols’ = 16 ‘StereoTetrahedral’ = 256 (in default bit flags) ‘StereoCisTrans’ = 512 (in default bit flags) ‘StereoExTetrahedral’ = 1024 (in default bit flags) ‘StereoExCisTrans’ = 1280 (in default bit flags) ‘AtomicMassStrict’ = 2048 ‘Stereo’ = 1792 (in default bit flags) ‘Cx2dCoordinates’ = 4096 ‘Cx3dCoordinates’ = 8192 ‘CxCoordinates’ = 12288 ‘CxAtomLabel’ = 32768 ‘CxAtomValue’ = 65536 ‘CxRadical’ = 131072 ‘CxMulticenter’ = 262144 ‘CxPolymer’ = 524288 ‘CxFragmentGroup’ = 1048576 ‘AtomAtomMapRenumber’ = 33554437 ‘CxSmiles’ = 12550400 ‘CxSmilesWithCoords’ = 12562688 ‘Unique’ = 1 (in default bit flags) ‘Isomeric’ = 1800 (in default bit flags) ‘Absolute’ = 1801 (in default bit flags) ‘UniversalSmiles’ = 1803 ‘Default’ = 1801 (in default bit flags)
Parameters: - mol (a CDK molecule) – a molecule object
- id (None or a string (default: None)) – an alternate identifier for the output record, if relevant
- flavor (None, integer or string with "|"- or ","-separated terms (default: "Default")) – Output flavor bit flags
- errors (one of "strict", "ignore", or "log") – specify how to handle errors
Returns: a CDK molecule object
-
chemfp.cdk_toolkit.
from_smi
(content: str, *, has_header: bool = False, delimiter: Optional[Literal[to_eol, space, tab, comma, whitespace, native, , ], None] = 'to-eol', implementation: Optional[Literal[cdk, chemfp], None] = 'cdk', kekulise: bool = True, errors: str = 'strict')¶ Parse a SMILES string and id using the CDK toolkit
- This is equivalent to calling:
- parse_molecule(content, “smi”, reader_args={…}, errors=errors)
Parameters: - has_header (Boolean (default: False)) – If true, treat the first line of the SMILES file as a header
- delimiter (One of None, 'to_eol', 'space', 'tab', 'comma', 'whitespace', 'native', or the space or tab characters (default: "to-eol")) – The separator between the SMILES and the id
- implementation (either 'cdk' or 'chemfp') – use CDK or chemfp to identify records
- kekulise (Boolean (default: True)) – if true, ensure a valid Kekule intepretation exists
- errors (one of "strict", "ignore", or "log") – specify how to handle errors
Returns: a CDK molecule object
-
chemfp.cdk_toolkit.
to_smi
(mol: Any, *, id: Optional[str, None] = None, delimiter: Optional[Literal[to_eol, space, tab, comma, whitespace, native, , ], None] = None, flavor: Union[int, str, None] = 'Default', errors: str = 'strict')¶ Generate a SMILES string and id from a CDK molecule
- This is equivalent to calling:
- create_string(mol, “smi”, id=id, writer_args={…}, errors=errors)
- Available bit flag flavors are:
- ‘Canonical’ = 1 (in default bit flags) ‘InChILabelling’ = 3 ‘AtomAtomMap’ = 4 ‘AtomicMass’ = 8 (in default bit flags) ‘UseAromaticSymbols’ = 16 ‘StereoTetrahedral’ = 256 (in default bit flags) ‘StereoCisTrans’ = 512 (in default bit flags) ‘StereoExTetrahedral’ = 1024 (in default bit flags) ‘StereoExCisTrans’ = 1280 (in default bit flags) ‘AtomicMassStrict’ = 2048 ‘Stereo’ = 1792 (in default bit flags) ‘Cx2dCoordinates’ = 4096 ‘Cx3dCoordinates’ = 8192 ‘CxCoordinates’ = 12288 ‘CxAtomLabel’ = 32768 ‘CxAtomValue’ = 65536 ‘CxRadical’ = 131072 ‘CxMulticenter’ = 262144 ‘CxPolymer’ = 524288 ‘CxFragmentGroup’ = 1048576 ‘AtomAtomMapRenumber’ = 33554437 ‘CxSmiles’ = 12550400 ‘CxSmilesWithCoords’ = 12562688 ‘Unique’ = 1 (in default bit flags) ‘Isomeric’ = 1800 (in default bit flags) ‘Absolute’ = 1801 (in default bit flags) ‘UniversalSmiles’ = 1803 ‘Default’ = 1801 (in default bit flags)
Parameters: - mol (a CDK molecule) – a molecule object
- id (None or a string (default: None)) – an alternate identifier for the output record, if relevant
- delimiter (One of None, 'to_eol', 'space', 'tab', 'comma', 'whitespace', 'native', or the space or tab characters (default: None)) – The separator between the SMILES and the id
- flavor (None, integer or string with "|"- or ","-separated terms (default: "Default")) – Output flavor bit flags
- errors (one of "strict", "ignore", or "log") – specify how to handle errors
Returns: a CDK molecule object
-
chemfp.cdk_toolkit.
from_smi_file
(source: Union[None, str, BinaryIO], *, has_header: bool = False, delimiter: Optional[Literal[to_eol, space, tab, comma, whitespace, native, , ], None] = 'to-eol', implementation: Optional[Literal[cdk, chemfp], None] = 'cdk', kekulise: bool = True, errors: str = 'strict')¶ Parse a SMILES string and id file using the CDK toolkit
- This is mostly equivalent to calling:
- read_molecules(source, “smi”, reader_args={…}, errors=errors)
Parameters: - has_header (Boolean (default: False)) – If true, treat the first line of the SMILES file as a header
- delimiter (One of None, 'to_eol', 'space', 'tab', 'comma', 'whitespace', 'native', or the space or tab characters (default: "to-eol")) – The separator between the SMILES and the id
- implementation (either 'cdk' or 'chemfp') – use CDK or chemfp to identify records
- kekulise (Boolean (default: True)) – if true, ensure a valid Kekule intepretation exists
- errors (one of "strict", "ignore", or "log") – specify how to handle errors
Returns: a
chemfp.base_toolkit.MoleculeReader
iterating CDK molecules
-
chemfp.cdk_toolkit.
to_smi_file
(destination: Union[None, str, BinaryIO], *, delimiter: Optional[Literal[to_eol, space, tab, comma, whitespace, native, , ], None] = None, flavor: Union[int, str, None] = 'Default', errors: str = 'strict')¶ Generate a SMILES string and id from a CDK molecule
- This is mostly equivalent to calling:
- open_molecule_writer(destination, “smi”, writer_args={…}, errors=errors)
- Available bit flag flavors are:
- ‘Canonical’ = 1 (in default bit flags) ‘InChILabelling’ = 3 ‘AtomAtomMap’ = 4 ‘AtomicMass’ = 8 (in default bit flags) ‘UseAromaticSymbols’ = 16 ‘StereoTetrahedral’ = 256 (in default bit flags) ‘StereoCisTrans’ = 512 (in default bit flags) ‘StereoExTetrahedral’ = 1024 (in default bit flags) ‘StereoExCisTrans’ = 1280 (in default bit flags) ‘AtomicMassStrict’ = 2048 ‘Stereo’ = 1792 (in default bit flags) ‘Cx2dCoordinates’ = 4096 ‘Cx3dCoordinates’ = 8192 ‘CxCoordinates’ = 12288 ‘CxAtomLabel’ = 32768 ‘CxAtomValue’ = 65536 ‘CxRadical’ = 131072 ‘CxMulticenter’ = 262144 ‘CxPolymer’ = 524288 ‘CxFragmentGroup’ = 1048576 ‘AtomAtomMapRenumber’ = 33554437 ‘CxSmiles’ = 12550400 ‘CxSmilesWithCoords’ = 12562688 ‘Unique’ = 1 (in default bit flags) ‘Isomeric’ = 1800 (in default bit flags) ‘Absolute’ = 1801 (in default bit flags) ‘UniversalSmiles’ = 1803 ‘Default’ = 1801 (in default bit flags)
Parameters: - destination (None, a filename string, or a file-like object) – where to write the molecules
- delimiter (One of None, 'to_eol', 'space', 'tab', 'comma', 'whitespace', 'native', or the space or tab characters (default: None)) – The separator between the SMILES and the id
- flavor (None, integer or string with "|"- or ","-separated terms (default: "Default")) – Output flavor bit flags
- errors (one of "strict", "ignore", or "log") – specify how to handle errors
Returns: a
chemfp.base_toolkit.MoleculeWriter
expecting CDK molecules
-
chemfp.cdk_toolkit.
from_sdf
(content: str, *, ForceReadAs3DCoordinates: bool = False, mode: Literal[RELAXED, STRICT] = 'RELAXED', AddStereoElements: bool = True, InterpretHydrogenIsotopes: bool = True, implementation: Optional[Literal[cdk, chemfp], None] = 'cdk', errors: str = 'strict')¶ Parse an SDF record using the CDK toolkit
- This is equivalent to calling:
- parse_molecule(content, “sdf”, reader_args={…}, errors=errors)
Parameters: - ForceReadAs3DCoordinates (Boolean (default: False)) – if true, always interpret coordinates as 3D
- mode ('RELAXED' will attempt to recover, 'STRICT' will not) – strictness mode when parsing a record
- AddStereoElements (Boolean (default: True)) – if true, detect and create IStereoElements
- InterpretHydrogenIsotopes (Boolean (default: True)) – if true, interpret D and T as hydrogen isotopes
- implementation (either 'cdk' or 'chemfp') – use CDK or chemfp to identify records
- errors (one of "strict", "ignore", or "log") – specify how to handle errors
Returns: a CDK molecule object
-
chemfp.cdk_toolkit.
to_sdf
(mol: Any, *, id: Optional[str, None] = None, WriteAromaticBondTypes: bool = False, WriteMajorIsotopes: bool = True, writeProperties: bool = True, WriteQueryFormatValencies: bool = False, TruncateLongData: bool = False, ProgramName: str = 'CDK', ForceWriteAs2DCoordinates: bool = False, WriteDefaultProperties: bool = True, writeV3000: bool = False, errors: str = 'strict')¶ Generate an SDF record from a CDK molecule
- This is equivalent to calling:
- create_string(mol, “sdf”, id=id, writer_args={…}, errors=errors)
Parameters: - mol (a CDK molecule) – a molecule object
- id (None or a string (default: None)) – an alternate identifier for the output record, if relevant
- WriteAromaticBondTypes (Boolean (default: False)) – if true, write aromatic bonds as bond type 4
- WriteMajorIsotopes (Boolean (default: True)) – if true, include isotopic mass for atoms with a specified mass
- writeProperties (Boolean (default: True)) – if true, write non-molecule data to the data tags
- WriteQueryFormatValencies (Boolean (default: False)) – if true, write valences in the MDL query format (deprecated)
- TruncateLongData (Boolean (default: False)) – if true, truncate data items longer than 200 characters
- ProgramName (a string up to 8 characters long) – text to use in the ‘program name’ section of the second line
- ForceWriteAs2DCoordinates (Boolean (default: False)) – if true, write coordinates as 2D
- WriteDefaultProperties (Boolean (default: True)) – if true, always include zeros in the empty atom and bond block fields
- writeV3000 (Boolean (default: False)) – if true, always write the record in V3000 format
- errors (one of "strict", "ignore", or "log") – specify how to handle errors
Returns: a CDK molecule object
-
chemfp.cdk_toolkit.
from_sdf_file
(source: Union[None, str, BinaryIO], *, ForceReadAs3DCoordinates: bool = False, mode: Literal[RELAXED, STRICT] = 'RELAXED', AddStereoElements: bool = True, InterpretHydrogenIsotopes: bool = True, implementation: Optional[Literal[cdk, chemfp], None] = 'cdk', errors: str = 'strict')¶ Parse an SDF record file using the CDK toolkit
- This is mostly equivalent to calling:
- read_molecules(source, “sdf”, reader_args={…}, errors=errors)
Parameters: - ForceReadAs3DCoordinates (Boolean (default: False)) – if true, always interpret coordinates as 3D
- mode ('RELAXED' will attempt to recover, 'STRICT' will not) – strictness mode when parsing a record
- AddStereoElements (Boolean (default: True)) – if true, detect and create IStereoElements
- InterpretHydrogenIsotopes (Boolean (default: True)) – if true, interpret D and T as hydrogen isotopes
- implementation (either 'cdk' or 'chemfp') – use CDK or chemfp to identify records
- errors (one of "strict", "ignore", or "log") – specify how to handle errors
Returns: a
chemfp.base_toolkit.MoleculeReader
iterating CDK molecules
-
chemfp.cdk_toolkit.
to_sdf_file
(destination: Union[None, str, BinaryIO], *, WriteAromaticBondTypes: bool = False, WriteMajorIsotopes: bool = True, writeProperties: bool = True, WriteQueryFormatValencies: bool = False, TruncateLongData: bool = False, ProgramName: str = 'CDK', ForceWriteAs2DCoordinates: bool = False, WriteDefaultProperties: bool = True, writeV3000: bool = False, errors: str = 'strict')¶ Generate an SDF record from a CDK molecule
- This is mostly equivalent to calling:
- open_molecule_writer(destination, “sdf”, writer_args={…}, errors=errors)
Parameters: - destination (None, a filename string, or a file-like object) – where to write the molecules
- WriteAromaticBondTypes (Boolean (default: False)) – if true, write aromatic bonds as bond type 4
- WriteMajorIsotopes (Boolean (default: True)) – if true, include isotopic mass for atoms with a specified mass
- writeProperties (Boolean (default: True)) – if true, write non-molecule data to the data tags
- WriteQueryFormatValencies (Boolean (default: False)) – if true, write valences in the MDL query format (deprecated)
- TruncateLongData (Boolean (default: False)) – if true, truncate data items longer than 200 characters
- ProgramName (a string up to 8 characters long) – text to use in the ‘program name’ section of the second line
- ForceWriteAs2DCoordinates (Boolean (default: False)) – if true, write coordinates as 2D
- WriteDefaultProperties (Boolean (default: True)) – if true, always include zeros in the empty atom and bond block fields
- writeV3000 (Boolean (default: False)) – if true, always write the record in V3000 format
- errors (one of "strict", "ignore", or "log") – specify how to handle errors
Returns: a
chemfp.base_toolkit.MoleculeWriter
expecting CDK molecules
-
chemfp.cdk_toolkit.
to_sdf3k
(mol: Any, *, id: Optional[str, None] = None, WriteAromaticBondTypes: bool = False, WriteMajorIsotopes: bool = True, writeProperties: bool = True, WriteQueryFormatValencies: bool = False, TruncateLongData: bool = False, ProgramName: str = 'CDK', ForceWriteAs2DCoordinates: bool = False, WriteDefaultProperties: bool = True, writeV3000: bool = True, errors: str = 'strict')¶ Generate an SDF record in V3000 format from a CDK molecule
- This is equivalent to calling:
- create_string(mol, “sdf3k”, id=id, writer_args={…}, errors=errors)
Parameters: - mol (a CDK molecule) – a molecule object
- id (None or a string (default: None)) – an alternate identifier for the output record, if relevant
- WriteAromaticBondTypes (Boolean (default: False)) – if true, write aromatic bonds as bond type 4
- WriteMajorIsotopes (Boolean (default: True)) – if true, include isotopic mass for atoms with a specified mass
- writeProperties (Boolean (default: True)) – if true, write non-molecule data to the data tags
- WriteQueryFormatValencies (Boolean (default: False)) – if true, write valences in the MDL query format (deprecated)
- TruncateLongData (Boolean (default: False)) – if true, truncate data items longer than 200 characters
- ProgramName (a string up to 8 characters long) – text to use in the ‘program name’ section of the second line
- ForceWriteAs2DCoordinates (Boolean (default: False)) – if true, write coordinates as 2D
- WriteDefaultProperties (Boolean (default: True)) – if true, always include zeros in the empty atom and bond block fields
- writeV3000 (Boolean (default: True)) – if true, always write the record in V3000 format
- errors (one of "strict", "ignore", or "log") – specify how to handle errors
Returns: a CDK molecule object
-
chemfp.cdk_toolkit.
to_sdf3k_file
(destination: Union[None, str, BinaryIO], *, WriteAromaticBondTypes: bool = False, WriteMajorIsotopes: bool = True, writeProperties: bool = True, WriteQueryFormatValencies: bool = False, TruncateLongData: bool = False, ProgramName: str = 'CDK', ForceWriteAs2DCoordinates: bool = False, WriteDefaultProperties: bool = True, writeV3000: bool = True, errors: str = 'strict')¶ Generate an SDF record in V3000 format from a CDK molecule
- This is mostly equivalent to calling:
- open_molecule_writer(destination, “sdf3k”, writer_args={…}, errors=errors)
Parameters: - destination (None, a filename string, or a file-like object) – where to write the molecules
- WriteAromaticBondTypes (Boolean (default: False)) – if true, write aromatic bonds as bond type 4
- WriteMajorIsotopes (Boolean (default: True)) – if true, include isotopic mass for atoms with a specified mass
- writeProperties (Boolean (default: True)) – if true, write non-molecule data to the data tags
- WriteQueryFormatValencies (Boolean (default: False)) – if true, write valences in the MDL query format (deprecated)
- TruncateLongData (Boolean (default: False)) – if true, truncate data items longer than 200 characters
- ProgramName (a string up to 8 characters long) – text to use in the ‘program name’ section of the second line
- ForceWriteAs2DCoordinates (Boolean (default: False)) – if true, write coordinates as 2D
- WriteDefaultProperties (Boolean (default: True)) – if true, always include zeros in the empty atom and bond block fields
- writeV3000 (Boolean (default: True)) – if true, always write the record in V3000 format
- errors (one of "strict", "ignore", or "log") – specify how to handle errors
Returns: a
chemfp.base_toolkit.MoleculeWriter
expecting CDK molecules
-
chemfp.cdk_toolkit.
from_molfile
(content: str, *, ForceReadAs3DCoordinates: bool = False, mode: Literal[RELAXED, STRICT] = 'RELAXED', AddStereoElements: bool = True, InterpretHydrogenIsotopes: bool = True, implementation: Optional[Literal[cdk, chemfp], None] = 'cdk', errors: str = 'strict')¶ Parse a molfile using the CDK toolkit
- This is equivalent to calling:
- parse_molecule(content, “molfile”, reader_args={…}, errors=errors)
Parameters: - ForceReadAs3DCoordinates (Boolean (default: False)) – if true, always interpret coordinates as 3D
- mode ('RELAXED' will attempt to recover, 'STRICT' will not) – strictness mode when parsing a record
- AddStereoElements (Boolean (default: True)) – if true, detect and create IStereoElements
- InterpretHydrogenIsotopes (Boolean (default: True)) – if true, interpret D and T as hydrogen isotopes
- implementation (either 'cdk' or 'chemfp') – use CDK or chemfp to identify records
- errors (one of "strict", "ignore", or "log") – specify how to handle errors
Returns: a CDK molecule object
-
chemfp.cdk_toolkit.
to_molfile
(mol: Any, *, id: Optional[str, None] = None, WriteAromaticBondTypes: bool = False, WriteMajorIsotopes: bool = True, WriteQueryFormatValencies: bool = False, TruncateLongData: bool = False, ProgramName: str = 'CDK', ForceWriteAs2DCoordinates: bool = False, WriteDefaultProperties: bool = True, writeV3000: bool = False, errors: str = 'strict')¶ Generate a molfile from a CDK molecule
- This is equivalent to calling:
- create_string(mol, “molfile”, id=id, writer_args={…}, errors=errors)
Parameters: - mol (a CDK molecule) – a molecule object
- id (None or a string (default: None)) – an alternate identifier for the output record, if relevant
- WriteAromaticBondTypes (Boolean (default: False)) – if true, write aromatic bonds as bond type 4
- WriteMajorIsotopes (Boolean (default: True)) – if true, include isotopic mass for atoms with a specified mass
- WriteQueryFormatValencies (Boolean (default: False)) – if true, write valences in the MDL query format (deprecated)
- TruncateLongData (Boolean (default: False)) – if true, truncate data items longer than 200 characters
- ProgramName (a string up to 8 characters long) – text to use in the ‘program name’ section of the second line
- ForceWriteAs2DCoordinates (Boolean (default: False)) – if true, write coordinates as 2D
- WriteDefaultProperties (Boolean (default: True)) – if true, always include zeros in the empty atom and bond block fields
- writeV3000 (Boolean (default: False)) – if true, always write the record in V3000 format
- errors (one of "strict", "ignore", or "log") – specify how to handle errors
Returns: a CDK molecule object
-
chemfp.cdk_toolkit.
from_inchi
(content: str, *, delimiter: Optional[Literal[to_eol, space, tab, comma, whitespace, native, , ], None] = 'to-eol', errors: str = 'strict')¶ Parse an InChI string and id using the CDK toolkit
- This is equivalent to calling:
- parse_molecule(content, “inchi”, reader_args={…}, errors=errors)
Parameters: - delimiter (One of None, 'to_eol', 'space', 'tab', 'comma', 'whitespace', 'native', or the space or tab characters (default: "to-eol")) – The separator between the SMILES and the id
- errors (one of "strict", "ignore", or "log") – specify how to handle errors
Returns: a CDK molecule object
-
chemfp.cdk_toolkit.
to_inchi
(mol: Any, *, id: Optional[str, None] = None, RecMet: bool = None, FixedH: bool = None, DoNotAddH: bool = None, options: str = None, delimiter: Optional[Literal[to_eol, space, tab, comma, whitespace, native, , ], None] = None, include_id: bool = True, errors: str = 'strict')¶ Generate an InChI string and id from a CDK molecule
- This is equivalent to calling:
- create_string(mol, “inchi”, id=id, writer_args={…}, errors=errors)
Parameters: - mol (a CDK molecule) – a molecule object
- id (None or a string (default: None)) – an alternate identifier for the output record, if relevant
- RecMet (Boolean (default: None)) – Reconnect metals
- FixedH (Boolean (default: None)) – Use fixed hydrogens
- DoNotAddH (Boolean (default: None)) – Do not add hydrogens
- options (space separated strings) – Configuration string to pass to the InChI API
- delimiter (One of None, 'to_eol', 'space', 'tab', 'comma', 'whitespace', 'native', or the space or tab characters (default: None)) – The separator between the SMILES and the id
- include_id (Boolean (default: True)) – if true, include the molecule id in the output
- errors (one of "strict", "ignore", or "log") – specify how to handle errors
Returns: a CDK molecule object
-
chemfp.cdk_toolkit.
from_inchi_file
(source: Union[None, str, BinaryIO], *, delimiter: Optional[Literal[to_eol, space, tab, comma, whitespace, native, , ], None] = 'to-eol', errors: str = 'strict')¶ Parse an InChI string and id file using the CDK toolkit
- This is mostly equivalent to calling:
- read_molecules(source, “inchi”, reader_args={…}, errors=errors)
Parameters: - delimiter (One of None, 'to_eol', 'space', 'tab', 'comma', 'whitespace', 'native', or the space or tab characters (default: "to-eol")) – The separator between the SMILES and the id
- errors (one of "strict", "ignore", or "log") – specify how to handle errors
Returns: a
chemfp.base_toolkit.MoleculeReader
iterating CDK molecules
-
chemfp.cdk_toolkit.
to_inchi_file
(destination: Union[None, str, BinaryIO], *, RecMet: bool = None, FixedH: bool = None, DoNotAddH: bool = None, options: str = None, delimiter: Optional[Literal[to_eol, space, tab, comma, whitespace, native, , ], None] = None, include_id: bool = True, errors: str = 'strict')¶ Generate an InChI string and id from a CDK molecule
- This is mostly equivalent to calling:
- open_molecule_writer(destination, “inchi”, writer_args={…}, errors=errors)
Parameters: - destination (None, a filename string, or a file-like object) – where to write the molecules
- RecMet (Boolean (default: None)) – Reconnect metals
- FixedH (Boolean (default: None)) – Use fixed hydrogens
- DoNotAddH (Boolean (default: None)) – Do not add hydrogens
- options (space separated strings) – Configuration string to pass to the InChI API
- delimiter (One of None, 'to_eol', 'space', 'tab', 'comma', 'whitespace', 'native', or the space or tab characters (default: None)) – The separator between the SMILES and the id
- include_id (Boolean (default: True)) – if true, include the molecule id in the output
- errors (one of "strict", "ignore", or "log") – specify how to handle errors
Returns: a
chemfp.base_toolkit.MoleculeWriter
expecting CDK molecules
-
chemfp.cdk_toolkit.
from_inchistring
(content: str, *, delimiter: Optional[Literal[to_eol, space, tab, comma, whitespace, native, , ], None] = 'to-eol', errors: str = 'strict')¶ Parse an InChI string using the CDK toolkit
- This is equivalent to calling:
- parse_molecule(content, “inchistring”, reader_args={…}, errors=errors)
Parameters: - delimiter (One of None, 'to_eol', 'space', 'tab', 'comma', 'whitespace', 'native', or the space or tab characters (default: "to-eol")) – The separator between the SMILES and the id
- errors (one of "strict", "ignore", or "log") – specify how to handle errors
Returns: a CDK molecule object
-
chemfp.cdk_toolkit.
to_inchistring
(mol: Any, *, id: Optional[str, None] = None, RecMet: bool = None, FixedH: bool = None, DoNotAddH: bool = None, options: str = None, errors: str = 'strict')¶ Generate an InChI string from a CDK molecule
- This is equivalent to calling:
- create_string(mol, “inchistring”, id=id, writer_args={…}, errors=errors)
Parameters: - mol (a CDK molecule) – a molecule object
- id (None or a string (default: None)) – an alternate identifier for the output record, if relevant
- RecMet (Boolean (default: None)) – Reconnect metals
- FixedH (Boolean (default: None)) – Use fixed hydrogens
- DoNotAddH (Boolean (default: None)) – Do not add hydrogens
- options (space separated strings) – Configuration string to pass to the InChI API
- errors (one of "strict", "ignore", or "log") – specify how to handle errors
Returns: a CDK molecule object
-
chemfp.cdk_toolkit.
to_inchikey
(mol: Any, *, id: Optional[str, None] = None, RecMet: bool = None, FixedH: bool = None, DoNotAddH: bool = None, options: str = None, delimiter: Optional[Literal[to_eol, space, tab, comma, whitespace, native, , ], None] = None, include_id: bool = True, errors: str = 'strict')¶ Generate an InChIKey string and id from a CDK molecule
- This is equivalent to calling:
- create_string(mol, “inchikey”, id=id, writer_args={…}, errors=errors)
Parameters: - mol (a CDK molecule) – a molecule object
- id (None or a string (default: None)) – an alternate identifier for the output record, if relevant
- RecMet (Boolean (default: None)) – Reconnect metals
- FixedH (Boolean (default: None)) – Use fixed hydrogens
- DoNotAddH (Boolean (default: None)) – Do not add hydrogens
- options (space separated strings) – Configuration string to pass to the InChI API
- delimiter (One of None, 'to_eol', 'space', 'tab', 'comma', 'whitespace', 'native', or the space or tab characters (default: None)) – The separator between the SMILES and the id
- include_id (Boolean (default: True)) – if true, include the molecule id in the output
- errors (one of "strict", "ignore", or "log") – specify how to handle errors
Returns: a CDK molecule object
-
chemfp.cdk_toolkit.
to_inchikey_file
(destination: Union[None, str, BinaryIO], *, RecMet: bool = None, FixedH: bool = None, DoNotAddH: bool = None, options: str = None, delimiter: Optional[Literal[to_eol, space, tab, comma, whitespace, native, , ], None] = None, include_id: bool = True, errors: str = 'strict')¶ Generate an InChIKey string and id from a CDK molecule
- This is mostly equivalent to calling:
- open_molecule_writer(destination, “inchikey”, writer_args={…}, errors=errors)
Parameters: - destination (None, a filename string, or a file-like object) – where to write the molecules
- RecMet (Boolean (default: None)) – Reconnect metals
- FixedH (Boolean (default: None)) – Use fixed hydrogens
- DoNotAddH (Boolean (default: None)) – Do not add hydrogens
- options (space separated strings) – Configuration string to pass to the InChI API
- delimiter (One of None, 'to_eol', 'space', 'tab', 'comma', 'whitespace', 'native', or the space or tab characters (default: None)) – The separator between the SMILES and the id
- include_id (Boolean (default: True)) – if true, include the molecule id in the output
- errors (one of "strict", "ignore", or "log") – specify how to handle errors
Returns: a
chemfp.base_toolkit.MoleculeWriter
expecting CDK molecules
-
chemfp.cdk_toolkit.
to_inchikeystring
(mol: Any, *, id: Optional[str, None] = None, RecMet: bool = None, FixedH: bool = None, DoNotAddH: bool = None, options: str = None, errors: str = 'strict')¶ Generate an InChIKey string from a CDK molecule
- This is equivalent to calling:
- create_string(mol, “inchikeystring”, id=id, writer_args={…}, errors=errors)
Parameters: - mol (a CDK molecule) – a molecule object
- id (None or a string (default: None)) – an alternate identifier for the output record, if relevant
- RecMet (Boolean (default: None)) – Reconnect metals
- FixedH (Boolean (default: None)) – Use fixed hydrogens
- DoNotAddH (Boolean (default: None)) – Do not add hydrogens
- options (space separated strings) – Configuration string to pass to the InChI API
- errors (one of "strict", "ignore", or "log") – specify how to handle errors
Returns: a CDK molecule object