chemfp.cdk_toolkit module

The chemfp toolkit API wrapper for the CDK toolkit.

This module is also available as chemfp.cdk.

chemfp.cdk_toolkit.is_licensed()

Return True - CDK is always licensed

Returns:True
chemfp.cdk_toolkit.get_formats(include_unavailable=False)

Get the list of structure formats that CDK supports

If include_unavailable is True then also include CDK formats which aren’t available to this specific version of CDK.

Parameters:include_unavailable (True or False) – include unavailable formats?
Returns:a list of Format objects
chemfp.cdk_toolkit.get_input_formats()

Get the list of supported CDK input formats

Returns:a list of chemfp.base_toolkit.Format objects
chemfp.cdk_toolkit.get_output_formats()

Get the list of supported CDK output formats

Returns:a list of chemfp.base_toolkit.Format objects
chemfp.cdk_toolkit.get_format(format)

Get the named format, or raise a ValueError

This will raise a ValueError if CDK does not implement the format format_name or that format is not available.

Parameters:format_name (a string) – the format name
Returns:a list of chemfp.base_toolkit.Format objects
chemfp.cdk_toolkit.get_input_format(format)

Get the named input format, or raise a ValueError

This will raise a ValueError if CDK does not implement the format format_name or that format is not an input format.

Parameters:format_name (a string) – the format name
Returns:a list of chemfp.base_toolkit.Format objects
chemfp.cdk_toolkit.get_output_format(format)

Get the named format, or raise a ValueError

This will raise a ValueError if CDK does not implement the format format_name or that format is not an output format.

Parameters:format_name (a string) – the format name
Returns:a list of chemfp.base_toolkit.Format objects
chemfp.cdk_toolkit.get_input_format_from_source(source=None, format=None)

Get the most appropriate format given the available source and format information

If format is a chemfp.base_toolkit.Format then return it. If it’s a Format-like object with “name” and “compression” attributes use it to make a real Format object with the same attributes. If it’s a string then use it to create a Format object.

If format is None, use the source to auto-detect the format. If auto-detection is not possible, assume it’s an uncompressed SMILES file.

Parameters:
  • source (a filename (as a string), a file object, or None to read from stdin) – the structure data source.
  • format (a Format(-like) object, string, or None) – format information, if known.
Returns:

a chemfp.base_toolkit.Format object

chemfp.cdk_toolkit.get_output_format_from_destination(destination=None, format=None)

Get the most appropriate format given the available destination and format information

If format is a chemfp.base_toolkit.Format then return it. If it’s a Format-like object with “name” and “compression” attributes use it to make a real Format object with the same attributes. If it’s a string then use it to create a Format object.

If format is None, use the destination to auto-detect the format. If auto-detection is not possible, assume it’s an uncompressed SMILES file.

Parameters:
  • destination (a filename (as a string), a file object, or None to read from stdin) – The structure data source.
  • format (a Format(-like) object, string, or None) – format information, if known.
Returns:

a chemfp.base_toolkit.Format object

chemfp.cdk_toolkit.read_molecules(source=None, format=None, id_tag=None, reader_args=None, errors='strict', location=None, encoding='utf8', encoding_errors='strict')

Return an iterator that reads CDK molecules from a structure file

Iterate through the format structure records in source. If format is None then auto-detect the format based on the source. For SD files, use id_tag to get the record id from the given SD tag instead of the title line. (read_molecules() will ignore the id_tag. It exists to make it easier to switch between reader functions.)

Note: the reader returns a new CDK molecule each time.

The reader_args dictionary parameters depend on the format. These include:

  • SMILES
    • delimiter - one of “tab”, “space”, “to-eol”, the space or tab characters, or None
    • has_header - True or False
    • sanitize - True or default sanitizes; False for unsanitized processing
  • InChI
    • delimiter - one of “tab”, “space”, “to-eol”, the space or tab characters, or None
    • sanitize - True or default sanitizes; False for unsanitized processing
    • removeHs - True or default removes explicit hydrogens; False leaves them in the structure
    • logLevel - an integer log level
    • treatWarningAsError - True raises an exception on error; False or default keeps processing
  • SDF
    • sanitize - True or default sanitizes; False for unsanitized processing
    • removeHs - True or default removes explicit hydrogens; False leaves them in the structure
    • strictParsing - True or default for strict parsing; False for lenient parsing

The errors parameter specifies how to handle errors. “strict” raises an exception, “report” sends a message to stderr and goes to the next record, and “ignore” goes to the next record.

The location parameter takes a chemfp.io.Location instance. If None then a default Location will be created.

See chemfp.cdk_toolkit.read_ids_and_molecules() if you want (id, molecule) pairs instead of just the molecules.

Parameters:
  • source (a filename, file object, or None to read from stdin) – the structure source
  • format (a format name string, or Format object, or None to auto-detect) – the input structure format
  • id_tag (string, or None to use the record title) – SD tag containing the record id
  • reader_args (a dictionary) – reader parameters passed to the underlying toolkit
  • errors (one of "strict", "report", or "ignore") – specify how to handle errors
  • location (a chemfp.io.Location object, or None) – object used to track parser state information
Returns:

a chemfp.base_toolkit.MoleculeReader iterating CDK molecules

chemfp.cdk_toolkit.read_molecules_from_string(content, format, id_tag=None, reader_args=None, errors='strict', location=None)

Return an iterator that reads CDK molecules from a string containing structure records

content is a string containing 0 or more records in the format format. See chemfp.cdk_toolkit.read_molecules() for details about the other parameters. See chemfp.cdk_toolkit.read_ids_and_molecules_from_string() if you want to read (id, CDK molecule) pairs instead of just molecules.

Parameters:
  • content (a string) – the string containing structure records
  • format (a format name string, or Format object) – the input structure format
  • id_tag (string, or None to use the record title) – SD tag containing the record id
  • reader_args (a dictionary) – reader arguments passed to the underlying toolkit
  • errors (one of "strict", "report", or "ignore") – specify how to handle errors
  • location (a chemfp.io.Location object, or None) – object used to track parser state information
Returns:

a chemfp.base_toolkit.MoleculeReader iterating CDK molecules

chemfp.cdk_toolkit.read_ids_and_molecules(source=None, format=None, id_tag=None, reader_args=None, errors='strict', location=None, encoding='utf8', encoding_errors='strict')

Return an iterator that reads (id, CDK molecule) pairs from a structure file

See chemfp.cdk_toolkit.read_molecules() for full parameter details. The major difference is that this returns an iterator of (id, CDK molecule) pairs instead of just the molecules.

Parameters:
  • source (a filename, file object, or None to read from stdin) – the structure source
  • format (a format name string, or Format object, or None to auto-detect) – the input structure format
  • id_tag (string, or None to use the record title) – SD tag containing the record id
  • reader_args (a dictionary) – reader arguments passed to the underlying toolkit
  • errors (one of "strict", "report", or "ignore") – specify how to handle errors
  • location (a chemfp.io.Location object, or None) – object used to track parser state information
Returns:

a chemfp.base_toolkit.IdAndMoleculeReader iterating (id, CDK molecule) pairs

chemfp.cdk_toolkit.read_ids_and_molecules_from_string(content, format, id_tag=None, reader_args=None, errors='strict', location=None)

Return an iterator that reads (id, CDK molecule) pairs from a string containing structure records

content is a string containing 0 or more records in the format format. See chemfp.cdk_toolkit.read_molecules() for details about the other parameters. See chemfp.cdk_toolkit.read_molecules_from_string() if you just want to read the CDK molecules instead of (id, molecule) pairs.

Parameters:
  • content (a string) – the string containing structure records
  • format (a format name string, or Format object) – the input structure format
  • id_tag (string, or None to use the record title) – SD tag containing the record id
  • reader_args (a dictionary) – reader arguments passed to the underlying toolkit
  • errors (one of "strict", "report", or "ignore") – specify how to handle errors
  • location (a chemfp.io.Location object, or None) – object used to track parser state information
Returns:

a chemfp.base_toolkit.IdAndMoleculeReader iterating (id, CDK molecule) pairs

chemfp.cdk_toolkit.make_id_and_molecule_parser(format, id_tag=None, reader_args=None, errors='strict')

Create a specialized function which takes a record and returns an (id, CDK molecule) pair

The returned function is optimized for reading many records from individual strings because it only does parameter validation once. However, I haven’t really noticed much of a performance difference between this and chemfp.cdk_toolkit.parse_id_and_molecule() so you can probably so I suggest you use that function directly instead of making a specialized function. (Let me know if making a specialized function is useful.)

See chemfp.cdk_toolkit.read_molecules() for details about the other parameters.

Parameters:
  • format (a format name string, or Format object) – the input structure format
  • id_tag (string, or None to use the record title) – SD tag containing the record id
  • reader_args (a dictionary) – reader arguments passed to the underlying toolkit
  • errors (one of "strict", "report", or "ignore") – specify how to handle errors
Returns:

a function of the form parser(record string) -> (id, CDK molecule)

chemfp.cdk_toolkit.parse_molecule(content, format, id_tag=None, reader_args=None, errors='strict')

Parse the first structure record from the content string and return a CDK molecule.

content is a string containing a single structure record in format format. (Additional records are ignored). See chemfp.cdk_toolkit.read_molecules() for details about the other parameters. See chemfp.cdk_toolkit.parse_id_and_molecule() if you want the (id, CDK molecule) pair instead of just the molecule.

Parameters:
  • content (a string) – the string containing a structure record
  • format (a format name string, or Format object) – the input structure format
  • id_tag (string, or None to use the record title) – SD tag containing the record id
  • reader_args (a dictionary) – reader arguments passed to the underlying toolkit
  • errors (one of "strict", "report", or "ignore") – specify how to handle errors
Returns:

a CDK molecule

chemfp.cdk_toolkit.parse_id_and_molecule(content, format, id_tag=None, reader_args=None, errors='strict')

Parse the first structure record from content and return the (id, CDK molecule) pair.

content is a string containing a single structure record in format format. (Additional records are ignored). See chemfp.cdk_toolkit.read_molecules() for details about the other parameters.

See chemfp.cdk_toolkit.read_molecules() for details about the other parameters. See chemfp.cdk_toolkit.parse_molecule() if just want the CDK molecule and not the the (id, CDK molecule) pair.

Parameters:
  • content (a string) – the string containing a structure record
  • format (a format name string, or Format object) – the input structure format
  • id_tag (string, or None to use the record title) – SD tag containing the record id
  • reader_args (a dictionary) – reader arguments passed to the underlying toolkit
  • errors (one of "strict", "report", or "ignore") – specify how to handle errors
Returns:

an (id, CDK molecule) pair

chemfp.cdk_toolkit.create_string(mol, format, id=None, writer_args=None, errors='strict')

Convert a CDK molecule into a structure record in the given format as a Unicode string

If id is not None then use it instead of the molecule’s own title. Warning: this may briefly modify the molecule, so may not be thread-safe.

Parameters:
  • mol (a CDK molecule) – the molecule to use for the output
  • format (a format name string, or Format object) – the output structure format
  • id (a string, or None to use the molecule's own id) – an alternate record id
  • writer_args (a dictionary) – writer arguments passed to the underlying toolkit
  • errors (one of "strict", "report", or "ignore") – specify how to handle errors
Returns:

a Unicode string

chemfp.cdk_toolkit.create_bytes(mol, format, id=None, writer_args=None, errors='strict', level=None)

Convert a CDK molecule into a structure record in the given format as a byte string

If id is not None then use it instead of the molecule’s own title. Warning: this may briefly modify the molecule, so may not be thread-safe.

Parameters:
  • mol (a CDK molecule) – the molecule to use for the output
  • format (a format name string, or Format object) – the output structure format
  • id (a string, or None to use the molecule's own id) – an alternate record id
  • writer_args (a dictionary) – writer arguments passed to the underlying toolkit
  • errors (one of "strict", "report", or "ignore") – specify how to handle errors
  • level (None, a positive integer, or one of the strings 'min', 'default', or 'max') – compression level to use for compressed formats
Returns:

a byte string

chemfp.cdk_toolkit.open_molecule_writer(destination=None, format=None, writer_args=None, errors='strict', location=None, encoding='utf8', encoding_errors='strict', level=None)

Return a MoleculeWriter which can write CDK molecules to a destination.

A chemfp.base_toolkit.MoleculeWriter has the methods write_molecule, write_molecules, and write_ids_and_molecules, which are ways to write a CDK molecule, a CDK molecule iterator, or an (id, CDK molecule) pair iterator to a file.

Molecules are written to destination. The output format can be a string like “sdf.gz” or “smi”, a chemfp.base_toolkit.Format, or Format-like object with “name” and “compression” attributes, or None to auto-detect based on the destination. If auto-detection is not possible, the output will be written as uncompressed SMILES.

The writer_args dictionary parameters depend on the format. These include:

  • SMILES
    • delimiter - one of “tab”, “space”, “to-eol”, the space or tab characters, or None
    • isomericSmiles - True to generate isomeric SMILES
    • kekuleSmiles - True to generate SMILES in Kekule form
    • canonical - True to generate a canonical SMILES
    • allBondsExplicit - True to write explict ‘-’ and ‘:’ bonds, even if they can be inferred; default is False
    • allHsExplicit - True to write explicit hydrogen counts; default is False
    • cxsmiles - True to include CXSMILES annotations; default is False

InChI and InChIKey

  • delimiter - one of “tab”, “space”, “to-eol”, the space or tab characters, or None
  • include_id - True or default to include the id as the second column; False has no id column
  • options - an options string passed to the underlying InChI library
  • logLevel - an integer log level
  • treatWarningAsError - True raises an exception on error; False or default keeps processing

SDF

  • includeStereo - True include stereo information; False or default does not
  • kekulize - True or default creates the connection table with bonds in Kekeule form
  • v3k - True to always export in V3000 format

The errors parameter specifies how to handle errors. “strict” raises an exception, “report” sends a message to stderr and goes to the next record, and “ignore” goes to the next record.

The location parameter takes a chemfp.io.Location instance. If None then a default Location will be created.

Parameters:
  • destination (a filename, file object, or None to write to stdout) – the structure destination
  • format (a format name string, or Format(-like) object, or None to auto-detect) – the output structure format
  • writer_args (a dictionary) – writer parameters passed to the underlying toolkit
  • errors (one of "strict", "report", or "ignore") – specify how to handle errors
  • location (a chemfp.io.Location object, or None) – object used to track writer state information
  • level (None, a positive integer, or one of the strings 'min', 'default', or 'max') – compression level to use for compressed formats
Returns:

a chemfp.base_toolkit.MoleculeWriter expecting CDK molecules

chemfp.cdk_toolkit.open_molecule_writer_to_string(format, writer_args=None, errors='strict', location=None)

Return a MoleculeStringWriter which can write molecule records in the given format to a string.

See chemfp.cdk_toolkit.open_molecule_writer() for full parameter details.

Use the writer’s chemfp.base_toolkit.MoleculeStringWriter.getvalue() to get the output as a Unicode string.

Parameters:
  • format (a format name string, or Format(-like) object, or None to auto-detect) – the output structure format
  • writer_args (a dictionary) – writer arguments passed to the underlying toolkit
  • errors (one of "strict", "report", or "ignore") – specify how to handle errors
  • location (a chemfp.io.Location object, or None) – object used to track writer state information
Returns:

a chemfp.base_toolkit.MoleculeStringWriter expecting CDK molecules

chemfp.cdk_toolkit.open_molecule_writer_to_bytes(format, writer_args=None, errors='strict', location=None, level=None)

Return a MoleculeStringWriter which can write molecule records in the given format to a text string.

See chemfp.cdk_toolkit.open_molecule_writer() for full parameter details.

Use the writer’s chemfp.base_toolkit.MoleculeStringWriter.getvalue() to get the output as a byte string.

Parameters:
  • format (a format name string, or Format(-like) object, or None to auto-detect) – the output structure format
  • writer_args (a dictionary) – writer arguments passed to the underlying toolkit
  • errors (one of "strict", "report", or "ignore") – specify how to handle errors
  • location (a chemfp.io.Location object, or None) – object used to track writer state information
  • level (None, a positive integer, or one of the strings 'min', 'default', or 'max') – compression level to use for compressed formats
Returns:

a chemfp.base_toolkit.MoleculeStringWriter expecting CDK molecules

chemfp.cdk_toolkit.copy_molecule(mol)

Return a new CDK molecule which is a copy of the given molecule

Parameters:mol (a CDK molecule) – the molecule to copy
Returns:a new CDK Mol instance
chemfp.cdk_toolkit.add_tag(mol, tag, value)

Add an SD tag value to the CDK molecule

Parameters:
  • mol (a CDK molecule) – the molecule
  • tag (string) – the SD tag name
  • value (string) – the text for the tag
Returns:

None

chemfp.cdk_toolkit.get_tag(mol, tag)

Get the named SD tag value, or None if it doesn’t exist

Parameters:
  • mol (a CDK molecule) – the molecule
  • tag (string) – the SD tag name
Returns:

a string, or None

chemfp.cdk_toolkit.get_tag_pairs(mol)

Get a list of all SD tag (name, value) pairs for the molecule

Parameters:mol (a CDK molecule) – the molecule
Returns:a list of (string name, string value) pairs
chemfp.cdk_toolkit.get_id(mol)

Get the molecule’s id from CDK’s “cdk:Title” property

Parameters:mol (a CDK molecule) – the molecule
Returns:a string
chemfp.cdk_toolkit.set_id(mol, id)

Set the molecule’s id as CDK’s “cdk:Title” property

Parameters:
  • mol (a CDK molecule) – the molecule
  • id (string) – the new id
Returns:

None

chemfp.cdk_toolkit.from_smistring(content: str, *, kekulise: bool = True, errors: str = 'strict')

Parse a SMILES string using the CDK toolkit

This is equivalent to calling:
parse_molecule(content, “smistring”, reader_args={…}, errors=errors)
Parameters:
  • kekulise (Boolean (default: True)) – if true, ensure a valid Kekule intepretation exists
  • errors (one of "strict", "ignore", or "log") – specify how to handle errors
Returns:

a CDK molecule object

chemfp.cdk_toolkit.to_smistring(mol: Any, *, id: Optional[str, None] = None, flavor: Union[int, str, None] = 'Default', errors: str = 'strict')

Generate a SMILES string from a CDK molecule

This is equivalent to calling:
create_string(mol, “smistring”, id=id, writer_args={…}, errors=errors)
Available bit flag flavors are:
‘Canonical’ = 1 (in default bit flags) ‘InChILabelling’ = 3 ‘AtomAtomMap’ = 4 ‘AtomicMass’ = 8 (in default bit flags) ‘UseAromaticSymbols’ = 16 ‘StereoTetrahedral’ = 256 (in default bit flags) ‘StereoCisTrans’ = 512 (in default bit flags) ‘StereoExTetrahedral’ = 1024 (in default bit flags) ‘StereoExCisTrans’ = 1280 (in default bit flags) ‘AtomicMassStrict’ = 2048 ‘Stereo’ = 1792 (in default bit flags) ‘Cx2dCoordinates’ = 4096 ‘Cx3dCoordinates’ = 8192 ‘CxCoordinates’ = 12288 ‘CxAtomLabel’ = 32768 ‘CxAtomValue’ = 65536 ‘CxRadical’ = 131072 ‘CxMulticenter’ = 262144 ‘CxPolymer’ = 524288 ‘CxFragmentGroup’ = 1048576 ‘AtomAtomMapRenumber’ = 33554437 ‘CxSmiles’ = 12550400 ‘CxSmilesWithCoords’ = 12562688 ‘Unique’ = 1 (in default bit flags) ‘Isomeric’ = 1800 (in default bit flags) ‘Absolute’ = 1801 (in default bit flags) ‘UniversalSmiles’ = 1803 ‘Default’ = 1801 (in default bit flags)
Parameters:
  • mol (a CDK molecule) – a molecule object
  • id (None or a string (default: None)) – an alternate identifier for the output record, if relevant
  • flavor (None, integer or string with "|"- or ","-separated terms (default: "Default")) – Output flavor bit flags
  • errors (one of "strict", "ignore", or "log") – specify how to handle errors
Returns:

a CDK molecule object

chemfp.cdk_toolkit.from_smi(content: str, *, has_header: bool = False, delimiter: Optional[Literal[to_eol, space, tab, comma, whitespace, native, , ], None] = 'to-eol', implementation: Optional[Literal[cdk, chemfp], None] = 'cdk', kekulise: bool = True, errors: str = 'strict')

Parse a SMILES string and id using the CDK toolkit

This is equivalent to calling:
parse_molecule(content, “smi”, reader_args={…}, errors=errors)
Parameters:
  • has_header (Boolean (default: False)) – If true, treat the first line of the SMILES file as a header
  • delimiter (One of None, 'to_eol', 'space', 'tab', 'comma', 'whitespace', 'native', or the space or tab characters (default: "to-eol")) – The separator between the SMILES and the id
  • implementation (either 'cdk' or 'chemfp') – use CDK or chemfp to identify records
  • kekulise (Boolean (default: True)) – if true, ensure a valid Kekule intepretation exists
  • errors (one of "strict", "ignore", or "log") – specify how to handle errors
Returns:

a CDK molecule object

chemfp.cdk_toolkit.to_smi(mol: Any, *, id: Optional[str, None] = None, delimiter: Optional[Literal[to_eol, space, tab, comma, whitespace, native, , ], None] = None, flavor: Union[int, str, None] = 'Default', errors: str = 'strict')

Generate a SMILES string and id from a CDK molecule

This is equivalent to calling:
create_string(mol, “smi”, id=id, writer_args={…}, errors=errors)
Available bit flag flavors are:
‘Canonical’ = 1 (in default bit flags) ‘InChILabelling’ = 3 ‘AtomAtomMap’ = 4 ‘AtomicMass’ = 8 (in default bit flags) ‘UseAromaticSymbols’ = 16 ‘StereoTetrahedral’ = 256 (in default bit flags) ‘StereoCisTrans’ = 512 (in default bit flags) ‘StereoExTetrahedral’ = 1024 (in default bit flags) ‘StereoExCisTrans’ = 1280 (in default bit flags) ‘AtomicMassStrict’ = 2048 ‘Stereo’ = 1792 (in default bit flags) ‘Cx2dCoordinates’ = 4096 ‘Cx3dCoordinates’ = 8192 ‘CxCoordinates’ = 12288 ‘CxAtomLabel’ = 32768 ‘CxAtomValue’ = 65536 ‘CxRadical’ = 131072 ‘CxMulticenter’ = 262144 ‘CxPolymer’ = 524288 ‘CxFragmentGroup’ = 1048576 ‘AtomAtomMapRenumber’ = 33554437 ‘CxSmiles’ = 12550400 ‘CxSmilesWithCoords’ = 12562688 ‘Unique’ = 1 (in default bit flags) ‘Isomeric’ = 1800 (in default bit flags) ‘Absolute’ = 1801 (in default bit flags) ‘UniversalSmiles’ = 1803 ‘Default’ = 1801 (in default bit flags)
Parameters:
  • mol (a CDK molecule) – a molecule object
  • id (None or a string (default: None)) – an alternate identifier for the output record, if relevant
  • delimiter (One of None, 'to_eol', 'space', 'tab', 'comma', 'whitespace', 'native', or the space or tab characters (default: None)) – The separator between the SMILES and the id
  • flavor (None, integer or string with "|"- or ","-separated terms (default: "Default")) – Output flavor bit flags
  • errors (one of "strict", "ignore", or "log") – specify how to handle errors
Returns:

a CDK molecule object

chemfp.cdk_toolkit.from_smi_file(source: Union[None, str, BinaryIO], *, has_header: bool = False, delimiter: Optional[Literal[to_eol, space, tab, comma, whitespace, native, , ], None] = 'to-eol', implementation: Optional[Literal[cdk, chemfp], None] = 'cdk', kekulise: bool = True, errors: str = 'strict')

Parse a SMILES string and id file using the CDK toolkit

This is mostly equivalent to calling:
read_molecules(source, “smi”, reader_args={…}, errors=errors)
Parameters:
  • has_header (Boolean (default: False)) – If true, treat the first line of the SMILES file as a header
  • delimiter (One of None, 'to_eol', 'space', 'tab', 'comma', 'whitespace', 'native', or the space or tab characters (default: "to-eol")) – The separator between the SMILES and the id
  • implementation (either 'cdk' or 'chemfp') – use CDK or chemfp to identify records
  • kekulise (Boolean (default: True)) – if true, ensure a valid Kekule intepretation exists
  • errors (one of "strict", "ignore", or "log") – specify how to handle errors
Returns:

a chemfp.base_toolkit.MoleculeReader iterating CDK molecules

chemfp.cdk_toolkit.to_smi_file(destination: Union[None, str, BinaryIO], *, delimiter: Optional[Literal[to_eol, space, tab, comma, whitespace, native, , ], None] = None, flavor: Union[int, str, None] = 'Default', errors: str = 'strict')

Generate a SMILES string and id from a CDK molecule

This is mostly equivalent to calling:
open_molecule_writer(destination, “smi”, writer_args={…}, errors=errors)
Available bit flag flavors are:
‘Canonical’ = 1 (in default bit flags) ‘InChILabelling’ = 3 ‘AtomAtomMap’ = 4 ‘AtomicMass’ = 8 (in default bit flags) ‘UseAromaticSymbols’ = 16 ‘StereoTetrahedral’ = 256 (in default bit flags) ‘StereoCisTrans’ = 512 (in default bit flags) ‘StereoExTetrahedral’ = 1024 (in default bit flags) ‘StereoExCisTrans’ = 1280 (in default bit flags) ‘AtomicMassStrict’ = 2048 ‘Stereo’ = 1792 (in default bit flags) ‘Cx2dCoordinates’ = 4096 ‘Cx3dCoordinates’ = 8192 ‘CxCoordinates’ = 12288 ‘CxAtomLabel’ = 32768 ‘CxAtomValue’ = 65536 ‘CxRadical’ = 131072 ‘CxMulticenter’ = 262144 ‘CxPolymer’ = 524288 ‘CxFragmentGroup’ = 1048576 ‘AtomAtomMapRenumber’ = 33554437 ‘CxSmiles’ = 12550400 ‘CxSmilesWithCoords’ = 12562688 ‘Unique’ = 1 (in default bit flags) ‘Isomeric’ = 1800 (in default bit flags) ‘Absolute’ = 1801 (in default bit flags) ‘UniversalSmiles’ = 1803 ‘Default’ = 1801 (in default bit flags)
Parameters:
  • destination (None, a filename string, or a file-like object) – where to write the molecules
  • delimiter (One of None, 'to_eol', 'space', 'tab', 'comma', 'whitespace', 'native', or the space or tab characters (default: None)) – The separator between the SMILES and the id
  • flavor (None, integer or string with "|"- or ","-separated terms (default: "Default")) – Output flavor bit flags
  • errors (one of "strict", "ignore", or "log") – specify how to handle errors
Returns:

a chemfp.base_toolkit.MoleculeWriter expecting CDK molecules

chemfp.cdk_toolkit.from_sdf(content: str, *, ForceReadAs3DCoordinates: bool = False, mode: Literal[RELAXED, STRICT] = 'RELAXED', AddStereoElements: bool = True, InterpretHydrogenIsotopes: bool = True, implementation: Optional[Literal[cdk, chemfp], None] = 'cdk', errors: str = 'strict')

Parse an SDF record using the CDK toolkit

This is equivalent to calling:
parse_molecule(content, “sdf”, reader_args={…}, errors=errors)
Parameters:
  • ForceReadAs3DCoordinates (Boolean (default: False)) – if true, always interpret coordinates as 3D
  • mode ('RELAXED' will attempt to recover, 'STRICT' will not) – strictness mode when parsing a record
  • AddStereoElements (Boolean (default: True)) – if true, detect and create IStereoElements
  • InterpretHydrogenIsotopes (Boolean (default: True)) – if true, interpret D and T as hydrogen isotopes
  • implementation (either 'cdk' or 'chemfp') – use CDK or chemfp to identify records
  • errors (one of "strict", "ignore", or "log") – specify how to handle errors
Returns:

a CDK molecule object

chemfp.cdk_toolkit.to_sdf(mol: Any, *, id: Optional[str, None] = None, WriteAromaticBondTypes: bool = False, WriteMajorIsotopes: bool = True, writeProperties: bool = True, WriteQueryFormatValencies: bool = False, TruncateLongData: bool = False, ProgramName: str = 'CDK', ForceWriteAs2DCoordinates: bool = False, WriteDefaultProperties: bool = True, writeV3000: bool = False, errors: str = 'strict')

Generate an SDF record from a CDK molecule

This is equivalent to calling:
create_string(mol, “sdf”, id=id, writer_args={…}, errors=errors)
Parameters:
  • mol (a CDK molecule) – a molecule object
  • id (None or a string (default: None)) – an alternate identifier for the output record, if relevant
  • WriteAromaticBondTypes (Boolean (default: False)) – if true, write aromatic bonds as bond type 4
  • WriteMajorIsotopes (Boolean (default: True)) – if true, include isotopic mass for atoms with a specified mass
  • writeProperties (Boolean (default: True)) – if true, write non-molecule data to the data tags
  • WriteQueryFormatValencies (Boolean (default: False)) – if true, write valences in the MDL query format (deprecated)
  • TruncateLongData (Boolean (default: False)) – if true, truncate data items longer than 200 characters
  • ProgramName (a string up to 8 characters long) – text to use in the ‘program name’ section of the second line
  • ForceWriteAs2DCoordinates (Boolean (default: False)) – if true, write coordinates as 2D
  • WriteDefaultProperties (Boolean (default: True)) – if true, always include zeros in the empty atom and bond block fields
  • writeV3000 (Boolean (default: False)) – if true, always write the record in V3000 format
  • errors (one of "strict", "ignore", or "log") – specify how to handle errors
Returns:

a CDK molecule object

chemfp.cdk_toolkit.from_sdf_file(source: Union[None, str, BinaryIO], *, ForceReadAs3DCoordinates: bool = False, mode: Literal[RELAXED, STRICT] = 'RELAXED', AddStereoElements: bool = True, InterpretHydrogenIsotopes: bool = True, implementation: Optional[Literal[cdk, chemfp], None] = 'cdk', errors: str = 'strict')

Parse an SDF record file using the CDK toolkit

This is mostly equivalent to calling:
read_molecules(source, “sdf”, reader_args={…}, errors=errors)
Parameters:
  • ForceReadAs3DCoordinates (Boolean (default: False)) – if true, always interpret coordinates as 3D
  • mode ('RELAXED' will attempt to recover, 'STRICT' will not) – strictness mode when parsing a record
  • AddStereoElements (Boolean (default: True)) – if true, detect and create IStereoElements
  • InterpretHydrogenIsotopes (Boolean (default: True)) – if true, interpret D and T as hydrogen isotopes
  • implementation (either 'cdk' or 'chemfp') – use CDK or chemfp to identify records
  • errors (one of "strict", "ignore", or "log") – specify how to handle errors
Returns:

a chemfp.base_toolkit.MoleculeReader iterating CDK molecules

chemfp.cdk_toolkit.to_sdf_file(destination: Union[None, str, BinaryIO], *, WriteAromaticBondTypes: bool = False, WriteMajorIsotopes: bool = True, writeProperties: bool = True, WriteQueryFormatValencies: bool = False, TruncateLongData: bool = False, ProgramName: str = 'CDK', ForceWriteAs2DCoordinates: bool = False, WriteDefaultProperties: bool = True, writeV3000: bool = False, errors: str = 'strict')

Generate an SDF record from a CDK molecule

This is mostly equivalent to calling:
open_molecule_writer(destination, “sdf”, writer_args={…}, errors=errors)
Parameters:
  • destination (None, a filename string, or a file-like object) – where to write the molecules
  • WriteAromaticBondTypes (Boolean (default: False)) – if true, write aromatic bonds as bond type 4
  • WriteMajorIsotopes (Boolean (default: True)) – if true, include isotopic mass for atoms with a specified mass
  • writeProperties (Boolean (default: True)) – if true, write non-molecule data to the data tags
  • WriteQueryFormatValencies (Boolean (default: False)) – if true, write valences in the MDL query format (deprecated)
  • TruncateLongData (Boolean (default: False)) – if true, truncate data items longer than 200 characters
  • ProgramName (a string up to 8 characters long) – text to use in the ‘program name’ section of the second line
  • ForceWriteAs2DCoordinates (Boolean (default: False)) – if true, write coordinates as 2D
  • WriteDefaultProperties (Boolean (default: True)) – if true, always include zeros in the empty atom and bond block fields
  • writeV3000 (Boolean (default: False)) – if true, always write the record in V3000 format
  • errors (one of "strict", "ignore", or "log") – specify how to handle errors
Returns:

a chemfp.base_toolkit.MoleculeWriter expecting CDK molecules

chemfp.cdk_toolkit.to_sdf3k(mol: Any, *, id: Optional[str, None] = None, WriteAromaticBondTypes: bool = False, WriteMajorIsotopes: bool = True, writeProperties: bool = True, WriteQueryFormatValencies: bool = False, TruncateLongData: bool = False, ProgramName: str = 'CDK', ForceWriteAs2DCoordinates: bool = False, WriteDefaultProperties: bool = True, writeV3000: bool = True, errors: str = 'strict')

Generate an SDF record in V3000 format from a CDK molecule

This is equivalent to calling:
create_string(mol, “sdf3k”, id=id, writer_args={…}, errors=errors)
Parameters:
  • mol (a CDK molecule) – a molecule object
  • id (None or a string (default: None)) – an alternate identifier for the output record, if relevant
  • WriteAromaticBondTypes (Boolean (default: False)) – if true, write aromatic bonds as bond type 4
  • WriteMajorIsotopes (Boolean (default: True)) – if true, include isotopic mass for atoms with a specified mass
  • writeProperties (Boolean (default: True)) – if true, write non-molecule data to the data tags
  • WriteQueryFormatValencies (Boolean (default: False)) – if true, write valences in the MDL query format (deprecated)
  • TruncateLongData (Boolean (default: False)) – if true, truncate data items longer than 200 characters
  • ProgramName (a string up to 8 characters long) – text to use in the ‘program name’ section of the second line
  • ForceWriteAs2DCoordinates (Boolean (default: False)) – if true, write coordinates as 2D
  • WriteDefaultProperties (Boolean (default: True)) – if true, always include zeros in the empty atom and bond block fields
  • writeV3000 (Boolean (default: True)) – if true, always write the record in V3000 format
  • errors (one of "strict", "ignore", or "log") – specify how to handle errors
Returns:

a CDK molecule object

chemfp.cdk_toolkit.to_sdf3k_file(destination: Union[None, str, BinaryIO], *, WriteAromaticBondTypes: bool = False, WriteMajorIsotopes: bool = True, writeProperties: bool = True, WriteQueryFormatValencies: bool = False, TruncateLongData: bool = False, ProgramName: str = 'CDK', ForceWriteAs2DCoordinates: bool = False, WriteDefaultProperties: bool = True, writeV3000: bool = True, errors: str = 'strict')

Generate an SDF record in V3000 format from a CDK molecule

This is mostly equivalent to calling:
open_molecule_writer(destination, “sdf3k”, writer_args={…}, errors=errors)
Parameters:
  • destination (None, a filename string, or a file-like object) – where to write the molecules
  • WriteAromaticBondTypes (Boolean (default: False)) – if true, write aromatic bonds as bond type 4
  • WriteMajorIsotopes (Boolean (default: True)) – if true, include isotopic mass for atoms with a specified mass
  • writeProperties (Boolean (default: True)) – if true, write non-molecule data to the data tags
  • WriteQueryFormatValencies (Boolean (default: False)) – if true, write valences in the MDL query format (deprecated)
  • TruncateLongData (Boolean (default: False)) – if true, truncate data items longer than 200 characters
  • ProgramName (a string up to 8 characters long) – text to use in the ‘program name’ section of the second line
  • ForceWriteAs2DCoordinates (Boolean (default: False)) – if true, write coordinates as 2D
  • WriteDefaultProperties (Boolean (default: True)) – if true, always include zeros in the empty atom and bond block fields
  • writeV3000 (Boolean (default: True)) – if true, always write the record in V3000 format
  • errors (one of "strict", "ignore", or "log") – specify how to handle errors
Returns:

a chemfp.base_toolkit.MoleculeWriter expecting CDK molecules

chemfp.cdk_toolkit.from_molfile(content: str, *, ForceReadAs3DCoordinates: bool = False, mode: Literal[RELAXED, STRICT] = 'RELAXED', AddStereoElements: bool = True, InterpretHydrogenIsotopes: bool = True, implementation: Optional[Literal[cdk, chemfp], None] = 'cdk', errors: str = 'strict')

Parse a molfile using the CDK toolkit

This is equivalent to calling:
parse_molecule(content, “molfile”, reader_args={…}, errors=errors)
Parameters:
  • ForceReadAs3DCoordinates (Boolean (default: False)) – if true, always interpret coordinates as 3D
  • mode ('RELAXED' will attempt to recover, 'STRICT' will not) – strictness mode when parsing a record
  • AddStereoElements (Boolean (default: True)) – if true, detect and create IStereoElements
  • InterpretHydrogenIsotopes (Boolean (default: True)) – if true, interpret D and T as hydrogen isotopes
  • implementation (either 'cdk' or 'chemfp') – use CDK or chemfp to identify records
  • errors (one of "strict", "ignore", or "log") – specify how to handle errors
Returns:

a CDK molecule object

chemfp.cdk_toolkit.to_molfile(mol: Any, *, id: Optional[str, None] = None, WriteAromaticBondTypes: bool = False, WriteMajorIsotopes: bool = True, WriteQueryFormatValencies: bool = False, TruncateLongData: bool = False, ProgramName: str = 'CDK', ForceWriteAs2DCoordinates: bool = False, WriteDefaultProperties: bool = True, writeV3000: bool = False, errors: str = 'strict')

Generate a molfile from a CDK molecule

This is equivalent to calling:
create_string(mol, “molfile”, id=id, writer_args={…}, errors=errors)
Parameters:
  • mol (a CDK molecule) – a molecule object
  • id (None or a string (default: None)) – an alternate identifier for the output record, if relevant
  • WriteAromaticBondTypes (Boolean (default: False)) – if true, write aromatic bonds as bond type 4
  • WriteMajorIsotopes (Boolean (default: True)) – if true, include isotopic mass for atoms with a specified mass
  • WriteQueryFormatValencies (Boolean (default: False)) – if true, write valences in the MDL query format (deprecated)
  • TruncateLongData (Boolean (default: False)) – if true, truncate data items longer than 200 characters
  • ProgramName (a string up to 8 characters long) – text to use in the ‘program name’ section of the second line
  • ForceWriteAs2DCoordinates (Boolean (default: False)) – if true, write coordinates as 2D
  • WriteDefaultProperties (Boolean (default: True)) – if true, always include zeros in the empty atom and bond block fields
  • writeV3000 (Boolean (default: False)) – if true, always write the record in V3000 format
  • errors (one of "strict", "ignore", or "log") – specify how to handle errors
Returns:

a CDK molecule object

chemfp.cdk_toolkit.from_inchi(content: str, *, delimiter: Optional[Literal[to_eol, space, tab, comma, whitespace, native, , ], None] = 'to-eol', errors: str = 'strict')

Parse an InChI string and id using the CDK toolkit

This is equivalent to calling:
parse_molecule(content, “inchi”, reader_args={…}, errors=errors)
Parameters:
  • delimiter (One of None, 'to_eol', 'space', 'tab', 'comma', 'whitespace', 'native', or the space or tab characters (default: "to-eol")) – The separator between the SMILES and the id
  • errors (one of "strict", "ignore", or "log") – specify how to handle errors
Returns:

a CDK molecule object

chemfp.cdk_toolkit.to_inchi(mol: Any, *, id: Optional[str, None] = None, RecMet: bool = None, FixedH: bool = None, DoNotAddH: bool = None, options: str = None, delimiter: Optional[Literal[to_eol, space, tab, comma, whitespace, native, , ], None] = None, include_id: bool = True, errors: str = 'strict')

Generate an InChI string and id from a CDK molecule

This is equivalent to calling:
create_string(mol, “inchi”, id=id, writer_args={…}, errors=errors)
Parameters:
  • mol (a CDK molecule) – a molecule object
  • id (None or a string (default: None)) – an alternate identifier for the output record, if relevant
  • RecMet (Boolean (default: None)) – Reconnect metals
  • FixedH (Boolean (default: None)) – Use fixed hydrogens
  • DoNotAddH (Boolean (default: None)) – Do not add hydrogens
  • options (space separated strings) – Configuration string to pass to the InChI API
  • delimiter (One of None, 'to_eol', 'space', 'tab', 'comma', 'whitespace', 'native', or the space or tab characters (default: None)) – The separator between the SMILES and the id
  • include_id (Boolean (default: True)) – if true, include the molecule id in the output
  • errors (one of "strict", "ignore", or "log") – specify how to handle errors
Returns:

a CDK molecule object

chemfp.cdk_toolkit.from_inchi_file(source: Union[None, str, BinaryIO], *, delimiter: Optional[Literal[to_eol, space, tab, comma, whitespace, native, , ], None] = 'to-eol', errors: str = 'strict')

Parse an InChI string and id file using the CDK toolkit

This is mostly equivalent to calling:
read_molecules(source, “inchi”, reader_args={…}, errors=errors)
Parameters:
  • delimiter (One of None, 'to_eol', 'space', 'tab', 'comma', 'whitespace', 'native', or the space or tab characters (default: "to-eol")) – The separator between the SMILES and the id
  • errors (one of "strict", "ignore", or "log") – specify how to handle errors
Returns:

a chemfp.base_toolkit.MoleculeReader iterating CDK molecules

chemfp.cdk_toolkit.to_inchi_file(destination: Union[None, str, BinaryIO], *, RecMet: bool = None, FixedH: bool = None, DoNotAddH: bool = None, options: str = None, delimiter: Optional[Literal[to_eol, space, tab, comma, whitespace, native, , ], None] = None, include_id: bool = True, errors: str = 'strict')

Generate an InChI string and id from a CDK molecule

This is mostly equivalent to calling:
open_molecule_writer(destination, “inchi”, writer_args={…}, errors=errors)
Parameters:
  • destination (None, a filename string, or a file-like object) – where to write the molecules
  • RecMet (Boolean (default: None)) – Reconnect metals
  • FixedH (Boolean (default: None)) – Use fixed hydrogens
  • DoNotAddH (Boolean (default: None)) – Do not add hydrogens
  • options (space separated strings) – Configuration string to pass to the InChI API
  • delimiter (One of None, 'to_eol', 'space', 'tab', 'comma', 'whitespace', 'native', or the space or tab characters (default: None)) – The separator between the SMILES and the id
  • include_id (Boolean (default: True)) – if true, include the molecule id in the output
  • errors (one of "strict", "ignore", or "log") – specify how to handle errors
Returns:

a chemfp.base_toolkit.MoleculeWriter expecting CDK molecules

chemfp.cdk_toolkit.from_inchistring(content: str, *, delimiter: Optional[Literal[to_eol, space, tab, comma, whitespace, native, , ], None] = 'to-eol', errors: str = 'strict')

Parse an InChI string using the CDK toolkit

This is equivalent to calling:
parse_molecule(content, “inchistring”, reader_args={…}, errors=errors)
Parameters:
  • delimiter (One of None, 'to_eol', 'space', 'tab', 'comma', 'whitespace', 'native', or the space or tab characters (default: "to-eol")) – The separator between the SMILES and the id
  • errors (one of "strict", "ignore", or "log") – specify how to handle errors
Returns:

a CDK molecule object

chemfp.cdk_toolkit.to_inchistring(mol: Any, *, id: Optional[str, None] = None, RecMet: bool = None, FixedH: bool = None, DoNotAddH: bool = None, options: str = None, errors: str = 'strict')

Generate an InChI string from a CDK molecule

This is equivalent to calling:
create_string(mol, “inchistring”, id=id, writer_args={…}, errors=errors)
Parameters:
  • mol (a CDK molecule) – a molecule object
  • id (None or a string (default: None)) – an alternate identifier for the output record, if relevant
  • RecMet (Boolean (default: None)) – Reconnect metals
  • FixedH (Boolean (default: None)) – Use fixed hydrogens
  • DoNotAddH (Boolean (default: None)) – Do not add hydrogens
  • options (space separated strings) – Configuration string to pass to the InChI API
  • errors (one of "strict", "ignore", or "log") – specify how to handle errors
Returns:

a CDK molecule object

chemfp.cdk_toolkit.to_inchikey(mol: Any, *, id: Optional[str, None] = None, RecMet: bool = None, FixedH: bool = None, DoNotAddH: bool = None, options: str = None, delimiter: Optional[Literal[to_eol, space, tab, comma, whitespace, native, , ], None] = None, include_id: bool = True, errors: str = 'strict')

Generate an InChIKey string and id from a CDK molecule

This is equivalent to calling:
create_string(mol, “inchikey”, id=id, writer_args={…}, errors=errors)
Parameters:
  • mol (a CDK molecule) – a molecule object
  • id (None or a string (default: None)) – an alternate identifier for the output record, if relevant
  • RecMet (Boolean (default: None)) – Reconnect metals
  • FixedH (Boolean (default: None)) – Use fixed hydrogens
  • DoNotAddH (Boolean (default: None)) – Do not add hydrogens
  • options (space separated strings) – Configuration string to pass to the InChI API
  • delimiter (One of None, 'to_eol', 'space', 'tab', 'comma', 'whitespace', 'native', or the space or tab characters (default: None)) – The separator between the SMILES and the id
  • include_id (Boolean (default: True)) – if true, include the molecule id in the output
  • errors (one of "strict", "ignore", or "log") – specify how to handle errors
Returns:

a CDK molecule object

chemfp.cdk_toolkit.to_inchikey_file(destination: Union[None, str, BinaryIO], *, RecMet: bool = None, FixedH: bool = None, DoNotAddH: bool = None, options: str = None, delimiter: Optional[Literal[to_eol, space, tab, comma, whitespace, native, , ], None] = None, include_id: bool = True, errors: str = 'strict')

Generate an InChIKey string and id from a CDK molecule

This is mostly equivalent to calling:
open_molecule_writer(destination, “inchikey”, writer_args={…}, errors=errors)
Parameters:
  • destination (None, a filename string, or a file-like object) – where to write the molecules
  • RecMet (Boolean (default: None)) – Reconnect metals
  • FixedH (Boolean (default: None)) – Use fixed hydrogens
  • DoNotAddH (Boolean (default: None)) – Do not add hydrogens
  • options (space separated strings) – Configuration string to pass to the InChI API
  • delimiter (One of None, 'to_eol', 'space', 'tab', 'comma', 'whitespace', 'native', or the space or tab characters (default: None)) – The separator between the SMILES and the id
  • include_id (Boolean (default: True)) – if true, include the molecule id in the output
  • errors (one of "strict", "ignore", or "log") – specify how to handle errors
Returns:

a chemfp.base_toolkit.MoleculeWriter expecting CDK molecules

chemfp.cdk_toolkit.to_inchikeystring(mol: Any, *, id: Optional[str, None] = None, RecMet: bool = None, FixedH: bool = None, DoNotAddH: bool = None, options: str = None, errors: str = 'strict')

Generate an InChIKey string from a CDK molecule

This is equivalent to calling:
create_string(mol, “inchikeystring”, id=id, writer_args={…}, errors=errors)
Parameters:
  • mol (a CDK molecule) – a molecule object
  • id (None or a string (default: None)) – an alternate identifier for the output record, if relevant
  • RecMet (Boolean (default: None)) – Reconnect metals
  • FixedH (Boolean (default: None)) – Use fixed hydrogens
  • DoNotAddH (Boolean (default: None)) – Do not add hydrogens
  • options (space separated strings) – Configuration string to pass to the InChI API
  • errors (one of "strict", "ignore", or "log") – specify how to handle errors
Returns:

a CDK molecule object