chemfp.cdk_toolkit module¶

The chemfp toolkit API wrapper for the CDK toolkit.

This module is also available as chemfp.cdk.

chemfp.cdk_toolkit.is_licensed()¶

Return True - CDK is always licensed

Returns:	True

chemfp.cdk_toolkit.get_formats(include_unavailable=False)¶

Get the list of structure formats that CDK supports

If include_unavailable is True then also include CDK formats which aren’t available to this specific version of CDK.

Parameters:	include_unavailable (True or False) – include unavailable formats?
Returns:	a list of Format objects

chemfp.cdk_toolkit.get_input_formats()¶

Get the list of supported CDK input formats

Returns:	a list of `chemfp.base_toolkit.Format` objects

chemfp.cdk_toolkit.get_output_formats()¶

Get the list of supported CDK output formats

Returns:	a list of `chemfp.base_toolkit.Format` objects

chemfp.cdk_toolkit.get_format(format)¶

Get the named format, or raise a ValueError

This will raise a ValueError if CDK does not implement the format format_name or that format is not available.

Parameters:	format_name (a string) – the format name
Returns:	a list of `chemfp.base_toolkit.Format` objects

chemfp.cdk_toolkit.get_input_format(format)¶

Get the named input format, or raise a ValueError

This will raise a ValueError if CDK does not implement the format format_name or that format is not an input format.

Parameters:	format_name (a string) – the format name
Returns:	a list of `chemfp.base_toolkit.Format` objects

chemfp.cdk_toolkit.get_output_format(format)¶

Get the named format, or raise a ValueError

This will raise a ValueError if CDK does not implement the format format_name or that format is not an output format.

Parameters:	format_name (a string) – the format name
Returns:	a list of `chemfp.base_toolkit.Format` objects

chemfp.cdk_toolkit.get_input_format_from_source(source=None, format=None)¶

Get the most appropriate format given the available source and format information

If format is a chemfp.base_toolkit.Format then return it. If it’s a Format-like object with “name” and “compression” attributes use it to make a real Format object with the same attributes. If it’s a string then use it to create a Format object.

If format is None, use the source to auto-detect the format. If auto-detection is not possible, assume it’s an uncompressed SMILES file.

Parameters:	source (a filename (as a string), a file object, or None to read from stdin) – the structure data source. format (a Format(-like) object, string, or None) – format information, if known.
Returns:	a `chemfp.base_toolkit.Format` object

chemfp.cdk_toolkit.get_output_format_from_destination(destination=None, format=None)¶

Get the most appropriate format given the available destination and format information

If format is a chemfp.base_toolkit.Format then return it. If it’s a Format-like object with “name” and “compression” attributes use it to make a real Format object with the same attributes. If it’s a string then use it to create a Format object.

If format is None, use the destination to auto-detect the format. If auto-detection is not possible, assume it’s an uncompressed SMILES file.

Parameters:	destination (a filename (as a string), a file object, or None to read from stdin) – The structure data source. format (a Format(-like) object, string, or None) – format information, if known.
Returns:	a `chemfp.base_toolkit.Format` object

chemfp.cdk_toolkit.read_molecules(source=None, format=None, id_tag=None, reader_args=None, errors='strict', location=None, encoding='utf8', encoding_errors='strict')¶

Return an iterator that reads CDK molecules from a structure file

Iterate through the format structure records in source. If format is None then auto-detect the format based on the source. For SD files, use id_tag to get the record id from the given SD tag instead of the title line. (read_molecules() will ignore the id_tag. It exists to make it easier to switch between reader functions.)

Note: the reader returns a new CDK molecule each time.

The reader_args dictionary parameters depend on the format. These include:

SMILES
- delimiter - one of “tab”, “space”, “to-eol”, the space or tab characters, or None
- has_header - True or False
- sanitize - True or default sanitizes; False for unsanitized processing
InChI
- delimiter - one of “tab”, “space”, “to-eol”, the space or tab characters, or None
- sanitize - True or default sanitizes; False for unsanitized processing
- removeHs - True or default removes explicit hydrogens; False leaves them in the structure
- logLevel - an integer log level
- treatWarningAsError - True raises an exception on error; False or default keeps processing
SDF
- sanitize - True or default sanitizes; False for unsanitized processing
- removeHs - True or default removes explicit hydrogens; False leaves them in the structure
- strictParsing - True or default for strict parsing; False for lenient parsing

The errors parameter specifies how to handle errors. “strict” raises an exception, “report” sends a message to stderr and goes to the next record, and “ignore” goes to the next record.

The location parameter takes a chemfp.io.Location instance. If None then a default Location will be created.

See chemfp.cdk_toolkit.read_ids_and_molecules() if you want (id, molecule) pairs instead of just the molecules.

Parameters:

source (a filename, file object, or None to read from stdin) – the structure source
format (a format name string, or Format object, or None to auto-detect) – the input structure format
id_tag (string, or None to use the record title) – SD tag containing the record id
reader_args (a dictionary) – reader parameters passed to the underlying toolkit
errors (one of "strict", "report", or "ignore") – specify how to handle errors
location (a chemfp.io.Location object, or None) – object used to track parser state information

Returns:

a chemfp.base_toolkit.MoleculeReader iterating CDK molecules

chemfp.cdk_toolkit.read_molecules_from_string(content, format, id_tag=None, reader_args=None, errors='strict', location=None)¶

Return an iterator that reads CDK molecules from a string containing structure records

content is a string containing 0 or more records in the format format. See chemfp.cdk_toolkit.read_molecules() for details about the other parameters. See chemfp.cdk_toolkit.read_ids_and_molecules_from_string() if you want to read (id, CDK molecule) pairs instead of just molecules.

Parameters:

content (a string) – the string containing structure records
format (a format name string, or Format object) – the input structure format
id_tag (string, or None to use the record title) – SD tag containing the record id
reader_args (a dictionary) – reader arguments passed to the underlying toolkit
errors (one of "strict", "report", or "ignore") – specify how to handle errors
location (a chemfp.io.Location object, or None) – object used to track parser state information

Returns:

a chemfp.base_toolkit.MoleculeReader iterating CDK molecules

chemfp.cdk_toolkit.read_ids_and_molecules(source=None, format=None, id_tag=None, reader_args=None, errors='strict', location=None, encoding='utf8', encoding_errors='strict')¶

Return an iterator that reads (id, CDK molecule) pairs from a structure file

See chemfp.cdk_toolkit.read_molecules() for full parameter details. The major difference is that this returns an iterator of (id, CDK molecule) pairs instead of just the molecules.

Parameters:

source (a filename, file object, or None to read from stdin) – the structure source
format (a format name string, or Format object, or None to auto-detect) – the input structure format
id_tag (string, or None to use the record title) – SD tag containing the record id
reader_args (a dictionary) – reader arguments passed to the underlying toolkit
errors (one of "strict", "report", or "ignore") – specify how to handle errors
location (a chemfp.io.Location object, or None) – object used to track parser state information

Returns:

a chemfp.base_toolkit.IdAndMoleculeReader iterating (id, CDK molecule) pairs

chemfp.cdk_toolkit.read_ids_and_molecules_from_string(content, format, id_tag=None, reader_args=None, errors='strict', location=None)¶

Return an iterator that reads (id, CDK molecule) pairs from a string containing structure records

content is a string containing 0 or more records in the format format. See chemfp.cdk_toolkit.read_molecules() for details about the other parameters. See chemfp.cdk_toolkit.read_molecules_from_string() if you just want to read the CDK molecules instead of (id, molecule) pairs.

Parameters:

content (a string) – the string containing structure records
format (a format name string, or Format object) – the input structure format
id_tag (string, or None to use the record title) – SD tag containing the record id
reader_args (a dictionary) – reader arguments passed to the underlying toolkit
errors (one of "strict", "report", or "ignore") – specify how to handle errors
location (a chemfp.io.Location object, or None) – object used to track parser state information

Returns:

a chemfp.base_toolkit.IdAndMoleculeReader iterating (id, CDK molecule) pairs

chemfp.cdk_toolkit.make_id_and_molecule_parser(format, id_tag=None, reader_args=None, errors='strict')¶

Create a specialized function which takes a record and returns an (id, CDK molecule) pair

The returned function is optimized for reading many records from individual strings because it only does parameter validation once. However, I haven’t really noticed much of a performance difference between this and chemfp.cdk_toolkit.parse_id_and_molecule() so you can probably so I suggest you use that function directly instead of making a specialized function. (Let me know if making a specialized function is useful.)

See chemfp.cdk_toolkit.read_molecules() for details about the other parameters.

Parameters:	format (a format name string, or Format object) – the input structure format id_tag (string, or None to use the record title) – SD tag containing the record id reader_args (a dictionary) – reader arguments passed to the underlying toolkit errors (one of "strict", "report", or "ignore") – specify how to handle errors
Returns:	a function of the form `parser(record string) -> (id, CDK molecule)`

chemfp.cdk_toolkit.parse_molecule(content, format, id_tag=None, reader_args=None, errors='strict')¶

Parse the first structure record from the content string and return a CDK molecule.

content is a string containing a single structure record in format format. (Additional records are ignored). See chemfp.cdk_toolkit.read_molecules() for details about the other parameters. See chemfp.cdk_toolkit.parse_id_and_molecule() if you want the (id, CDK molecule) pair instead of just the molecule.

Parameters:

content (a string) – the string containing a structure record
format (a format name string, or Format object) – the input structure format
id_tag (string, or None to use the record title) – SD tag containing the record id
reader_args (a dictionary) – reader arguments passed to the underlying toolkit
errors (one of "strict", "report", or "ignore") – specify how to handle errors

Returns:

a CDK molecule

chemfp.cdk_toolkit.parse_id_and_molecule(content, format, id_tag=None, reader_args=None, errors='strict')¶

Parse the first structure record from content and return the (id, CDK molecule) pair.

content is a string containing a single structure record in format format. (Additional records are ignored). See chemfp.cdk_toolkit.read_molecules() for details about the other parameters.

See chemfp.cdk_toolkit.read_molecules() for details about the other parameters. See chemfp.cdk_toolkit.parse_molecule() if just want the CDK molecule and not the the (id, CDK molecule) pair.

Parameters:

content (a string) – the string containing a structure record
format (a format name string, or Format object) – the input structure format
id_tag (string, or None to use the record title) – SD tag containing the record id
reader_args (a dictionary) – reader arguments passed to the underlying toolkit
errors (one of "strict", "report", or "ignore") – specify how to handle errors

Returns:

an (id, CDK molecule) pair

chemfp.cdk_toolkit.create_string(mol, format, id=None, writer_args=None, errors='strict')¶

Convert a CDK molecule into a structure record in the given format as a Unicode string

If id is not None then use it instead of the molecule’s own title. Warning: this may briefly modify the molecule, so may not be thread-safe.

Parameters:

mol (a CDK molecule) – the molecule to use for the output
format (a format name string, or Format object) – the output structure format
id (a string, or None to use the molecule's own id) – an alternate record id
writer_args (a dictionary) – writer arguments passed to the underlying toolkit
errors (one of "strict", "report", or "ignore") – specify how to handle errors

Returns:

a Unicode string

chemfp.cdk_toolkit.create_bytes(mol, format, id=None, writer_args=None, errors='strict', level=None)¶

Convert a CDK molecule into a structure record in the given format as a byte string

If id is not None then use it instead of the molecule’s own title. Warning: this may briefly modify the molecule, so may not be thread-safe.

Parameters:

mol (a CDK molecule) – the molecule to use for the output
format (a format name string, or Format object) – the output structure format
id (a string, or None to use the molecule's own id) – an alternate record id
writer_args (a dictionary) – writer arguments passed to the underlying toolkit
errors (one of "strict", "report", or "ignore") – specify how to handle errors
level (None, a positive integer, or one of the strings 'min', 'default', or 'max') – compression level to use for compressed formats

Returns:

a byte string

chemfp.cdk_toolkit.open_molecule_writer(destination=None, format=None, writer_args=None, errors='strict', location=None, encoding='utf8', encoding_errors='strict', level=None)¶

Return a MoleculeWriter which can write CDK molecules to a destination.

A chemfp.base_toolkit.MoleculeWriter has the methods write_molecule, write_molecules, and write_ids_and_molecules, which are ways to write a CDK molecule, a CDK molecule iterator, or an (id, CDK molecule) pair iterator to a file.

Molecules are written to destination. The output format can be a string like “sdf.gz” or “smi”, a chemfp.base_toolkit.Format, or Format-like object with “name” and “compression” attributes, or None to auto-detect based on the destination. If auto-detection is not possible, the output will be written as uncompressed SMILES.

The writer_args dictionary parameters depend on the format. These include:

SMILES
- delimiter - one of “tab”, “space”, “to-eol”, the space or tab characters, or None
- isomericSmiles - True to generate isomeric SMILES
- kekuleSmiles - True to generate SMILES in Kekule form
- canonical - True to generate a canonical SMILES
- allBondsExplicit - True to write explict ‘-’ and ‘:’ bonds, even if they can be inferred; default is False
- allHsExplicit - True to write explicit hydrogen counts; default is False
- cxsmiles - True to include CXSMILES annotations; default is False

InChI and InChIKey

delimiter - one of “tab”, “space”, “to-eol”, the space or tab characters, or None

include_id - True or default to include the id as the second column; False has no id column

options - an options string passed to the underlying InChI library

logLevel - an integer log level

treatWarningAsError - True raises an exception on error; False or default keeps processing

SDF

includeStereo - True include stereo information; False or default does not

kekulize - True or default creates the connection table with bonds in Kekeule form

v3k - True to always export in V3000 format

The errors parameter specifies how to handle errors. “strict” raises an exception, “report” sends a message to stderr and goes to the next record, and “ignore” goes to the next record.

The location parameter takes a chemfp.io.Location instance. If None then a default Location will be created.

Parameters:

destination (a filename, file object, or None to write to stdout) – the structure destination
format (a format name string, or Format(-like) object, or None to auto-detect) – the output structure format
writer_args (a dictionary) – writer parameters passed to the underlying toolkit
errors (one of "strict", "report", or "ignore") – specify how to handle errors
location (a chemfp.io.Location object, or None) – object used to track writer state information
level (None, a positive integer, or one of the strings 'min', 'default', or 'max') – compression level to use for compressed formats

Returns:

a chemfp.base_toolkit.MoleculeWriter expecting CDK molecules

chemfp.cdk_toolkit.open_molecule_writer_to_string(format, writer_args=None, errors='strict', location=None)¶

Return a MoleculeStringWriter which can write molecule records in the given format to a string.

See chemfp.cdk_toolkit.open_molecule_writer() for full parameter details.

Use the writer’s chemfp.base_toolkit.MoleculeStringWriter.getvalue() to get the output as a Unicode string.

Parameters:

format (a format name string, or Format(-like) object, or None to auto-detect) – the output structure format
writer_args (a dictionary) – writer arguments passed to the underlying toolkit
errors (one of "strict", "report", or "ignore") – specify how to handle errors
location (a chemfp.io.Location object, or None) – object used to track writer state information

Returns:

a chemfp.base_toolkit.MoleculeStringWriter expecting CDK molecules

chemfp.cdk_toolkit.open_molecule_writer_to_bytes(format, writer_args=None, errors='strict', location=None, level=None)¶

Return a MoleculeStringWriter which can write molecule records in the given format to a text string.

See chemfp.cdk_toolkit.open_molecule_writer() for full parameter details.

Use the writer’s chemfp.base_toolkit.MoleculeStringWriter.getvalue() to get the output as a byte string.

Parameters:

format (a format name string, or Format(-like) object, or None to auto-detect) – the output structure format
writer_args (a dictionary) – writer arguments passed to the underlying toolkit
errors (one of "strict", "report", or "ignore") – specify how to handle errors
location (a chemfp.io.Location object, or None) – object used to track writer state information
level (None, a positive integer, or one of the strings 'min', 'default', or 'max') – compression level to use for compressed formats

Returns:

a chemfp.base_toolkit.MoleculeStringWriter expecting CDK molecules

chemfp.cdk_toolkit.copy_molecule(mol)¶

Return a new CDK molecule which is a copy of the given molecule

Parameters:	mol (a CDK molecule) – the molecule to copy
Returns:	a new CDK Mol instance

chemfp.cdk_toolkit.add_tag(mol, tag, value)¶

Add an SD tag value to the CDK molecule

Parameters:	mol (a CDK molecule) – the molecule tag (string) – the SD tag name value (string) – the text for the tag
Returns:	None

chemfp.cdk_toolkit.get_tag(mol, tag)¶

Get the named SD tag value, or None if it doesn’t exist

Parameters:	mol (a CDK molecule) – the molecule tag (string) – the SD tag name
Returns:	a string, or None

chemfp.cdk_toolkit.get_tag_pairs(mol)¶

Get a list of all SD tag (name, value) pairs for the molecule

Parameters:	mol (a CDK molecule) – the molecule
Returns:	a list of (string name, string value) pairs

chemfp.cdk_toolkit.get_id(mol)¶

Get the molecule’s id from CDK’s “cdk:Title” property

Parameters:	mol (a CDK molecule) – the molecule
Returns:	a string

chemfp.cdk_toolkit.set_id(mol, id)¶

Set the molecule’s id as CDK’s “cdk:Title” property

Parameters:	mol (a CDK molecule) – the molecule id (string) – the new id
Returns:	None

chemfp.cdk_toolkit.from_smistring(content: str, *, kekulise: bool = True, errors: str = 'strict')¶

Parse a SMILES string using the CDK toolkit

This is equivalent to calling:: parse_molecule(content, “smistring”, reader_args={…}, errors=errors)

Parameters:	kekulise (Boolean (default: True)) – if true, ensure a valid Kekule intepretation exists errors (one of "strict", "ignore", or "log") – specify how to handle errors
Returns:	a CDK molecule object

chemfp.cdk_toolkit.to_smistring(mol: Any, *, id: Optional[str, None] = None, flavor: Union[int, str, None] = 'Default', errors: str = 'strict')¶

Generate a SMILES string from a CDK molecule

This is equivalent to calling:: create_string(mol, “smistring”, id=id, writer_args={…}, errors=errors)
Available bit flag flavors are:: ‘Canonical’ = 1 (in default bit flags) ‘InChILabelling’ = 3 ‘AtomAtomMap’ = 4 ‘AtomicMass’ = 8 (in default bit flags) ‘UseAromaticSymbols’ = 16 ‘StereoTetrahedral’ = 256 (in default bit flags) ‘StereoCisTrans’ = 512 (in default bit flags) ‘StereoExTetrahedral’ = 1024 (in default bit flags) ‘StereoExCisTrans’ = 1280 (in default bit flags) ‘AtomicMassStrict’ = 2048 ‘Stereo’ = 1792 (in default bit flags) ‘Cx2dCoordinates’ = 4096 ‘Cx3dCoordinates’ = 8192 ‘CxCoordinates’ = 12288 ‘CxAtomLabel’ = 32768 ‘CxAtomValue’ = 65536 ‘CxRadical’ = 131072 ‘CxMulticenter’ = 262144 ‘CxPolymer’ = 524288 ‘CxFragmentGroup’ = 1048576 ‘AtomAtomMapRenumber’ = 33554437 ‘CxSmiles’ = 12550400 ‘CxSmilesWithCoords’ = 12562688 ‘Unique’ = 1 (in default bit flags) ‘Isomeric’ = 1800 (in default bit flags) ‘Absolute’ = 1801 (in default bit flags) ‘UniversalSmiles’ = 1803 ‘Default’ = 1801 (in default bit flags)

Parameters:	mol (a CDK molecule) – a molecule object id (None or a string (default: None)) – an alternate identifier for the output record, if relevant flavor (None, integer or string with "\|"- or ","-separated terms (default: "Default")) – Output flavor bit flags errors (one of "strict", "ignore", or "log") – specify how to handle errors
Returns:	a CDK molecule object

chemfp.cdk_toolkit.from_smi(content: str, *, has_header: bool = False, delimiter: Optional[Literal[to_eol, space, tab, comma, whitespace, native, , ], None] = 'to-eol', implementation: Optional[Literal[cdk, chemfp], None] = 'cdk', kekulise: bool = True, errors: str = 'strict')¶

Parse a SMILES string and id using the CDK toolkit

This is equivalent to calling:: parse_molecule(content, “smi”, reader_args={…}, errors=errors)

Parameters:

has_header (Boolean (default: False)) – If true, treat the first line of the SMILES file as a header
delimiter (One of None, 'to_eol', 'space', 'tab', 'comma', 'whitespace', 'native', or the space or tab characters (default: "to-eol")) – The separator between the SMILES and the id
implementation (either 'cdk' or 'chemfp') – use CDK or chemfp to identify records
kekulise (Boolean (default: True)) – if true, ensure a valid Kekule intepretation exists
errors (one of "strict", "ignore", or "log") – specify how to handle errors

Returns:

a CDK molecule object

chemfp.cdk_toolkit.to_smi(mol: Any, *, id: Optional[str, None] = None, delimiter: Optional[Literal[to_eol, space, tab, comma, whitespace, native, , ], None] = None, flavor: Union[int, str, None] = 'Default', errors: str = 'strict')¶

Generate a SMILES string and id from a CDK molecule

This is equivalent to calling:: create_string(mol, “smi”, id=id, writer_args={…}, errors=errors)
Available bit flag flavors are:: ‘Canonical’ = 1 (in default bit flags) ‘InChILabelling’ = 3 ‘AtomAtomMap’ = 4 ‘AtomicMass’ = 8 (in default bit flags) ‘UseAromaticSymbols’ = 16 ‘StereoTetrahedral’ = 256 (in default bit flags) ‘StereoCisTrans’ = 512 (in default bit flags) ‘StereoExTetrahedral’ = 1024 (in default bit flags) ‘StereoExCisTrans’ = 1280 (in default bit flags) ‘AtomicMassStrict’ = 2048 ‘Stereo’ = 1792 (in default bit flags) ‘Cx2dCoordinates’ = 4096 ‘Cx3dCoordinates’ = 8192 ‘CxCoordinates’ = 12288 ‘CxAtomLabel’ = 32768 ‘CxAtomValue’ = 65536 ‘CxRadical’ = 131072 ‘CxMulticenter’ = 262144 ‘CxPolymer’ = 524288 ‘CxFragmentGroup’ = 1048576 ‘AtomAtomMapRenumber’ = 33554437 ‘CxSmiles’ = 12550400 ‘CxSmilesWithCoords’ = 12562688 ‘Unique’ = 1 (in default bit flags) ‘Isomeric’ = 1800 (in default bit flags) ‘Absolute’ = 1801 (in default bit flags) ‘UniversalSmiles’ = 1803 ‘Default’ = 1801 (in default bit flags)

Parameters:

mol (a CDK molecule) – a molecule object
id (None or a string (default: None)) – an alternate identifier for the output record, if relevant
delimiter (One of None, 'to_eol', 'space', 'tab', 'comma', 'whitespace', 'native', or the space or tab characters (default: None)) – The separator between the SMILES and the id
flavor (None, integer or string with "|"- or ","-separated terms (default: "Default")) – Output flavor bit flags
errors (one of "strict", "ignore", or "log") – specify how to handle errors

Returns:

a CDK molecule object

chemfp.cdk_toolkit.from_smi_file(source: Union[None, str, BinaryIO], *, has_header: bool = False, delimiter: Optional[Literal[to_eol, space, tab, comma, whitespace, native, , ], None] = 'to-eol', implementation: Optional[Literal[cdk, chemfp], None] = 'cdk', kekulise: bool = True, errors: str = 'strict')¶

Parse a SMILES string and id file using the CDK toolkit

This is mostly equivalent to calling:: read_molecules(source, “smi”, reader_args={…}, errors=errors)

Parameters:

has_header (Boolean (default: False)) – If true, treat the first line of the SMILES file as a header
delimiter (One of None, 'to_eol', 'space', 'tab', 'comma', 'whitespace', 'native', or the space or tab characters (default: "to-eol")) – The separator between the SMILES and the id
implementation (either 'cdk' or 'chemfp') – use CDK or chemfp to identify records
kekulise (Boolean (default: True)) – if true, ensure a valid Kekule intepretation exists
errors (one of "strict", "ignore", or "log") – specify how to handle errors

Returns:

a chemfp.base_toolkit.MoleculeReader iterating CDK molecules

chemfp.cdk_toolkit.to_smi_file(destination: Union[None, str, BinaryIO], *, delimiter: Optional[Literal[to_eol, space, tab, comma, whitespace, native, , ], None] = None, flavor: Union[int, str, None] = 'Default', errors: str = 'strict')¶

Generate a SMILES string and id from a CDK molecule

This is mostly equivalent to calling:: open_molecule_writer(destination, “smi”, writer_args={…}, errors=errors)
Available bit flag flavors are:: ‘Canonical’ = 1 (in default bit flags) ‘InChILabelling’ = 3 ‘AtomAtomMap’ = 4 ‘AtomicMass’ = 8 (in default bit flags) ‘UseAromaticSymbols’ = 16 ‘StereoTetrahedral’ = 256 (in default bit flags) ‘StereoCisTrans’ = 512 (in default bit flags) ‘StereoExTetrahedral’ = 1024 (in default bit flags) ‘StereoExCisTrans’ = 1280 (in default bit flags) ‘AtomicMassStrict’ = 2048 ‘Stereo’ = 1792 (in default bit flags) ‘Cx2dCoordinates’ = 4096 ‘Cx3dCoordinates’ = 8192 ‘CxCoordinates’ = 12288 ‘CxAtomLabel’ = 32768 ‘CxAtomValue’ = 65536 ‘CxRadical’ = 131072 ‘CxMulticenter’ = 262144 ‘CxPolymer’ = 524288 ‘CxFragmentGroup’ = 1048576 ‘AtomAtomMapRenumber’ = 33554437 ‘CxSmiles’ = 12550400 ‘CxSmilesWithCoords’ = 12562688 ‘Unique’ = 1 (in default bit flags) ‘Isomeric’ = 1800 (in default bit flags) ‘Absolute’ = 1801 (in default bit flags) ‘UniversalSmiles’ = 1803 ‘Default’ = 1801 (in default bit flags)

Parameters:

destination (None, a filename string, or a file-like object) – where to write the molecules
delimiter (One of None, 'to_eol', 'space', 'tab', 'comma', 'whitespace', 'native', or the space or tab characters (default: None)) – The separator between the SMILES and the id
flavor (None, integer or string with "|"- or ","-separated terms (default: "Default")) – Output flavor bit flags
errors (one of "strict", "ignore", or "log") – specify how to handle errors

Returns:

a chemfp.base_toolkit.MoleculeWriter expecting CDK molecules

chemfp.cdk_toolkit.from_sdf(content: str, *, ForceReadAs3DCoordinates: bool = False, mode: Literal[RELAXED, STRICT] = 'RELAXED', AddStereoElements: bool = True, InterpretHydrogenIsotopes: bool = True, implementation: Optional[Literal[cdk, chemfp], None] = 'cdk', errors: str = 'strict')¶

Parse an SDF record using the CDK toolkit

This is equivalent to calling:: parse_molecule(content, “sdf”, reader_args={…}, errors=errors)

Parameters:

ForceReadAs3DCoordinates (Boolean (default: False)) – if true, always interpret coordinates as 3D
mode ('RELAXED' will attempt to recover, 'STRICT' will not) – strictness mode when parsing a record
AddStereoElements (Boolean (default: True)) – if true, detect and create IStereoElements
InterpretHydrogenIsotopes (Boolean (default: True)) – if true, interpret D and T as hydrogen isotopes
implementation (either 'cdk' or 'chemfp') – use CDK or chemfp to identify records
errors (one of "strict", "ignore", or "log") – specify how to handle errors

Returns:

a CDK molecule object

chemfp.cdk_toolkit.to_sdf(mol: Any, *, id: Optional[str, None] = None, WriteAromaticBondTypes: bool = False, WriteMajorIsotopes: bool = True, writeProperties: bool = True, WriteQueryFormatValencies: bool = False, TruncateLongData: bool = False, ProgramName: str = 'CDK', ForceWriteAs2DCoordinates: bool = False, WriteDefaultProperties: bool = True, writeV3000: bool = False, errors: str = 'strict')¶

Generate an SDF record from a CDK molecule

This is equivalent to calling:: create_string(mol, “sdf”, id=id, writer_args={…}, errors=errors)

Parameters:

mol (a CDK molecule) – a molecule object
id (None or a string (default: None)) – an alternate identifier for the output record, if relevant
WriteAromaticBondTypes (Boolean (default: False)) – if true, write aromatic bonds as bond type 4
WriteMajorIsotopes (Boolean (default: True)) – if true, include isotopic mass for atoms with a specified mass
writeProperties (Boolean (default: True)) – if true, write non-molecule data to the data tags
WriteQueryFormatValencies (Boolean (default: False)) – if true, write valences in the MDL query format (deprecated)
TruncateLongData (Boolean (default: False)) – if true, truncate data items longer than 200 characters
ProgramName (a string up to 8 characters long) – text to use in the ‘program name’ section of the second line
ForceWriteAs2DCoordinates (Boolean (default: False)) – if true, write coordinates as 2D
WriteDefaultProperties (Boolean (default: True)) – if true, always include zeros in the empty atom and bond block fields
writeV3000 (Boolean (default: False)) – if true, always write the record in V3000 format
errors (one of "strict", "ignore", or "log") – specify how to handle errors

Returns:

a CDK molecule object

chemfp.cdk_toolkit.from_sdf_file(source: Union[None, str, BinaryIO], *, ForceReadAs3DCoordinates: bool = False, mode: Literal[RELAXED, STRICT] = 'RELAXED', AddStereoElements: bool = True, InterpretHydrogenIsotopes: bool = True, implementation: Optional[Literal[cdk, chemfp], None] = 'cdk', errors: str = 'strict')¶

Parse an SDF record file using the CDK toolkit

This is mostly equivalent to calling:: read_molecules(source, “sdf”, reader_args={…}, errors=errors)

Parameters:

ForceReadAs3DCoordinates (Boolean (default: False)) – if true, always interpret coordinates as 3D
mode ('RELAXED' will attempt to recover, 'STRICT' will not) – strictness mode when parsing a record
AddStereoElements (Boolean (default: True)) – if true, detect and create IStereoElements
InterpretHydrogenIsotopes (Boolean (default: True)) – if true, interpret D and T as hydrogen isotopes
implementation (either 'cdk' or 'chemfp') – use CDK or chemfp to identify records
errors (one of "strict", "ignore", or "log") – specify how to handle errors

Returns:

a chemfp.base_toolkit.MoleculeReader iterating CDK molecules

chemfp.cdk_toolkit.to_sdf_file(destination: Union[None, str, BinaryIO], *, WriteAromaticBondTypes: bool = False, WriteMajorIsotopes: bool = True, writeProperties: bool = True, WriteQueryFormatValencies: bool = False, TruncateLongData: bool = False, ProgramName: str = 'CDK', ForceWriteAs2DCoordinates: bool = False, WriteDefaultProperties: bool = True, writeV3000: bool = False, errors: str = 'strict')¶

Generate an SDF record from a CDK molecule

This is mostly equivalent to calling:: open_molecule_writer(destination, “sdf”, writer_args={…}, errors=errors)

Parameters:

destination (None, a filename string, or a file-like object) – where to write the molecules
WriteAromaticBondTypes (Boolean (default: False)) – if true, write aromatic bonds as bond type 4
WriteMajorIsotopes (Boolean (default: True)) – if true, include isotopic mass for atoms with a specified mass
writeProperties (Boolean (default: True)) – if true, write non-molecule data to the data tags
WriteQueryFormatValencies (Boolean (default: False)) – if true, write valences in the MDL query format (deprecated)
TruncateLongData (Boolean (default: False)) – if true, truncate data items longer than 200 characters
ProgramName (a string up to 8 characters long) – text to use in the ‘program name’ section of the second line
ForceWriteAs2DCoordinates (Boolean (default: False)) – if true, write coordinates as 2D
WriteDefaultProperties (Boolean (default: True)) – if true, always include zeros in the empty atom and bond block fields
writeV3000 (Boolean (default: False)) – if true, always write the record in V3000 format
errors (one of "strict", "ignore", or "log") – specify how to handle errors

Returns:

a chemfp.base_toolkit.MoleculeWriter expecting CDK molecules

chemfp.cdk_toolkit.to_sdf3k(mol: Any, *, id: Optional[str, None] = None, WriteAromaticBondTypes: bool = False, WriteMajorIsotopes: bool = True, writeProperties: bool = True, WriteQueryFormatValencies: bool = False, TruncateLongData: bool = False, ProgramName: str = 'CDK', ForceWriteAs2DCoordinates: bool = False, WriteDefaultProperties: bool = True, writeV3000: bool = True, errors: str = 'strict')¶

Generate an SDF record in V3000 format from a CDK molecule

This is equivalent to calling:: create_string(mol, “sdf3k”, id=id, writer_args={…}, errors=errors)

Parameters:

mol (a CDK molecule) – a molecule object
id (None or a string (default: None)) – an alternate identifier for the output record, if relevant
WriteAromaticBondTypes (Boolean (default: False)) – if true, write aromatic bonds as bond type 4
WriteMajorIsotopes (Boolean (default: True)) – if true, include isotopic mass for atoms with a specified mass
writeProperties (Boolean (default: True)) – if true, write non-molecule data to the data tags
WriteQueryFormatValencies (Boolean (default: False)) – if true, write valences in the MDL query format (deprecated)
TruncateLongData (Boolean (default: False)) – if true, truncate data items longer than 200 characters
ProgramName (a string up to 8 characters long) – text to use in the ‘program name’ section of the second line
ForceWriteAs2DCoordinates (Boolean (default: False)) – if true, write coordinates as 2D
WriteDefaultProperties (Boolean (default: True)) – if true, always include zeros in the empty atom and bond block fields
writeV3000 (Boolean (default: True)) – if true, always write the record in V3000 format
errors (one of "strict", "ignore", or "log") – specify how to handle errors

Returns:

a CDK molecule object

chemfp.cdk_toolkit.to_sdf3k_file(destination: Union[None, str, BinaryIO], *, WriteAromaticBondTypes: bool = False, WriteMajorIsotopes: bool = True, writeProperties: bool = True, WriteQueryFormatValencies: bool = False, TruncateLongData: bool = False, ProgramName: str = 'CDK', ForceWriteAs2DCoordinates: bool = False, WriteDefaultProperties: bool = True, writeV3000: bool = True, errors: str = 'strict')¶

Generate an SDF record in V3000 format from a CDK molecule

This is mostly equivalent to calling:: open_molecule_writer(destination, “sdf3k”, writer_args={…}, errors=errors)

Parameters:

destination (None, a filename string, or a file-like object) – where to write the molecules
WriteAromaticBondTypes (Boolean (default: False)) – if true, write aromatic bonds as bond type 4
WriteMajorIsotopes (Boolean (default: True)) – if true, include isotopic mass for atoms with a specified mass
writeProperties (Boolean (default: True)) – if true, write non-molecule data to the data tags
WriteQueryFormatValencies (Boolean (default: False)) – if true, write valences in the MDL query format (deprecated)
TruncateLongData (Boolean (default: False)) – if true, truncate data items longer than 200 characters
ProgramName (a string up to 8 characters long) – text to use in the ‘program name’ section of the second line
ForceWriteAs2DCoordinates (Boolean (default: False)) – if true, write coordinates as 2D
WriteDefaultProperties (Boolean (default: True)) – if true, always include zeros in the empty atom and bond block fields
writeV3000 (Boolean (default: True)) – if true, always write the record in V3000 format
errors (one of "strict", "ignore", or "log") – specify how to handle errors

Returns:

a chemfp.base_toolkit.MoleculeWriter expecting CDK molecules

chemfp.cdk_toolkit.from_molfile(content: str, *, ForceReadAs3DCoordinates: bool = False, mode: Literal[RELAXED, STRICT] = 'RELAXED', AddStereoElements: bool = True, InterpretHydrogenIsotopes: bool = True, implementation: Optional[Literal[cdk, chemfp], None] = 'cdk', errors: str = 'strict')¶

Parse a molfile using the CDK toolkit

This is equivalent to calling:: parse_molecule(content, “molfile”, reader_args={…}, errors=errors)

Parameters:

ForceReadAs3DCoordinates (Boolean (default: False)) – if true, always interpret coordinates as 3D
mode ('RELAXED' will attempt to recover, 'STRICT' will not) – strictness mode when parsing a record
AddStereoElements (Boolean (default: True)) – if true, detect and create IStereoElements
InterpretHydrogenIsotopes (Boolean (default: True)) – if true, interpret D and T as hydrogen isotopes
implementation (either 'cdk' or 'chemfp') – use CDK or chemfp to identify records
errors (one of "strict", "ignore", or "log") – specify how to handle errors

Returns:

a CDK molecule object

chemfp.cdk_toolkit.to_molfile(mol: Any, *, id: Optional[str, None] = None, WriteAromaticBondTypes: bool = False, WriteMajorIsotopes: bool = True, WriteQueryFormatValencies: bool = False, TruncateLongData: bool = False, ProgramName: str = 'CDK', ForceWriteAs2DCoordinates: bool = False, WriteDefaultProperties: bool = True, writeV3000: bool = False, errors: str = 'strict')¶

Generate a molfile from a CDK molecule

This is equivalent to calling:: create_string(mol, “molfile”, id=id, writer_args={…}, errors=errors)

Parameters:

mol (a CDK molecule) – a molecule object
id (None or a string (default: None)) – an alternate identifier for the output record, if relevant
WriteAromaticBondTypes (Boolean (default: False)) – if true, write aromatic bonds as bond type 4
WriteMajorIsotopes (Boolean (default: True)) – if true, include isotopic mass for atoms with a specified mass
WriteQueryFormatValencies (Boolean (default: False)) – if true, write valences in the MDL query format (deprecated)
TruncateLongData (Boolean (default: False)) – if true, truncate data items longer than 200 characters
ProgramName (a string up to 8 characters long) – text to use in the ‘program name’ section of the second line
ForceWriteAs2DCoordinates (Boolean (default: False)) – if true, write coordinates as 2D
WriteDefaultProperties (Boolean (default: True)) – if true, always include zeros in the empty atom and bond block fields
writeV3000 (Boolean (default: False)) – if true, always write the record in V3000 format
errors (one of "strict", "ignore", or "log") – specify how to handle errors

Returns:

a CDK molecule object

chemfp.cdk_toolkit.from_inchi(content: str, *, delimiter: Optional[Literal[to_eol, space, tab, comma, whitespace, native, , ], None] = 'to-eol', errors: str = 'strict')¶

Parse an InChI string and id using the CDK toolkit

This is equivalent to calling:: parse_molecule(content, “inchi”, reader_args={…}, errors=errors)

Parameters:	delimiter (One of None, 'to_eol', 'space', 'tab', 'comma', 'whitespace', 'native', or the space or tab characters (default: "to-eol")) – The separator between the SMILES and the id errors (one of "strict", "ignore", or "log") – specify how to handle errors
Returns:	a CDK molecule object

chemfp.cdk_toolkit.to_inchi(mol: Any, *, id: Optional[str, None] = None, RecMet: bool = None, FixedH: bool = None, DoNotAddH: bool = None, options: str = None, delimiter: Optional[Literal[to_eol, space, tab, comma, whitespace, native, , ], None] = None, include_id: bool = True, errors: str = 'strict')¶

Generate an InChI string and id from a CDK molecule

This is equivalent to calling:: create_string(mol, “inchi”, id=id, writer_args={…}, errors=errors)

Parameters:

mol (a CDK molecule) – a molecule object
id (None or a string (default: None)) – an alternate identifier for the output record, if relevant
RecMet (Boolean (default: None)) – Reconnect metals
FixedH (Boolean (default: None)) – Use fixed hydrogens
DoNotAddH (Boolean (default: None)) – Do not add hydrogens
options (space separated strings) – Configuration string to pass to the InChI API
delimiter (One of None, 'to_eol', 'space', 'tab', 'comma', 'whitespace', 'native', or the space or tab characters (default: None)) – The separator between the SMILES and the id
include_id (Boolean (default: True)) – if true, include the molecule id in the output
errors (one of "strict", "ignore", or "log") – specify how to handle errors

Returns:

a CDK molecule object

chemfp.cdk_toolkit.from_inchi_file(source: Union[None, str, BinaryIO], *, delimiter: Optional[Literal[to_eol, space, tab, comma, whitespace, native, , ], None] = 'to-eol', errors: str = 'strict')¶

Parse an InChI string and id file using the CDK toolkit

This is mostly equivalent to calling:: read_molecules(source, “inchi”, reader_args={…}, errors=errors)

Parameters:	delimiter (One of None, 'to_eol', 'space', 'tab', 'comma', 'whitespace', 'native', or the space or tab characters (default: "to-eol")) – The separator between the SMILES and the id errors (one of "strict", "ignore", or "log") – specify how to handle errors
Returns:	a `chemfp.base_toolkit.MoleculeReader` iterating CDK molecules

chemfp.cdk_toolkit.to_inchi_file(destination: Union[None, str, BinaryIO], *, RecMet: bool = None, FixedH: bool = None, DoNotAddH: bool = None, options: str = None, delimiter: Optional[Literal[to_eol, space, tab, comma, whitespace, native, , ], None] = None, include_id: bool = True, errors: str = 'strict')¶

Generate an InChI string and id from a CDK molecule

This is mostly equivalent to calling:: open_molecule_writer(destination, “inchi”, writer_args={…}, errors=errors)

Parameters:

destination (None, a filename string, or a file-like object) – where to write the molecules
RecMet (Boolean (default: None)) – Reconnect metals
FixedH (Boolean (default: None)) – Use fixed hydrogens
DoNotAddH (Boolean (default: None)) – Do not add hydrogens
options (space separated strings) – Configuration string to pass to the InChI API
delimiter (One of None, 'to_eol', 'space', 'tab', 'comma', 'whitespace', 'native', or the space or tab characters (default: None)) – The separator between the SMILES and the id
include_id (Boolean (default: True)) – if true, include the molecule id in the output
errors (one of "strict", "ignore", or "log") – specify how to handle errors

Returns:

a chemfp.base_toolkit.MoleculeWriter expecting CDK molecules

chemfp.cdk_toolkit.from_inchistring(content: str, *, delimiter: Optional[Literal[to_eol, space, tab, comma, whitespace, native, , ], None] = 'to-eol', errors: str = 'strict')¶

Parse an InChI string using the CDK toolkit

This is equivalent to calling:: parse_molecule(content, “inchistring”, reader_args={…}, errors=errors)

Parameters:	delimiter (One of None, 'to_eol', 'space', 'tab', 'comma', 'whitespace', 'native', or the space or tab characters (default: "to-eol")) – The separator between the SMILES and the id errors (one of "strict", "ignore", or "log") – specify how to handle errors
Returns:	a CDK molecule object

chemfp.cdk_toolkit.to_inchistring(mol: Any, *, id: Optional[str, None] = None, RecMet: bool = None, FixedH: bool = None, DoNotAddH: bool = None, options: str = None, errors: str = 'strict')¶

Generate an InChI string from a CDK molecule

This is equivalent to calling:: create_string(mol, “inchistring”, id=id, writer_args={…}, errors=errors)

Parameters:

mol (a CDK molecule) – a molecule object
id (None or a string (default: None)) – an alternate identifier for the output record, if relevant
RecMet (Boolean (default: None)) – Reconnect metals
FixedH (Boolean (default: None)) – Use fixed hydrogens
DoNotAddH (Boolean (default: None)) – Do not add hydrogens
options (space separated strings) – Configuration string to pass to the InChI API
errors (one of "strict", "ignore", or "log") – specify how to handle errors

Returns:

a CDK molecule object

chemfp.cdk_toolkit.to_inchikey(mol: Any, *, id: Optional[str, None] = None, RecMet: bool = None, FixedH: bool = None, DoNotAddH: bool = None, options: str = None, delimiter: Optional[Literal[to_eol, space, tab, comma, whitespace, native, , ], None] = None, include_id: bool = True, errors: str = 'strict')¶

Generate an InChIKey string and id from a CDK molecule

This is equivalent to calling:: create_string(mol, “inchikey”, id=id, writer_args={…}, errors=errors)

Parameters:

mol (a CDK molecule) – a molecule object
id (None or a string (default: None)) – an alternate identifier for the output record, if relevant
RecMet (Boolean (default: None)) – Reconnect metals
FixedH (Boolean (default: None)) – Use fixed hydrogens
DoNotAddH (Boolean (default: None)) – Do not add hydrogens
options (space separated strings) – Configuration string to pass to the InChI API
delimiter (One of None, 'to_eol', 'space', 'tab', 'comma', 'whitespace', 'native', or the space or tab characters (default: None)) – The separator between the SMILES and the id
include_id (Boolean (default: True)) – if true, include the molecule id in the output
errors (one of "strict", "ignore", or "log") – specify how to handle errors

Returns:

a CDK molecule object

chemfp.cdk_toolkit.to_inchikey_file(destination: Union[None, str, BinaryIO], *, RecMet: bool = None, FixedH: bool = None, DoNotAddH: bool = None, options: str = None, delimiter: Optional[Literal[to_eol, space, tab, comma, whitespace, native, , ], None] = None, include_id: bool = True, errors: str = 'strict')¶

Generate an InChIKey string and id from a CDK molecule

This is mostly equivalent to calling:: open_molecule_writer(destination, “inchikey”, writer_args={…}, errors=errors)

Parameters:

destination (None, a filename string, or a file-like object) – where to write the molecules
RecMet (Boolean (default: None)) – Reconnect metals
FixedH (Boolean (default: None)) – Use fixed hydrogens
DoNotAddH (Boolean (default: None)) – Do not add hydrogens
options (space separated strings) – Configuration string to pass to the InChI API
delimiter (One of None, 'to_eol', 'space', 'tab', 'comma', 'whitespace', 'native', or the space or tab characters (default: None)) – The separator between the SMILES and the id
include_id (Boolean (default: True)) – if true, include the molecule id in the output
errors (one of "strict", "ignore", or "log") – specify how to handle errors

Returns:

a chemfp.base_toolkit.MoleculeWriter expecting CDK molecules

chemfp.cdk_toolkit.to_inchikeystring(mol: Any, *, id: Optional[str, None] = None, RecMet: bool = None, FixedH: bool = None, DoNotAddH: bool = None, options: str = None, errors: str = 'strict')¶

Generate an InChIKey string from a CDK molecule

This is equivalent to calling:: create_string(mol, “inchikeystring”, id=id, writer_args={…}, errors=errors)

Parameters:

mol (a CDK molecule) – a molecule object
id (None or a string (default: None)) – an alternate identifier for the output record, if relevant
RecMet (Boolean (default: None)) – Reconnect metals
FixedH (Boolean (default: None)) – Use fixed hydrogens
DoNotAddH (Boolean (default: None)) – Do not add hydrogens
options (space separated strings) – Configuration string to pass to the InChI API
errors (one of "strict", "ignore", or "log") – specify how to handle errors

Returns:

a CDK molecule object