chemfp.base_toolkit module

Support code which is shared by the toolkit wrappers and the text_toolkit.

This is an internal chemfp module. It should not be imported by programs which use the public API. (Let me know if anything else should be part of the public API.)

This module contains class definitions for objects which are returned as part of the public API.

A Format contains information about a toolkit format, along with methods to get information about format-specific parameters.

A FormatMetadata contains metadata about the structure file reader or writer, including the record format and any format-specific parameters.

The BaseMoleculeReader is the base class for IdAndMoleculeReader, IdAndRecordReader, MoleculeReader, and RecordReader, which are returned by the different ways to read from a structure file.

The BaseMoleculeWriter is the base class for MoleculeWriter and MoleculeStringWriter, which are used to write molecule (or records) to a file or to a string, respectively.

class chemfp.base_toolkit.Format(toolkit_name, format_config, compression=None)

Bases: object

Information about a toolkit format.

Use the toolkit’s get_format and related functions to return a Format instance.

compression = None

the compression type, “” for uncompressed, “gz” for gzip, etc.

extensions

Return a list of appropriate filename extensions for this format

Returns an empty list if this format does not support io.

get_default_reader_args()

Return a dictionary of the default reader arguments

The keys are unqualified (ie, without dots).

>>> from chemfp import openbabel_toolkit as T
>>> fmt = T.get_format("smi")
>>> fmt.get_default_reader_args()
{'has_header': False, 'delimiter': None, 'options': None}
Returns:a dictionary of string keys and Python objects for values
get_default_writer_args()

Return a dictionary of the default writer arguments

The keys are unqualified (ie, without dots).

>>> from chemfp import openbabel_toolkit as T
>>> fmt = T.get_format("smi")
>>> fmt.get_default_writer_args()
{'explicit_hydrogens': False, 'isomeric': True, 'delimiter': None,
'options': None, 'canonicalization': 'default'}
Returns:a dictionary of string keys and Python objects for values
get_reader_args_from_text_settings(reader_settings)

Process the reader_settings and return the reader_args for this format.

This function exists to help convert string settings, eg, from the command-line or a configuration, into usable reader_args.

Setting names may be fully-qualified names like “rdkit.sdf.sanitize”, partially qualified names like “rdkit.*.sanitize” or “openeye.smi.delimiter”, or unqualified names like “delimiter”. The qualifiers act as a namespace so the settings can be specified without needing to know the actual toolkit or format.

The function turns the format-appropriate qualified names into unqualified ones and converts the string values into usable Python objects. For example:

>>> from chemfp import rdkit_toolkit  as T
>>> fmt = T.get_format("smi")
>>> fmt.get_reader_args_from_text_settings({"rdkit.*.sanitize": "true", "delimiter": "to-eol"})
{'delimiter': 'to-eol', 'sanitize': True}
Parameters:reader_settings (a dictionary with string keys and values) – the reader settings
Returns:a dictionary of unqualified argument names as keys and processed Python values as values
get_unqualified_reader_args(reader_args)

Convert possibly qualified reader args into unqualified reader args for this format

The reader_args dictionary can be confusing because of the priority rules in how to resolve qualifiers, and because it can include irrelevant parameters, which are ignored.

The get_unqualified_reader_args function applies the qualifier resolution algorithm and removes irrelevant parameters to return a dictionary containing the equivalent unqualified reader args dictionary for this format.

>>> from chemfp import rdkit_toolkit as T
>> fmt = T.get_format("smi")
>>> fmt.get_unqualified_reader_args({"rdkit.*.delimiter": "tab", "smi.sanitize": False, "X": "Y"})
{'delimiter': 'tab', 'has_header': False, 'sanitize': False}
>>> fmt = T.get_format("can")
>>> fmt.get_unqualified_reader_args({"rdkit.*.delimiter": "tab", "smi.sanitize": False, "X": "Y"})
{'delimiter': 'tab', 'has_header': False, 'sanitize': True}
Parameters reader_args:
 reader arguments, which can contain qualified and unqualified arguments
Returns:a dictionary of reader arguments, containing only unqualified arguments appropriate for this format.
get_unqualified_writer_args(writer_args)

Convert possibly qualified writer args into unqualified writer args for this format

The writer_args dictionary can be confusing because of the priority rules in how to resolve qualifiers, and because it can include irrelevant parameters, which are ignored.

The get_unqualified_writer_args function applies the qualifier resolution algorithm and removes irrelevant parameters to return a dictionary containing the equivalent unqualified writer args dictionary for this format.

>>> from chemfp import rdkit_toolkit as T
>>> fmt = T.get_format("smi")
>>> fmt.get_unqualified_writer_args({"rdkit.*.delimiter": "tab", "smi.kekuleSmiles": True, "X": "Y"})
{'isomericSmiles': True, 'delimiter': 'tab', 'kekuleSmiles': True, 'allBondsExplicit': False, 'canonical': True}
>>> fmt = T.get_format("can")
>>> fmt.get_unqualified_writer_args({"rdkit.*.delimiter": "tab", "smi.kekuleSmiles": True, "X": "Y"})
{'isomericSmiles': False, 'delimiter': 'tab', 'kekuleSmiles': False, 'allBondsExplicit': False, 'canonical': True}
Parameters writer_args:
 writer arguments, which can contain qualified and unqualified arguments
Returns:a dictionary of writer arguments, containing only unqualified arguments appropriate for this format.
get_writer_args_from_text_settings(writer_settings)

Process writer_settings and return the writer_args for this format.

This function exists to help convert string settings, eg, from the command-line or a configuration, into usable writer_args.

Setting names may be fully-qualified names like “rdkit.sdf.kekulize”, partially qualified names like “rdkit.*.delimiter” or “openeye.smi.delimiter”, or unqualified names like “delimiter”. The qualifiers act as a namespace so the settings can be specified without needing to know the actual toolkit or format.

The function turns the format-appropriate qualified names into unqualified ones and converts the string values into usable Python objects. For example:

>>> from chemfp import rdkit_toolkit  as T
>>> fmt = T.get_format("smi")
>>> fmt.get_writer_args_from_text_settings({"rdkit.*.kekuleSmiles": "true", "canonical": "false"})
{'kekuleSmiles': True, 'canonical': False}
Parameters:writer_settings (a dictionary with string keys and values) – the writer settings
Returns:a dictionary of unqualified argument names as keys and processed Python values as values
is_available

Return True if this version of the toolkit understands this format

For example, if your version of RDKit does not support InChI then this would return False for the “inchi” and “inchikey” formats.

is_input_format

Return True if this toolkit can read molecules in this format

is_output_format

Return True if this toolkit can write molecules in this format

name = None

the format name, without any compression information

prefix

Return the prefix to turn an unqualified parameter into a fully qualified parameter

Returns:a string like “rdkit.smi” or “openbabel.sdf”
supports_io

Return True if this format support reading or writing records

This will return False for formats like “smistring” and “inchikeystring” because those are are not record-based formats.

Note: I don’t like this name. I may change it to is_record_format. Let me know if you have ideas, or if changing the name will be a problem.

toolkit_name = None

the toolkit name; either “cdk”, “openeye”, “openbabel”, or “rdkit”

class chemfp.base_toolkit.FormatMetadata(filename, record_format, args)

Bases: object

Information about the reader or writer

args = None

the final reader_args or writer_args, after all processing, and as used by the reader and writer

filename = None

the source or destination filename, the string “<string>” for string-based I/O, or None if not known

record_format = None

the normalized record format name. All SMILES formats are “smi” and this does not contain compression information

class chemfp.base_toolkit.BaseMoleculeReader(metadata, structure_reader, location)

Bases: object

Base class for the toolkit readers

A Reader is an iterators, so iter(reader) returns itself. next(reader) returns either a single object or a pair of objects depending on reader.

A Reader is also a context manager, and calls self.close() when exiting the context.

close()

Close the reader

If the reader wasn’t previously closed then close it. This will set the location properties to their final values, close any files that the reader may have opened, and set self.closed to False.

closed = None

False if the reader is open, otherwise True

location = None

a chemfp.io.Location instance

metadata = None

a chemfp.base_toolkit.FormatMetadata instance

class chemfp.base_toolkit.IdAndMoleculeReader(metadata, structure_reader, location)

Bases: chemfp.base_toolkit.BaseMoleculeReader

Read structures from a file and iterate over the (id, toolkit molecule) pairs

Note: the toolkit implementation is free to reuse a molecule instead of returning a new one each time.

class chemfp.base_toolkit.IdAndRecordReader(metadata, structure_reader, location)

Bases: chemfp.base_toolkit.BaseMoleculeReader

Read records from file and iterate over the (id, record string) pairs

class chemfp.base_toolkit.MoleculeReader(metadata, structure_reader, location)

Bases: chemfp.base_toolkit.BaseMoleculeReader

Read structures from a file and iterate over the toolkit molecules

Note: the toolkit implementation is free to reuse a molecule instead of returning a new one each time.

class chemfp.base_toolkit.RecordReader(metadata, structure_reader, location)

Bases: chemfp.base_toolkit.BaseMoleculeReader

Read and iterate over records as strings

class chemfp.base_toolkit.BaseMoleculeWriter(metadata, structure_writer, location)

Bases: object

The base molecule writer API, implemented by MoleculeWriter and MoleculeStringWriter

A writer is a context manager, and calls self.close() when the context exits.

close()

Close the writer

If the reader wasn’t previously closed then close it. This will set the location properties to their final values, close any files that the writer may have opened, and set self.closed to False.

closed = None

False if the reader is open, otherwise True

location = None

a chemfp.io.Location instance

metadata = None

a chemfp.base_toolkit.FormatMetadata instance

write_id_and_molecule(id, mol)

Write an identifier and toolkit molecule

If id is None then the output uses the molecule’s own id/title. Specifying the id may modify the molecule’s id/title, depending on the format and toolkit.

Parameters:
  • id (string, or None) – the identifier to use for the molecule
  • mol (a toolkit molecule) – the molecule to write
write_ids_and_molecules(ids_and_mols)

Write a sequence of (id, molecule) pairs

This function works well with chemfp.toolkit.read_ids_and_molecules(), for example, to convert an SD file to SMILES file, and use an alternate id_tag to specify an alternative identifier.

Parameters:mols (a (id string, toolkit molecule) iterator) – the molecules to write
write_molecule(mol)

Write a toolkit molecule

Parameters:mol (a toolkit molecule) – the molecule to write
write_molecules(mols)

Write a sequence of molecules

Parameters:mols (a toolkit molecule iterator) – the molecules to write
class chemfp.base_toolkit.MoleculeWriter(metadata, structure_writer, location)

Bases: chemfp.base_toolkit.BaseMoleculeWriter

A BaseMoleculeWriter which writes molecules to a file.

A writer is a context manager, and calls self.close() when the context exits.

class chemfp.base_toolkit.MoleculeStringWriter(details, structure_writer, getvalue, location)

Bases: chemfp.base_toolkit.BaseMoleculeWriter

A BaseMoleculeWriter which writes molecules to a string.

A writer is a context manager, and calls self.close() when the context exits.

close()

Close the writer

If the reader wasn’t previously closed then close it. This will set the location properties to their final values, close any files that the writer may have opened, and set self.closed to False.

self.getvalue() will still work after the file is closed.

getvalue()

Get the string containing all of the written record.

This function can also be called after the writer is closed.

Returns:a string