.. py:module:: chemfp

.. _chemfp-api: 

####################
chemfp API
####################

This chapter contains the docstrings for the public portion of the
chemfp API.

.. _chemfp-toplevel-api: 

chemfp top-level API
====================

The following functions and classes are in the top-level chemfp module.


is_licensed
-----------

.. py:function:: is_licensed()

   Return True if the chemfp license is valid, otherwise return False.
   
   :returns: True or False


New in chemfp 3.2.1.


get_license_date
----------------

.. py:function:: get_license_date()

   Return expiration date as a 3-element tuple in the form (year, month, day).
   
   If the license key is not found or does not pass the security check then the
   function returns None. If this version of chemfp does not need a license key
   then it returns (9999, 12, 25).
   
   :returns: a 3-element tuple or None


New in chemfp 3.2.1.


open
----

.. py:function:: open(source, format=None, location=None)

   Read fingerprints from a fingerprint file
   
   Read fingerprints from *source*, using the given format. If
   *source* is a string then it is treated as a filename. If *source*
   is None then fingerprints are read from stdin. Otherwise, *source*
   must be a Python file object supporting the ``read`` and
   ``readline`` methods.
   
   If *format* is None then the fingerprint file format and
   compression type are derived from the source filename, or from the
   ``name`` attribute of the source file object. If the source is None
   then the stdin is assumed to be uncompressed data in "fps" format.
   
   The supported format strings are:
   
      * "fps", "fps.gz", or "fps.zst" for fingerprints in FPS format
      * "fpb", "fpb.gz" or "fpb.zst"  for fingerprints in FPB format
   
   The optional *location* is a :class:`chemfp.io.Location` instance.
   It will only be used if the source is in FPS format.
   
   If the source is in FPS format then ``open`` will return a
   :class:`chemfp.fps_io.FPSReader`, which will use the *location*
   if specified.
   
   If the source is in FPB format then ``open`` will return a
   :class:`chemfp.arena.FingerprintArena` and the *location* will
   not be used.
   
   Here's an example of printing the contents of the file::
   
       from chemfp.bitops import hex_encode
       reader = chemfp.open("example.fps.gz")
       for id, fp in reader:
           print(id, hex_encode(fp))
       
   :param source: The fingerprint source.
   :type source: A filename string, a file object, or None
   :param format: The file format and optional compression.
   :type format: string, or None
   
   :returns: a :class:`chemfp.fps_io.FPSReader` or :class:`chemfp.arena.FingerprintArena`


load_fingerprints
-----------------

.. py:function:: load_fingerprints(reader, metadata=None, reorder=True, alignment=None, format=None)

   Load all of the fingerprints into an in-memory FingerprintArena data structure
   
   The function reads all of the fingerprints and identifers from *reader*
   and stores them into an in-memory :class:`chemfp.arena.FingerprintArena`
   data structure which supports fast similarity searches.
   
   If *reader* is a string or has a ``read`` attribute then it will be
   passed to the :func:`chemfp.open` function and the result used as the reader.
   If that returns a FingerprintArena then the *reorder* and *alignment*
   parameters are ignored and the arena returned.
   
   If *reader* is a FingerprintArena then the *reorder* and *alignment*
   parameters are ignored. If *metadata* is None then the input reader
   is returned without modifications, otherwise a new FingerprintArena
   is created, whose metadata attribue is *metadata*.
   
   Otherwise the *reader* or the result of opening the file must be an
   iterator which returns (id, fingerprint) pairs. These will be used
   to create a new arena.
   
   *metadata* specifies the metadata for all returned arenas. If not given
   the default comes from the source file or from ``reader.metadata``.
   
   The loader may reorder the fingerprints for better search performance.
   To prevent ordering, use ``reorder=False``. The *reorder* parameter
   is ignored if the reader is an arena or FPB file.
   
   The *alignment* option specifies the alignment data alignment and
   padding size for each fingerprint. A value of 8 means that each
   fingerprint will start on a 8 byte alignment, and use storage space
   which a multiple of 8 bytes long. The default value of None will
   determine the best alignment based on the fingerprint size and available
   popcount methods. This parameter is ignored if the reader is an
   arena or FPB file.
   
   :param reader: An iterator over (id, fingerprint) pairs
   :type reader: a string, file object, or (id, fingerprint) iterator
   :param metadata: The metadata for the arena, if other than reader.metadata
   :type metadata: Metadata
   :param reorder: Specify if fingerprints should be reordered for better performance
   :type reorder: True or False
   :param alignment: Alignment size in bytes (both data alignment and padding); None
      autoselects the best alignment.
   :type alignment: a positive integer, or None
   :param format: The file format name if the reader is a string
   :type format: None, "fps", "fps.gz", "fps.zst", "fpb", "fpb.gz" or "fpb.zst"
   :returns: :class:`chemfp.arena.FingerprintArena`


read_molecule_fingerprints
--------------------------

.. py:function:: read_molecule_fingerprints(type, source=None, format=None, id_tag=None, reader_args=None, errors="strict")

   Read structures from *source* and return the corresponding ids and fingerprints
   
   This returns an :class:`chemfp.fps_io.FPSReader` which can be iterated
   over to get the id and fingerprint for each read structure record. 
   The fingerprint generated depends on the value of *type*. Structures are
   read from *source*, which can either be the structure filename, or
   None to read from stdin.
   
   *type* contains the information about how to turn a structure
   into a fingerprint. It can be a string or a metadata instance.
   String values look like ``OpenBabel-FP2/1``, ``OpenEye-Path``, and
   ``OpenEye-Path/1 min_bonds=0 max_bonds=5 atype=DefaultAtom btype=DefaultBond``.
   Default values are used for unspecified parameters. Use a
   Metadata instance with *type* and *aromaticity* values set
   in order to pass aromaticity information to OpenEye.
   
   If *format* is None then the structure file format and compression
   are determined by the filename's extension(s), defaulting to
   uncompressed SMILES if that is not possible. Otherwise *format* may
   be "smi" or "sdf" optionally followed by ".gz" or ".bz2" to indicate
   compression. The OpenBabel and OpenEye toolkits also support
   additional formats.
   
   If *id_tag* is None, then the record id is based on the title
   field for the given format. If the input format is "sdf" then *id_tag*
   specifies the tag field containing the identifier. (Only the first
   line is used for multi-line values.) For example, ChEBI omits the
   title from the SD files and stores the id after the ">  <ChEBI ID>"
   line. In that case, use ``id_tag = "ChEBI ID"``.
   
   The *reader_args* is a dictionary with additional structure reader
   parameters. The parameters depend on the toolkit and the format.
   Unknown parameters are ignored.
   
   *errors* specifies how to handle errors. The value "strict" raises
   an exception if there are any detected errors. The value "report"
   sends an error message to stderr and skips to the next record. The
   value "ignore" skips to the next record.
   
   Here is an example of using fingerprints generated from structure file::
   
     from chemfp.bitops import hex_encode
     fp_reader = chemfp.read_molecule_fingerprints("OpenBabel-FP4/1", "example.sdf.gz")
     print("Each fingerprint has", fp_reader.metadata.num_bits, "bits")
     for (id, fp) in fp_reader:
       print(id, hex_encode(fp))
   
   See also :func:`chemfp.read_molecule_fingerprints_from_string`.
   
   :param type: information about how to convert the input structure into a fingerprint
   :type type: string or Metadata
   :param source: The structure data source.
   :type source: A filename (as a string), a file object, or None to read from stdin
   :param format: The file format and optional compression.
           Examples: "smi" and "sdf.gz"
   :type format: string, or None to autodetect based on the source
   :param id_tag: The tag containing the record id. Example: "ChEBI ID".
           Only valid for SD files.
   :type id_tag: string, or None to use the default title for the given format
   :param reader_args: additional parameters for the structure reader
   :type reader_args: dict, or None to use the default arguments
   :param errors: specify how to handle parse errors
   :type errors: one of "strict", "report", or "ignore"
   :returns: a :class:`chemfp.FingerprintReader`


read_molecule_fingerprints_from_string
--------------------------------------

.. py:function:: read_molecule_fingerprints_from_string( type, content, format, id_tag=None, reader_args=None, errors="strict")

   Read structures from the content string and return the corresponding ids and fingerprints
   
   The parameters are identical to :func:`chemfp.read_molecule_fingerprints`
   except that the entire content is passed through as a *content* string,
   rather than as a *source* filename. See that function for details.
   
   You must specify the format! As there is no *source* filename, it's not
   possible to guess the format based on the extension, and there is no
   support for auto-detecting the format by looking at the string content.
   
   :param type: information about how to convert the input structure into a fingerprint
   :type type: string or Metadata
   :param content: The structure data as a string.
   :type content: string
   :param format: The file format and optional compression.
           Examples: "smi" and "sdf.gz"
   :type format: string
   :param id_tag: The tag containing the record id. Example: "ChEBI ID".
           Only valid for SD files.
   :type id_tag: string, or None to use the default title for the given format
   :param reader_args: additional parameters for the structure reader
   :type reader_args: dict, or None to use the default arguments
   :param errors: specify how to handle parse errors
   :type errors: one of "strict" (raise exception), "report" (send a message
       to stderr and continue processing), or "ignore" (continue processing)
   :returns: a :class:`chemfp.FingerprintReader`


open_fingerprint_writer
-----------------------

.. py:function:: open_fingerprint_writer( destination, metadata=None, format=None, alignment=8, reorder=True, level=None, tmpdir=None, max_spool_size=None, errors="strict", location=None)

   Create a fingerprint writer for the given destination
   
   The fingerprint writer is an object with methods to write fingerprints
   to the given *destination*. The output format is based on the `format`.
   If that's None then the format depends on the *destination*, or is
   "fps" if the attempts at format detection fail.
   
   The *metadata*, if given, is a :class:`Metadata` instance, and used to
   fill the header of an FPS file or META block of an FPB file.
   
   If the output format is "fps", "fps.gz", or "fps.zst" then
   *destination* may be a filename, a file object, or None for
   stdout. If the output format is "fpb" then *destination* must be a
   filename or seekable file object. A fingerprint writer with
   compressed FPB output is not supported; use arena.save() instead, or
   post-process the file.
   
   Use `level` to change the compression level. The default is 9 for
   gzip and 3 for ztd. Use "min", "default", or "max" as aliases for
   the minimum, default, and maximum values for each range.
   
   Some options only apply to FPB output. The *alignment* specifies the
   arena byte alignment. By default the fingerprints are reordered
   by popcount, which enables sublinear similarity search. Set *reorder*
   to ``False`` to preserve the input fingerprint order.
   
   The default FPB writer stores everything into memory before writing
   the file, which may cause performance problems if there isn't
   enough available free memory. In that case, set *max_spool_size*
   to the number of bytes of memory to use before spooling intermediate
   data to a file. (Note: there are two independent spools so this
   may use up to roughly twice as much memory as specified.)
   
   Use *tmpdir* to specify where to write the temporary spool files
   if you don't want to use the operating system default. You may
   also set the TMPDIR, TEMP or TMP environment variables.
   
   Some options only apply to FPS output. *errors* specifies how to handle
   recoverable write errors. The value "strict" raises an exception if
   there are any detected errors. The value "report" sends an error message
   to stderr and skips to the next record. The value "ignore" skips to the
   next record.
   
   The *location* is a :class:`Location` instance. It lets the caller
   access state information such as the number of records that have
   been written.
   
   :param destination: the output destination
   :type destination: a filename, file object, or None
   :param metadata: the fingerprint metadata
   :type metadata: a Metadata instance, or None
   :param format: the output format
   :type format: None, "fps", "fps.gz", "fps.zst", or "fpb"
   :param alignment: arena byte alignment for FPB files
   :type alignment: positive integer
   :param reorder: True reorders the fingerprints by popcount, False leaves them in input order
   :type reorder: True or False
   :param level: True reorders the fingerprints by popcount, False leaves them in input order
   :type level: an integer, the strings "min", "default" or "max", or None for default
   :param tmpdir: the directory to use for temporary files, when max_spool_size is specified
   :type tmpdir: string or None
   :param max_spool_size: number of bytes to store in memory before using a temporary file. If None, use memory for everything.
   :type max_spool_size: integer, or None
   :param location: a location object used to access output state information
   :type location: a Location instance, or None
   :returns: a :class:`chemfp.FingerprintWriter`


ChemFPError
-----------

.. py:class:: ChemFPError

   Base class for all of the chemfp exceptions


ParseError
----------

.. py:class:: ParseError

   Exception raised by the molecule and fingerprint parsers and writers
   
   The public attributes are:
   
   .. py:attribute:: msg
   
      a string describing the exception
   
   .. py:attribute:: location
   
      a :class:`chemfp.io.Location` instance, or None


Metadata
--------

.. py:class:: Metadata

   Store information about a set of fingerprints
   
   The public attributes are:
   
   .. py:attribute:: num_bits
   
      the number of bits in the fingerprint
   
   .. py:attribute:: num_bytes
   
      the number of bytes in the fingerprint
   
   .. py:attribute:: type
   
      the fingerprint type string
   
   .. py:attribute:: aromaticity
   
      aromaticity model (only used with OEChem, and now deprecated)
   
   .. py:attribute:: software
   
      software used to make the fingerprints
   
   .. py:attribute:: sources
   
      list of sources used to make the fingerprint
   
   .. py:attribute:: date
   
      a `datetime <https://docs.python.org/2/library/datetime.html#module-datetime>`_
      timestamp of when the fingerprints were made


  .. py:method:: __repr__()

     Return a string like ``Metadata(num_bits=1024, num_bytes=128, type='OpenBabel/FP2', ....)``


  .. py:method:: __str__()

     Show the metadata in FPS header format


  .. py:method:: copy(num_bits=None, num_bytes=None, type=None, aromaticity=None, software=None, sources=None, date=None)

     Return a new Metadata instance based on the current attributes and optional new values
     
     When called with no parameter, make a new Metadata instance with the
     same attributes as the current instance.
     
     If a given call parameter is not None then it will be used instead of
     the current value. If you want to change a current value to None then
     you will have to modify the new Metadata after you created it.
     
     :param num_bits: the number of bits in the fingerprint
     :type num_bits: an integer, or None
     :param num_bytes: the number of bytes in the fingerprint
     :type num_bytes: an integer, or None
     :param type: the fingerprint type description
     :type type: string or None
     :param aromaticity: obsolete
     :type aromaticity: None
     :param software: a description of the software
     :type software: string or None
     :param sources: source filenames
     :type sources: list of strings, a string (interpreted as a list with one string), or None
     :param date: creation or processing date for the contents
     :type date: a datetime instance, or None
     :returns: a new Metadata instance


FingerprintReader
-----------------

.. py:class:: FingerprintReader

   Base class for all chemfp objects holding fingerprint records
   
   All FingerprintReader instances have a ``metadata`` attribute
   containing a Metadata and can be iteratated over to get the (id,
   fingerprint) for each record.


  .. py:method:: __iter__()

     iterate over the (id, fingerprint) pairs


  .. py:method:: iter_arenas(arena_size=1000)

     iterate through *arena_size* fingerprints at a time, as subarenas
     
     Iterate through *arena_size* fingerprints  at a time, returned
     as :class:`chemfp.arena.FingerprintArena` instances. The arenas are in input
     order and not reordered by popcount.
     
     This method helps trade off between performance and memory
     use. Working with arenas is often faster than processing one
     fingerprint at a time, but if the file is very large then you
     might run out of memory, or get bored while waiting to process
     all of the fingerprint before getting the first answer.
     
     If *arena_size* is None then this makes an iterator which
     returns a single arena containing all of the fingerprints.
     
     :param arena_size: The number of fingerprints to put into each arena.
     :type arena_size: positive integer, or None
     :returns: an iterator of :class:`chemfp.arena.FingerprintArena` instances


  .. py:method:: save(destination, format=None, level=None)

     Save the fingerprints to a given destination and format
     
     The output format is based on the *format*. If the format
     is None then the format depends on the *destination* file
     extension. If the extension isn't recognized then the
     fingerprints will be saved in "fps" format.
     
     If the output format is "fps", "fps.gz", or "fps.zst" then
     *destination* may be a filename, a file object, or None; None
     writes to stdout.
     
     If the output format is "fpb" then *destination* must be
     a filename or seekable file object. Chemfp cannot save
     to compressed FPB files.
     
     :param destination: the output destination
     :type destination: a filename, file object, or None
     :param format: the output format
     :type format: None, "fps", "fps.gz", "fps.zst", or "fpb"
     :param level: compression level when writing .gz or .zst files
     :type level: an integer, or "min", "default", or "max" for compressor-specific values
     :returns: None


  .. py:method:: get_fingerprint_type()

     Get the fingerprint type object based on the metadata's type field
     
     This uses ``self.metadata.type`` to get the fingerprint type
     string then calls :func:`chemfp.get_fingerprint_type` to get and return
     a :class:`chemfp.types.FingerprintType` instance.
     
     This will raise a TypeError if there is no metadata, and
     a ValueError if the type field was invalid or the fingerprint
     type isn't available.
     
     :returns: a :class:`chemfp.types.FingerprintType`


FingerprintIterator
-------------------

.. py:class:: FingerprintIterator

   A :class:`chemfp.FingerprintReader` for an iterator of (id, fingerprint) pairs
   
   This is often used as an adapter container to hold the metadata
   and (id, fingerprint) iterator. It supports an optional location,
   and can call a close function when the iterator has completed.
   
   A FingerprintIterator is a context manager which will close the
   underlying iterator if it's given a close handler.
   
   Like all iterators you can use next() to get the next
   (id, fingerprint) pair.


  .. py:method:: __init__(metadata, id_fp_iterator, location=None, close=None)

     Initialize with a Metadata instance and the (id, fingerprint) iterator
     
     The *metadata* is a :class:`Metadata` instance. The *id_fp_iterator*
     is an iterator which returns (id, fingerprint) pairs.
     
     The optional *location* is a :class:`chemfp.io.Location`. The optional
     *close* callable is called (as ``close()``) whenever ``self.close()``
     is called and when the context manager exits.


  .. py:method:: __iter__()

     Iterate over the (id, fingerprint) pairs


  .. py:method:: close()

     Close the iterator
     
     The call will be forwarded to the ``close`` callable passed to the
     constructor. If that ``close`` is None then this does nothing.


Fingerprints
------------

.. py:class:: Fingerprints

   A :class:`chemf.FingerprintReader` containing a metadata and a list of (id, fingerprint) pairs.
   
   This is typically used as an adapater when you have a list of (id, fingerprint)
   pairs and you want to pass it (and the metadata) to the rest of the chemfp API.
   
   This implements a simple list-like collection of fingerprints. It supports:
     - for (id, fingerprint) in fingerprints: ...
     - id, fingerprint = fingerprints[1]
     - len(fingerprints)
   
   More features, like slicing, will be added as needed or when requested.


  .. py:method:: __init__(metadata, id_fp_pairs)

     Initialize with a Metadata instance and the (id, fingerprint) pair list
     
     The *metadata* is a :class:`Metadata` instance. The *id_fp_iterator*
     is an iterator which returns (id, fingerprint) pairs.


FingerprintWriter
-----------------

.. py:class:: FingerprintWriter

   Base class for the fingerprint writers
   
   The three fingerprint writer classes are:
   
   * :class:`chemfp.fps_io.FPSWriter` - write an FPS file
   * :class:`chemfp.fpb_io.OrderedFPBWriter` - write an FPB file, sorted by popcount
   * :class:`chemfp.fpb_io.InputOrderFPBWriter` - write an FPB file, preserving input order
   
   If the chemfp_converters package is available then its
   FlushFingerprintWriter will be used to write fingerprints in flush
   format.
   
   Use :func:`chemfp.open_fingerprint_writer` to create a fingerprint
   writer class; do not create them directly.
   
   All classes have the following attributes:
   
   * metadata - a :class:`chemfp.Metadata` instance
   * format - a string describing the base format type (without compression); either 'fps' or 'fpb'
   * closed - False when the file is open, else True
   
   Fingerprint writers are also their own context manager, and
   close the writer on context exit.


  .. py:method:: write_fingerprint(id, fp)

     Write a single fingerprint record with the given id and fp to the destination
     
     :param string id: the record identifier
     :param fp: the fingerprint
     :type fp: byte string


  .. py:method:: write_fingerprints(id_fp_pairs)

     Write a sequence of (id, fingerprint) pairs to the destination
     
     :param id_fp_pairs: An iterable of (id, fingerprint) pairs. *id* is a string
       and *fingerprint* is a byte string.


  .. py:method:: close()

     Close the writer
     
     This will set self.closed to False.


ChemFPProblem
-------------

.. py:class:: ChemFPProblem

   Information about a compatibility problem between a query and target.
   
   Instances are generated by :func:`chemfp.check_fingerprint_problems`
   and :func:`chemfp.check_metadata_problems`.
   
   The public attributes are:
   
   .. py:attribute:: severity
   
       one of "info", "warning", or "error"
       
   .. py:attribute:: error_level
   
       5 for "info", 10 for "warning", and 20 for "error"
       
   .. py:attribute:: category
   
       a string used as a category name. This string will not change over time.
       
   .. py:attribute:: description
   
       a more detailed description of the error, including details of the mismatch.
       The description depends on *query_name* and *target_name* and may change over time.
   
   The current category names are:
     * "num_bits mismatch" (error)
     * "num_bytes_mismatch" (error)
     * "type mismatch" (warning)
     * "aromaticity mismatch" (info)
     * "software mismatch" (info)


check_fingerprint_problems
--------------------------

.. py:function:: check_fingerprint_problems(query_fp, target_metadata, query_name="query", target_name="target")

   Return a list of compatibility problems between a fingerprint and a metadata
   
   If there are no problems then this returns an empty list. If there is a
   bit length or byte length mismatch between the *query_fp* byte string
   and the *target_metadata* then it will return a list containing a
   :class:`ChemFPProblem` instance, with a severity level "error" and
   category "num_bytes mismatch".
   
   This function is usually used to check if a query fingerprint is
   compatible with the target fingerprints. In case of a problem, the
   default message looks like::
   
       >>> problems = check_fingerprint_problems("A"*64, Metadata(num_bytes=128))
       >>> problems[0].description
       'query contains 64 bytes but target has 128 byte fingerprints'
   
   You can change the error message with the *query_name* and *target_name*
   parameters::
   
       >>> import chemfp
       >>> problems = check_fingerprint_problems("z"*64, chemfp.Metadata(num_bytes=128),
       ...      query_name="input", target_name="database")
       >>> problems[0].description
       'input contains 64 bytes but database has 128 byte fingerprints'
   
   :param query_fp: a fingerprint (usually the query fingerprint)
   :type query_fp: byte string
   :param target_metadata: the metadata to check against (usually the target metadata)
   :type target_metadata: Metadata instance
   :param query_name: the text used to describe the fingerprint, in case of problem
   :type query_name: string
   :param target_name: the text used to describe the metadata, in case of problem
   :type target_name: string
   :return: a list of :class:`ChemFPProblem` instances


check_metadata_problems
-----------------------

.. py:function:: check_metadata_problems(query_metadata, target_metadata, query_name="query", target_name="target")

   Return a list of compatibility problems between two metadata instances.
   
   If there are no probelms then this returns an empty list. Otherwise it
   returns a list of :class:`ChemFPProblem` instances, with a severity level
   ranging from "info" to "error".
   
   Bit length and byte length mismatches produce an "error". Fingerprint type
   and aromaticity mismatches produce a "warning". Software version mismatches
   produce an "info".
   
   This is usually used to check if the query metadata is incompatible with
   the target metadata. In case of a problem the messages look like::
   
     >>> import chemfp
     >>> m1 = chemfp.Metadata(num_bytes=128, type="Example/1")
     >>> m2 = chemfp.Metadata(num_bytes=256, type="Counter-Example/1")
     >>> problems = chemfp.check_metadata_problems(m1, m2)
     >>> len(problems)
     2
     >>> print(problems[1].description)
     query has fingerprints of type 'Example/1' but target has fingerprints of type 'Counter-Example/1'
   
   You can change the error message with the *query_name* and *target_name*
   parameters::
   
     >>> problems = chemfp.check_metadata_problems(m1, m2, query_name="input", target_name="database")
     >>> print(problems[1].description)
     input has fingerprints of type 'Example/1' but database has fingerprints of type 'Counter-Example/1'
   
   :param fp: a fingerprint
   :type fp: byte string
   :param metadata: the metadata to check against
   :type metadata: Metadata instance
   :param query_name: the text used to describe the fingerprint, in case of problem
   :type query_name: string
   :param target_name: the text used to describe the metadata, in case of problem
   :type target_name: string
   :return: a list of :class:`ChemFPProblem` instances


count_tanimoto_hits
-------------------

.. py:function:: count_tanimoto_hits(queries, targets, threshold=0.7, arena_size=100)

   Count the number of targets within *threshold* of each query term
   
   For each query in *queries*, count the number of targets in *targets*
   which are at least *threshold* similar to the query. This function
   returns an iterator containing the (query_id, count) pairs.
   
   Example::
   
     queries = chemfp.open("queries.fps")
     targets = chemfp.load_fingerprints("targets.fps.gz")
     for (query_id, count) in chemfp.count_tanimoto_hits(queries, targets, threshold=0.9):
       print(query_id, "has", count, "neighbors with at least 0.9 similarity")
   
   Internally, queries are processed in batches with *arena_size*
   elements. A small batch size uses less overall memory and has lower
   processing latency, while a large batch size has better overall
   performance. Use arena_size=None to process the input as a single batch.
   
   Note: an :class:`chemfp.fps_io.FPSReader` may be used as a target but
   it will only process one batch and not reset for the next batch. It's
   faster to search a :class:`chemfp.arena.FingerprintArena`, but if
   you have an FPS file then that takes extra time to load. At times,
   if there is a small number of queries, the time to load the arena
   from an FPS file may be slower than the direct search using an FPSReader.
   
   If you know the targets are in an arena then you may want to use
   :func:`chemfp.search.count_tanimoto_hits_fp` or
   :func:`chemfp.search.count_tanimoto_hits_arena`.
   
   :param queries: The query fingerprints.
   :type queries: any fingerprint container
   :param targets: The target fingerprints.
   :type targets: :class:`chemfp.arena.FingerprintArena` or the slower :class:`chemfp.fps_io.FPSReader`
   :param threshold: The minimum score threshold.
   :type threshold: float between 0.0 and 1.0, inclusive
   :param arena_size: The number of queries to process in a batch
   :type arena_size: a positive integer, or None
   :returns: iterator of the (query_id, score) pairs, one for each query


count_tanimoto_hits_symmetric
-----------------------------

.. py:function:: count_tanimoto_hits_symmetric(fingerprints, threshold=0.7)

   Find the number of other fingerprints within *threshold* of each fingerprint
   
   For each fingerprint in the *fingerprints* arena, find the number
   of other fingerprints in the same arena which are at least
   *threshold* similar to it. The arena must have pre-computed
   popcounts. A fingerprint never matches itself.
   
   This function returns an iterator of (fingerprint_id, count) pairs.
   
   Example::
   
     arena = chemfp.load_fingerprints("targets.fps.gz")
     for (fp_id, count) in chemfp.count_tanimoto_hits_symmetric(arena, threshold=0.6):
         print(fp_id, "has", count, "neighbors with at least 0.6 similarity")
   
   You may also be interested in :func:`chemfp.search.count_tanimoto_hits_symmetric`.
   
   :param fingerprints: The arena containing the fingerprints.
   :type fingerprints: a FingerprintArena with precomputed popcount_indices
   :param threshold: The minimum score threshold.
   :type threshold: float between 0.0 and 1.0, inclusive
   :returns:
     An iterator of (fp_id, count) pairs, one for each fingerprint


threshold_tanimoto_search
-------------------------

.. py:function:: threshold_tanimoto_search(queries, targets, threshold=0.7, arena_size=100)

   Find all targets within *threshold* of each query term
   
   For each query in *queries*, find all the targets in *targets* which
   are at least *threshold* similar to the query. This function returns
   an iterator containing the (query_id, hits) pairs. The hits are stored
   as a list of (target_id, score) pairs.
   
   Example::
   
     queries = chemfp.open("queries.fps")
     targets = chemfp.load_fingerprints("targets.fps.gz")
     for (query_id, hits) in chemfp.id_threshold_tanimoto_search(queries, targets, threshold=0.8):
         print(query_id, "has", len(hits), "neighbors with at least 0.8 similarity")
         non_identical = [target_id for (target_id, score) in hits if score != 1.0]
         print("  The non-identical hits are:", non_identical)
   
   Internally, queries are processed in batches with *arena_size* elements.
   A small batch size uses less overall memory and has lower processing
   latency, while a large batch size has better overall performance. Use
   ``arena_size=None`` to process the input as a single batch.
   
   Note: an :class:`chemfp.fps_io.FPSReader` may be used as a target but
   it will only process one batch and not reset for the next batch. It's
   faster to search a :class:`chemfp.arena.FingerprintArena`, but if
   you have an FPS file then that takes extra time to load. At times,
   if there is a small number of queries, the time to load the arena
   from an FPS file may be slower than the direct search using an FPSReader.
   
   If you know the targets are in an arena then you may want to use
   :func:`chemfp.search.threshold_tanimoto_search_fp` or
   :func:`chemfp.search.threshold_tanimoto_search_arena`.
   
   :param queries: The query fingerprints.
   :type queries: any fingerprint container
   :param targets: The target fingerprints.
   :type targets: :class:`chemfp.arena.FingerprintArena` or the slower :class:`chemfp.fps_io.FPSReader`
   :param threshold: The minimum score threshold.
   :type threshold: float between 0.0 and 1.0, inclusive
   :param arena_size: The number of queries to process in a batch
   :type arena_size: positive integer, or None
   :returns:
     An iterator containing (query_id, hits) pairs, one for each query.
     'hits' contains a list of (target_id, score) pairs.


threshold_tanimoto_search_symmetric
-----------------------------------

.. py:function:: threshold_tanimoto_search_symmetric(fingerprints, threshold=0.7)

   Find the other fingerprints within *threshold* of each fingerprint
   
   For each fingerprint in the *fingerprints* arena, find the other
   fingerprints in the same arena which share at least *threshold*
   similar to it. The arena must have pre-computed popcounts. A
   fingerprint never matches itself.
   
   This function returns an iterator of (fingerprint, SearchResult) pairs.
   The :class:`chemfp.search.SearchResult` hit order is arbitrary.
   
   Example::
   
     arena = chemfp.load_fingerprints("targets.fps.gz")
     for (fp_id, hits) in chemfp.threshold_tanimoto_search_symmetric(arena, threshold=0.75):
         print(fp_id, "has", len(hits), "neighbors:")
         for (other_id, score) in hits.get_ids_and_scores():
             print("   %s  %.2f" % (other_id, score))
   
   You may also be interested in the :func:`chemfp.search.threshold_tanimoto_search_symmetric`
   function.
   
   :param fingerprints: The arena containing the fingerprints.
   :type fingerprints: a FingerprintArena with precomputed popcount_indices
   :param threshold: The minimum score threshold.
   :type threshold: float between 0.0 and 1.0, inclusive
   :returns: An iterator of (fp_id, SearchResult) pairs, one for each fingerprint


knearest_tanimoto_search
------------------------

.. py:function:: knearest_tanimoto_search(queries, targets, k=3, threshold=0.7, arena_size=100)

   Find the *k*-nearest targets within *threshold* of each query term
   
   For each query in *queries*, find the *k*-nearest of all the targets
   in *targets* which are at least *threshold* similar to the query. Ties
   are broken arbitrarily and hits with scores equal to the smallest value
   may have been omitted.
   
   This function returns an iterator containing the (query_id, hits) pairs,
   where hits is a list of (target_id, score) pairs, sorted so that the
   highest scores are first. The order of ties is arbitrary.
   
   Example::
   
     # Use the first 5 fingerprints as the queries 
     queries = next(chemfp.open("pubchem_subset.fps").iter_arenas(5))
     targets = chemfp.load_fingerprints("pubchem_subset.fps")
     
     # Find the 3 nearest hits with a similarity of at least 0.8
     for (query_id, hits) in chemfp.id_knearest_tanimoto_search(queries, targets, k=3, threshold=0.8):
         print(query_id, "has", len(hits), "neighbors with at least 0.8 similarity")
         if hits:
             target_id, score = hits[-1]
             print("    The least similar is", target_id, "with score", score)
   
   Internally, queries are processed in batches with *arena_size* elements.
   A small batch size uses less overall memory and has lower processing
   latency, while a large batch size has better overall performance. Use
   ``arena_size=None`` to process the input as a single batch.
   
   Note: an :class:`chemfp.fps_io.FPSReader` may be used as a target but
   it will only process one batch and not reset for the next batch. It's
   faster to search a :class:`chemfp.arena.FingerprintArena`, but if
   you have an FPS file then that takes extra time to load. At times,
   if there is a small number of queries, the time to load the arena
   from an FPS file may be slower than the direct search using an FPSReader.
   
   If you know the targets are in an arena then you may want to use
   :func:`chemfp.search.knearest_tanimoto_search_fp` or
   :func:`chemfp.search.knearest_tanimoto_search_arena`.
   
   :param queries: The query fingerprints.
   :type queries: any fingerprint container
   :param targets: The target fingerprints.
   :type targets: :class:`chemfp.arena.FingerprintArena` or the slower :class:`chemfp.fps_io.FPSReader`
   :param k: The maximum number of nearest neighbors to find.
   :type k: positive integer
   :param threshold: The minimum score threshold.
   :type threshold: float between 0.0 and 1.0, inclusive
   :param arena_size: The number of queries to process in a batch
   :type arena_size: positive integer, or None
   :returns:
     An iterator containing (query_id, hits) pairs, one for each query.
     The *hits* are a list of (target_id, score) pairs, sorted by score.


knearest_tanimoto_search_symmetric
----------------------------------

.. py:function:: knearest_tanimoto_search_symmetric(fingerprints, k=3, threshold=0.7)

   Find the *k*-nearest fingerprints within *threshold* of each fingerprint
   
   For each fingerprint in the *fingerprints* arena, find the nearest
   *k* fingerprints in the same arena which have at least *threshold*
   similar to it. The arena must have pre-computed popcounts. A
   fingerprint never matches itself.
   
   This function returns an iterator of (fingerprint, SearchResult) pairs.
   The :class:`chemfp.search.SearchResult` hits are ordered from highest
   score to lowest, with ties broken arbitrarily.
   
   Example::
   
     arena = chemfp.load_fingerprints("targets.fps.gz")
     for (fp_id, hits) in chemfp.knearest_tanimoto_search_symmetric(arena, k=5, threshold=0.5):
         print(fp_id, "has", len(hits), "neighbors, with scores", end="")
         print(", ".join("%.2f" % x for x in hits.get_scores()))
   
   You may also be interested in the :func:`chemfp.search.knearest_tanimoto_search_symmetric`
   function.
   
   :param fingerprints: The arena containing the fingerprints.
   :type fingerprints: a FingerprintArena with precomputed popcount_indices
   :param k: The maximum number of nearest neighbors to find.
   :type k: positive integer
   :param threshold: The minimum score threshold.
   :type threshold: float between 0.0 and 1.0, inclusive
   :returns: An iterator of (fp_id, SearchResult) pairs, one for each fingerprint


count_tversky_hits
------------------

.. py:function:: count_tversky_hits(queries, targets, threshold=0.7, alpha=1.0, beta=1.0, arena_size=100)

   Count the number of targets within *threshold* of each query term
   
   For each query in *queries*, count the number of targets in *targets*
   which are at least *threshold* similar to the query. This function
   returns an iterator containing the (query_id, count) pairs.
   
   Example::
   
     queries = chemfp.open("queries.fps")
     targets = chemfp.load_fingerprints("targets.fps.gz")
     for (query_id, count) in chemfp.count_tversky_hits(
               queries, targets, threshold=0.9, alpha=0.5, beta=0.5):
       print(query_id, "has", count, "neighbors with at least 0.9 Dice similarity")
   
   Internally, queries are processed in batches with *arena_size*
   elements. A small batch size uses less overall memory and has lower
   processing latency, while a large batch size has better overall
   performance. Use arena_size=None to process the input as a single batch.
   
   Note: an :class:`chemfp.fps_io.FPSReader` may be used as a target but
   it will only process one batch and not reset for the next batch. It's
   faster to search a :class:`chemfp.arena.FingerprintArena`, but if
   you have an FPS file then that takes extra time to load. At times,
   if there is a small number of queries, the time to load the arena
   from an FPS file may be slower than the direct search using an FPSReader.
   
   If you know the targets are in an arena then you may want to use
   :func:`chemfp.search.count_tversky_hits_fp` or
   :func:`chemfp.search.count_tversky_hits_arena`.
   
   :param queries: The query fingerprints.
   :type queries: any fingerprint container
   :param targets: The target fingerprints.
   :type targets: :class:`chemfp.arena.FingerprintArena` or the slower :class:`chemfp.fps_io.FPSReader`
   :param threshold: The minimum score threshold.
   :type threshold: float between 0.0 and 1.0, inclusive
   :param arena_size: The number of queries to process in a batch
   :type arena_size: a positive integer, or None
   :returns: iterator of the (query_id, score) pairs, one for each query


count_tversky_hits_symmetric
----------------------------

.. py:function:: count_tversky_hits_symmetric(fingerprints, threshold=0.7, alpha=1.0, beta=1.0)

   Find the number of other fingerprints within *threshold* of each fingerprint
   
   For each fingerprint in the *fingerprints* arena, find the number
   of other fingerprints in the same arena which are at least
   *threshold* similar to it. The arena must have pre-computed
   popcounts. A fingerprint never matches itself.
   
   This function returns an iterator of (fingerprint_id, count) pairs.
   
   Example::
   
     arena = chemfp.load_fingerprints("targets.fps.gz")
     for (fp_id, count) in chemfp.count_tversky_hits_symmetric(
             arena, threshold=0.6, alpha=0.5, beta=0.5):
         print(fp_id, "has", count, "neighbors with at least 0.6 Dice similarity")
   
   You may also be interested in :func:`chemfp.search.count_tversky_hits_symmetric`.
   
   :param fingerprints: The arena containing the fingerprints.
   :type fingerprints: a FingerprintArena with precomputed popcount_indices
   :param threshold: The minimum score threshold.
   :type threshold: float between 0.0 and 1.0, inclusive
   :returns:
     An iterator of (fp_id, count) pairs, one for each fingerprint


threshold_tversky_search
------------------------

.. py:function:: threshold_tversky_search(queries, targets, threshold=0.7, alpha=1.0, beta=1.0, arena_size=100)

   Find all targets within *threshold* of each query term
   
   For each query in *queries*, find all the targets in *targets* which
   are at least *threshold* similar to the query. This function returns
   an iterator containing the (query_id, hits) pairs. The hits are stored
   as a list of (target_id, score) pairs.
   
   Example::
   
     queries = chemfp.open("queries.fps")
     targets = chemfp.load_fingerprints("targets.fps.gz")
     for (query_id, hits) in chemfp.id_threshold_tanimoto_search(
                queries, targets, threshold=0.8, alpha=0.5, beta=0.5):
         print(query_id, "has", len(hits), "neighbors with at least 0.8 Dice similarity")
         non_identical = [target_id for (target_id, score) in hits if score != 1.0]
         print("  The non-identical hits are:", non_identical)
   
   Internally, queries are processed in batches with *arena_size* elements.
   A small batch size uses less overall memory and has lower processing
   latency, while a large batch size has better overall performance. Use
   ``arena_size=None`` to process the input as a single batch.
   
   Note: an :class:`chemfp.fps_io.FPSReader` may be used as a target but
   it will only process one batch and not reset for the next batch. It's
   faster to search a :class:`chemfp.arena.FingerprintArena`, but if
   you have an FPS file then that takes extra time to load. At times,
   if there is a small number of queries, the time to load the arena
   from an FPS file may be slower than the direct search using an FPSReader.
   
   If you know the targets are in an arena then you may want to use
   :func:`chemfp.search.threshold_tversky_search_fp` or
   :func:`chemfp.search.threshold_tversky_search_arena`.
   
   :param queries: The query fingerprints.
   :type queries: any fingerprint container
   :param targets: The target fingerprints.
   :type targets: :class:`chemfp.arena.FingerprintArena` or the slower :class:`chemfp.fps_io.FPSReader`
   :param threshold: The minimum score threshold.
   :type threshold: float between 0.0 and 1.0, inclusive
   :param arena_size: The number of queries to process in a batch
   :type arena_size: positive integer, or None
   :returns:
     An iterator containing (query_id, hits) pairs, one for each query.
     'hits' contains a list of (target_id, score) pairs.


threshold_tversky_search_symmetric
----------------------------------

.. py:function:: threshold_tversky_search_symmetric(fingerprints, threshold=0.7, alpha=1.0, beta=1.0)

   Find the other fingerprints within *threshold* of each fingerprint
   
   For each fingerprint in the *fingerprints* arena, find the other
   fingerprints in the same arena which share at least *threshold*
   similar to it. The arena must have pre-computed popcounts. A
   fingerprint never matches itself.
   
   This function returns an iterator of (fingerprint, SearchResult) pairs.
   The :class:`chemfp.search.SearchResult` hit order is arbitrary.
   
   Example::
   
     arena = chemfp.load_fingerprints("targets.fps.gz")
     for (fp_id, hits) in chemfp.threshold_tversky_search_symmetric(
                arena, threshold=0.75, alpha=0.5, beta=0.5):
         print(fp_id, "has", len(hits), "Dice neighbors:")
         for (other_id, score) in hits.get_ids_and_scores():
             print("   %s  %.2f" % (other_id, score))
   
   You may also be interested in the :func:`chemfp.search.threshold_tversky_search_symmetric`
   function.
   
   :param fingerprints: The arena containing the fingerprints.
   :type fingerprints: a FingerprintArena with precomputed popcount_indices
   :param threshold: The minimum score threshold.
   :type threshold: float between 0.0 and 1.0, inclusive
   :returns: An iterator of (fp_id, SearchResult) pairs, one for each fingerprint


knearest_tversky_search
-----------------------

.. py:function:: knearest_tversky_search(queries, targets, k=3, threshold=0.7, alpha=1.0, beta=1.0, arena_size=100)

   Find the *k*-nearest targets within *threshold* of each query term
   
   For each query in *queries*, find the *k*-nearest of all the targets
   in *targets* which are at least *threshold* similar to the query. Ties
   are broken arbitrarily and hits with scores equal to the smallest value
   may have been omitted.
   
   This function returns an iterator containing the (query_id, hits) pairs,
   where hits is a list of (target_id, score) pairs, sorted so that the
   highest scores are first. The order of ties is arbitrary.
   
   Example::
   
     # Use the first 5 fingerprints as the queries 
     queries = next(chemfp.open("pubchem_subset.fps").iter_arenas(5))
     targets = chemfp.load_fingerprints("pubchem_subset.fps")
     
     # Find the 3 nearest hits with a similarity of at least 0.8
     for (query_id, hits) in chemfp.id_knearest_tversky_search(
               queries, targets, k=3, threshold=0.8, alpha=0.5, beta=0.5):
         print(query_id, "has", len(hits), "neighbors with at least 0.8 Dice similarity")
         if hits:
             target_id, score = hits[-1]
             print("    The least similar is", target_id, "with score", score)
   
   Internally, queries are processed in batches with *arena_size* elements.
   A small batch size uses less overall memory and has lower processing
   latency, while a large batch size has better overall performance. Use
   ``arena_size=None`` to process the input as a single batch.
   
   Note: an :class:`chemfp.fps_io.FPSReader` may be used as a target but
   it will only process one batch and not reset for the next batch. It's
   faster to search a :class:`chemfp.arena.FingerprintArena`, but if
   you have an FPS file then that takes extra time to load. At times,
   if there is a small number of queries, the time to load the arena
   from an FPS file may be slower than the direct search using an FPSReader.
   
   If you know the targets are in an arena then you may want to use
   :func:`chemfp.search.knearest_tversky_search_fp` or
   :func:`chemfp.search.knearest_tversky_search_arena`.
   
   :param queries: The query fingerprints.
   :type queries: any fingerprint container
   :param targets: The target fingerprints.
   :type targets: :class:`chemfp.arena.FingerprintArena` or the slower :class:`chemfp.fps_io.FPSReader`
   :param k: The maximum number of nearest neighbors to find.
   :type k: positive integer
   :param threshold: The minimum score threshold.
   :type threshold: float between 0.0 and 1.0, inclusive
   :param arena_size: The number of queries to process in a batch
   :type arena_size: positive integer, or None
   :returns:
     An iterator containing (query_id, hits) pairs, one for each query.
     The *hits* are a list of (target_id, score) pairs, sorted by score.


knearest_tversky_search_symmetric
---------------------------------

.. py:function:: knearest_tversky_search_symmetric(fingerprints, k=3, threshold=0.7, alpha=1.0, beta=1.0)

   Find the *k*-nearest fingerprints within *threshold* of each fingerprint
   
   For each fingerprint in the *fingerprints* arena, find the nearest
   *k* fingerprints in the same arena which have at least *threshold*
   similar to it. The arena must have pre-computed popcounts. A
   fingerprint never matches itself.
   
   This function returns an iterator of (fingerprint, SearchResult) pairs.
   The :class:`chemfp.search.SearchResult` hits are ordered from highest
   score to lowest, with ties broken arbitrarily.
   
   Example::
   
     arena = chemfp.load_fingerprints("targets.fps.gz")
     for (fp_id, hits) in chemfp.knearest_tversky_search_symmetric(
             arena, k=5, threshold=0.5, alpha=0.5, beta=0.5):
         print(fp_id, "has", len(hits), "neighbors, with Dice scores", end="")
         print(", ".join("%.2f" % x for x in hits.get_scores()))
   
   You may also be interested in the :func:`chemfp.search.knearest_tversky_search_symmetric`
   function.
   
   :param fingerprints: The arena containing the fingerprints.
   :type fingerprints: a FingerprintArena with precomputed popcount_indices
   :param k: The maximum number of nearest neighbors to find.
   :type k: positive integer
   :param threshold: The minimum score threshold.
   :type threshold: float between 0.0 and 1.0, inclusive
   :returns: An iterator of (fp_id, SearchResult) pairs, one for each fingerprint


get_fingerprint_families
------------------------

.. py:function:: get_fingerprint_families(toolkit_name=None)

   Return a list of available fingerprint families
   
   :param string toolkit_name: restrict fingerprints to the named toolkit
   :returns: a list of :class:`chemfp.types.FingerprintFamily` instances


get_fingerprint_family
----------------------

.. py:function:: get_fingerprint_family(family_name)

   Return the named fingerprint family, or raise a ValueError if not available
   
   Given a *family_name* like ``OpenBabel-FP2`` or ``OpenEye-MACCS166``
   return the corresponding :class:`chemfp.types.FingerprintFamily`.
   
   :param string family_name: the family name
   :returns: a :class:`chemfp.types.FingerprintFamily` instance


get_fingerprint_family_names
----------------------------

.. py:function:: get_fingerprint_family_names(include_unavailable=False, toolkit_name=None)

   Return a set of fingerprint family name strings
   
   The function tries to load each known fingerprint family. The
   names of the families which could be loaded are returned as
   a set of strings.
   
   If *include_unavailable* is True then this will return a
   set of all of the fingerprint family names, including those
   which could not be loaded.
   
   The set contains both the versioned and unversioned family names,
   so both ``OpenBabel-FP2/1`` and ``OpenBabel-FP2`` may be returned.
   
   :param include_unavailable: Should unavailable family names be included in the result set?
   :type include_unavailable: True or False
   :returns: a set of strings


get_fingerprint_type
--------------------

.. py:function:: get_fingerprint_type(type, fingerprint_kwargs=None)

   Get the fingerprint type based on its type string and optional keyword arguments
   
   Given a fingerprint *type* string like ``OpenBabel-FP2``, or
   ``RDKit-Fingerprint/1 fpSize=1024``, return the corresponding
   :class:`chemfp.types.FingerprintType`.
   
   The fingerprint type string may include fingerprint parameters.
   Parameters can also be specified through the *fingerprint_kwargs*
   dictionary, where the dictionary values are native Python values.
   If the same parameter is specified in the type string and the
   kwargs dictionary then the *fingerprint_kwargs* takes precedence.
   
   For example:
   
       >>> fptype = get_fingerprint_type("RDKit-Fingerprint fpSize=1024 minPath=3", {"fpSize": 4096})
       >>> fptype.get_type()
       'RDKit-Fingerprint/2 minPath=3 maxPath=7 fpSize=4096 nBitsPerHash=2 useHs=1'
   
   Use :func:`get_fingerprint_type_from_text_settings` if your fingerprint
   parameter values are all string-encoded, eg, from the command-line
   or a configuration file.
   
   :param string type: a fingerprint type string
   :param fingerprint_kwargs: fingerprint type parameters
   :type fingerprint_kwargs: a dictionary of string names and Python types for values
   :returns: a :class:`chemfp.types.FingerprintType`


get_fingerprint_type_from_text_settings
---------------------------------------

.. py:function:: get_fingerprint_type_from_text_settings(type, settings=None)

   Get the fingerprint type based on its type string and optional settings arguments
   
   Given a fingerprint *type* string like ``OpenBabel-FP2``, or
   ``RDKit-Fingerprint/1 fpSize=1024``, return the corresponding
   :class:`chemfp.types.FingerprintType`.
   
   The fingerprint type string may include fingerprint parameters.
   Parameters can also be specified through the *settings* dictionary,
   where the dictionary values are string-encoded values. If the same
   parameter is specified in the *type* string and the *settings*
   dictionary then the *settings* take precedence.
   
   For example:
   
       >>> fptype = get_fingerprint_type_from_text_settings("RDKit-Fingerprint fpSize=1024 minPath=3",
       ...                                                  {"fpSize": "4096"})
       >>> fptype.get_type()
       'RDKit-Fingerprint/2 minPath=3 maxPath=7 fpSize=4096 nBitsPerHash=2 useHs=1'
   
   This function is for string settings from a configuration file or
   command-line. Use :func:`get_fingerprint_type` if your fingerprint
   parameters are Python values.
   
   :param type: a fingerprint type string
   :type type: string
   :param fingerprint_kwargs: fingerprint type parameters
   :type fingerprint_kwargs: a dictionary of string names and Python types for values
   :returns: a :class:`chemfp.types.FingerprintType`


has_fingerprint_family
----------------------

.. py:function:: has_fingerprint_family(family_name)

   Test if the fingerprint family is available
   
   Return True if the fingerprint *family_name* is available,
   otherwise False. The *family_name* may be versioned or
   unversioned, like "OpenBabel-FP2/1" or "OpenEye-MACCS166".
   
   :param string family_name: the family name
   :returns: True or False


get_max_threads
---------------

.. py:function:: get_max_threads()

   Return the maximum number of threads available.
   
   WARNING: this likely doesn't do what you think it does. Do not use!
   
   If OpenMP is not available then this will return 1. Otherwise it
   returns the maximum number of threads available, as reported by
   omp_get_num_threads().


get_num_threads
---------------

.. py:function:: get_num_threads()

   Return the number of OpenMP threads to use in searches
   
   Initially this is the value returned by omp_get_max_threads(),
   which is generally 4 unless you set the environment variable
   OMP_NUM_THREADS to some other value. 
   
   It may be any value in the range 1 to get_max_threads(), inclusive.
   
   :returns: the current number of OpenMP threads to use


set_num_threads
---------------

.. py:function:: set_num_threads(num_threads)

   Set the number of OpenMP threads to use in searches
   
   If *num_threads* is less than one then it is treated as one, and a
   value greater than get_max_threads() is treated as get_max_threads().
   
   :param int num_threads: the new number of OpenMP threads to use


get_toolkit
-----------

.. py:function:: get_toolkit(toolkit_name)

   Return the named toolkit, if available, or raise a ValueError
   
   If *toolkit_name* is one of "openbabel", "openeye", or "rdkit"
   and the named toolkit is available, then it will return
   :mod:`chemfp.openbabel_toolkit`, :mod:`chemfp.openeye_toolkit`,
   or :mod:`chemfp.rdkit_toolkit`, respectively.::
   
     >>> import chemfp
     >>> chemfp.get_toolkit("openeye")
     <module 'chemfp.openeye_toolkit' from 'chemfp/openeye_toolkit.py'>
     >>> chemfp.get_toolkit("rdkit")
     Traceback (most recent call last):
          ...
     ValueError: Unable to get toolkit 'rdkit': No module named rdkit
   
   :param toolkit_name: the toolkit name
   :type toolkit_name: string
   :returns: the chemfp toolkit
   :raises: ValueError if *toolkit_name* is unknown or the toolkit does not exist


get_toolkit_names
-----------------

.. py:function:: get_toolkit_names()

   Return a set of available toolkit names
   
   The function checks if each supported toolkit is available by
   trying to import its corresponding module. It returns a set of
   toolkit names::
   
     >>> import chemfp
     >>> chemfp.get_toolkit_names()
     set(['openeye', 'rdkit', 'openbabel'])
   
   :returns: a set of toolkit names, as strings


has_toolkit
-----------

.. py:function:: has_toolkit(toolkit_name)

   Return True if the named toolkit is available, otherwise False
   
   If *toolkit_name* is one of "openbabel", "openeye", or "rdkit"
   then this function will test to see if the given toolkit is
   available, and if so return True. Otherwise it returns False.
   
     >>> import chemfp
     >>> chemfp.has_toolkit("openeye")
     True
     >>> chemfp.has_toolkit("openbabel")
     False
   
   The initial test for a toolkit can be slow, especially
   if the underlying toolkit loads a lot of shared libraries.
   The test is only done once, and cached.
   
   :param toolkit_name: the toolkit name
   :type toolkit_name: string
   :returns: True or False


.. py:module:: chemfp.types


chemfp.types - fingerprint families and types
=============================================

A "fingerprint type" is an object which knows how to convert a
molecule into a fingerprint. A "fingerprint family" is an object which
uses a set of parameters to make a specific fingerprint type. ::

  >>> import chemfp
  >>> fpfamily = chemfp.get_fingerprint_family("RDKit-Fingerprint")
  >>> fpfamily.get_defaults()
  {'maxPath': 7, 'fpSize': 2048, 'nBitsPerHash': 2, 'minPath': 1, 'useHs': 1}
  >>> 
  >>> fptype = fpfamily()  # create the default fingerprint type
  >>> fptype.get_type()
  'RDKit-Fingerprint/2 minPath=1 maxPath=7 fpSize=2048 nBitsPerHash=2 useHs=1'
  >>> 
  >>> fptype = fpfamily(fpSize=1024)   # use a non-default value
  >>> fptype.get_type()
  'RDKit-Fingerprint/2  minPath=1 maxPath=7 fpSize=1024 nBitsPerHash=2 useHs=1'
  >>> mol = fptype.toolkit.parse_molecule("c1ccccc1O", "smistring")
  >>> fptype.compute_fingerprint(mol)
  '\x04\x00\x00\x00\x00\x00\x10\x00\x00\x00  ... x00\x00\x00\x00\x00'


FingerprintFamily
-----------------

.. py:class:: FingerprintFamily

   A FingerprintFamily is used to create a FingerprintType or get information about its parameters
   
   Two reasons to use a FingerprintFamily (instead of using
   :func:`chemfp.get_fingerprint_type` or :func:`chemfp.get_fingerprint_type_from_text_settings`) are:
   
   * figure out the default arguments;
   * given a text settings or parameter dictionary, use the keys from the default
     argument keys to remove other parameters before creating a FingerprintType
     (otherwise the creation function will raise an exception)
   
   All fingerprint families have the following attributes:
   
   * name - the type name, including version
   * toolkit - the toolkit API for the underlying chemistry toolkit, or None


  .. py:method:: __repr__()

     Return a string like 'FingerprintFamily(<RDKit-Fingerprint/2>)'


  .. py:attribute:: FingerprintFamily.name

     Read-only attribute.

     The full fingerprint name, including the version


  .. py:attribute:: FingerprintFamily.base_name

     Read-only attribute.

     The base fingerprint name, without the version


  .. py:attribute:: FingerprintFamily.version

     Read-only attribute.

     The fingerprint version


  .. py:attribute:: FingerprintFamily.toolkit

     Read-only attribute.

     The toolkit used to implement this fingerprint, or None


  .. py:method:: __call__(**fingerprint_kwargs)

     Create a fingerprint type; keyword arguments can override the defaults
     
     The argument values are native Python values, not string-encoded values::
     
       >>> import chemfp
       >>> family = chemfp.get_fingerprint_family("RDKit-Fingerprint")
       >>> fptype = family()
       >>> fptype.get_type()
       'RDKit-Fingerprint/2 minPath=1 maxPath=7 fpSize=2048 nBitsPerHash=2 useHs=1'
       >>> fptype = family(fpSize=1024)
       >>> fptype.get_type()
       'RDKit-Fingerprint/2 minPath=1 maxPath=7 fpSize=1024 nBitsPerHash=2 useHs=1'
     
     The function will raise an exception for unknown arguments.
     
     :param fingerprint_kwargs: the fingerprint parameters
     :returns: an object implementing the :class:`chemfp.types.FingerprintType` API


  .. py:method:: from_kwargs(fingerprint_kwargs=None)

     Create a fingerprint type; items in the *fingerprint_kwargs* dictionary can override the defaults
     
     The dictionary values are native Python values, not string-encoded values::
     
       >>> import chemfp
       >>> family = chemfp.get_fingerprint_family("RDKit-Fingerprint")
       >>> fptype = family()
       >>> fptype.get_type()
       'RDKit-Fingerprint/2 minPath=1 maxPath=7 fpSize=2048 nBitsPerHash=2 useHs=1'
       >>> fptype = family.from_kwargs({"fpSize": 1024})
       >>> fptype.get_type()
       'RDKit-Fingerprint/2 minPath=1 maxPath=7 fpSize=1024 nBitsPerHash=2 useHs=1'
     
     The function will raise an exception for unknown arguments.
     
     :param fingerprint_kwargs: the fingerprint parameters
     :type fingerprint_kwargs: a dictionary where the values are Python objects
     :returns: an object implementing the :class:`chemfp.types.FingerprintType` API


  .. py:method:: from_text_settings(settings=None)

     Create a fingerprint type; *settings* is a dictionary with string-encoded value that can override the defaults
     
     The dictionary values are string-encoded values, not native Python values.
     This function exists to help handle command-line arguments and setting files.::
     
       >>> import chemfp
       >>> family = chemfp.get_fingerprint_family("RDKit-Fingerprint")
       >>> fptype = family.from_text_settings()
       >>> fptype.get_type()
       'RDKit-Fingerprint/2 minPath=1 maxPath=7 fpSize=2048 nBitsPerHash=2 useHs=1'
       >>> fptype = family.from_text_settings({"fpSize": "1024"})
       >>> fptype.get_type()
       'RDKit-Fingerprint/2 minPath=1 maxPath=7 fpSize=1024 nBitsPerHash=2 useHs=1'
     
     The function will raise an exception for unknown arguments.
     
     :param settings: the fingerprint text settings
     :type settings: a dictionary where the values are string-encoded
     :returns: an object implementing the :class:`chemfp.types.FingerprintType` API


  .. py:method:: get_kwargs_from_text_settings(settings=None)

     Convert a dictionary of string-encoded fingerprint parameters into native Python values
     
     String-encoded values ("text settings") can come from the command-line,
     a configuration file, a web reqest, or other text sources. The fingerprint
     types need actual Python values. This method converts the first to the second::
     
       >>> import chemfp
       >>> family = chemfp.get_fingerprint_family("RDKit-Fingerprint")
       >>> family.get_kwargs_from_text_settings()
       {'maxPath': 7, 'fpSize': 2048, 'nBitsPerHash': 2, 'minPath': 1, 'useHs': 1}
       >>> family.get_kwargs_from_text_settings({"fpSize": "128", "maxPath": "5"})
       {'maxPath': 5, 'fpSize': 128, 'nBitsPerHash': 2, 'minPath': 1, 'useHs': 1}
     
     :param settings: the fingerprint text settings
     :type settings: a dictionary where the values are string-encoded
     :returns: an dictionary of (decoded) fingerprint parameters


  .. py:method:: get_defaults()

     Return the default parameters as a dictionary
     
     The dictionary values are native Python objects::
     
       >>> import chemfp
       >>> family = chemfp.get_fingerprint_family("RDKit-Fingerprint")
       >>> family.get_defaults()
       {'maxPath': 7, 'fpSize': 2048, 'nBitsPerHash': 2, 'minPath': 1, 'useHs': 1}
     
     :returns: an dictionary of fingerprint parameters


.. :py:module:: chemfp.types


FingerprintType
---------------

.. py:class:: FingerprintType

   The base to all fingerprint types
   
   A fingerprint type has the following public attributes:
   
   .. py:attribute:: name
   
     the fingerprint name, including the version
   
   .. py:attribute:: base_name
   
     the fingerprint name, without the version
   
   .. py:attribute:: version
   
     the fingerprint version
   
   .. py:attribute:: toolkit
   
      the toolkit API for the underlying chemistry toolkit, or None
   
   .. py:attribute:: software
   
      a string which characterizes the toolkit, including version information
   
   .. py:attribute:: num_bits
   
      the number of bits in this fingerprint type
      
   .. py:attribute:: fingerprint_kwargs
   
      a dictionary of the fingerprint arguments
   
   The built-in fingerprint types are:
   
   * :class:`chemfp.openbabel_types.OpenBabelFP2FingerprintType_v1` - ``OpenBabel-FP2/1`` -
     Open Babel FP2
   * :class:`chemfp.openbabel_types.OpenBabelFP3FingerprintType_v1` - ``OpenBabel-FP3/1`` -
     Open Babel FP3
   * :class:`chemfp.openbabel_types.OpenBabelFP4FingerprintType_v1` - ``OpenBabel-FP4/1`` -
     Open Babel FP4
   * :class:`chemfp.openbabel_types.OpenBabelMACCSFingerprintType_v1` - ``OpenBabel-MACCS/1`` -
     Open Babel 166 MACCS keys
   * :class:`chemfp.openbabel_types.OpenBabelMACCSFingerprintType_v2` - ``OpenBabel-MACCS/2`` -
     Open Babel 166 MACCS keys
   * :class:`chemfp.openbabel_patterns.SubstructOpenBabelFingerprinter_v1` - ``ChemFP-Substruct-OpenBabel/1`` -
     chemfp's 881 CACTVS/PubChem-like keys implemented with Open Babel
   * :class:`chemfp.openbabel_patterns.RDMACCSOpenBabelFingerprinter_v1` - ``RDMACCS-OpenBabel/1`` -
     chemfp's own 166 MACCS keys implemented with Open Babel (does not include key 44)
   * :class:`chemfp.openbabel_patterns.RDMACCSOpenBabelFingerprinter_v2` - ``RDMACCS-OpenBabel/1`` -
     chemfp's own 166 MACCS keys implemented with Open Babel
   
   * :class:`chemfp.openeye_types.OpenEyeCircularFingerprintType_v2` - ``OpenEye-Circular/2`` -
     OEGraphSim circular fingerprints
   * :class:`chemfp.openeye_types.OpenEyeMACCSFingerprintType_v2` - ``OpenEye-MACCS166/2`` -
     OEGraphSim 166 MACCS keys
   * :class:`chemfp.openeye_types.OpenEyePathFingerprintType_v2` - ``OpenEye-Path/2`` -
     OEGraphSim path fingerprints
   * :class:`chemfp.openeye_types.OpenEyeTreeFingerprintType_v2` - ``OpenEye-Tree/2`` -
     OEGraphSim tree fingerprints
   
   * :class:`chemfp.openeye_patterns.SubstructOpenEyeFingerprinter_v1` - ``ChemFP-Substruct-OpenEye/1`` -
     chemfp's 881 CACTVS/PubChem-like keys implemented with OEChem
   * :class:`chemfp.openeye_patterns.RDMACCSOpenEyeFingerprinter_v1` - ``RDMACCS-OpenEye/1`` -
     chemfp's own 166 MACCS keys implemented with OEChem (does not include key 44)
   * :class:`chemfp.openeye_patterns.RDMACCSOpenEyeFingerprinter_v2` - ``RDMACCS-OpenEye/2`` -
     chemfp's own 166 MACCS keys implemented with OEChem
   
   * :class:`chemfp.rdkit_types.RDKitFingerprintType_v1` - RDKit-Fingerprint/1 - RDKit path and tree fingerprint
   * :class:`chemfp.rdkit_types.RDKitFingerprintType_v2` - RDKit-Fingerprint/2 - RDKit path and tree fingerprint
   * :class:`chemfp.rdkit_types.RDKitMACCSFingerprintType_v1` - ``RDKit-MACCS/1`` -
     RDKit 166 MACCS keys (does not include key 44)
   * :class:`chemfp.rdkit_types.RDKitMACCSFingerprintType_v2` - ``RDKit-MACCS/2`` -
     RDKit 166 MACCS keys
   * :class:`chemfp.rdkit_types.RDKitMorganFingerprintType_v1` - ``RDKit-Morgan/1`` -
     RDKit circular fingerprints
   * :class:`chemfp.rdkit_types.RDKitAtomPairFingerprint_v1` - ``RDKit-AtomPair/1`` -
     RDKit atom pair fingerprints
   * :class:`chemfp.rdkit_types.RDKitAtomPairFingerprint_v2` - ``RDKit-AtomPair/2`` -
     RDKit atom pair fingerprints
   * :class:`chemfp.rdkit_types.RDKitTorsionFingerprintType_v1` - ``RDKit-Torsion/1`` -
     RDKit torsion fingerprints
   * :class:`chemfp.rdkit_types.RDKitTorsionFingerprintType_v2` - ``RDKit-Torsion/2`` -
     RDKit torsion fingerprints
   * :class:`chemfp.rdkit_types.RDKitTorsionFingerprintType_v3` - ``RDKit-Torsion/3`` -
     RDKit torsion fingerprints
   
   * :class:`chemfp.rdkit_patterns.SubstructRDKitFingerprintType_v1` - ``ChemFP-Substruct-RDKit/1`` -
     chemfp's 881 CACTVS/PubChem-like keys implemented with RDKit
   * :class:`chemfp.rdkit_patterns.RDMACCSRDKitFingerprinter_v1` - ``RDMACCS-RDKit/1`` -
     chemfp's own 166 MACCS keys implemented with OEChem (does not include key 44)
   * :class:`chemfp.rdkit_patterns.RDMACCSRDKitFingerprinter_v2` - ``RDMACCS-RDKit/2`` -
     chemfp's own 166 MACCS keys implemented with OEChem


  .. py:method:: get_type()

     Get the full type string (name and parameters) for this fingerprint type
     
     :returns: a canonical fingerprint type string, including its parameters


  .. py:method:: get_metadata(sources=None)

     Return a Metadata appropriate for the given fingerprint type.
     
     This is most commonly used to make a :class:`chemfp.Metadata`
     that can be passed into a :class:`chemfp.FingerprintWriter`.
     
     If *sources* is a string or a list of strings then it will passed
     to the newly created Metadata instance. It should contain filenames
     or other description of the fingerprint sources.
     
     :param sources: fingerprint source filenames or other description
     :type sources: None, a string, or list of strings
     :returns: a :class:`chemfp.Metadata`


  .. py:method:: make_fingerprinter()

     Make a 'fingerprinter'; a callable which takes a molecule and returns a fingerprint
     
     :returns: a function object which takes a molecule and return a fingerprint


  .. py:method:: read_molecule_fingerprints(source, format=None, id_tag=None, reader_args=None, errors="strict", location=None)

     Read fingerprints from a structure source as a FingerprintIterator
     
     Iterate through the *format* structure records in *source*. If *format*
     is None then auto-detect the format based on the *source*. Use the
     fingerprint type to compute the fingerprint. For SD files, use *id_tag*
     to get the record id from the given SD tag instead of the title line.
     
     The *reader_args* dictionary parameters depend on the toolkit and format.
     For details see the docstring for ``self.toolkit.read_molecules``.
     
     The *errors* parameter specifies how to handle errors. "strict" raises
     an exception, "report" sends a message to stderr and goes to the next
     record, and "ignore" goes to the next record.
     
     The *location* parameter takes a Location instance. If None then a default
     Location will be created.
     
     
     :param source: the structure source
     :type source: a filename, file object, or None to read from stdin
     :param format: the input structure format
     :type format: a format name string, or Format object, or None to auto-detect
     :param id_tag: SD tag containing the record id
     :type id_tag: string, or None to use the record title
     :param reader_args: reader parameters passed to the underlying toolkit
     :type reader_args: a dictionary
     :param errors: specify how to handle errors
     :type errors: one of "strict", "report", or "ignore"
     :param location: object used to track parser state information
     :type location: a Location object, or None
     :returns: a :class:`chemfp.FingerprintIterator` which iterates over the (id, fingerprint) pair


  .. py:method:: read_molecule_fingerprints_from_string(content, format=None, id_tag=None, reader_args=None, errors="strict", location=None)

     Read fingerprints from structure records in a string, as a FingerprintIterator
     
     Iterate through the *format* structure records in *content*. Use the
     fingerprint type to compute the fingerprint. For SD files, use *id_tag*
     to get the record id from the given SD tag instead of the title line.
     
     The *reader_args* dictionary parameters depend on the toolkit and format.
     For details see the docstring for ``self.toolkit.read_molecules``.
     
     The *errors* parameter specifies how to handle errors. "strict" raises
     an exception, "report" sends a message to stderr and goes to the next
     record, and "ignore" goes to the next record.
     
     The *location* parameter takes a Location instance. If None then a default
     Location will be created.
     
     :param content: the string containing structure records
     :type source: a string
     :param format: the input structure format
     :type format: a format name string, or Format object
     :param id_tag: SD tag containing the record id
     :type id_tag: string, or None to use the record title
     :param reader_args: reader parameters passed to the underlying toolkit
     :type reader_args: a dictionary
     :param errors: specify how to handle errors
     :type errors: one of "strict", "report", or "ignore"
     :param location: object used to track parser state information
     :type location: a Location object, or None
     :returns: a :class:`chemfp.FingerprintIterator` which iterates over the (id, fingerprint) pair


  .. py:method:: parse_molecule_fingerprint(content, format, reader_args=None, errors="strict")

     Parse the first molecule record of the content then compute and return the fingerprint
     
     Read the first molecule from *content*, which contains records
     in the given *format*. Compute and return its fingerprint.
     
     The *reader_args* dictionary parameters depend on the toolkit and format.
     For details see the docstring for ``self.toolkit.read_molecules``.
     
     The *errors* parameter specifies how to handle errors. "strict" raises
     an exception, "report" sends a message to stderr and return None for
     the fingerprint, and "ignore" returns None for the fingerprint without
     any extra message.
     
     :param content: the string containing at least one structure record
     :type source: a string
     :param format: the input structure format
     :type format: a format name string, or Format object
     :param reader_args: reader parameters passed to the underlying toolkit
     :type reader_args: a dictionary
     :param errors: specify how to handle errors
     :type errors: one of "strict", "report", or "ignore"
     :returns: the fingerprint as a byte string


  .. py:method:: parse_id_and_molecule_fingerprint(content, format, id_tag=None, reader_args=None, errors="strict")

     Parse the first molecule record of the content then compute and return the id and fingerprint
     
     Read the first molecule from *content*, which contains records in
     the given *format*. Compute its fingerprint and get the molecule id.
     For an SD record use *id_tag* to get the record id from the given
     SD tag instead of from the title line.
     
     Return the id and fingerprint as the (id, fingerprint) pair.
     
     The *reader_args* dictionary parameters depend on the toolkit and format.
     For details see the docstring for ``self.toolkit.read_molecules``.
     
     The *errors* parameter specifies how to handle errors. "strict" raises
     an exception, "report" sends a message to stderr and return None for
     values it cannot compute, and "ignore" is like "report" but without
     the error message. For "report" and "ignore", if the molecule cannot
     be parsed then the result will be (None, None). If the fingerprint
     cannot be computed then the result will be (id, None).
     
     :param content: the string containing at least one structure record
     :type source: a string
     :param format: the input structure format
     :type format: a format name string, or Format object
     :param id_tag: SD tag containing the record id
     :type id_tag: string, or None to use the record title
     :param reader_args: reader parameters passed to the underlying toolkit
     :type reader_args: a dictionary
     :param errors: specify how to handle errors
     :type errors: one of "strict", "report", or "ignore"
     :returns: a pair of (id string, fingerprint byte string)


  .. py:method:: make_id_and_molecule_fingerprint_parser(format, id_tag=None, reader_args=None, errors="strict")

     Make a function which parses molecule from a record and returns the id and computed fingerprint
     
     This is a very specialized function, designed for performance, but it
     doesn't appear to give any advantage. You likely don't need it.
     
     Return a function which parses a content string containing structure
     records in the given *format* to get a molecule. Use the molecule to
     compute the fingerprint and get its id. For an SD record use *id_tag*
     to get the record id from the given SD tag instead of from the
     title line.
     
     The new function will return the (id, fingerprint) pair.
     
     The *reader_args* dictionary parameters depend on the toolkit and format.
     For details see the docstring for ``self.toolkit.read_molecules``.
     
     The *errors* parameter specifies how to handle errors. "strict" raises
     an exception, "report" sends a message to stderr and return None for
     values it cannot compute, and "ignore" is like "report" but without
     the error message. For "report" and "ignore", if the molecule cannot
     be parsed then the result will be (None, None). If the fingerprint
     cannot be computed then the result will be (id, None).
     
     :param format: the input structure format
     :type format: a format name string, or Format object
     :param id_tag: SD tag containing the record id
     :type id_tag: string, or None to use the record title
     :param reader_args: reader parameters passed to the underlying toolkit
     :type reader_args: a dictionary
     :param errors: specify how to handle errors
     :type errors: one of "strict", "report", or "ignore"
     :returns: a function which takes a content string and returns an (id, fingerprint) pair


  .. py:method:: compute_fingerprint(mol)

     Compute and return the fingerprint byte string for the toolkit molecule
     
     :param mol: a toolkit molecule
     :returns: the fingerprint as a byte string


  .. py:method:: compute_fingerprints(mols)

     Compute and return the fingerprint for each toolkit molecule in an iterator
     
     This function is a slightly optimized version of::
     
       for mol in mols:
         yield self.compute_fingerprint(mol)
     
     :param mols: an iterable of toolkit molecules
     :returns: a generator of fingerprints, one per molecule


  .. py:method:: get_fingerprint_family()

     Return the fingerprint family for this fingerprint type
     
     :returns: a :class:`FingerprintFamily`


Open Babel fingerprints
-----------------------

Open Babel implements four fingerprints families and chemfp implements
two fingerprint families using the Open Babel toolkit. These are:

* OpenBabel-FP2 - Indexes linear fragments up to 7 atoms.
* OpenBabel-FP3 - SMARTS patterns specified in the file patterns.txt
* OpenBabel-FP4 - SMARTS patterns specified in the file SMARTS_InteLigand.txt
* OpenBabel-MACCS - SMARTS patterns specified in the file MACCS.txt, which
  implements nearly all of the 166 MACCS keys
* RDMACCS-OpenBabel - a chemfp implementation of nearly all of the
  MACCS keys
* ChemFP-Substruct-OpenBabel - an experimental chemfp implementation
  of the PubChem keys

Most people use FP2 and MACCS.

Note: chemfp-2.0 implements both RDMACCS-OpenBabel/1 and
RDMACCS-OpenBabel/2. Version 1 did not have a definition for key 44.


.. py:module:: chemfp.openbabel_types


OpenBabelFP2FingerprintType_v1
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. py:class:: OpenBabelFP2FingerprintType_v1

   OpenBabel FP2 fingerprint based on path enumeration
   
   See http://openbabel.org/wiki/FP2
   
   This is a Daylight-like path enumeration fingerprint with 1021 bits.
   
   The OpenBabel-FP2/1 :class:`.FingerprintType` has no parameters.


OpenBabelFP3FingerprintType_v1
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. py:class:: OpenBabelFP3FingerprintType_v1

   OpenBabel FP3 fingerprint
   
   See http://openbabel.org/wiki/FP3
   
   55 bit fingerprints based on a set of SMARTS patterns defining functional groups.
   
   The OpenBabel-FP3/1 :class:`.FingerprintType` has no parameters.


OpenBabelFP4FingerprintType_v1
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. py:class:: OpenBabelFP4FingerprintType_v1

   OpenBabel FP4 fingerprint
   
   http://openbabel.org/wiki/FP4
   
   307 bit fingerprints based on a set of SMARTS patterns defining functional groups.
   
   The OpenBabel-FP4/1 :class:`.FingerprintType` has no parameters.


OpenBabelMACCSFingerprintType_v1
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. py:class:: OpenBabelMACCSFingerprintType_v1

   Open Babel's implementation of the 166 MACCS keys
   
   WARNING: This implementation contains serious bugs! All of
   the ring sizes are wrong.
   
   See http://openbabel.org/wiki/Tutorial:Fingerprints and
   https://github.com/openbabel/openbabel/blob/master/data/MACCS.txt .
   
   The OpenBabel-MACCS/1 :class:`.FingerprintType` has no parameters.
   
   Note: this version is only available in older (pre-2012) versions of Open Babel.


OpenBabelMACCSFingerprintType_v2
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. py:class:: OpenBabelMACCSFingerprintType_v2

   Open Babel's implementation of the 166 MACCS keys
   
   See http://openbabel.org/wiki/Tutorial:Fingerprints and
   https://github.com/openbabel/openbabel/blob/master/data/MACCS.txt .
   
   Note: Open Babel added support for key 44 on 20 October 2014.  This
   should have been version 3. However, I didn't notice until 1 May
   2017 that there was no chemfp test for it. Since everyone has been
   using it as v2, and very few people used the older version, I
   won't change the version number.
   
   The OpenBabel-MACCS/2 :class:`.FingerprintType` has no parameters.


OpenBabelECFP0FingerprintType_v1
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. py:class:: OpenBabelECFP0FingerprintType_v1

   Open Babel's implementation of the ECFP0 fingerprint
   
   This is a circular fingerprint of diameter 0.
   
   The OpenBabel-ECFP0/1 :class:`.FingerprintType` parameter is:
   
   * nBits - the number of bits in the fingerprint (default: 4096 and
       must be a power of 2)


OpenBabelECFP2FingerprintType_v1
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. py:class:: OpenBabelECFP2FingerprintType_v1

   Open Babel's implementation of the ECFP2 fingerprint
   
   This is a circular fingerprint of diameter 2.
   
   The OpenBabel-ECFP2/1 :class:`.FingerprintType` parameter is:
   
   * nBits - the number of bits in the fingerprint (default: 4096 and
       must be a power of 2)


OpenBabelECFP4FingerprintType_v1
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. py:class:: OpenBabelECFP4FingerprintType_v1

   Open Babel's implementation of the ECFP4 fingerprint
   
   This is a circular fingerprint of diameter 4.
   
   The OpenBabel-ECFP4/1 :class:`.FingerprintType` parameter is:
   
   * nBits - the number of bits in the fingerprint (default: 4096 and
       must be a power of 2)


OpenBabelECFP6FingerprintType_v1
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. py:class:: OpenBabelECFP6FingerprintType_v1

   Open Babel's implementation of the ECFP6 fingerprint
   
   This is a circular fingerprint of diameter 6.
   
   The OpenBabel-ECFP6/1 :class:`.FingerprintType` parameter is:
   
   * nBits - the number of bits in the fingerprint (default: 4096 and
       must be a power of 2)


OpenBabelECFP8FingerprintType_v1
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. py:class:: OpenBabelECFP8FingerprintType_v1

   Open Babel's implementation of the ECFP8 fingerprint
   
   This is a circular fingerprint of diameter 8.
   
   The OpenBabel-ECFP8/1 :class:`.FingerprintType` parameter is:
   
   * nBits - the number of bits in the fingerprint (default: 4096 and
       must be a power of 2)


OpenBabelECFP10FingerprintType_v1
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. py:class:: OpenBabelECFP10FingerprintType_v1

   Open Babel's implementation of the ECFP10 fingerprint
   
   This is a circular fingerprint of diameter 10.
   
   The OpenBabel-ECFP10/1 :class:`.FingerprintType` parameter is:
   
   * nBits - the number of bits in the fingerprint (default: 4096 and
       must be a power of 2)


.. py:module:: chemfp.openbabel_patterns


SubstructOpenBabelFingerprinter_v1
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. py:class:: SubstructOpenBabelFingerprinter_v1

   chemfp's Substruct fingerprint implementation for OEChem, version 1
   
   WARNING: these fingerprints have not been validated.
   
   The Substruct fingerprints are CACTVS/PubChem-like fingerprints designed
   for use across multiple toolkits.
   
   The ChemFP-Substruct-OpenBabel/1 :class:`.FingerprintType` has no parameters.


RDMACCSOpenBabelFingerprinter_v1
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. py:class:: RDMACCSOpenBabelFingerprinter_v1

   chemfp's RDMACCS fingerprint implementation for Open Babel, version 1
   
   The RDMACSS keys are MACCS-166-like fingerprints based on RDKit's
   MACCS116 definition, but designed to be (slightly) more portable
   across multiple chemistry toolkits.
   
   This version does not define key 44.
   
   The RDMACSS-OpenBabel/1 :class:`.FingerprintType` has no parameters.


RDMACCSOpenBabelFingerprinter_v2
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. py:class:: RDMACCSOpenBabelFingerprinter_v2

   chemfp's RDMACCS fingerprint implementation for Open Babel, version 2
   
   The RDMACSS keys are MACCS-166-like fingerprints based on RDKit's
   MACCS116 definition, but designed to be (slightly) more portable
   across multiple chemistry toolkits.
   
   This version defines key 44.
   
   The RDMACSS-OpenBabel/2 :class:`.FingerprintType` has no parameters.


OpenEye fingerprints
--------------------


OpenEye's OEGraphSim library implements four bitstring-based
fingerprint families, and chemfp implements two fingerprint families
based on OEChem. These are:

* OpenEye-Path - exhaustive enumeration of all linear fragments
  up to a given size
* OpenEye-Circular - exhaustive enumeration of all circular
  fragments grown radially from each heavy atom up to a given radius
* OpenEye-Tree - exhaustive enumeration of all trees up to
  a given size
* OpenEye-MACCS166 - an implementation of the 166 MACCS keys
* RDMACCS-OpenEye - a chemfp implementation of the 166 MACCS keys
* ChemFP-Substruct-OpenEye - an experimental chemfp implementation
  of the PubChem keys

Note: chemfp-2.0 implements both RDMACCS-OpenEye/1 and
RDMACCS-OpenEye/2. Version 1 did not have a definition for key 44.


.. py:module:: chemfp.openeye_types


OpenEyeCircularFingerprintType_v2
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. py:class:: OpenEyeCircularFingerprintType_v2

   OEGraphSim fingerprint based on circular fingerprints around heavy atoms, version 2
   
   See https://docs.eyesopen.com/toolkits/cpp/graphsimtk/fingerprint.html#section-fingerprint-circular
   
   The OpenEye-Circular/2 :class:`.FingerprintType` parameters are:
   
   * numbits - the number of bits in the fingerprint (default: 4096)
   * minradius - the minimum radius (default: 0)
   * maxradius - the maximum radius (default: 5)
   * atype - the atom type (default: "Default")
   * btype - the bond type (default: "Default")
   
   The atype is either 0 or a '|' separated string containing one
   or more of the following: Aromaticity, AtomicNumber, Chiral,
   EqHBondAcceptor, EqHBondDonor, EqHalogen, FormalCharge, HCount,
   HvyDegree, Hybridization, InRing, EqAromatic,
   
   The btype is either 0 or a '|' separated string containing one
   or more of the following: BondOrder, Chiral, InRing.


OpenEyeMACCSFingerprintType_v2
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. py:class:: OpenEyeMACCSFingerprintType_v2

   OEGraphSim implementation of the 166 MACCS keys, version 2
   
   See https://docs.eyesopen.com/toolkits/cpp/graphsimtk/fingerprint.html#maccs .
   
   The OpenEye-MACCS166/2 :class:`.FingerprintType` has no parameters.
   
   This corresponds to GraphSim version '2.0.0'.


OpenEyeMACCSFingerprintType_v3
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. py:class:: OpenEyeMACCSFingerprintType_v3

   OEGraphSim implementation of the 166 MACCS keys, version 3
   
   See https://docs.eyesopen.com/toolkits/cpp/graphsimtk/fingerprint.html#maccs .
   
   The OpenEye-MACCS166/3 :class:`.FingerprintType` has no parameters.
   
   This corresponds to GraphSim version '2.2.0', with fixes for bits 91 and 92.


OpenEyePathFingerprintType_v2
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. py:class:: OpenEyePathFingerprintType_v2

   OEGraphSim fingerprint based on path-based enumeration, version 2
   
   See https://docs.eyesopen.com/toolkits/cpp/graphsimtk/fingerprint.html#section-fingerprint-path
   
   The OpenEye-Path/2 :class:`.FingerprintType` parameters are:
   
   * numbits - the number of bits in the fingerprint (default: 4096)
   * minbonds - the minimum number of bonds (default: 0)
   * maxbonds - the maximum number of bonds (default: 5)
   * atype - the atom type (default: "Default")
   * btype - the bond type (default: "Default")
   
   The atype is either 0 or a '|' separated string containing one
   or more of the following: Aromaticity, AtomicNumber, Chiral,
   EqHBondAcceptor, EqHBondDonor, EqHalogen, FormalCharge, HCount,
   HvyDegree, Hybridization, InRing, EqAromatic,
   
   The btype is either 0 or a '|' separated string containing one
   or more of the following: BondOrder, Chiral, InRing.


OpenEyeTreeFingerprintType_v2
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. py:class:: OpenEyeTreeFingerprintType_v2

   OEGraphSim fingerprint based on tree fingerprints, version 2
   
   See https://docs.eyesopen.com/toolkits/cpp/graphsimtk/fingerprint.html#section-fingerprint-tree
   
   The OpenEye-Tree/2 :class:`.FingerprintType` parameters are:
   
   * numbits - the number of bits in the fingerprint (default: 4096)
   * minbonds - minimum number of bonds in the tree
   * maxbonds - maximum number of bonds in the tree
   * atype - the atom type (default: "Default")
   * btype - the bond type (default: "Default")
   
   The atype is either 0 or a '|' separated string containing one
   or more of the following: Aromaticity, AtomicNumber, Chiral,
   EqHBondAcceptor, EqHBondDonor, EqHalogen, FormalCharge, HCount,
   HvyDegree, Hybridization, InRing, EqAromatic,
   
   The btype is either 0 or a '|' separated string containing one
   or more of the following: BondOrder, Chiral, InRing.


OpenEyeMoleculeScreenFingerprintType_v1
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. py:class:: OpenEyeMoleculeScreenFingerprintType_v1

   OEChem molecule screen using OESubSearchScreenType::Molecule
   
   See http://https://docs.eyesopen.com/toolkits/cpp/oechemtk/OEChemClasses/OESubSearchScreen.html 
   This OpenEyeMoleculeScreenFingerprintType_v1 :class:`.FingerprintType`
   takes no parameters. Calling the fingerprinter with a QMol returns the
   query screen, calling with an OEMol returns a target screen.


OpenEyeSMARTSScreenFingerprintType_v1
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. py:class:: OpenEyeSMARTSScreenFingerprintType_v1

   OEChem SMARTS screen using OESubSearchScreenType::SMARTS
   
   See http://https://docs.eyesopen.com/toolkits/cpp/oechemtk/OEChemClasses/OESubSearchScreen.html 
   This OpenEyeSMARTSScreenFingerprintType_v1 :class:`.FingerprintType`
   takes no parameters. Calling the fingerprinter with a QMol returns the
   query screen, calling with an OEMol returns a target screen.


OpenEyeMDLScreenFingerprintType_v1
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. py:class:: OpenEyeMDLScreenFingerprintType_v1

   OEChem MDL screen using OESubSearchScreenType::MDL
   
   See http://https://docs.eyesopen.com/toolkits/cpp/oechemtk/OEChemClasses/OESubSearchScreen.html 
   This OpenEyeMDLScreenFingerprintType_v1 :class:`.FingerprintType`
   takes no parameters. Calling the fingerprinter with a QMol returns the
   query screen, calling with an OEMol returns a target screen.


.. py:module:: chemfp.openeye_patterns


SubstructOpenEyeFingerprinter_v1
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. py:class:: SubstructOpenEyeFingerprinter_v1

   chemfp's Substruct fingerprint implementation for OEChem, version 1
   
   WARNING: these fingerprints have not been validated.
   
   The Substruct fingerprints are CACTVS/PubChem-like fingerprints designed
   for use across multiple toolkits.
   
   The ChemFP-Substruct-OpenEye/1 :class:`.FingerprintType` has no parameters.


RDMACCSOpenEyeFingerprinter_v1
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. py:class:: RDMACCSOpenEyeFingerprinter_v1

   chemfp's RDMACCS fingerprint implementation for OEChem, version 1
   
   The RDMACSS keys are MACCS-166-like fingerprints based on RDKit's
   MACCS116 definition, but designed to be (slightly) more portable
   across multiple chemistry toolkits.
   
   This version does not define key 44.
   
   The RDMACSS-OpenEye/1 :class:`.FingerprintType` has no parameters.


RDMACCSOpenEyeFingerprinter_v2
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. py:class:: RDMACCSOpenEyeFingerprinter_v2

   chemfp's RDMACCS fingerprint implementation for OEChem, version 2
   
   The RDMACSS keys are MACCS-166-like fingerprints based on RDKit's
   MACCS116 definition, but designed to be (slightly) more portable
   across multiple chemistry toolkits.
   
   This version defines key 44.
   
   The RDMACSS-OpenEye/2 :class:`.FingerprintType` has no parameters.

RDKit fingerprints
------------------

RDKit implements six fingerprint families, and chemfp implements two
fingerprint families based on RDKit. These are:

* RDKit-Fingerprint - exhaustive enumeration of linear and branched trees
* RDKit-MACCS166 - The RDKit implementation of the MACCS keys
* RDKit-Morgan - EFCP-like circular fingerprints
* RDKit-AtomPair - atom pair fingerprints
* RDKit-Torsion - topological-torsion fingerprints
* RDKit-Pattern - substructure screen fingerprint
* RDMACCS-RDKit - a chemfp implementation of the 166 MACCS keys
* ChemFP-Substruct-RDKit - an experimental chemfp implementation
  of the PubChem keys

Note: chemfp-2.0 implements both RDMACCS-RDKit/1 and
RDMACCS-RDKit/2. Version 1 did not have a definition for key 44.


.. py:module:: chemfp.rdkit_types


RDKitFingerprintType_v1
^^^^^^^^^^^^^^^^^^^^^^^

.. py:class:: RDKitFingerprintType_v1

   RDKit's Daylight-like fingerprint based on linear path and branched tree enumeration, version 1
   
   See http://www.rdkit.org/Python_Docs/rdkit.Chem.rdmolops-module.html#RDKFingerprint
   
   The RDKit-Fingerprint/1 :class:`.FingerprintType` parameters are:
   
   * fpSize - number of bits in the fingerprint (default: 2048)
   * minPath - minimum number of bonds (default: 1)
   * maxPath - maximum number of bonds (default: 7)
   * nBitsPerHash - number of bits to set for each path hash (default: 2)
   * useHs - include information about the number of hydrogens on each atom? (default: True)
   
   Note: this version is only available in older (pre-2014) versions of RDKit 


RDKitFingerprintType_v2
^^^^^^^^^^^^^^^^^^^^^^^

.. py:class:: RDKitFingerprintType_v2

   RDKit's Daylight-like fingerprint based on linear path and branched tree enumeration, version 2
   
   See http://www.rdkit.org/Python_Docs/rdkit.Chem.rdmolops-module.html#RDKFingerprint
   
   The RDKit-Fingerprint/2 :class:`.FingerprintType` parameters are:
   
   * fpSize - number of bits in the fingerprint (default: 2048)
   * minPath - minimum number of bonds (default: 1)
   * maxPath - maximum number of bonds (default: 7)
   * nBitsPerHash - number of bits to set for each path hash (default: 2)
   * useHs - include information about the number of hydrogens on each atom? (default: True)
   * branchedPaths - include both branched and unbranched paths (default: True)
   * useBondOrder - use both bond orders in the path hashes (default: True)
   * fromAtoms - a comma-separated list of atom indices which must be part of the path enumeration


RDKitMACCSFingerprintType_v1
^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. py:class:: RDKitMACCSFingerprintType_v1

   RDKit's implementation of the 166 MACCS keys, version 1
   
   See http://rdkit.org/Python_Docs/rdkit.Chem.rdMolDescriptors-module.html#GetMACCSKeysFingerprint
   
   The RDKit-MACCS166/1 fingerprints have no parameters.
   
   This version of RDKit does not support MACCS key 44 ("OTHER").


RDKitMACCSFingerprintType_v2
^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. py:class:: RDKitMACCSFingerprintType_v2

   RDKit's implementation of the 166 MACCS keys, version 2
   
   See http://rdkit.org/Python_Docs/rdkit.Chem.rdMolDescriptors-module.html#GetMACCSKeysFingerprint
   
   The RDKit-MACCS166/1 fingerprints have no parameters. RDKit version
   added this version in late 2014.


RDKitMorganFingerprintType_v1
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. py:class:: RDKitMorganFingerprintType_v1

   RDKit Morgan (ECFP-like) fingerprints, version 1
   
   See http://rdkit.org/Python_Docs/rdkit.Chem.rdMolDescriptors-module.html#GetMorganFingerprintAsBitVect
   
   The RDKit-Morgan/1 :class:`.FingerprintType` parameters are:
   
   * fpSize - number of bits in the fingerprint (default: 2048)
   * radius - radius for the Morgan algorithm (default: 2)
   * useFeatures - use chemical-feature invariants (default: 0)
   * useChirality - use chirality information (default: 0)
   * useBondTypes - include bond type information (default: 1)
   * includeRedundantEnvironments - if set, the check for redundant atom
        environments will not be done (added in RDKit 2020-3) (default: 0)
   * fromAtoms - a comma-separated list of atom indices to use as centers


RDKitAtomPairFingerprint_v1
^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. py:class:: RDKitAtomPairFingerprint_v1

   RDKit atom pair fingerprints, version 1"
   
   See http://rdkit.org/Python_Docs/rdkit.Chem.rdMolDescriptors-module.html#GetHashedAtomPairFingerprintAsBitVect
   
   The RDKit-AtomPair/1 :class:`.FingerprintType` parameters are:
   
   * fpSize - number of bits in the fingerprint (default: 2048)
   * minLength - minimum bond count for a pair (default: 1)
   * maxLength - maximum bond count for a pair (default: 30)
   
   Note: this version is only available in older (pre-2012) versions of RDKit 


RDKitAtomPairFingerprint_v2
^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. py:class:: RDKitAtomPairFingerprint_v2

   RDKit atom pair fingerprints, version 2"
   
   See http://rdkit.org/Python_Docs/rdkit.Chem.rdMolDescriptors-module.html#GetHashedAtomPairFingerprintAsBitVect
   
   The RDKit-AtomPair/2 :class:`.FingerprintType` parameters are:
   
   * fpSize - number of bits in the fingerprint (default: 2048)
   * minLength - minimum bond count for a pair (default: 1 bond)
   * maxLength - maximum bond count for a pair (default: 30, max: 63)
   * nBitsPerEntry - number of bits to use in simulating counts (default: 4)
   * includeChirality - if set, chirality will be used in the atom invariants (default: 0)
   * use2D - if 1, use a 2D distance matrix, if 0 use the 3D matrix from the first
       set of conformers, or return an empty fingerprint if no conformers (default: 1)
   * fromAtoms - a comma-separated list of atom indices which must be in the pair


RDKitTorsionFingerprintType_v1
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. py:class:: RDKitTorsionFingerprintType_v1

   RDKit torsion fingerprints, version 1
   
   See http://www.rdkit.org/Python_Docs/rdkit.Chem.AtomPairs.Torsions-module.html
   
   An implementation of Topological-torsion fingerprints, as
   described in: R. Nilakantan, N. Bauman, J. S. Dixon,
   R. Venkataraghavan; "Topological Torsion: A New Molecular
   Descriptor for SAR Applications.  Comparison with Other
   Descriptors" JCICS 27, 82-85 (1987).
   
   The RDKit-Torsion/1 :class:`.FingerprintType` parameters are:
   
   * fpSize - number of bits in the fingerprint (default: 2048)
   * targetSize - number of bonds per torsion (default: 4)
   
   Note: this version is only available in older (pre-2014) versions of RDKit 


RDKitTorsionFingerprintType_v2
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. py:class:: RDKitTorsionFingerprintType_v2

   RDKit torsion fingerprints, version 2
   
   See http://www.rdkit.org/Python_Docs/rdkit.Chem.AtomPairs.Torsions-module.html
   
   An implementation of Topological-torsion fingerprints, as
   described in: R. Nilakantan, N. Bauman, J. S. Dixon,
   R. Venkataraghavan; "Topological Torsion: A New Molecular
   Descriptor for SAR Applications.  Comparison with Other
   Descriptors" JCICS 27, 82-85 (1987).
   
   The RDKit-Torsion/2 :class:`.FingerprintType` parameters are:
   
   * fpSize - number of bits in the fingerprint (default: 2048)
   * targetSize - number of bonds per torsion (default: 4)
   * nBitsPerEntry - number of bits to set per entry (default: 4)
   * includeChirality - include chirality information (default: 0)
   * fromAtoms - a comma-separated list of atom indices which must be part of the torsion


RDKitPatternFingerprint_v1
^^^^^^^^^^^^^^^^^^^^^^^^^^

.. py:class:: RDKitPatternFingerprint_v1

   RDKit's experimental substructure screen fingerprint, version 1
   
   See http://www.rdkit.org/Python_Docs/rdkit.Chem.rdmolops-module.html#PatternFingerprint
   
   The RDKit-Pattern/1 fingerprint has no parameters.


RDKitPatternFingerprint_v2
^^^^^^^^^^^^^^^^^^^^^^^^^^

.. py:class:: RDKitPatternFingerprint_v2

   RDKit's experimental substructure screen fingerprint, version 2
   
   See http://www.rdkit.org/Python_Docs/rdkit.Chem.rdmolops-module.html#PatternFingerprint
   
   The RDKit-Pattern/2 fingerprint has no parameters.


RDKitPatternFingerprint_v3
^^^^^^^^^^^^^^^^^^^^^^^^^^

.. py:class:: RDKitPatternFingerprint_v3

   RDKit's experimental substructure screen fingerprint, version 3
   
   See http://www.rdkit.org/Python_Docs/rdkit.Chem.rdmolops-module.html#PatternFingerprint
   
   The RDKit-Pattern/3 fingerprint has no parameters. This version was
   released 2017.03.1.


RDKitSECFPFingerprintType_v1
^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. py:class:: RDKitSECFPFingerprintType_v1

   SECFP fingerprints
   
   The SMILES Extended Connectivity Fingerprint, as described in:
     Probst, D., Reymond, J. A probabilistic molecular fingerprint for
     big data settings. J Cheminform 10, 66
     (2018). https://doi.org/10.1186/s13321-018-0321-8
     https://jcheminf.biomedcentral.com/articles/10.1186/s13321-018-0321-8
     
   These are circular fingerprints which encode the circular region as
   a fragment SMILES, which is then hashed to produce the fingerprint bits.
   
   The RDKit-SECFP/1 :class:`.FingerprintType` parameters are:
   
   * fpSize - number of bits in the fingerprint (default: 2048)
   * radius - analogous to the radius for the Morgan algorithm (default: 3)
   * rings - include ring membership (default: 1)
   * isomeric - use isomeric SMILES (default: 0)
   * kekulize - Kekulize the molecule and use Kekule SMILES (default: 1)
   * min_radius - minimum radius for the Morgan algorithm (default: 1)


RDKitAvalonFingerprintType_v1
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. py:class:: RDKitAvalonFingerprintType_v1

   Avalon fingerprints
   
   The Avalon Cheminformatics toolkit is available from
   https://sourceforge.net/projects/avalontoolkit/ . It is not part of
   the core RDKit distribution. Instead, RDKit has a compile-time
   option to download and include it as part of the build process.
   
   The Avalon fingerprint are described in the supplemental information
   for "QSAR - How Good Is It in Practice? Comparison of Descriptor
   Sets on an Unbiased Cross Section of Corporate Data Sets", Peter
   Gedeck, Bernhard Rohde, and Christian Bartels, J. Chem. Inf. Model.,
   2006, 46 (5), pp 1924-1936, DOI: 10.1021/ci050413p. The supplemental
   information is available from
   http://pubs.acs.org/doi/suppl/10.1021/ci050413p
   
   It uses a set of feature classes which "have been fine-tuned to
   provide good screen-out for the set of substructure queries
   encounted at Novartis while limiting redundancy." The classes are
   ATOM_COUNT, ATOM_SYMBOL_PATH, AUGMENTED_ATOM, AUGMENTED_BOND,
   HCOUNT_PAIR, HCOUNT_PATH, RING_PATH, BOND_PATH, HCOUNT_CLASS_PATH,
   ATOM_CLASS_PATH, RING_PATTERN, RING_SIZE_COUNTS, DEGREE_PATHS,
   CLASS_SPIDERS, FEATURE_PAIRS and ALL_PATTERNS.


.. py:module:: chemfp.rdkit_patterns


SubstructRDKitFingerprintType_v1
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. py:class:: SubstructRDKitFingerprintType_v1

   chemfp's Substruct fingerprint implementation for RDKit, version 1
   
   WARNING: these fingerprints have not been validated.
   
   The Substruct fingerprints are CACTVS/PubChem-like fingerprints designed
   for use across multiple toolkits.
   
   The ChemFP-Substruct-RDKit/1 :class:`.FingerprintType` has no parameters.


RDMACCSRDKitFingerprinter_v1
^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. py:class:: RDMACCSRDKitFingerprinter_v1

   chemfp's RDMACCS fingerprint implementation for RDKit, version 1
   
   The RDMACSS keys are MACCS-166-like fingerprints based on RDKit's
   MACCS116 definition, but designed to be (slightly) more portable
   across multiple chemistry toolkits.
   
   This version does not define key 44.
   
   The RDMACSS-RDKit/1 :class:`.FingerprintType` has no parameters.


RDMACCSRDKitFingerprinter_v2
^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. py:class:: RDMACCSRDKitFingerprinter_v2

   chemfp's RDMACCS fingerprint implementation for RDKit, version 2
   
   The RDMACSS keys are MACCS-166-like fingerprints based on RDKit's
   MACCS116 definition, but designed to be (slightly) more portable
   across multiple chemistry toolkits.
   
   This version defines key 44.
   
   The RDMACSS-RDKit/2 :class:`.FingerprintType` has no parameters.


chemfp.arena module
===================

There should be no reason for you to import this module yourself. It
contains the :class:`.FingerprintArena`
implementation. FingerprintArena instances are returned as part of the
public API but should not be constructed directly. Instead, use
:func:`chemfp.load_fingerprints` to create an arena.

.. py:module:: chemfp.arena


FingerprintArena
----------------

.. py:class:: FingerprintArena

   Store fingerprints in a contiguous block of memory for fast searches
   
   A fingerprint arena implements the :class:`chemfp.FingerprintReader` API.
   
   A fingerprint arena stores all of the fingerprints in a continuous
   block of memory, so the per-molecule overhead is very low.
   
   The fingerprints can be sorted by popcount, so the fingerprints
   with no bits set come first, followed by those with 1 bit, etc.
   If ``self.popcount_indices`` is a non-empty string then the string
   contains information about the start and end offsets for all the
   fingerprints with a given popcount. This information is used for
   the sublinear search methods.
   
   The public attributes are:
   
   .. py:attribute:: metadata
   
      :class:`chemfp.Metadata` about the fingerprints
   
   .. py:attribute:: ids
   
      list of identifiers, in index order
   
   .. py:attribute:: fingerprints
   
      *Added in version 3.3.*
   
      a :class:`.FingerprintList` list-like view of the fingerprints, in index order
   
   Other attributes, which might be subject to change, and which I won't fully explain, are:
     * arena - a contiguous block of memory, which contains the fingerprints
     * start_padding - number of bytes to the first fingerprint in the block
     * end_padding - number of bytes after the last fingerprint in the block
     * storage_size - number of bytes used to store a fingerprint
     * num_bytes - number of bytes in each fingerprint (must be <= storage_size)
     * num_bits - number of bits in each fingerprint
     * alignment - the fingerprint alignment
     * start - the index for the first fingerprint in the arena/subarena
     * end - the index for the last fingerprint in the arena/subarena
     * arena_ids - all of the identifiers for the parent arena
   
   The FingerprintArena is its own context manager, but it does
   nothing on context exit. The derived FPBFingerprintArena may 
   use a memory-mapped FPB file, which will be closed by the
   context manager or by an explicit call to close().


  .. py:method:: __len__()

     Number of fingerprint records in the FingerprintArena


  .. py:method:: __getitem__(i)

     Return the (id, fingerprint) pair at index i


  .. py:method:: __iter__()

     Iterate over the (id, fingerprint) contents of the arena


  .. py:method:: get_fingerprint_type()

     Get the fingerprint type object based on the metadata's type field
     
     This uses ``self.metadata.type`` to get the fingerprint type
     string then calls :func:`chemfp.get_fingerprint_type` to get and return
     a :class:`chemfp.types.FingerprintType` instance.
     
     This will raise a TypeError if there is no metadata, and
     a ValueError if the type field was invalid or the fingerprint
     type isn't available.
     
     :returns: a :class:`chemfp.types.FingerprintType`


  .. py:method:: get_fingerprint(i)

     Return the fingerprint at index *i*
     
     Raises an IndexError if index *i* is out of range.


  .. py:method:: get_by_id(id)

     Given the record identifier, return the (id, fingerprint) pair,
     
     If the *id* is not present then return None.


  .. py:method:: get_index_by_id(id)

     Given the record identifier, return the record index
     
     If the *id* is not present then return None.


  .. py:method:: get_fingerprint_by_id(id)

     Given the record identifier, return its fingerprint
     
     If the *id* is not present then return None


  .. py:method:: save(destination, format=None, level=None)

     Save the fingerprints to a given destination and format
     
     The output format is based on the *format*. If the format
     is None then the format depends on the *destination* file
     extension. If the extension isn't recognized then the
     fingerprints will be saved in "fps" format.
     
     If the output format is "fps", "fps.gz", or "fps.zst" then
     *destination* may be a filename, a file object, or None; None
     writes to stdout.
     
     If the output format is "fpb" then *destination* must be
     a filename or seekable file object. Chemfp cannot save
     to compressed FPB files.
     
     :param destination: the output destination
     :type destination: a filename, file object, or None
     :param format: the output format
     :type format: None, "fps", "fps.gz", "fps.zst", or "fpb"
     :param level: compression level when writing .gz or .zst files
     :type level: an integer, or "min", "default", or "max" for compressor-specific values
     :returns: None


  .. py:method:: iter_arenas(arena_size = 1000)

     Base class for all chemfp objects holding fingerprint records
     
     All FingerprintReader instances have a ``metadata`` attribute
     containing a Metadata and can be iteratated over to get the (id,
     fingerprint) for each record.


  .. py:method:: copy(indices=None, reorder=None)

     Create a new arena using either all or some of the fingerprints in this arena
     
     By default this create a new arena. The fingerprint data block and ids may
     be shared with the original arena, which makes this a shallow copy. If the
     original arena is a slice, or "sub-arena" of an arena, then the copy will
     allocate new space to store just the fingerprints in the slice and use its
     own list for the ids.
     
     The *indices* parameter, if not None, is an iterable which contains the
     indicies of the fingerprint records to copy. Duplicates are allowed, though
     discouraged.
     
     If *indices* are specified then the default *reorder* value of None, or
     the value True, will reorder the fingerprints for the new arena by popcount.
     This improves overall search performance. If *reorder* is False then the
     new arena will preserve the order given by the indices.
     
     If *indices* are not specified, then the default is to preserve the order
     type of the original arena. Use ``reorder=True`` to always reorder the
     fingerprints in the new arena by popcount, and ``reorder=False`` to always
     leave them in the current ordering.
     
         >>> import chemfp
         >>> arena = chemfp.load_fingerprints("pubchem_queries.fps")
         >>> arena.ids[1], arena.ids[5], arena.ids[10], arena.ids[18]
         (b'9425031', b'9425015', b'9425040', b'9425033')
         >>> len(arena)
         19
         >>> new_arena = arena.copy(indices=[1, 5, 10, 18])
         >>> len(new_arena)
         4
         >>> new_arena.ids
         [b'9425031', b'9425015', b'9425040', b'9425033']
         >>> new_arena = arena.copy(indices=[18, 10, 5, 1], reorder=False)
         >>> new_arena.ids
         [b'9425033', b'9425040', b'9425015', b'9425031']
     
     :param indices: indicies of the records to copy into the new arena
     :type indices: iterable containing integers, or None
     :param reorder: describes how to order the fingerprints
     :type reorder: True to reorder, False to leave in input order, None for default action


  .. py:method:: to_numpy_array()
     
     *Added in version 3.4.*

     Get the fingerprint bytes in a chemfp arena as NumPy uint8 array.
     
     A chemfp arena stores fingerprints in a contiguous byte string.
     This function returns a 2D NumPy array which is a view of that
     string. The array has `len(arena)` rows and `arena.storage_size`
     columns.
     
     The storage size may be larger than the minimum number of bytes
     in the fingerprint because of zero padding used to improve
     performance. For example, the 166-bit MACCS keys uses 24 bytes of
     storage when only 21 bytes are needed, because then chemfp can use
     the fast POPCNT instruction when computing the Tanimoto.
     
     To remove extra padding bytes, use NumPy indexing to copy the
     fingerprint bytes to a new array::
     
       arr[:,0:arena.num_bytes]
     
     The last column of this new array may contain padding bits if the
     number of bits in a fingerprint is not a multiple of 8.
     
     .. WARNING::
        Do not attempt to access the contents of a NumPy view of a
        FPBFingerprintArena (the arena from an FPB file) after the FPB
        file has been closed as that will likely cause a segmentation
        fault or other severe failure.
     
     :returns: a NumPy array of type uint8


  .. py:method:: to_numpy_bitarray(bitlist=None)
     
     *Added in version 3.4.*

     Get the fingerprint bits in a chemfp arena as NumPy uint8 array.
     
     This function returns a 2D NumPy array with len(arena) rows
     and one column for each bit. The default returns `arena.num_bits`
     columns, where column 0 is the first bit, etc. Use `bitlist` to
     specify the indicies of which columns to return. Negative indices
     are supported; -1 is the last bit, -2 is the second to last. Out
     of range indices raise an IndexError.
     
     :param bitlist: bit column indices to use (default: all bits)
     :type bitlist: iterable of integers
     :returns: a NumPy array of type uint8


  .. py:method:: count_tanimoto_hits_fp(query_fp, threshold=0.7)

     Count the fingerprints which are sufficiently similar to the query fingerprint
     
     Return the number of fingerprints in the arena which are
     at least *threshold* similar to the query fingerprint *query_fp*.
     
     :param query_fp: query fingerprint
     :type query_fp: byte string
     :param threshold: minimum similarity threshold (default: 0.7)
     :type threshold: float between 0.0 and 1.0, inclusive
     :returns: integer count


  .. py:method:: threshold_tanimoto_search_fp(query_fp, threshold=0.7)

     Find the fingerprints which are sufficiently similar to the query fingerprint
     
     Find all of the fingerprints in this arena which are at least
     *threshold* similar to the query fingerprint *query_fp*.  The
     hits are returned as a :class:`.SearchResult`, in arbitrary
     order.
     
     :param query_fp: query fingerprint
     :type query_fp: byte string
     :param threshold: minimum similarity threshold (default: 0.7)
     :type threshold: float between 0.0 and 1.0, inclusive
     :returns: a :class:`.SearchResult`


  .. py:method:: knearest_tanimoto_search_fp(query_fp, k=3, threshold=0.7)

     Find the k-nearest fingerprints which are sufficiently similar to the query fingerprint
     
     Find all of the fingerprints in this arena which are at least
     *threshold* similar to the query fingerprint, and of those, select
     the top *k* hits. The hits are returned as a :class:`.SearchResult`,
     sorted from highest score to lowest.
     
     :param queries: query fingerprints
     :type queries: a :class:`.FingerprintArena`
     :param threshold: minimum similarity threshold (default: 0.7)
     :type threshold: float between 0.0 and 1.0, inclusive
     :returns: a :class:`.SearchResult`


  .. py:method:: count_tversky_hits_fp(query_fp, threshold=0.7, alpha=1.0, beta=1.0)

     Count the fingerprints which are sufficiently similar to the query fingerprint
     
     Return the number of fingerprints in the arena which are
     at least *threshold* similar to the query fingerprint *query_fp*.
     
     :param query_fp: query fingerprint
     :type query_fp: byte string
     :param threshold: minimum similarity threshold (default: 0.7)
     :type threshold: float between 0.0 and 1.0, inclusive
     :returns: integer count


  .. py:method:: threshold_tversky_search_fp(query_fp, threshold=0.7, alpha=1.0, beta=1.0)

     Find the fingerprints which are sufficiently similar to the query fingerprint
     
     Find all of the fingerprints in this arena which are at least
     *threshold* similar to the query fingerprint *query_fp*.  The
     hits are returned as a :class:`.SearchResult`, in arbitrary
     order.
     
     :param query_fp: query fingerprint
     :type query_fp: byte string
     :param threshold: minimum similarity threshold (default: 0.7)
     :type threshold: float between 0.0 and 1.0, inclusive
     :returns: a :class:`.SearchResult`


  .. py:method:: knearest_tversky_search_fp(query_fp, k=3, threshold=0.7, alpha=1.0, beta=1.0)

     Find the k-nearest fingerprints which are sufficiently similar to the query fingerprint
     
     Find all of the fingerprints in this arena which are at least
     *threshold* similar to the query fingerprint, and of those, select
     the top *k* hits. The hits are returned as a :class:`.SearchResult`,
     sorted from highest score to lowest.
     
     :param queries: query fingerprints
     :type queries: a :class:`.FingerprintArena`
     :param threshold: minimum similarity threshold (default: 0.7)
     :type threshold: float between 0.0 and 1.0, inclusive
     :returns: a :class:`.SearchResult`


FingerprintList
---------------

.. py:class:: FingerprintList
   
   *Added in version 3.3.*

   A read-only list-like view of the arena fingerprints
   
   This implements the standard Python list API, including
   indexing and iteration.
   
   Note: fingerprint searches like "fp in fingerprint_list" and 
   "fingerprint_list.index(fp)" are not fast.


chemfp.search module
====================

.. _chemfp_search:
.. py:module:: chemfp.search


The following functions and classes are in the chemfp.search module.

There are three main classes of functions. The ones ending with
``*_fp`` use a query fingerprint to search a target arena. The ones
ending with ``*_arena`` use a query arena to search a target
arena. The ones ending with ``*_symmetric`` use arena to search
itself, except that a fingerprint is not tested against itself.


These functions share the same name with very similar functions in the
top-level :mod:`chemfp` module. My apologies for any confusion. The
top-level functions are designed to work with both arenas and
iterators as the target. They give a simple search API, and
automatically process in blocks, to give a balanced trade-off between
performance and response time for the first results.

The functions in this module only work with arena as the target. By
default it searches the entire arena before returning. If you want to
process portions of the arena then you need to specify the range
yourself.


count_tanimoto_hits_fp
----------------------

.. py:function:: count_tanimoto_hits_fp(query_fp, target_arena, threshold=0.7)

   Count the number of hits in *target_arena* at least *threshold* similar to the *query_fp*
   
   Example::
   
       query_id, query_fp = chemfp.load_fingerprints("queries.fps")[0]
       targets = chemfp.load_fingerprints("targets.fps")
       print(chemfp.search.count_tanimoto_hits_fp(query_fp, targets, threshold=0.1))
       
   
   :param query_fp: the query fingerprint
   :type query_fp: a byte string
   :param target_arena: the target arena
   :type target_fp: a :class:`FingerprintArena`
   :param threshold: The minimum score threshold.
   :type threshold: float between 0.0 and 1.0, inclusive
   :returns: an integer count


count_tanimoto_hits_arena
-------------------------

.. py:function:: count_tanimoto_hits_arena(query_arena, target_arena, threshold=0.7)

   For each fingerprint in *query_arena*, count the number of hits in *target_arena* at least *threshold* similar to it
   
   Example::
   
       queries = chemfp.load_fingerprints("queries.fps")
       targets = chemfp.load_fingerprints("targets.fps")
       counts = chemfp.search.count_tanimoto_hits_arena(queries, targets, threshold=0.1)
       print(counts[:10])
   
   The result is implementation specific. You'll always be able to
   get its length and do an index lookup to get an integer
   count. Currently it's a `ctypes array of longs <https://docs.python.org/2/library/ctypes.html#arrays>`_,
   but it could be an `array.array <https://docs.python.org/2/library/array.html>`_
   or Python list in the future.
   
   :param query_arena: The query fingerprints.
   :type query_arena: a :class:`chemfp.arena.FingerprintArena`
   :param target_arena: The target fingerprints.
   :type target_arena: a :class:`chemfp.arena.FingerprintArena`
   :param threshold: The minimum score threshold.
   :type threshold: float between 0.0 and 1.0, inclusive
   :returns: an array of counts


count_tanimoto_hits_symmetric
-----------------------------

.. py:function:: count_tanimoto_hits_symmetric(arena, threshold=0.7, batch_size=100)

   For each fingerprint in the *arena*, count the number of other fingerprints at least *threshold* similar to it
   
   A fingerprint never matches itself.
   
   The computation can take a long time. Python won't check check for
   a ``^C`` until the function finishes. This can be irritating. Instead,
   process only *batch_size* rows at a time before checking for a ``^C``.
   
   Note: the *batch_size* may disappear in future versions of chemfp.
   I can't detect any performance difference between the current value
   and a larger value, so it seems rather pointless to have. Let me
   know if it's useful to keep as a user-defined parameter.
   
   Example::
   
       arena = chemfp.load_fingerprints("targets.fps")
       counts = chemfp.search.count_tanimoto_hits_symmetric(arena, threshold=0.2)
       print(counts[:10])
   
   The result object is implementation specific. You'll always be able to
   get its length and do an index lookup to get an integer
   count. Currently it's a ctype array of longs, but it could be an
   array.array or Python list in the future.
   
   :param arena: the set of fingerprints
   :type arena: a :class:`chemfp.arena.FingerprintArena`
   :param threshold: The minimum score threshold.
   :type threshold: float between 0.0 and 1.0, inclusive
   :param batch_size: the number of rows to process before checking for a ``^C``
   :type batch_size: integer
   :returns: an array of counts
 

partial_count_tanimoto_hits_symmetric
-------------------------------------

.. py:function:: partial_count_tanimoto_hits_symmetric(counts, arena, threshold=0.7, query_start=0, query_end=None, target_start=0, target_end=None)

   Compute a portion of the symmetric Tanimoto counts
   
   For most cases, use :func:`chemfp.search.count_tanimoto_hits_symmetric`
   instead of this function!
   
   This function is only useful for thread-pool implementations. In
   that case, set the number of OpenMP threads to 1.
   
   *counts* is a contiguous array of integers. It should be
   initialized to zeros, and reused for successive calls.
   
   The function adds counts for counts[*query_start*:*query_end*] based
   on computing the upper-triangle portion contained in the rectangle
   *query_start*:*query_end* and *target_start*:target_end* and using
   symmetry to fill in the lower half.
   
   You know, this is pretty complicated. Here's the bare minimum
   example of how to use it correctly to process 10 rows at a time
   using up to 4 threads::
   
       import chemfp
       import chemfp.search
       from chemfp import futures
       import array
       
       chemfp.set_num_threads(1)  # Globally disable OpenMP
       
       arena = chemfp.load_fingerprints("targets.fps")  # Load the fingerprints
       n = len(arena)
       counts = array.array("i", [0]*n)
       
       with futures.ThreadPoolExecutor(max_workers=4) as executor:
           for row in xrange(0, n, 10):
               executor.submit(chemfp.search.partial_count_tanimoto_hits_symmetric,
                               counts, arena, threshold=0.2,
                               query_start=row, query_end=min(row+10, n))
       
       print(counts)
   
   :param counts: the accumulated Tanimoto counts
   :type counts: a contiguous block of integer
   :param arena: the fingerprints.
   :type arena: a :class:`chemfp.arena.FingerprintArena`
   :param threshold: The minimum score threshold.
   :type threshold: float between 0.0 and 1.0, inclusive
   :param query_start: the query start row
   :type query_start: an integer
   :param query_end: the query end row
   :type query_end: an integer, or None to mean the last query row
   :param target_start: the target start row
   :type target_start: an integer
   :param target_end: the target end row
   :type target_end: an integer, or None to mean the last target row
   :returns: None


count_tversky_hits_fp
---------------------

.. py:function:: count_tversky_hits_fp(query_fp, target_arena, threshold=0.7, alpha=1.0, beta=1.0)

   Count the number of hits in *target_arena*  least *threshold* similar to the *query_fp* (Tversky)
   
   Example::
   
       query_id, query_fp = chemfp.load_fingerprints("queries.fps")[0]
       targets = chemfp.load_fingerprints("targets.fps")
       print(chemfp.search.count_tversky_hits_fp(query_fp, targets, threshold=0.1))
       
   
   :param query_fp: the query fingerprint
   :type query_fp: a byte string
   :param target_arena: the target arena
   :type target_fp: a :class:`FingerprintArena`
   :param threshold: The minimum score threshold.
   :type threshold: float between 0.0 and 1.0, inclusive
   :returns: an integer count


count_tversky_hits_arena
------------------------

.. py:function:: count_tversky_hits_arena(query_arena, target_arena, threshold=0.7, alpha=1.0, beta=1.0)

   For each fingerprint in *query_arena*, count the number of hits in *target_arena* at least *threshold* similar to it
   
   Example::
   
       queries = chemfp.load_fingerprints("queries.fps")
       targets = chemfp.load_fingerprints("targets.fps")
       counts = chemfp.search.count_tversky_hits_arena(queries, targets, threshold=0.1,
                     alpha=0.5, beta=0.5)
       print(counts[:10])
   
   The result is implementation specific. You'll always be able to
   get its length and do an index lookup to get an integer
   count. Currently it's a `ctypes array of longs <https://docs.python.org/2/library/ctypes.html#arrays>`_,
   but it could be an `array.array <https://docs.python.org/2/library/array.html>`_
   or Python list in the future.
   
   :param query_arena: The query fingerprints.
   :type query_arena: a :class:`chemfp.arena.FingerprintArena`
   :param target_arena: The target fingerprints.
   :type target_arena: a :class:`chemfp.arena.FingerprintArena`
   :param threshold: The minimum score threshold.
   :type threshold: float between 0.0 and 1.0, inclusive
   :returns: an array of counts


count_tversky_hits_symmetric
----------------------------

.. py:function:: count_tversky_hits_symmetric(arena, threshold=0.7, alpha=1.0, beta=1.0, batch_size=100)

   For each fingerprint in the *arena*, count the number of other fingerprints at least *threshold* similar to it
   
   A fingerprint never matches itself.
   
   The computation can take a long time. Python won't check check for
   a ``^C`` until the function finishes. This can be irritating. Instead,
   process only *batch_size* rows at a time before checking for a ``^C``.
   
   Note: the *batch_size* may disappear in future versions of chemfp.
   I can't detect any performance difference between the current value
   and a larger value, so it seems rather pointless to have. Let me
   know if it's useful to keep as a user-defined parameter.
   
   Example::
   
       arena = chemfp.load_fingerprints("targets.fps")
       counts = chemfp.search.count_tversky_hits_symmetric(
             arena, threshold=0.2, alpha=0.5, beta=0.5)
       print(counts[:10])
   
   The result object is implementation specific. You'll always be able to
   get its length and do an index lookup to get an integer
   count. Currently it's a ctype array of longs, but it could be an
   array.array or Python list in the future.
   
   :param arena: the set of fingerprints
   :type arena: a :class:`chemfp.arena.FingerprintArena`
   :param threshold: The minimum score threshold.
   :type threshold: float between 0.0 and 1.0, inclusive
   :param batch_size: the number of rows to process before checking for a ``^C``
   :type batch_size: integer
   :returns: an array of counts
 

partial_count_tversky_hits_symmetric
------------------------------------

.. py:function:: partial_count_tversky_hits_symmetric( counts, arena, threshold=0.7, alpha=1.0, beta=1.0, query_start=0, query_end=None, target_start=0, target_end=None)

   Compute a portion of the symmetric Tversky counts
   
   For most cases, use :func:`chemfp.search.count_tversky_hits_symmetric`
   instead of this function!
   
   This function is only useful for thread-pool implementations. In
   that case, set the number of OpenMP threads to 1.
   
   *counts* is a contiguous array of integers. It should be
   initialized to zeros, and reused for successive calls.
   
   The function adds counts for counts[*query_start*:*query_end*] based
   on computing the upper-triangle portion contained in the rectangle
   *query_start*:*query_end* and *target_start*:target_end* and using
   symmetry to fill in the lower half.
   
   You know, this is pretty complicated. Here's the bare minimum
   example of how to use it correctly to process 10 rows at a time
   using up to 4 threads::
   
       import chemfp
       import chemfp.search
       from chemfp import futures
       import array
       
       chemfp.set_num_threads(1)  # Globally disable OpenMP
       
       arena = chemfp.load_fingerprints("targets.fps")  # Load the fingerprints
       n = len(arena)
       counts = array.array("i", [0]*n)
       
       with futures.ThreadPoolExecutor(max_workers=4) as executor:
           for row in xrange(0, n, 10):
               executor.submit(chemfp.search.partial_count_tversky_hits_symmetric,
                               counts, arena, threshold=0.2, alpha=0.5, beta=0.5,
                               query_start=row, query_end=min(row+10, n))
       
       print(counts)
   
   :param counts: the accumulated Tversky counts
   :type counts: a contiguous block of integer
   :param arena: the fingerprints.
   :type arena: a :class:`chemfp.arena.FingerprintArena`
   :param threshold: The minimum score threshold.
   :type threshold: float between 0.0 and 1.0, inclusive
   :param query_start: the query start row
   :type query_start: an integer
   :param query_end: the query end row
   :type query_end: an integer, or None to mean the last query row
   :param target_start: the target start row
   :type target_start: an integer
   :param target_end: the target end row
   :type target_end: an integer, or None to mean the last target row
   :returns: None


threshold_tanimoto_search_fp
----------------------------

.. py:function:: threshold_tanimoto_search_fp(query_fp, target_arena, threshold=0.7)

   Search for fingerprint hits in *target_arena* which are at least *threshold* similar to *query_fp*
   
   The hits in the returned :class:`chemfp.search.SearchResult` are in arbitrary order.
   
   Example::
   
       query_id, query_fp = chemfp.load_fingerprints("queries.fps")[0]
       targets = chemfp.load_fingerprints("targets.fps")
       print(list(chemfp.search.threshold_tanimoto_search_fp(query_fp, targets, threshold=0.15)))
   
   :param query_fp: the query fingerprint
   :type query_fp: a byte string
   :param target_arena: the target arena
   :type target_arena: a :class:`chemfp.arena.FingerprintArena`
   :param threshold: The minimum score threshold.
   :type threshold: float between 0.0 and 1.0, inclusive
   :returns: a :class:`chemfp.search.SearchResult`


threshold_tanimoto_search_arena
-------------------------------

.. py:function:: threshold_tanimoto_search_arena(query_arena, target_arena, threshold=0.7)

   Search for the hits in the *target_arena* at least *threshold* similar to the fingerprints in *query_arena*
   
   The hits in the returned :class:`chemfp.search.SearchResults` are in arbitrary order.
   
   Example::
   
       queries = chemfp.load_fingerprints("queries.fps")
       targets = chemfp.load_fingerprints("targets.fps")
       results = chemfp.search.threshold_tanimoto_search_arena(queries, targets, threshold=0.5)
       for query_id, query_hits in zip(queries.ids, results):
           if len(query_hits) > 0:
               print(query_id, "->", ", ".join(query_hits.get_ids()))
   
   :param query_arena: The query fingerprints.
   :type query_arena: a :class:`chemfp.arena.FingerprintArena`
   :param target_arena: The target fingerprints.
   :type target_arena: a :class:`chemfp.arena.FingerprintArena`
   :param threshold: The minimum score threshold.
   :type threshold: float between 0.0 and 1.0, inclusive
   :returns: a :class:`chemfp.search.SearchResults`


threshold_tanimoto_search_symmetric
-----------------------------------

.. py:function:: threshold_tanimoto_search_symmetric(arena, threshold=0.7, include_lower_triangle=True, batch_size=100)

   Search for the hits in the *arena* at least *threshold* similar to the fingerprints in the arena
   
   When *include_lower_triangle* is True, compute the upper-triangle
   similarities, then copy the results to get the full set of
   results. When *include_lower_triangle* is False, only compute the
   upper triangle.
   
   The hits in the returned :class:`chemfp.search.SearchResults` are in arbitrary order.
   
   The computation can take a long time. Python won't check check for
   a ``^C`` until the function finishes. This can be irritating. Instead,
   process only *batch_size* rows at a time before checking for a ``^C``.
   
   Note: the *batch_size* may disappear in future versions of chemfp. Let
   me know if it really is useful for you to have as a user-defined parameter.
   
   Example::
   
       arena = chemfp.load_fingerprints("queries.fps")
       full_result = chemfp.search.threshold_tanimoto_search_symmetric(arena, threshold=0.2)
       upper_triangle = chemfp.search.threshold_tanimoto_search_symmetric(
                 arena, threshold=0.2, include_lower_triangle=False)
       assert sum(map(len, full_result)) == sum(map(len, upper_triangle))*2
                 
   :param arena: the set of fingerprints
   :type arena: a :class:`chemfp.arena.FingerprintArena`
   :param threshold: The minimum score threshold.
   :type threshold: float between 0.0 and 1.0, inclusive
   :param include_lower_triangle:
       if False, compute only the upper triangle, otherwise use symmetry to compute the full matrix
   :type include_lower_triangle: boolean
   :param batch_size: the number of rows to process before checking for a ^C
   :type batch_size: integer
   :returns: a :class:`chemfp.search.SearchResults`
 

partial_threshold_tanimoto_search_symmetric
-------------------------------------------

.. py:function:: partial_threshold_tanimoto_search_symmetric(results, arena, threshold=0.7, query_start=0, query_end=None, target_start=0, target_end=None, results_offset=0)

   Compute a portion of the symmetric Tanimoto search results
   
   For most cases, use :func:`chemfp.search.threshold_tanimoto_search_symmetric`
   instead of this function!
   
   This function is only useful for thread-pool implementations. In
   that case, set the number of OpenMP threads to 1.
   
   *results* is a :class:`chemfp.search.SearchResults` instance which is at
   least as large as the arena. It should be reused for successive updates.
   
   The function adds hits to results[*query_start*:*query_end*], based
   on computing the upper-triangle portion contained in the rectangle
   *query_start*:*query_end* and *target_start*:*target_end*.
   
   It does not fill in the lower triangle. To get the full matrix,
   call *fill_lower_triangle*.
   
   You know, this is pretty complicated. Here's the bare minimum
   example of how to use it correctly to process 10 rows at a time
   using up to 4 threads::
   
       import chemfp
       import chemfp.search
       from chemfp import futures
       import array
   
       chemfp.set_num_threads(1)
   
       arena = chemfp.load_fingerprints("targets.fps")
       n = len(arena)
       results = chemfp.search.SearchResults(n, n, arena.ids)
   
       with futures.ThreadPoolExecutor(max_workers=4) as executor:
           for row in xrange(0, n, 10):
               executor.submit(chemfp.search.partial_threshold_tanimoto_search_symmetric,
                               results, arena, threshold=0.2,
                               query_start=row, query_end=min(row+10, n))
   
       chemfp.search.fill_lower_triangle(results)
   
   The hits in the :class:`chemfp.search.SearchResults` are in arbitrary order.
   
   :param results: the intermediate search results
   :type results: a :class:`chemfp.search.SearchResults` instance
   :param arena: the fingerprints.
   :type arena: a :class:`chemfp.arena.FingerprintArena`
   :param threshold: The minimum score threshold.
   :type threshold: float between 0.0 and 1.0, inclusive
   :param query_start: the query start row
   :type query_start: an integer
   :param query_end: the query end row
   :type query_end: an integer, or None to mean the last query row
   :param target_start: the target start row
   :type target_start: an integer
   :param target_end: the target end row
   :type target_end: an integer, or None to mean the last target row
   :param results_offset: use results[results_offset] as the base for the results
   :param results_offset: an integer
   :returns: None


fill_lower_triangle
-------------------

.. py:function:: fill_lower_triangle(results)

   Duplicate each entry of *results* to its transpose
   
   This is used after the symmetric threshold search to turn the
   upper-triangle results into a full matrix.
   
   :param results: search results
   :type results: a :class:`chemfp.search.SearchResults`


threshold_tversky_search_fp
---------------------------

.. py:function:: threshold_tversky_search_fp(query_fp, target_arena, threshold=0.7, alpha=1.0, beta=1.0)

   Search for fingerprint hits in *target_arena* which are at least *threshold* similar to *query_fp*
   
   The hits in the returned :class:`chemfp.search.SearchResult` are in arbitrary order.
   
   Example::
   
       query_id, query_fp = chemfp.load_fingerprints("queries.fps")[0]
       targets = chemfp.load_fingerprints("targets.fps")
       print(list(chemfp.search.threshold_tversky_search_fp(
                  query_fp, targets, threshold=0.15, alpha=0.5, beta=0.5)))
   
   :param query_fp: the query fingerprint
   :type query_fp: a byte string
   :param target_arena: the target arena
   :type target_arena: a :class:`chemfp.arena.FingerprintArena`
   :param threshold: The minimum score threshold.
   :type threshold: float between 0.0 and 1.0, inclusive
   :returns: a :class:`chemfp.search.SearchResult`


threshold_tversky_search_arena
------------------------------

.. py:function:: threshold_tversky_search_arena(query_arena, target_arena, threshold=0.7, alpha=1.0, beta=1.0)

   Search for the hits in the *target_arena* at least *threshold* similar to the fingerprints in *query_arena*
   
   The hits in the returned :class:`chemfp.search.SearchResults` are in arbitrary order.
   
   Example::
   
       queries = chemfp.load_fingerprints("queries.fps")
       targets = chemfp.load_fingerprints("targets.fps")
       results = chemfp.search.threshold_tversky_search_arena(
                     queries, targets, threshold=0.5, alpha=0.5, beta=0.5)
       for query_id, query_hits in zip(queries.ids, results):
           if len(query_hits) > 0:
               print(query_id, "->", ", ".join(query_hits.get_ids()))
   
   :param query_arena: The query fingerprints.
   :type query_arena: a :class:`chemfp.arena.FingerprintArena`
   :param target_arena: The target fingerprints.
   :type target_arena: a :class:`chemfp.arena.FingerprintArena`
   :param threshold: The minimum score threshold.
   :type threshold: float between 0.0 and 1.0, inclusive
   :returns: a :class:`chemfp.search.SearchResults`


threshold_tversky_search_symmetric
----------------------------------

.. py:function:: threshold_tversky_search_symmetric(arena, threshold=0.7, alpha=1.0, beta=1.0, include_lower_triangle=True, batch_size=100)

   Search for the hits in the *arena* at least *threshold* similar to the fingerprints in the arena
   
   When *include_lower_triangle* is True, compute the upper-triangle
   similarities, then copy the results to get the full set of
   results. When *include_lower_triangle* is False, only compute the
   upper triangle.
   
   The hits in the returned :class:`chemfp.search.SearchResults` are in arbitrary order.
   
   The computation can take a long time. Python won't check check for
   a ``^C`` until the function finishes. This can be irritating. Instead,
   process only *batch_size* rows at a time before checking for a ``^C``
   
   Note: the *batch_size* may disappear in future versions of chemfp. Let
   me know if it really is useful for you to have as a user-defined parameter.
   
   Example::
   
       arena = chemfp.load_fingerprints("queries.fps")
       full_result = chemfp.search.threshold_tversky_search_symmetric(
             arena, threshold=0.2, alpha=0.5, beta=0.5)
       upper_triangle = chemfp.search.threshold_tversky_search_symmetric(
                 arena, threshold=0.2, alpha=0.5, beta=0.5, include_lower_triangle=False)
       assert sum(map(len, full_result)) == sum(map(len, upper_triangle))*2
                 
   :param arena: the set of fingerprints
   :type arena: a :class:`chemfp.arena.FingerprintArena`
   :param threshold: The minimum score threshold.
   :type threshold: float between 0.0 and 1.0, inclusive
   :param include_lower_triangle:
       if False, compute only the upper triangle, otherwise use symmetry to compute the full matrix
   :type include_lower_triangle: boolean
   :param batch_size: the number of rows to process before checking for a ^C
   :type batch_size: integer
   :returns: a :class:`chemfp.search.SearchResults`
 

partial_threshold_tversky_search_symmetric
------------------------------------------

.. py:function:: partial_threshold_tversky_search_symmetric( results, arena, threshold=0.7, alpha=1.0, beta=1.0, query_start=0, query_end=None, target_start=0, target_end=None, results_offset=0)

   Compute a portion of the symmetric Tversky search results
   
   For most cases, use :func:`chemfp.search.threshold_tversky_search_symmetric`
   instead of this function!
   
   This function is only useful for thread-pool implementations. In
   that case, set the number of OpenMP threads to 1.
   
   *results* is a :class:`chemfp.search.SearchResults` instance which is at
   least as large as the arena. It should be reused for successive updates.
   
   The function adds hits to results[*query_start*:*query_end*], based
   on computing the upper-triangle portion contained in the rectangle
   *query_start*:*query_end* and *target_start*:*target_end*.
   
   It does not fill in the lower triangle. To get the full matrix,
   call *fill_lower_triangle*.
   
   You know, this is pretty complicated. Here's the bare minimum
   example of how to use it correctly to process 10 rows at a time
   using up to 4 threads::
   
       import chemfp
       import chemfp.search
       from chemfp import futures
       import array
   
       chemfp.set_num_threads(1)
   
       arena = chemfp.load_fingerprints("targets.fps")
       n = len(arena)
       results = chemfp.search.SearchResults(n, n, arena.ids)
   
       with futures.ThreadPoolExecutor(max_workers=4) as executor:
           for row in xrange(0, n, 10):
               executor.submit(chemfp.search.partial_threshold_tversky_search_symmetric,
                               results, arena, threshold=0.2, alpha=0.5, beta=0.5,
                               query_start=row, query_end=min(row+10, n))
   
       chemfp.search.fill_lower_triangle(results)
   
   The hits in the :class:`chemfp.search.SearchResults` are in arbitrary order.
   
   :param counts: the intermediate search results
   :type counts: a SearchResults instance
   :param arena: the fingerprints.
   :type arena: a :class:`chemfp.arena.FingerprintArena`
   :param threshold: The minimum score threshold.
   :type threshold: float between 0.0 and 1.0, inclusive
   :param query_start: the query start row
   :type query_start: an integer
   :param query_end: the query end row
   :type query_end: an integer, or None to mean the last query row
   :param target_start: the target start row
   :type target_start: an integer
   :param target_end: the target end row
   :type target_end: an integer, or None to mean the last target row
   :param results_offset: use results[results_offset] as the base for the results
   :param results_offset: an integer
   :returns: None


knearest_tanimoto_search_fp
---------------------------

.. py:function:: knearest_tanimoto_search_fp(query_fp, target_arena, k=3, threshold=0.7)

   Search for *k*-nearest hits in *target_arena* which are at least *threshold* similar to *query_fp*
   
   The hits in the :class:`chemfp.search.SearchResults` are ordered by
   decreasing similarity score.
   
   Example::
   
       query_id, query_fp = chemfp.load_fingerprints("queries.fps")[0]
       targets = chemfp.load_fingerprints("targets.fps")
       print(list(chemfp.search.knearest_tanimoto_search_fp(query_fp, targets, k=3, threshold=0.0)))
   
   :param query_fp: the query fingerprint
   :type query_fp: a byte string
   :param target_arena: the target arena
   :type target_arena: a :class:`chemfp.arena.FingerprintArena`
   :param k: the number of nearest neighbors to find.
   :type k: positive integer
   :param threshold: The minimum score threshold.
   :type threshold: float between 0.0 and 1.0, inclusive
   :returns: a :class:`chemfp.search.SearchResult`


knearest_tanimoto_search_arena
------------------------------

.. py:function:: knearest_tanimoto_search_arena(query_arena, target_arena, k=3, threshold=0.7)

   Search for the *k* nearest hits in the *target_arena* at least *threshold* similar to the fingerprints in *query_arena*
   
   The hits in the :class:`chemfp.search.SearchResults` are ordered by
   decreasing similarity score.
   
   Example::
   
       queries = chemfp.load_fingerprints("queries.fps")
       targets = chemfp.load_fingerprints("targets.fps")
       results = chemfp.search.knearest_tanimoto_search_arena(queries, targets, k=3, threshold=0.5)
       for query_id, query_hits in zip(queries.ids, results):
           if len(query_hits) >= 2:
               print(query_id, "->", ", ".join(query_hits.get_ids()))
   
   :param query_arena: The query fingerprints.
   :type query_arena: a :class:`chemfp.arena.FingerprintArena`
   :param target_arena: The target fingerprints.
   :type target_arena: a :class:`chemfp.arena.FingerprintArena`
   :param k: the number of nearest neighbors to find.
   :type k: positive integer
   :param threshold: The minimum score threshold.
   :type threshold: float between 0.0 and 1.0, inclusive
   :returns: a :class:`chemfp.search.SearchResults`


knearest_tanimoto_search_symmetric
----------------------------------

.. py:function:: knearest_tanimoto_search_symmetric(arena, k=3, threshold=0.7, batch_size=100)

   Search for the *k*-nearest hits in the *arena* at least *threshold* similar to the fingerprints in the arena
   
   The hits in the :class:`SearchResults` are ordered by decreasing similarity score.
   
   The computation can take a long time. Python won't check check for
   a ``^C`` until the function finishes. This can be irritating. Instead,
   process only *batch_size* rows at a time before checking for a ``^C.``
   
   Note: the *batch_size* may disappear in future versions of chemfp. Let
   me know if it really is useful for you to keep as a user-defined parameter.
   
   Example::
   
       arena = chemfp.load_fingerprints("queries.fps")
       results = chemfp.search.knearest_tanimoto_search_symmetric(arena, k=3, threshold=0.8)
       for (query_id, hits) in zip(arena.ids, results):
           print(query_id, "->", ", ".join(("%s %.2f" % hit) for hit in  hits.get_ids_and_scores()))
   
   :param arena: the set of fingerprints
   :type arena: a :class:`chemfp.arena.FingerprintArena`
   :param k: the number of nearest neighbors to find.
   :type k: positive integer
   :param threshold: The minimum score threshold.
   :type threshold: float between 0.0 and 1.0, inclusive
   :param include_lower_triangle:
       if False, compute only the upper triangle, otherwise use symmetry to compute the full matrix
   :type include_lower_triangle: boolean
   :param batch_size: the number of rows to process before checking for a ^C
   :type batch_size: integer
   :returns: a :class:`chemfp.search.SearchResults`


knearest_tversky_search_fp
--------------------------

.. py:function:: knearest_tversky_search_fp(query_fp, target_arena, k=3, threshold=0.7, alpha=1.0, beta=1.0)

   Search for *k*-nearest hits in *target_arena* which are at least *threshold* similar to *query_fp*
   
   The hits in the :class:`chemfp.search.SearchResults` are ordered by
   decreasing similarity score.
   
   Example::
   
       query_id, query_fp = chemfp.load_fingerprints("queries.fps")[0]
       targets = chemfp.load_fingerprints("targets.fps")
       print(list(chemfp.search.knearest_tversky_search_fp(
               query_fp, targets, k=3, threshold=0.0, alpha=0.5, beta=0.5)))
   
   :param query_fp: the query fingerprint
   :type query_fp: a byte string
   :param target_arena: the target arena
   :type target_fp: a :class:`chemfp.arena.FingerprintArena`
   :param k: the number of nearest neighbors to find.
   :type k: positive integer
   :param threshold: The minimum score threshold.
   :type threshold: float between 0.0 and 1.0, inclusive
   :returns: a :class:`chemfp.search.SearchResults`


knearest_tversky_search_arena
-----------------------------

.. py:function:: knearest_tversky_search_arena(query_arena, target_arena, k=3, threshold=0.7, alpha=1.0, beta=1.0)

   Search for the *k* nearest hits in the *target_arena* at least *threshold* similar to the fingerprints in *query_arena*
   
   The hits in the :class:`chemfp.search.SearchResults` are ordered by
   decreasing similarity score.
   
   Example::
   
       queries = chemfp.load_fingerprints("queries.fps")
       targets = chemfp.load_fingerprints("targets.fps")
       results = chemfp.search.knearest_tversky_search_arena(
             queries, targets, k=3, threshold=0.5, alpha=0.5, beta=0.5)
       for query_id, query_hits in zip(queries.ids, results):
           if len(query_hits) >= 2:
               print(query_id, "->", ", ".join(query_hits.get_ids()))
   
   :param query_arena: The query fingerprints.
   :type query_arena: a :class:`chemfp.arena.FingerprintArena`
   :param target_arena: The target fingerprints.
   :type target_arena: a :class:`chemfp.arena.FingerprintArena`
   :param k: the number of nearest neighbors to find.
   :type k: positive integer
   :param threshold: The minimum score threshold.
   :type threshold: float between 0.0 and 1.0, inclusive
   :returns: a :class:`chemfp.search.SearchResults`


knearest_tversky_search_symmetric
---------------------------------

.. py:function:: knearest_tversky_search_symmetric(arena, k=3, threshold=0.7, alpha=1.0, beta=1.0, batch_size=100)

   Search for the *k*-nearest hits in the *arena* at least *threshold* similar to the fingerprints in the arena
   
   The hits in the :class:`SearchResults` are ordered by decreasing similarity score.
   
   The computation can take a long time. Python won't check check for
   a ``^C`` until the function finishes. This can be irritating. Instead,
   process only *batch_size* rows at a time before checking for a ``^C.``
   
   Note: the *batch_size* may disappear in future versions of chemfp. Let
   me know if it really is useful for you to keep as a user-defined parameter.
   
   Example::
   
       arena = chemfp.load_fingerprints("queries.fps")
       results = chemfp.search.knearest_tversky_search_symmetric(
                arena, k=3, threshold=0.8, alpha=0.5, beta=0.5)
       for (query_id, hits) in zip(arena.ids, results):
           print(query_id, "->", ", ".join(("%s %.2f" % hit) for hit in  hits.get_ids_and_scores()))
   
   :param arena: the set of fingerprints
   :type arena: a :class:`chemfp.arena.FingerprintArena`
   :param k: the number of nearest neighbors to find.
   :type k: positive integer
   :param threshold: The minimum score threshold.
   :type threshold: float between 0.0 and 1.0, inclusive
   :param include_lower_triangle:
       if False, compute only the upper triangle, otherwise use symmetry to compute the full matrix
   :type include_lower_triangle: boolean
   :param batch_size: the number of rows to process before checking for a ^C
   :type batch_size: integer
   :returns: a :class:`chemfp.search.SearchResults`


contains_fp
-----------

.. py:function:: contains_fp(query_fp, target_arena)

   Find the target fingerprints which contain the query fingerprint bits as a subset
   
   A target fingerprint contains a query fingerprint if all of the on
   bits of the query fingerprint are also on bits of the target
   fingerprint. This function returns a :class:`chemfp.search.SearchResult`
   containing all of the target fingerprints in *target_arena* that contain
   the *query_fp*.
   
   The SearchResult scores are all 0.0. 
   
   There is currently no direct way to limit the arena search range.
   Instead create a subarena by using Python's slice notation on the
   arena then search the subarena.
   
   :param query_fp: the query fingerprint
   :type query_fp: a byte string
   :param target_arena: The target fingerprints.
   :type target_arena: a :class:`chemfp.arena.FingerprintArena`
   :returns: a SearchResult instance


contains_arena
--------------

.. py:function:: contains_arena(query_arena, target_arena)

   Find the target fingerprints which contain the query fingerprints as a subset
   
   A target fingerprint contains a query fingerprint if all of the on
   bits of the query fingerprint are also on bits of the target
   fingerprint. This function returns a :class:`chemfp.search.SearchResults`
   where SearchResults[i] contains all of the target fingerprints in
   *target_arena* that contain the fingerprint for entry
   *query_arena* [i].
   
   The SearchResult scores are all 0.0.
   
   There is currently no direct way to limit the arena search range,
   though you can create and search a subarena by using Python's
   slice notation.
   
   :param query_arena: the query fingerprints
   :type query_arena: a :class:`chemfp.arena.FingerprintArena`
   :param target_arena: the target fingerprints
   :type target_arena: a :class:`chemfp.arena.FingerprintArena`
   :returns: a :class:`chemfp.search.SearchResults` instance, of the same size as query_arena


SearchResults
-------------

.. py:class:: SearchResults

   Search results for a list of query fingerprints against a target arena
   
   This acts like a list of SearchResult elements, with the ability
   to iterate over each search results, look them up by index, and
   get the number of scores.
   
   In addition, there are helper methods to iterate over each hit and
   to get the hit indicies, scores, and identifiers directly as Python
   lists, sort the list contents, and more.


  .. py:method:: __len__()

     The number of rows in the SearchResults


  .. py:method:: __iter__()

     Iterate over each SearchResult hit


  .. py:method:: __getitem__(i)

     Get the *i*-th SearchResult


  .. py:attribute:: SearchResults.shape

     Read-only attribute.

     the tuple (number of rows, number of columns)
     
     The number of columns is the size of the target arena.


  .. py:method:: iter_indices()

     For each hit, yield the list of target indices


  .. py:method:: iter_ids()

     For each hit, yield the list of target identifiers


  .. py:method:: iter_scores()

     For each hit, yield the list of target scores


  .. py:method:: iter_indices_and_scores()

     For each hit, yield the list of (target index, score) tuples


  .. py:method:: iter_ids_and_scores()

     For each hit, yield the list of (target id, score) tuples


  .. py:method:: clear_all()

     Remove all hits from all of the search results


  .. py:method:: count_all(min_score=None, max_score=None, interval="[]")

     Count the number of hits with a score between *min_score* and *max_score*
     
     Using the default parameters this returns the number of
     hits in the result.
     
     The default *min_score* of None is equivalent to -infinity.
     The default *max_score* of None is equivalent to +infinity.
     
     The *interval* parameter describes the interval end
     conditions. The default of "[]" uses a closed interval,
     where min_score <= score <= max_score. The interval "()"
     uses the open interval where min_score < score < max_score.
     The half-open/half-closed intervals "(]" and "[)" are
     also supported.
     
     :param min_score: the minimum score in the range.
     :type min_score: a float, or None for -infinity
     :param max_score: the maximum score in the range.
     :type max_score: a float, or None for +infinity
     :param interval: specify if the end points are open or closed.
     :type interval: one of "[]", "()", "(]", "[)"
     :returns: an integer count


  .. py:method:: cumulative_score_all(min_score=None, max_score=None, interval="[]")

     The sum of all scores in all rows which are between *min_score* and *max_score*
     
     Using the default parameters this returns the sum of all of
     the scores in all of the results. With a specified range this
     returns the sum of all of the scores in that range. The
     cumulative score is also known as the raw score.
     
     The default *min_score* of None is equivalent to -infinity.
     The default *max_score* of None is equivalent to +infinity.
     
     The *interval* parameter describes the interval end
     conditions. The default of "[]" uses a closed interval,
     where min_score <= score <= max_score. The interval "()"
     uses the open interval where min_score < score < max_score.
     The half-open/half-closed intervals "(]" and "[)" are
     also supported.
     
     :param min_score: the minimum score in the range.
     :type min_score: a float, or None for -infinity
     :param max_score: the maximum score in the range.
     :type max_score: a float, or None for +infinity
     :param interval: specify if the end points are open or closed.
     :type interval: one of "[]", "()", "(]", "[)"
     :returns: a floating point count


  .. py:method:: reorder_all(order="decreasing-score")

     Reorder the hits for all of the rows based on the requested *order*.
     
     The available orderings are:
     
     * increasing-score - sort by increasing score
     * decreasing-score - sort by decreasing score
     * increasing-index - sort by increasing target index
     * decreasing-index - sort by decreasing target index
     * move-closest-first - move the hit with the highest score to the first position
     * reverse - reverse the current ordering
     
     :param ordering: the name of the ordering to use
     :type ordering: string


  .. py:method:: to_csr(dtype=None)

     Return the results as a SciPy compressed sparse row matrix.
     
     The returned matrix has the same shape as the SearchResult
     instance and can be passed into, for example, a scikit-learn
     clustering algorithm.
     
     By default the scores are stored with the `dtype` is "float64".
     
     This method requires that SciPy (and NumPy) be installed.
     
     :param dtype: a NumPy numeric data type
     :type dtype: string or NumPy type


SearchResult
------------

.. py:class:: SearchResult

   Search results for a query fingerprint against a target arena.
   
   The results contains a list of hits. Hits contain a target index,
   score, and optional target ids. The hits can be reordered based on
   score or index.


  .. py:method:: __len__()

     The number of hits


  .. py:method:: __iter__()

     Iterate through the pairs of (target index, score) using the current ordering


  .. py:method:: clear()

     Remove all hits from this result


  .. py:method:: get_indices()

     The list of target indices, in the current ordering.


  .. py:method:: get_ids()

     The list of target identifiers (if available), in the current ordering


  .. py:method:: iter_ids()

     Iterate over target identifiers (if available), in the current ordering


  .. py:method:: get_scores()

     The list of target scores, in the current ordering


  .. py:method:: get_ids_and_scores()

     The list of (target identifier, target score) pairs, in the current ordering
     
     Raises a TypeError if the target IDs are not available.


  .. py:method:: get_indices_and_scores()

     The list of (target index, score) pairs, in the current ordering


  .. py:method:: reorder(ordering="decreasing-score")

     Reorder the hits based on the requested ordering.
     
     The available orderings are:
       * increasing-score - sort by increasing score
       * decreasing-score - sort by decreasing score
       * increasing-index - sort by increasing target index
       * decreasing-index - sort by decreasing target index
       * move-closest-first - move the hit with the highest score to the first position
       * reverse - reverse the current ordering
     
     :param string ordering: the name of the ordering to use


  .. py:method:: count(min_score=None, max_score=None, interval="[]")

     Count the number of hits with a score between *min_score* and *max_score*
     
     Using the default parameters this returns the number of
     hits in the result.
     
     The default *min_score* of None is equivalent to -infinity.
     The default *max_score* of None is equivalent to +infinity.
     
     The *interval* parameter describes the interval end
     conditions. The default of "[]" uses a closed interval,
     where min_score <= score <= max_score. The interval "()"
     uses the open interval where min_score < score < max_score.
     The half-open/half-closed intervals "(]" and "[)" are
     also supported.
     
     :param min_score: the minimum score in the range.
     :type min_score: a float, or None for -infinity
     :param max_score: the maximum score in the range.
     :type max_score: a float, or None for +infinity
     :param interval: specify if the end points are open or closed.
     :type interval: one of "[]", "()", "(]", "[)"
     :returns: an integer count


  .. py:method:: cumulative_score(min_score=None, max_score=None, interval="[]")

     The sum of the scores which are between *min_score* and *max_score*
     
     Using the default parameters this returns the sum of all of
     the scores in the result. With a specified range this returns
     the sum of all of the scores in that range. The cumulative
     score is also known as the raw score.
     
     The default *min_score* of None is equivalent to -infinity.
     The default *max_score* of None is equivalent to +infinity.
     
     The *interval* parameter describes the interval end
     conditions. The default of "[]" uses a closed interval,
     where min_score <= score <= max_score. The interval "()"
     uses the open interval where min_score < score < max_score.
     The half-open/half-closed intervals "(]" and "[)" are
     also supported.
     
     :param min_score: the minimum score in the range.
     :type min_score: a float, or None for -infinity
     :param max_score: the maximum score in the range.
     :type max_score: a float, or None for +infinity
     :param interval: specify if the end points are open or closed.
     :type interval: one of "[]", "()", "(]", "[)"
     :returns: a floating point value


  .. py:method:: format_ids_and_scores_as_bytes(ids=None, precision=4)
     
     *Added in version 3.3.*

     Format the ids and scores as the byte string needed for simsearch output
     
     If there are no hits then the result is the empty string b"", otherwise it
     returns a byte string containing the tab-seperated ids and scores, in
     the order ids[0], scores[0], ids[1], scores[1], ...
     
     If the *ids* is not specified then the ids come from self.get_ids(). If no
     ids are available, a ValueError is raised. The ids must be a list of Unicode
     strings.
     
     The *precision* sets the number of decimal digits to use in the score output.
     It must be an integer value between 1 and 10, inclusive.
     
     This function is 3-4x faster than the Python equivalent, which is roughly::
     
        ids = ids if (ids is not None) else self.get_ids()
        formatter = ("%s\t%." + str(precision) + "f").encode("ascii")
        return b"\t".join(formatter % pair for pair in zip(ids, self.get_scores()))
     
     :param ids: the identifiers to use for each hit.
     :type ids: a list of Unicode strings, or None to use the default
     :param precision: the precision to use for each score
     :type precision: an integer from 1 to 10, inclusive
     :returns: a byte string


.. _chemfp.bitops:

chemfp.bitops module
====================

.. py:module:: chemfp.bitops

The following functions from the chemfp.bitops module provide
low-level bit operations on byte and hex fingerprints.


.. py:function:: byte_contains(sub_fp, super_fp)

   Return 1 if the on bits of *sub_fp* are also 1 bits in *super_fp*, that is,
   if *super_fp* contains *sub_fp*.


.. py:function:: byte_contains_bit(fp, bit_index)

   Return True if the the given bit position is on, otherwise False


.. py:function:: byte_difference(fp1, fp2)

   Return the absolute difference (xor) between the two byte strings, fp1 ^ fp2


.. py:function:: byte_from_bitlist(fp[, num_bits=1024])

   Convert a list of bit positions into a byte fingerprint, including modulo folding


.. py:function:: byte_hex_tanimoto(fp1, fp2)

   Compute the Tanimoto similarity between the byte fingerprint *fp1* and the hex fingerprint *fp2*.
   Return a float between 0.0 and 1.0, or raise a ValueError if *fp2* is not a hex fingerprint


.. py:function:: byte_hex_tversky(fp1, fp2, alpha=1.0, beta=1.0)

   Compute the Tversky index between the byte fingerprint *fp1* and the hex fingerprint *fp2*.
   Return a float between 0.0 and 1.0, or raise a ValueError if *fp2* is not a hex fingerprint


.. py:function:: byte_intersect(fp1, fp2)

   Return the intersection of the two byte strings, *fp1* & *fp2*


.. py:function:: byte_intersect_popcount(fp1, fp2)

   Return the number of bits set in the instersection of the two byte fingerprints
   *fp1* and *fp2*


.. py:function:: byte_popcount(fp)

   Return the number of bits set in the byte fingerprint *fp*


.. py:function:: byte_tanimoto(fp1, fp2)

   Compute the Tanimoto similarity between the two byte fingerprints *fp1* and *fp2*


.. py:function:: byte_to_bitlist(bitlist)

   Return a sorted list of the on-bit positions in the byte fingerprint


.. py:function:: byte_tversky(fp1, fp2, alpha=1.0, beta=1.0)

   Compute the Tversky index between the two byte fingerprints *fp1* and *fp2*


.. py:function:: byte_union(fp1, fp2)

   Return the union of the two byte strings, *fp1* | *fp2*


.. py:function:: hex_contains(sub_fp, super_fp)

   Return 1 if the on bits of sub_fp are also on bits in super_fp, otherwise 0.
   Return -1 if either string is not a hex fingerprint


.. py:function:: hex_contains_bit(fp, bit_index)

   Return True if the the given bit position is on, otherwise False.
   
   This function does not validate that the hex fingerprint is actually in hex.


.. py:function:: hex_difference(fp1, fp2)

   Return the absolute difference (xor) between the two hex strings, *fp1* ^ *fp2*.
   Raises a ValueError for non-hex fingerprints.


.. py:function:: hex_from_bitlist(fp[, num_bits=1024])

   Convert a list of bit positions into a hex fingerprint, including modulo folding


.. py:function:: hex_intersect(fp1, fp2)

   Return the intersection of the two hex strings, *fp1* & *fp2*.
   Raises a ValueError for non-hex fingerprints.


.. py:function:: hex_intersect_popcount(fp1, fp2)

   Return the number of bits set in the intersection of the two hex fingerprints
   *fp1* and *fp2*, or raise a ValueError if either string is a non-hex string


.. py:function:: hex_isvalid(s)

   Return 1 if the string *s* is a valid hex fingerprint, otherwise 0


.. py:function:: hex_popcount(fp)

   Return the number of bits set in a hex fingerprint *fp*, or -1 for non-hex strings


.. py:function:: hex_tanimoto(fp1, fp2)

   Compute the Tanimoto similarity between two hex fingerprints. Return a float
   between 0.0 and 1.0, or raise a ValueError if either string is not a hex fingerprint


.. py:function:: hex_tversky(fp1, fp2, alpha=1.0, beta=1.0)

   Compute the Tversky index between two hex fingerprints. Return a float
   between 0.0 and 1.0, or raise a ValueError if either string is not a hex fingerprint


.. py:function:: hex_to_bitlist(bitlist)

   Return a sorted list of the on-bit positions in the hex fingerprint


.. py:function:: hex_union(fp1, fp2)

   Return the union of the two hex strings, *fp1* | *fp2*.
   Raises a ValueError for non-hex fingerprints.


.. py:function:: hex_encode(s)

   Encode the byte string or ASCII string to hex. Returns a text string.


.. py:function:: hex_encode_as_bytes(s)

   Encode the byte string or ASCII string to hex. Returns a byte string.


.. py:function:: hex_decode(s)

   Decode the hex-encoded value to a byte string


chemfp.encodings
================

.. py:module:: chemfp.encodings

Decode different fingerprint representations into chemfp
form. (Currently only decoders are available. Future released may
include encoders.)

The chemfp fingerprints are stored as byte strings, with the bytes in
least-significant bit order (bit #0 is stored in the first/left-most
byte) and with the bits in most-significant bit order (bit #0 is
stored in the first/right-most bit of the first byte).

Other systems use different encodings. These include:
  - the '0 and '1' characters, as in '00111101'
  - hex encoding, like '3d'
  - base64 encoding, like 'SGVsbG8h'
  - CACTVS's variation of base64 encoding

plus variations of different LSB and MSB orders.

This module decodes most of the fingerprint encodings I have come
across. The fingerprint decoders return a 2-ple of the bit length and
the chemfp fingerprint. The bit length is None unless the bit length
is known exactly, which currently is only the case for the binary and
CACTVS fingerprints. (The hex and other encoders must round the
fingerprints up to a multiple of 8 bits.)


from_binary_lsb
---------------

.. py:function:: from_binary_lsb(text)

   Convert a string like '00010101' (bit 0 here is off) into '\xa8'
   
   The encoding characters '0' and '1' are in LSB order, so bit 0 is the left-most field.
   The result is a 2-ple of the fingerprint length and the decoded chemfp fingerprint
   
   >>> from_binary_lsb('00010101')
   (8, b'\xa8')
   >>> from_binary_lsb('11101')
   (5, b'\x17')
   >>> from_binary_lsb('00000000000000010000000000000')
   (29, b'\x00\x80\x00\x00')
   >>>


from_binary_msb
---------------

.. py:function:: from_binary_msb(text)

   Convert a string like '10101000' (bit 0 here is off) into '\xa8'
   
   The encoding characters '0' and '1' are in MSB order, so bit 0 is the right-most field.
   
   >>> from_binary_msb(b'10101000')
   (8, b'\xa8')
   >>> from_binary_msb(b'00010101')
   (8, b'\x15')
   >>> from_binary_msb(b'00111')
   (5, b'\x07')
   >>> from_binary_msb(b'00000000000001000000000000000')
   (29, b'\x00\x80\x00\x00')
   >>>


from_base64
-----------

.. py:function:: from_base64(text)

   Decode a base64 encoded fingerprint string
   
   The encoded fingerprint must be in chemfp form, with the bytes in
   LSB order and the bits in MSB order.
   
   >>> from_base64("SGk=")
   (None, b'Hi')
   >>> from binascii import hexlify
   >>> hexlify(from_base64("SGk=")[1])
   b'4869'
   >>> 


from_hex
--------

.. py:function:: from_hex(text)

   Decode a hex encoded fingerprint string
   
   The encoded fingerprint must be in chemfp form, with the bytes in
   LSB order and the bits in MSB order.
   
   >>> from_hex(b'10f2')
   (None, b'\x10\xf2')
   >>>
   
   Raises a ValueError if the hex string is not a multiple of 2 bytes long
   or if it contains a non-hex character.


from_hex_msb
------------

.. py:function:: from_hex_msb(text)

   Decode a hex encoded fingerprint string where the bits and bytes are in MSB order
   
   >>> from_hex_msb(b'10f2')
   (None, b'\xf2\x10')
   >>>
   
   Raises a ValueError if the hex string is not a multiple of 2 bytes long
   or if it contains a non-hex character.


from_hex_lsb
------------

.. py:function:: from_hex_lsb(text)

   Decode a hex encoded fingerprint string where the bits and bytes are in LSB order
   
   >>> from_hex_lsb(b'102f')
   (None, b'\x08\xf4')
   >>> 
   
   Raises a ValueError if the hex string is not a multiple of 2 bytes long
   or if it contains a non-hex character.


from_cactvs
-----------

.. py:function:: from_cactvs(text)

   Decode a 881-bit CACTVS-encoded fingerprint used by PubChem
   
   >>> from_cactvs(b"AAADceB7sQAEAAAAAAAAAAAAAAAAAWAAAAAwAAAAAAAAAAABwAAAHwIYAAAADA" +
   ...             b"rBniwygJJqAACqAyVyVACSBAAhhwIa+CC4ZtgIYCLB0/CUpAhgmADIyYcAgAAO" +
   ...             b"AAAAAAABAAAAAAAAAAIAAAAAAAAAAA==")
   (881, b'\x07\xde\x8d\x00 \x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x80\x06\x00\x00\x00\x0c\x00\x00\x00\x00\x00\x00\x00\x00\x80\x03\x00\x00\xf8@\x18\x00\x00\x000P\x83y4L\x01IV\x00\x00U\xc0\xa4N*\x00I \x00\x84\xe1@X\x1f\x04\x1df\x1b\x10\x06D\x83\xcb\x0f)%\x10\x06\x19\x00\x13\x93\xe1\x00\x01\x00p\x00\x00\x00\x00\x00\x80\x00\x00\x00\x00\x00\x00\x00@\x00\x00\x00\x00\x00\x00\x00\x00')
   >>>
   
   For format details, see
     ftp://ftp.ncbi.nlm.nih.gov/pubchem/specifications/pubchem_fingerprints.txt


from_daylight
-------------

.. py:function:: from_daylight(text)

   Decode a Daylight ASCII fingerprint
   
   >>> from_daylight(b"I5Z2MLZgOKRcR...1")
   (None, b'PyDaylight')
   
   See the implementation for format details.


from_on_bit_positions
---------------------

.. py:function:: from_on_bit_positions(text, num_bits=1024, separator=" ")

   Decode from a list of integers describing the location of the on bits
   
   >>> from_on_bit_positions("1 4 9 63", num_bits=32)
   (32, b'\x12\x02\x00\x80')
   >>> from_on_bit_positions("1,4,9,63", num_bits=64, separator=",")
   (64, b'\x12\x02\x00\x00\x00\x00\x00\x80')
   
   The text contains a sequence of non-negative integer values
   separated by the `separator` text. Bit positions are folded modulo
   num_bits. 
   
   This is often used to convert sparse fingerprints into a dense
   fingerprint.
   
   Note: if you have a list of bit position as integer values then
   you probably want to use :func:`chemfp.bitops.byte_from_bitlist`.


.. py:module:: chemfp.fps_io

chemfp.fps_io module
====================

This module is part of the private API. Do not import it directly.

The function :func:`chemfp.open` returns an FPSReader if the source is
an FPS file. The function :func:`chemfp.open_fingerprint_writer`
returns an FPSWriter if the destination is an FPS file.


FPSReader
---------

.. py:class:: FPSReader

   FPS file reader
   
   This class implements the :class:`chemfp.FingerprintReader` API. It
   is also its own a context manager, which automatically closes the
   file when the manager exists.
   
   The public attributes are:
   
   .. py:attribute:: metadata
   
      a :class:`chemfp.Metadata` instance with information about the fingerprint type
      
   .. py:attribute:: location
   
      a :class:`chemfp.io.Location` instance with parser location and state information
      
   .. py:attribute:: closed
   
      True if the file is open, else False
   
   The FPSReader.location only tracks the "lineno" variable.


  .. py:method:: __iter__()

     Iterate through the (id, fp) pairs


  .. py:method:: iter_arenas(arena_size=1000)

     iterate through *arena_size* fingerprints at a time, as subarenas
     
     Iterate through *arena_size* fingerprints  at a time, returned
     as :class:`chemfp.arena.FingerprintArena` instances. The arenas are in input
     order and not reordered by popcount.
     
     This method helps trade off between performance and memory
     use. Working with arenas is often faster than processing one
     fingerprint at a time, but if the file is very large then you
     might run out of memory, or get bored while waiting to process
     all of the fingerprint before getting the first answer.
     
     If *arena_size* is None then this makes an iterator which
     returns a single arena containing all of the fingerprints.
     
     :param arena_size: The number of fingerprints to put into each arena.
     :type arena_size: positive integer, or None
     :returns: an iterator of :class:`chemfp.arena.FingerprintArena` instances


  .. py:method:: save(destination, format=None, level=None)

     Save the fingerprints to a given destination and format
     
     The output format is based on the *format*. If the format
     is None then the format depends on the *destination* file
     extension. If the extension isn't recognized then the
     fingerprints will be saved in "fps" format.
     
     If the output format is "fps", "fps.gz", or "fps.zst" then
     *destination* may be a filename, a file object, or None; None
     writes to stdout.
     
     If the output format is "fpb" then *destination* must be
     a filename or seekable file object. Chemfp cannot save
     to compressed FPB files.
     
     :param destination: the output destination
     :type destination: a filename, file object, or None
     :param format: the output format
     :type format: None, "fps", "fps.gz", "fps.zst", or "fpb"
     :param level: compression level when writing .gz or .zst files
     :type level: an integer, or "min", "default", or "max" for compressor-specific values
     :returns: None


  .. py:method:: get_fingerprint_type()

     Get the fingerprint type object based on the metadata's type field
     
     This uses ``self.metadata.type`` to get the fingerprint type
     string then calls :func:`chemfp.get_fingerprint_type` to get and return
     a :class:`chemfp.types.FingerprintType` instance.
     
     This will raise a TypeError if there is no metadata, and
     a ValueError if the type field was invalid or the fingerprint
     type isn't available.
     
     :returns: a :class:`chemfp.types.FingerprintType`


  .. py:method:: close()

     Close the file


  .. py:method:: count_tanimoto_hits_fp(query_fp, threshold=0.7)

     Count the fingerprints which are sufficiently similar to the query fingerprint
     
     Return the number of fingerprints in the reader which are
     at least *threshold* similar to the query fingerprint *query_fp*.
     
     :param query_fp: query fingerprint
     :type query_fp: byte string
     :param threshold: minimum similarity threshold (default: 0.7)
     :type threshold: float between 0.0 and 1.0, inclusive
     :returns: integer count


  .. py:method:: count_tanimoto_hits_arena(queries, threshold=0.7)

     Count the fingerprints which are sufficiently similar to each query fingerprint
     
     Returns a list containing a count for each query fingerprint
     in the *queries* arena. The count is the number of
     fingerprints in the reader which are at least *threshold*
     similar to the query fingerprint.
     
     The order of results is the same as the order of the queries.
     
     :param queries: query fingerprints
     :type queries: a :class:`.FingerprintArena`
     :param threshold: minimum similarity threshold (default: 0.7)
     :type threshold: float between 0.0 and 1.0, inclusive
     :returns: list of integer counts, one for each query


  .. py:method:: count_tversky_hits_fp(query_fp, threshold=0.7, alpha=1.0, beta=1.0)

     Count the fingerprints which are sufficiently similar to the query fingerprint
     
     Return the number of fingerprints in the reader which are
     at least *threshold* similar to the query fingerprint *query_fp*.
     
     :param query_fp: query fingerprint
     :type query_fp: byte string
     :param threshold: minimum similarity threshold (default: 0.7)
     :type threshold: float between 0.0 and 1.0, inclusive
     :type alpha: float between 0.0 and 100.0, inclusive
     :type beta: float between 0.0 and 100.0, inclusive
     :returns: integer count


  .. py:method:: threshold_tanimoto_search_fp(query_fp, threshold=0.7)

     Find the fingerprints which are sufficiently similar to the query fingerprint
     
     Find all of the fingerprints in this reader which are at least
     *threshold* similar to the query fingerprint *query_fp*.  The
     hits are returned as a :class:`.SearchResult`, in arbitrary
     order.
     
     :param query_fp: query fingerprint
     :type query_fp: byte string
     :param threshold: minimum similarity threshold (default: 0.7)
     :type threshold: float between 0.0 and 1.0, inclusive
     :returns: a :class:`.SearchResult`


  .. py:method:: threshold_tanimoto_search_arena(queries, threshold=0.7)

     Find the fingerprints which are sufficiently similar to each of the query fingerprints
     
     For each fingerprint in the *queries* arena, find all of the
     fingerprints in this arena which are at least *threshold*
     similar. The hits are returned as a :class:`.SearchResults`,
     where the hits in each :class:`.SearchResult` is in arbitrary
     order.
     
     :param queries: query fingerprints
     :type queries: a :class:`.FingerprintArena`
     :param threshold: minimum similarity threshold (default: 0.7)
     :type threshold: float between 0.0 and 1.0, inclusive
     :returns: a :class:`.SearchResults`


  .. py:method:: threshold_tversky_search_fp(query_fp, threshold=0.7, alpha=1.0, beta=1.0)

     Find the fingerprints which are sufficiently similar to the query fingerprint
     
     Find all of the fingerprints in this reader which are at least
     *threshold* similar to the query fingerprint *query_fp*.  The
     hits are returned as a :class:`.SearchResult`, in arbitrary
     order.
     
     :param query_fp: query fingerprint
     :type query_fp: byte string
     :param threshold: minimum similarity threshold (default: 0.7)
     :type threshold: float between 0.0 and 1.0, inclusive
     :type alpha: float between 0.0 and 100.0, inclusive
     :type beta: float between 0.0 and 100.0, inclusive
     :returns: a :class:`.SearchResult`


  .. py:method:: knearest_tanimoto_search_fp(query_fp, k=3, threshold=0.7)

     Find the k-nearest fingerprints which are sufficiently similar to the query fingerprint
     
     Find all of the fingerprints in this reader which are at least
     *threshold* similar to the query fingerprint, and of those, select
     the top *k* hits. The hits are returned as a :class:`.SearchResult`,
     sorted from highest score to lowest.
     
     :param queries: query fingerprints
     :type queries: a :class:`.FingerprintArena`
     :param threshold: minimum similarity threshold (default: 0.7)
     :type threshold: float between 0.0 and 1.0, inclusive
     :returns: a :class:`.SearchResult`


  .. py:method:: knearest_tanimoto_search_arena(queries, k=3, threshold=0.7)

     Find the k-nearest fingerprints which are sufficiently similar to each of the query fingerprints
     
     For each fingerprint in the *queries* arena, find the
     fingerprints in this reader which are at least *threshold*
     similar to the query fingerprint, and of those, select the top
     *k* hits. The hits are returned as a :class:`.SearchResults`,
     where the hits in each :class:`.SearchResult` are sorted by
     similarity score.
     
     :param queries: query fingerprints
     :type queries: a :class:`.FingerprintArena`
     :param threshold: minimum similarity threshold (default: 0.7)
     :type threshold: float between 0.0 and 1.0, inclusive
     :returns: a :class:`.SearchResults`


  .. py:method:: knearest_tversky_search_fp(query_fp, k=3, threshold=0.7, alpha=1.0, beta=1.0)

     Find the k-nearest fingerprints which are sufficiently similar to the query fingerprint
     
     Find all of the fingerprints in this reader which are at least
     *threshold* similar to the query fingerprint, and of those, select
     the top *k* hits. The hits are returned as a :class:`.SearchResult`,
     sorted from highest score to lowest.
     
     :param queries: query fingerprints
     :type queries: a :class:`.FingerprintArena`
     :param threshold: minimum similarity threshold (default: 0.7)
     :type threshold: float between 0.0 and 1.0, inclusive
     :type alpha: float between 0.0 and 100.0, inclusive
     :type beta: float between 0.0 and 100.0, inclusive
     :returns: a :class:`.SearchResult`


FPSWriter
---------

.. py:class:: FPSWriter

   Write fingerprints in FPS format.
   
   This is a subclass of :class:`chemfp.FingerprintWriter`.
   
   Instances have the following attributes:
   
   * metadata - a :class:`chemfp.Metadata` instance
   * format - the string 'fps'
   * closed - False when the file is open, else True
   * location - a :class:`chemfp.io.Location` instance
   
   An FPSWriter is its own context manager, and will close the
   output file on context exit.
   
   The Location instance supports the "recno", "output_recno",
   and "lineno" properties.


  .. py:method:: write_fingerprint(id, fp)

     Write a single fingerprint record with the given id and fp
     
     :param string id: the record identifier
     :param bytes fp: the fingerprint


  .. py:method:: write_fingerprints(id_fp_pairs)

     Write a sequence of fingerprint records
     
     :param id_fp_pairs: An iterable of (id, fingerprint) pairs.
 

  .. py:method:: close()

     Close the writer
     
     This will set self.closed to False.
 

chemfp.fpb_io module
====================

This module is part of the private API. Do not import directly.

The function :func:`chemfp.open_fingerprint_writer` returns an
OrderedFPBWriter if the destination is an FPB file and *reorder* is
True, or an InputOrderFPBWriter if *reorder* is False.

.. py:module:: chemfp.fpb_io


OrderedFPBWriter
----------------

.. py:class:: OrderedFPBWriter

   Fingerprint writer for FPB files where the input fingerprint order is preserved
   
   This is a subclass of :class:`chemfp.FingerprintWriter`.
   
   Instances have the following public attributes:
   
   .. py:attribute:: metadata
   
      a :class:`chemfp.Metadata` instance
      
   .. py:attribute:: format
   
      the string 'fpb'
      
   .. py:attribute:: closed
   
      False when the file is open, else True
   
   Other attributes (like "alignment", "include_hash", "include_popc",
   "max_spool_size", and "tmpdir") are undocumented and subject
   to change in the future. Let me know if they are useful.
   
   An OrderedFPBWriter is also is own context manager, and will
   close the writer on context exit.


  .. py:method:: write_fingerprint(id, fp)

     Write a single fingerprint record with the given id and fp to the destination
     
     :param string id: the record identifier
     :param bytes fp: the fingerprint


  .. py:method:: write_fingerprints(id_fp_iter)

     Write a sequence of (id, fingerprint) pairs to the destination
     
     :param id_fp_pairs: An iterable of (id, fingerprint) pairs.


  .. py:method:: close()

     Close the output writer


InputOrderFPBWriter
-------------------

.. py:class:: InputOrderFPBWriter

   Fingerprint writer for FPB files which preserves the input fingerprint order
   
   This is a subclass of :class:`chemfp.FingerprintWriter`.
   
   Instances have the following public attributes:
   
   .. py:attribute:: metadata
   
      a :class:`chemfp.Metadata` instance
      
   .. py:attribute:: format
   
      the string 'fpb'
      
   .. py:attribute:: closed
   
      False when the file is open, else True
   
   Other attributes (like "alignment", "include_hash", "include_popc",
   "max_spool_size", and "tmpdir") are undocumented and subject
   to change in the future. Let me know if they are useful.
   
   An InputOrderFPBWriter is also is own context manager, and will
   close the writer on context exit.


  .. py:method:: write_fingerprint(id, fp)

     Write a single fingerprint record with the given id and fp to the destination
     
     :param string id: the record identifier
     :param bytes fp: the fingerprint


  .. py:method:: write_fingerprints(id_fp_iter)

     Write a sequence of (id, fingerprint) pairs to the destination
     
     :param id_fp_pairs: An iterable of (id, fingerprint) pairs.


  .. py:method:: close()

     Close the output writer
     
     This will set self.closed to False


chemfp toolkit API
==================

.. py:module:: chemfp.toolkit

Open Babel, OEChem and RDKit have different ways to read and write
molecules. The chemfp toolkit API is a common wrapper API for
structure I/O. The chemfp functions work with native toolkit
molecules; chemfp does not have a common molecule API. (For that, use
`Cinfony <http://code.google.com/p/cinfony/>`_.)

While the API is the same across :mod:`.openbabel_toolkit`,
:mod:`.openbabel_toolkit`, :mod:`.rdkit_toolkit`, and the
:mod:`.text_toolkit`, there are some differences in how they
work. For example, each of the toolkits has it own set of reader and
writer arguments. The details are available in the documentation, and
this chapter acts as a pointer to the specific toolkit documentation.


name
----

.. py:attribute:: name

The string "openbabel", "openeye", "rdkit", or "text".

[:ref:`openbabel_toolkit <openbabel_toolkit.name>`]
[:ref:`openeye_toolkit <openeye_toolkit.name>`]
[:ref:`rdkit_toolkit <rdkit_toolkit.name>`]
[:ref:`text_toolkit <text_toolkit.name>`]

software
--------

.. py:attribute:: software

A string like "OpenBabel/2.4.1", "OEChem/20170208",
"RDKit/2016.09.3" or "chemfp/3.1".

[:ref:`openbabel_toolkit <openbabel_toolkit.software>`]
[:ref:`openeye_toolkit <openeye_toolkit.software>`]
[:ref:`rdkit_toolkit <rdkit_toolkit.software>`]
[:ref:`text_toolkit <text_toolkit.software>`]


is_licensed
===========
.. py:function:: is_licensed ()


[:ref:`openbabel_toolkit <openbabel_toolkit.is_licensed>`]
[:ref:`openeye_toolkit <openeye_toolkit.is_licensed>`]
[:ref:`rdkit_toolkit <rdkit_toolkit.is_licensed>`]
[:ref:`text_toolkit <text_toolkit.is_licensed>`]

Check if the toolkit is licensed.


get_formats
===========
.. py:function:: get_formats (include_unavailable=False)


[:ref:`openbabel_toolkit <openbabel_toolkit.get_formats>`]
[:ref:`openeye_toolkit <openeye_toolkit.get_formats>`]
[:ref:`rdkit_toolkit <rdkit_toolkit.get_formats>`]
[:ref:`text_toolkit <text_toolkit.get_formats>`]

Return a list of structure formats.


get_input_formats
=================
.. py:function:: get_input_formats ()


[:ref:`openbabel_toolkit <openbabel_toolkit.get_input_formats>`]
[:ref:`openeye_toolkit <openeye_toolkit.get_input_formats>`]
[:ref:`rdkit_toolkit <rdkit_toolkit.get_input_formats>`]
[:ref:`text_toolkit <text_toolkit.get_input_formats>`]

Return a list of input structure formats.


get_output_formats
==================
.. py:function:: get_output_formats ()


[:ref:`openbabel_toolkit <openbabel_toolkit.get_output_formats>`]
[:ref:`openeye_toolkit <openeye_toolkit.get_output_formats>`]
[:ref:`rdkit_toolkit <rdkit_toolkit.get_output_formats>`]
[:ref:`text_toolkit <text_toolkit.get_output_formats>`]

Return a list of output structure formats.


get_format
==========
.. py:function:: get_format (format)


[:ref:`openbabel_toolkit <openbabel_toolkit.get_format>`]
[:ref:`openeye_toolkit <openeye_toolkit.get_format>`]
[:ref:`rdkit_toolkit <rdkit_toolkit.get_format>`]
[:ref:`text_toolkit <text_toolkit.get_format>`]

Get a named format.


get_input_format
================
.. py:function:: get_input_format (format)


[:ref:`openbabel_toolkit <openbabel_toolkit.get_input_format>`]
[:ref:`openeye_toolkit <openeye_toolkit.get_input_format>`]
[:ref:`rdkit_toolkit <rdkit_toolkit.get_input_format>`]
[:ref:`text_toolkit <text_toolkit.get_input_format>`]

Get a named input format.


get_output_format
=================
.. py:function:: get_output_format (format)


[:ref:`openbabel_toolkit <openbabel_toolkit.get_output_format>`]
[:ref:`openeye_toolkit <openeye_toolkit.get_output_format>`]
[:ref:`rdkit_toolkit <rdkit_toolkit.get_output_format>`]
[:ref:`text_toolkit <text_toolkit.get_output_format>`]

Get a named output format.


get_input_format_from_source
============================
.. py:function:: get_input_format_from_source (source=None, format=None)


[:ref:`openbabel_toolkit <openbabel_toolkit.get_input_format_from_source>`]
[:ref:`openeye_toolkit <openeye_toolkit.get_input_format_from_source>`]
[:ref:`rdkit_toolkit <rdkit_toolkit.get_input_format_from_source>`]
[:ref:`text_toolkit <text_toolkit.get_input_format_from_source>`]

Get an format given an input source.


get_output_format_from_destination
==================================
.. py:function:: get_output_format_from_destination (destination=None, format=None)


[:ref:`openbabel_toolkit <openbabel_toolkit.get_output_format_from_destination>`]
[:ref:`openeye_toolkit <openeye_toolkit.get_output_format_from_destination>`]
[:ref:`rdkit_toolkit <rdkit_toolkit.get_output_format_from_destination>`]
[:ref:`text_toolkit <text_toolkit.get_output_format_from_destination>`]

Get an format given an output destination.


read_molecules
==============
.. py:function:: read_molecules (source=None, format=None, id_tag=None, reader_args=None, errors="strict", location=None")


[:ref:`openbabel_toolkit <openbabel_toolkit.read_molecules>`]
[:ref:`openeye_toolkit <openeye_toolkit.read_molecules>`]
[:ref:`rdkit_toolkit <rdkit_toolkit.read_molecules>`]
[:ref:`text_toolkit <text_toolkit.read_molecules>`]

Read molecules from a structure file.


read_molecules_from_string
==========================
.. py:function:: read_molecules_from_string (content, format, id_tag=None, reader_args=None, errors="strict", location=None)


[:ref:`openbabel_toolkit <openbabel_toolkit.read_molecules_from_string>`]
[:ref:`openeye_toolkit <openeye_toolkit.read_molecules_from_string>`]
[:ref:`rdkit_toolkit <rdkit_toolkit.read_molecules_from_string>`]
[:ref:`text_toolkit <text_toolkit.read_molecules_from_string>`]

Read molecules from structure data stored in a string.


read_ids_and_molecules
======================
.. py:function:: read_ids_and_molecules (source=None, format=None, id_tag=None, reader_args=None, errors="strict", location=None)


[:ref:`openbabel_toolkit <openbabel_toolkit.read_ids_and_molecules>`]
[:ref:`openeye_toolkit <openeye_toolkit.read_ids_and_molecules>`]
[:ref:`rdkit_toolkit <rdkit_toolkit.read_ids_and_molecules>`]
[:ref:`text_toolkit <text_toolkit.read_ids_and_molecules>`]

Read ids and molecules from a structure file.


read_ids_and_molecules_from_string
==================================
.. py:function:: read_ids_and_molecules_from_string (content, format, id_tag=None, reader_args=None, errors="strict", location=None)


[:ref:`openbabel_toolkit <openbabel_toolkit.read_ids_and_molecules_from_string>`]
[:ref:`openeye_toolkit <openeye_toolkit.read_ids_and_molecules_from_string>`]
[:ref:`rdkit_toolkit <rdkit_toolkit.read_ids_and_molecules_from_string>`]
[:ref:`text_toolkit <text_toolkit.read_ids_and_molecules_from_string>`]

Read ids and molecules from structure data stored in a string.


make_id_and_molecule_parser
===========================
.. py:function:: make_id_and_molecule_parser (format, id_tag=None, reader_args=None, errors="strict")


[:ref:`openbabel_toolkit <openbabel_toolkit.make_id_and_molecule_parser>`]
[:ref:`openeye_toolkit <openeye_toolkit.make_id_and_molecule_parser>`]
[:ref:`rdkit_toolkit <rdkit_toolkit.make_id_and_molecule_parser>`]
[:ref:`text_toolkit <text_toolkit.make_id_and_molecule_parser>`]

Make a specialized function which returns the id and molecule given a structure record.


parse_molecule
==============
.. py:function:: parse_molecule (content, format, id_tag=None, reader_args=None, errors="strict")


[:ref:`openbabel_toolkit <openbabel_toolkit.parse_molecule>`]
[:ref:`openeye_toolkit <openeye_toolkit.parse_molecule>`]
[:ref:`rdkit_toolkit <rdkit_toolkit.parse_molecule>`]
[:ref:`text_toolkit <text_toolkit.parse_molecule>`]

Parse a structure record into a molecule.


parse_id_and_molecule
=====================
.. py:function:: parse_id_and_molecule (content, format, id_tag=None, reader_args=None, errors="strict")


[:ref:`openbabel_toolkit <openbabel_toolkit.parse_id_and_molecule>`]
[:ref:`openeye_toolkit <openeye_toolkit.parse_id_and_molecule>`]
[:ref:`rdkit_toolkit <rdkit_toolkit.parse_id_and_molecule>`]
[:ref:`text_toolkit <text_toolkit.parse_id_and_molecule>`]

Parse a structure record into an id and molecule.


create_string
=============
.. py:function:: create_string (mol, format, id=None, writer_args=None, errors="strict")


[:ref:`openbabel_toolkit <openbabel_toolkit.create_string>`]
[:ref:`openeye_toolkit <openeye_toolkit.create_string>`]
[:ref:`rdkit_toolkit <rdkit_toolkit.create_string>`]
[:ref:`text_toolkit <text_toolkit.create_string>`]

Convert a molecule into a Unicode string containg a structure record.


create_bytes
============
.. py:function:: create_bytes (mol, format, id=None, writer_args=None, errors="strict")


[:ref:`openbabel_toolkit <openbabel_toolkit.create_bytes>`]
[:ref:`openeye_toolkit <openeye_toolkit.create_bytes>`]
[:ref:`rdkit_toolkit <rdkit_toolkit.create_bytes>`]
[:ref:`text_toolkit <text_toolkit.create_bytes>`]

Convert a molecule into a byte string containing a structure record.


open_molecule_writer
====================
.. py:function:: open_molecule_writer (destination=None, format=None, writer_args=None, errors="strict", location=None)


[:ref:`openbabel_toolkit <openbabel_toolkit.open_molecule_writer>`]
[:ref:`openeye_toolkit <openeye_toolkit.open_molecule_writer>`]
[:ref:`rdkit_toolkit <rdkit_toolkit.open_molecule_writer>`]
[:ref:`text_toolkit <text_toolkit.open_molecule_writer>`]

Create an output molecule writer, for writing to a file.


open_molecule_writer_to_string
==============================
.. py:function:: open_molecule_writer_to_string (format, writer_args=None, errors="strict", location=None)


[:ref:`openbabel_toolkit <openbabel_toolkit.open_molecule_writer_to_string>`]
[:ref:`openeye_toolkit <openeye_toolkit.open_molecule_writer_to_string>`]
[:ref:`rdkit_toolkit <rdkit_toolkit.open_molecule_writer_to_string>`]
[:ref:`text_toolkit <text_toolkit.open_molecule_writer_to_string>`]

Create an output molecule writer, for writing to a Unicode string.


open_molecule_writer_to_bytes
=============================
.. py:function:: open_molecule_writer_to_bytes (format, writer_args=None, errors="strict", location=None)


[:ref:`openbabel_toolkit <openbabel_toolkit.open_molecule_writer_to_bytes>`]
[:ref:`openeye_toolkit <openeye_toolkit.open_molecule_writer_to_bytes>`]
[:ref:`rdkit_toolkit <rdkit_toolkit.open_molecule_writer_to_bytes>`]
[:ref:`text_toolkit <text_toolkit.open_molecule_writer_to_bytes>`]

Create an output molecule writer, for writing to a byte string.


copy_molecule
=============
.. py:function:: copy_molecule (mol)


[:ref:`openbabel_toolkit <openbabel_toolkit.copy_molecule>`]
[:ref:`openeye_toolkit <openeye_toolkit.copy_molecule>`]
[:ref:`rdkit_toolkit <rdkit_toolkit.copy_molecule>`]
[:ref:`text_toolkit <text_toolkit.copy_molecule>`]

Make a copy of a toolkit molecule.


add_tag
=======
.. py:function:: add_tag (mol, tag, value)


[:ref:`openbabel_toolkit <openbabel_toolkit.add_tag>`]
[:ref:`openeye_toolkit <openeye_toolkit.add_tag>`]
[:ref:`rdkit_toolkit <rdkit_toolkit.add_tag>`]
[:ref:`text_toolkit <text_toolkit.add_tag>`]

Add an SD tag to the molecule.


get_tag
=======
.. py:function:: get_tag (mol, tag)


[:ref:`openbabel_toolkit <openbabel_toolkit.get_tag>`]
[:ref:`openeye_toolkit <openeye_toolkit.get_tag>`]
[:ref:`rdkit_toolkit <rdkit_toolkit.get_tag>`]
[:ref:`text_toolkit <text_toolkit.get_tag>`]

Get an SD tag for a molecule.


get_tag_pairs
=============
.. py:function:: get_tag_pairs ()


[:ref:`openbabel_toolkit <openbabel_toolkit.get_tag_pairs>`]
[:ref:`openeye_toolkit <openeye_toolkit.get_tag_pairs>`]
[:ref:`rdkit_toolkit <rdkit_toolkit.get_tag_pairs>`]
[:ref:`text_toolkit <text_toolkit.get_tag_pairs>`]

Get the list of tag name and tag value pairs.


get_id
======
.. py:function:: get_id (mol)


[:ref:`openbabel_toolkit <openbabel_toolkit.get_id>`]
[:ref:`openeye_toolkit <openeye_toolkit.get_id>`]
[:ref:`rdkit_toolkit <rdkit_toolkit.get_id>`]
[:ref:`text_toolkit <text_toolkit.get_id>`]

Get the molecule id.


set_id
======
.. py:function:: set_id (mol, id)


[:ref:`openbabel_toolkit <openbabel_toolkit.set_id>`]
[:ref:`openeye_toolkit <openeye_toolkit.set_id>`]
[:ref:`rdkit_toolkit <rdkit_toolkit.set_id>`]
[:ref:`text_toolkit <text_toolkit.set_id>`]

Set the molecule id.


.. py:module:: chemfp.base_toolkit

chemfp.base_toolkit
===================

The chemfp.base_toolkit module contains a few objects which are shared
by the different toolkit. There should be no reason for you to import
the module yourself.

molecule I/O file metadata
--------------------------

The ``metadata`` attribute of the toolkit readers and writers is a
FormatMetadata instance. It contains information about the structure
file.

Note that this is **not** the same as the fingerprint
:class:`chemfp.Metadata` instance, which contains information about
the fingerprint file.


FormatMetadata
--------------

.. py:class:: FormatMetadata

   Information about the reader or writer
   
   The public attributes are:
   
   .. py:attribute:: filename
   
      the source or destination filename, the string "<string>" for
      string-based I/O, or None if not known
       
   .. py:attribute:: record_format
   
      the normalized record format name. All SMILES formats are "smi",
      and this does not contain compression information
   
   .. py:attribute:: args
   
      the final reader_args or writer_args, after all processing,
      and as used by the reader and writer


  .. py:method:: __repr__()

     Return a string like 'FormatMeta(filename="cmpds.sdf.gz", record_format="sdf", args={})'


Toolkit readers
===============

The toolkit readers read from structure files. There are several
different variations, depending on the function used to read the
file. All of the readers are subclasses of
:class:`chemfp.base_toolkit.BaseMoleculeReader`.


================================================================  ================================================
Function                                                          Returned reader
================================================================  ================================================
:func:`chemfp.toolkit.read_molecules`                             :class:`chemfp.base_toolkit.MoleculeReader`
:func:`chemfp.toolkit.read_molecules_from_string`                 :class:`chemfp.base_toolkit.MoleculeReader`
:func:`chemfp.toolkit.read_ids_and_molecules`                     :class:`chemfp.base_toolkit.IdAndMoleculeReader`
:func:`chemfp.toolkit.read_ids_and_molecules_from_string`         :class:`chemfp.base_toolkit.IdAndMoleculeReader`
:func:`chemfp.text_toolkit.read_sdf_records`                      :class:`chemfp.base_toolkit.RecordReader`
:func:`chemfp.text_toolkit.read_sdf_records_from_string`          :class:`chemfp.base_toolkit.RecordReader`
:func:`chemfp.text_toolkit.read_sdf_ids_and_records`              :class:`chemfp.base_toolkit.IdAndRecordReader`
:func:`chemfp.text_toolkit.read_sdf_ids_and_records_from_string`  :class:`chemfp.base_toolkit.IdAndRecordReader`
:func:`chemfp.text_toolkit.read_sdf_ids_and_values`               :class:`chemfp.base_toolkit.IdAndRecordReader`
:func:`chemfp.text_toolkit.read_sdf_ids_and_values_from_string`   :class:`chemfp.base_toolkit.IdAndRecordReader`
================================================================  ================================================

All of the readers have the same API. The major difference is that
some readers return a single object during iteration while the others
(those with an "And" in the name) return a pair of objects.


BaseMoleculeReader
------------------

.. py:class:: BaseMoleculeReader

   Base class for the toolkit readers
   
   The public attributes are:
   
   .. py:attribute:: metadata
   
      a :class:`chemfp.base_toolkit.FormatMetadata` instance
   
   .. py:attribute:: location
   
      a :class:`chemfp.io.Location` instance
   
   .. py:attribute:: closed
   
      False if the reader is open, otherwise True
   
   Readers are iterators, so iter(reader) returns itself.
   next(reader) returns either a single object or a pair of
   objects depending on reader.
   
   Readers are also a context manager, and call self.close()
   during exit.


  .. py:method:: close()

     Close the reader
     
     If the reader wasn't previously closed then close it. This will
     set the location properties to their final values, close any
     files that the reader may have opened, and set ``self.closed`` to False.


.. py:class:: MoleculeReader

   Read structures from a file and iterate over the toolkit molecules
   
   The public attributes are:
   
   .. py:attribute:: metadata
   
      a :class:`chemfp.base_toolkit.FormatMetadata` instance
   
   .. py:attribute:: location
   
      a :class:`chemfp.io.Location` instance
   
   .. py:attribute:: closed
   
      False if the reader is open, otherwise True
   
   Note: the toolkit implementation is free to reuse a molecule instead
   of returning a new one each time.


.. py:class:: IdAndMoleculeReader

   Read structures from a file and iterate over the (id, toolkit molecule) pairs
   
   The public attributes are:
   
   .. py:attribute:: metadata
   
      a :class:`chemfp.base_toolkit.FormatMetadata` instance
   
   .. py:attribute:: location
   
      a :class:`chemfp.io.Location` instance
   
   .. py:attribute:: closed
   
      False if the reader is open, otherwise True
   
   Note: the toolkit implementation is free to reuse a molecule
   instead of returning a new one each time.


.. py:class:: RecordReader

   Read and iterate over records as strings
   
   The public attributes are:
   
   .. py:attribute:: metadata
   
      a :class:`chemfp.base_toolkit.FormatMetadata` instance
   
   .. py:attribute:: location
   
      a :class:`chemfp.io.Location` instance
   
   .. py:attribute:: closed
   
      False if the reader is open, otherwise True


.. py:class:: IdAndRecordReader

   Read records from file and iterate over the (id, record string) pairs
   
   The public attributes are:
   
   .. py:attribute:: metadata
   
      a :class:`chemfp.base_toolkit.FormatMetadata` instance
   
   .. py:attribute:: location
   
      a :class:`chemfp.io.Location` instance
   
   .. py:attribute:: closed
   
      False if the reader is open, otherwise True


Toolkit writers
===============

The :func:`chemfp.open_molecule_writer` function returns a
:class:`chemfp.base_toolkit.MoleculeWriter`, and
:func:`chemfp.open_molecule_writer_to_string` returns a
:class:`chemfp.base_toolkit.MoleculeStringWriter`. The two classes
implement the :class:`chemfp.base_toolkit.BaseMoleculeWriter` API,
and MoleculeWriterToString also implements getvalue().


BaseMoleculeWriter
------------------

.. py:class:: BaseMoleculeWriter

   The base molecule writer API, implemented by :class:`MoleculeWriter` and :class:`MoleculeStringWriter`
   
   The public attributes are:
   
   .. py:attribute:: metadata
   
      a :class:`chemfp.base_toolkit.FormatMetadata` instance
   
   .. py:attribute:: location
   
      a :class:`chemfp.io.Location` instance
   
   .. py:attribute:: closed
   
      False if the reader is open, otherwise True
   
   The writer is a context manager, which calls self.close() when
   the manager exits.


  .. py:method:: write_molecule(mol)

     Write a toolkit molecule
     
     :param mol: the molecule to write
     :type mol: a toolkit molecule


  .. py:method:: write_molecules(mols)

     Write a sequence of molecules
     
     :param mols: the molecules to write
     :type mols: a toolkit molecule iterator


  .. py:method:: write_id_and_molecule(id, mol)

     Write an identifier and toolkit molecule
     
     If id is None then the output uses the molecule's own id/title.
     Specifying the id may modify the molecule's id/title, depending
     on the format and toolkit.
     
     :param id: the identifier to use for the molecule
     :type id: string, or None
     :param mol: the molecule to write
     :type mol: a toolkit molecule


  .. py:method:: write_ids_and_molecules(ids_and_mols)

     Write a sequence of (id, molecule) pairs
     
     This function works well with :func:`chemfp.toolkit.read_ids_and_molecules()`,
     for example, to convert an SD file to SMILES file, and use an
     alternate *id_tag* to specify an alternative identifier.
     
     :param mols: the molecules to write
     :type mols: a (id string, toolkit molecule) iterator


  .. py:method:: close()

     Close the writer
     
     If the reader wasn't previously closed then close it. This will
     set the location properties to their final values, close any
     files that the writer may have opened, and set ``self.closed`` to False.


.. py:class:: MoleculeWriter

   A BaseMoleculeWriter which writes molecules to a file.
   
   The public attributetes are:
   
   .. py:attribute:: metadata
   
      a :class:`chemfp.base_toolkit.FormatMetadata` instance
   
   .. py:attribute:: location
   
      a :class:`chemfp.io.Location` instance
   
   .. py:attribute:: closed
   
      False if the reader is open, otherwise True
   
   The writer is a context manager, which calls self.close() when
   the manager exits.


.. py:class:: MoleculeStringWriter

   A BaseMoleculeWriter which writes molecules to a string.
   
   This class implements the :class:`chemfp.base_toolkit.BaseMoleculeWriter` API.
   
   .. py:attribute:: metadata
   
      a :class:`chemfp.base_toolkit.FormatMetadata` instance
   
   .. py:attribute:: location
   
      a :class:`chemfp.io.Location` instance
   
   .. py:attribute:: closed
   
      False if the reader is open, otherwise True
   
   The writer is a context manager, which calls self.close() when
   the manager exits.


  .. py:method:: getvalue()

     Get the string containing all of the written record.
     
     This function can also be called after the writer is closed.
     
     :returns: a string


Format
------

.. py:class:: Format

   Information about a toolkit format.
   
   Use :func:`chemfp.toolkit.get_format` and related functions to return
   a Format instance.
   
   The public properties are:
   
   .. py:attribute::toolkit_name
   
      the toolkit name; either "rdkit", "openeye", or "openbabel"
   
   .. py:attribute::name
   
      the format name, without any compression information
   
   .. py:attribute::compression
   
      the compression type: "" for uncompressed, "gz" for gzip
   
   .. py:attribute::record_format
   
      the normalized record format name. All SMILES formats are "smi",
      and this does not contain compression information


  .. py:method:: __repr__()

     Return a string like 'Format("openeye/sdf.gz")'


  .. py:attribute:: Format.prefix

     Read-only attribute.

     Return the prefix to turn an unqualified parameter into a fully qualified parameter
     
     :returns: a string like "rdkit.smi" or "openbabel.sdf"


  .. py:attribute:: Format.is_input_format

     Read-only attribute.

     Return True if this toolkit can read molecules in this format


  .. py:attribute:: Format.is_output_format

     Read-only attribute.

     Return True if this toolkit can write molecules in this format


  .. py:attribute:: Format.is_available

     Read-only attribute.

     Return True if this version of the toolkit understands this format
     
     For example, if your version of RDKit does not support InChI then
     this would return False for the "inchi" and "inchikey" formats.


  .. py:attribute:: Format.supports_io

     Read-only attribute.

     Return True if this format support reading or writing records
     
     This will return False for formats like "smistring" and "inchikeystring"
     because those are are not record-based formats.
     
     Note: I don't like this name. I may change it to ``is_record_format``.
     Let me know if you have ideas, or if changing the name will be a problem.


  .. py:method:: get_reader_args_from_text_settings(reader_settings)

     Process the *reader_settings* and return the *reader_args* for this format.
     
     This function exists to help convert string settings, eg, from the
     command-line or a configuration, into usable *reader_args*.
     
     Setting names may be fully-qualified names like "rdkit.sdf.sanitize",
     partially qualified names like "rdkit.*.sanitize" or "openeye.smi.delimiter",
     or unqualified names like "delimiter". The qualifiers act as a namespace
     so the settings can be specified without needing to know the actual
     toolkit or format.
     
     The function turns the format-appropriate qualified names into unqualified
     ones and converts the string values into usable Python objects. For example:
     
       >>> from chemfp import rdkit_toolkit  as T
       >>> fmt = T.get_format("smi")
       >>> fmt.get_reader_args_from_text_settings({"rdkit.*.sanitize": "true", "delimiter": "to-eol"})
       {'delimiter': 'to-eol', 'sanitize': True}
     
     :param reader_settings: the reader settings
     :type reader_settings: a dictionary with string keys and values
     :returns: a dictionary of unqualified argument names as keys and processed Python values as values


  .. py:method:: get_writer_args_from_text_settings(writer_settings)

     Process *writer_settings* and return the *writer_args* for this format.
     
     This function exists to help convert string settings, eg, from the
     command-line or a configuration, into usable *writer_args*.
     
     Setting names may be fully-qualified names like "rdkit.sdf.kekulize",
     partially qualified names like "rdkit.*.delimiter" or "openeye.smi.delimiter",
     or unqualified names like "delimiter". The qualifiers act as a namespace
     so the settings can be specified without needing to know the actual
     toolkit or format.
     
     The function turns the format-appropriate qualified names into unqualified
     ones and converts the string values into usable Python objects. For example:
     
       >>> from chemfp import rdkit_toolkit  as T
       >>> fmt = T.get_format("smi")
       >>> fmt.get_writer_args_from_text_settings({"rdkit.*.kekuleSmiles": "true", "canonical": "false"})
       {'kekuleSmiles': True, 'canonical': False}
     
     :param writer_settings: the writer settings
     :type writer_settings: a dictionary with string keys and values
     :returns: a dictionary of unqualified argument names as keys and processed Python values as values


  .. py:method:: get_default_reader_args()

     Return a dictionary of the default reader arguments
     
     The keys are unqualified (ie, without dots).
     
       >>> from chemfp import openbabel_toolkit as T
       >>> fmt = T.get_format("smi")
       >>> fmt.get_default_reader_args()
       {'has_header': False, 'delimiter': None, 'options': None}
     
     :returns: a dictionary of string keys and Python objects for values


  .. py:method:: get_default_writer_args()

     Return a dictionary of the default writer arguments
     
     The keys are unqualified (ie, without dots).
     
       >>> from chemfp import openbabel_toolkit as T
       >>> fmt = T.get_format("smi")
       >>> fmt.get_default_writer_args()
       {'explicit_hydrogens': False, 'isomeric': True, 'delimiter': None,
       'options': None, 'canonicalization': 'default'}
     
     :returns: a dictionary of string keys and Python objects for values


  .. py:method:: get_unqualified_reader_args(reader_args)

     Convert possibly qualified reader args into unqualified reader args for this format
     
     The *reader_args* dictionary can be confusing because of the
     priority rules in how to resolve qualifiers, and because it
     can include irrelevant parameters, which are ignored.
     
     The get_unqualified_reader_args function applies the qualifier resolution
     algorithm and removes irrelevant parameters to return a dictionary
     containing the equivalent unqualified reader args dictionary for this format.
     
       >>> from chemfp import rdkit_toolkit as T
       >> fmt = T.get_format("smi")
       >>> fmt.get_unqualified_reader_args({"rdkit.*.delimiter": "tab", "smi.sanitize": False, "X": "Y"})
       {'delimiter': 'tab', 'has_header': False, 'sanitize': False}
       >>> fmt = T.get_format("can")
       >>> fmt.get_unqualified_reader_args({"rdkit.*.delimiter": "tab", "smi.sanitize": False, "X": "Y"})
       {'delimiter': 'tab', 'has_header': False, 'sanitize': True}
     
     :parameters reader_args: reader arguments, which can contain qualified and unqualified arguments
     :type reader_args: a dictionary with string keys and Python values
     :returns: a dictionary of reader arguments, containing only unqualified arguments
       appropriate for this format.


  .. py:method:: get_unqualified_writer_args(writer_args)

     Convert possibly qualified writer args into unqualified writer args for this format
     
     The *writer_args* dictionary can be confusing because of the
     priority rules in how to resolve qualifiers, and because it
     can include irrelevant parameters, which are ignored.
     
     The get_unqualified_writer_args function applies the qualifier resolution
     algorithm and removes irrelevant parameters to return a dictionary
     containing the equivalent unqualified writer args dictionary for this format.
     
       >>> from chemfp import rdkit_toolkit as T
       >>> fmt = T.get_format("smi")
       >>> fmt.get_unqualified_writer_args({"rdkit.*.delimiter": "tab", "smi.kekuleSmiles": True, "X": "Y"})
       {'isomericSmiles': True, 'delimiter': 'tab', 'kekuleSmiles': True, 'allBondsExplicit': False, 'canonical': True}
       >>> fmt = T.get_format("can")
       >>> fmt.get_unqualified_writer_args({"rdkit.*.delimiter": "tab", "smi.kekuleSmiles": True, "X": "Y"})
       {'isomericSmiles': False, 'delimiter': 'tab', 'kekuleSmiles': False, 'allBondsExplicit': False, 'canonical': True}
     
     :parameters writer_args: writer arguments, which can contain qualified and unqualified arguments
     :type writer_args: a dictionary with string keys and Python values
     :returns: a dictionary of writer arguments, containing only unqualified arguments
       appropriate for this format.


.. py:module:: chemfp.openbabel_toolkit

chemfp.openbabel_toolkit module
===============================

The chemfp toolkit layer for Open Babel.

.. _openbabel_toolkit.name:

name
----

.. py:attribute:: name

The string "openbabel".

.. _openbabel_toolkit.software:

software
--------

.. py:attribute:: software

A string like "OpenBabel/2.4.1", where the second part of the
string comes from OBReleaseVersion.


.. _openbabel_toolkit.is_licensed:

is_licensed (openbabel_toolkit)
-------------------------------

  .. py:function:: is_licensed()

     Return True - Open Babel is always licensed
     
     :returns: True


.. _openbabel_toolkit.get_formats:

get_formats (openbabel_toolkit)
-------------------------------

  .. py:function:: get_formats(include_unavailable=False)

     Get the list of structure formats that Open Babel supports
     
     If *include_unavailable* is True then also include Open Babel formats
     which aren't available to this specific version of Open Babel.
     
     :param include_unavailable: include unavailable formats?
     :type include_unavailable: True or False
     :returns: a list of :class:`chemfp.base_toolkit.Format` objects


.. _openbabel_toolkit.get_input_formats:

get_input_formats (openbabel_toolkit)
-------------------------------------

  .. py:function:: get_input_formats()

     Get the list of supported Open Babel input formats
     
     :returns: a list of :class:`chemfp.base_toolkit.Format` objects


.. _openbabel_toolkit.get_output_formats:

get_output_formats (openbabel_toolkit)
--------------------------------------

  .. py:function:: get_output_formats()

     Get the list of supported Open Babel output formats
     
     :returns: a list of :class:`chemfp.base_toolkit.Format` objects


.. _openbabel_toolkit.get_format:

get_format (openbabel_toolkit)
------------------------------

  .. py:function:: get_format(format_name)

     Get the named format, or raise a ValueError
     
     This will raise a ValueError if Open Babel does not implement
     the format *format_name* or that format is not available.
     
     :param format_name: the format name
     :type format_name: a string
     :returns: a :class:`chemfp.base_toolkit.Format` object


.. _openbabel_toolkit.get_input_format:

get_input_format (openbabel_toolkit)
------------------------------------

  .. py:function:: get_input_format(format_name)

     Get the named input format, or raise a ValueError
     
     This will raise a ValueError if Open Babel does not implement
     the format *format_name* or that format is not an input format.
     
     :param format_name: the format name
     :type format_name: a string
     :returns: a :class:`chemfp.base_toolkit.Format` object


.. _openbabel_toolkit.get_output_format:

get_output_format (openbabel_toolkit)
-------------------------------------

  .. py:function:: get_output_format(format_name)

     Get the named format, or raise a ValueError
     
     This will raise a ValueError if Open Babel does not implement
     the format *format_name* or that format is not an output format.
     
     :param format_name: the format name
     :type format_name: a string
     :returns: a :class:`chemfp.base_toolkit.Format` object


.. _openbabel_toolkit.get_input_format_from_source:

get_input_format_from_source (openbabel_toolkit)
------------------------------------------------

  .. py:function:: get_input_format_from_source(source=None, format=None)

     Get the most appropriate format given the available source and format information
     
     If *format* is a :class:`chemfp.base_toolkit.Format` then
     return it. If it's a Format-like object with "name" and "compression"
     attributes use it to make a real Format object with the same
     attributes. If it's a string then use it to create a Format object.
     
     If *format* is None, use the *source* to auto-detect the format.
     If auto-detection is not possible, assume it's an uncompressed
     SMILES file.
     
     :param source: the structure data source.
     :type source: a filename (as a string), a file object, or None to read from stdin
     :param format: format information, if known.
     :type format: a Format(-like) object, string, or None
     :returns: a :class:`chemfp.base_toolkit.Format` object


.. _openbabel_toolkit.get_output_format_from_destination:

get_output_format_from_destination (openbabel_toolkit)
------------------------------------------------------

  .. py:function:: get_output_format_from_destination(destination=None, format=None)

     Get the most appropriate format given the available destination and format information
     
     If *format* is a :class:`chemfp.base_toolkit.Format` then
     return it. If it's a Format-like object with "name" and "compression"
     attributes use it to make a real Format object with the same
     attributes. If it's a string then use it to create a Format object.
     
     If *format* is None, use the *destination* to auto-detect the format.
     If auto-detection is not possible, assume it's an uncompressed
     SMILES file.
     
     :param destination: the structure data source.
     :type destination: a filename (as a string), a file object, or None to read from stdin
     :param format: format information, if known.
     :type format: a Format(-like) object, string, or None
     :returns: a :class:`chemfp.base_toolkit.Format` object


.. _openbabel_toolkit.read_molecules:

read_molecules (openbabel_toolkit)
----------------------------------

  .. py:function:: read_molecules(source=None, format=None, id_tag=None, reader_args=None, errors="strict", location=None, encoding="utf8", encoding_errors="strict")

     Return an iterator that reads OBMol molecules from a structure file
     
     Iterate through the *format* structure records in *source*. If *format*
     is None then auto-detect the format based on the *source*. For SD files,
     use *id_tag* to get the record id from the given SD tag instead of the
     title line. (read_molecules() will ignore the *id_tag*. It exists to
     make it easier to switch between reader functions.)
     
     Note: the reader will clear and reuse the OBMol instance. Make a copy
     if you want to keep the molecule around.
     
     The *reader_args* dictionary parameters depend on the format. Every
     Open Babel format supports an "options" entry, which is passed to
     SetOptions(). See that documentation for details. Some formats support
     additional parameters:
     
     * SMILES and InChI
     
       * delimiter - one of "tab", "space", "to-eol", the space or tab characters, or None
       * has_header - True or False
     
     * SDF
     
       * implementation - if "openbabel" or None, use the Open Babel record parser;
         if "chemfp", use chemfp's own record parser, which has better location tracking
     
     The *errors* parameter specifies how to handle errors. "strict" raises
     an exception, "report" sends a message to stderr and goes to the next
     record, and "ignore" goes to the next record.
     
     The *location* parameter takes a :class:`chemfp.io.Location` instance. If None
     then a default Location will be created.
     
     See :func:`chemfp.openbabel_toolkit.read_ids_and_molecules` if you want
     (id, OBMol) pairs instead of just the molecules.
     
     :param source: the structure source
     :type source: a filename, file object, or None to read from stdin
     :param format: the input structure format
     :type format: a format name string, or Format object, or None to auto-detect
     :param id_tag: SD tag containing the record id
     :type id_tag: string, or None to use the record title
     :param reader_args: reader arguments passed to the underlying toolkit
     :type reader_args: a dictionary
     :param errors: specify how to handle errors
     :type errors: one of "strict", "report", or "ignore"
     :param location: object used to track parser state information
     :type location: a :class:`chemfp.io.Location` object, or None
     :returns: a :class:`chemfp.base_toolkit.MoleculeReader` iterating OBMol molecules


.. _openbabel_toolkit.read_molecules_from_string:

read_molecules_from_string (openbabel_toolkit)
----------------------------------------------

  .. py:function:: read_molecules_from_string(content, format, id_tag=None, reader_args=None, errors="strict", location=None)

     Return an iterator that reads OBMol molecules from a string containing structure records
     
     *content* is a string containing 0 or more records in the format *format*. See
     :func:`chemfp.openbabel_toolkit.read_molecules` for details about the other
     parameters.  See :func:`chemfp.openbabel_toolkit.read_ids_and_molecules_from_string`
     if you  want to read (id, OBMol) pairs instead of just molecules.
     
     Note: the reader will clear and reuse the OBMol instance. Make a
     copy if you want to keep the molecule around.
     
     :param content: the string containing structure records
     :type content: a string
     :param format: the input structure format
     :type format: a format name string, or Format object
     :param id_tag: SD tag containing the record id
     :type id_tag: string, or None to use the record title
     :param reader_args: reader arguments passed to the underlying toolkit
     :type reader_args: a dictionary
     :param errors: specify how to handle errors
     :type errors: one of "strict", "report", or "ignore"
     :param location: object used to track parser state information
     :type location: a :class:`chemfp.io.Location` object, or None
     :returns: a :class:`chemfp.base_toolkit.MoleculeReader` iterating OBMol molecules


.. _openbabel_toolkit.read_ids_and_molecules:

read_ids_and_molecules (openbabel_toolkit)
------------------------------------------

  .. py:function:: read_ids_and_molecules(source=None, format=None, id_tag=None, reader_args=None, errors="strict", location=None, encoding="utf8", encoding_errors="strict")

     Return an iterator that reads (id, OBMol molecule) pairs from a structure file
     
     See :func:`chemfp.openbabel_toolkit.read_molecules` for full parameter details.
     The major difference is that this returns an iterator of (id, OBMol)
     pairs instead of just the molecules.
     
     Note: the reader will clear and reuse the OBMol instance. Make a
     copy if you want to keep the molecule around.
     
     :param source: the structure source
     :type source: a filename, file object, or None to read from stdin
     :param format: the input structure format
     :type format: a format name string, or Format object, or None to auto-detect
     :param id_tag: SD tag containing the record id
     :type id_tag: string, or None to use the record title
     :param reader_args: reader arguments passed to the underlying toolkit
     :type reader_args: a dictionary
     :param errors: specify how to handle errors
     :type errors: one of "strict", "report", or "ignore"
     :param location: object used to track parser state information
     :type location: a :class:`chemfp.io.Location` object, or None
     :returns: a :class:`chemfp.base_toolkit.IdAndMoleculeReader` iterating (id, OBMol) pairs


.. _openbabel_toolkit.read_ids_and_molecules_from_string:

read_ids_and_molecules_from_string (openbabel_toolkit)
------------------------------------------------------

  .. py:function:: read_ids_and_molecules_from_string(content, format, id_tag=None, reader_args=None, errors="strict", location=None)

     Return an iterator that reads (id, OBMol) pairs from a string containing structure records
     
     *content* is a string containing 0 or more records in the format *format*. See
     :func:`chemfp.openbabel_toolkit.read_molecules` for details about the other
     parameters. See :func:`chemfp.openbabel_toolkit.read_molecules_from_string`
     if you just want to read the OBMol molecules instead of (id, OBMol) pairs.
     
     Note: the reader will clear and reuse the OBMol instance. Make a
     copy if you want to keep the molecule around.
     
     :param content: the string containing structure records
     :type content: a string
     :param format: the input structure format
     :type format: a format name string, or Format object
     :param id_tag: SD tag containing the record id
     :type id_tag: string, or None to use the record title
     :param reader_args: reader arguments passed to the underlying toolkit
     :type reader_args: a dictionary
     :param errors: specify how to handle errors
     :type errors: one of "strict", "report", or "ignore"
     :param location: object used to track parser state information
     :type location: a :class:`chemfp.io.Location` object, or None
     :returns: a :class:`chemfp.base_toolkit.IdAndMoleculeReader` iterating (id, OBMol) pairs


.. _openbabel_toolkit.make_id_and_molecule_parser:

make_id_and_molecule_parser (openbabel_toolkit)
-----------------------------------------------

  .. py:function:: make_id_and_molecule_parser(format, id_tag=None, reader_args=None, errors="strict")

     Create a specialized function which takes a record and returns an (id, OBMol) pair
     
     The returned function is optimized for reading many records from individual
     strings because it only does parameter validation once. The function
     will reuse the OBMol for successive calls, so make a copy if you want
     to keep it around. However, I haven't really noticed much of a performance
     difference between this and :func:`chemfp.openbabel_toolkit.parse_id_and_molecule`
     so I suggest you use that function directly instead of making a specialized function.
     (Let me know if making a specialized function is useful.)
     
     See :func:`chemfp.openbabel_toolkit.read_molecules` for details about the
     other parameters.
     
     :param format: the input structure format
     :type format: a format name string, or Format object
     :param id_tag: SD tag containing the record id
     :type id_tag: string, or None to use the record title
     :param reader_args: reader arguments passed to the underlying toolkit
     :type reader_args: a dictionary
     :param errors: specify how to handle errors
     :type errors: one of "strict", "report", or "ignore"
     :returns: a function of the form ``parser(record string) -> (id, OBMol)``


.. _openbabel_toolkit.parse_molecule:

parse_molecule (openbabel_toolkit)
----------------------------------

  .. py:function:: parse_molecule(content, format, id_tag=None, reader_args=None, errors="strict")

     Parse the first structure record from the *content* string and return an OBMol molecule.
     
     *content* is a string containing a single structure record in format *format*.
     (Additional records are ignored). See :func:`chemfp.openbabel_toolkit.read_molecules`
     for details about the other parameters. See :func:`chemfp.openbabel_toolkit.parse_id_and_molecule`
     if you want the (id, OBMol) pair instead of just the molecule.
     
     :param content: the string containing a structure record
     :type content: a string
     :param format: the input structure format
     :type format: a format name string, or Format object
     :param id_tag: SD tag containing the record id
     :type id_tag: string, or None to use the record title
     :param reader_args: reader arguments passed to the underlying toolkit
     :type reader_args: a dictionary
     :param errors: specify how to handle errors
     :type errors: one of "strict", "report", or "ignore"
     :returns: an OBMol molecule


.. _openbabel_toolkit.parse_id_and_molecule:

parse_id_and_molecule (openbabel_toolkit)
-----------------------------------------

  .. py:function:: parse_id_and_molecule(content, format, id_tag=None, reader_args=None, errors="strict")

     Parse the first structure record from *content* and return the (id, OBMol) pair.
     
     *content* is a string containing a single structure record in format *format*.
     (Additional records are ignored). See :func:`chemfp.openbabel_toolkit.read_molecules`
     for details about the other parameters.
     
     See :func:`chemfp.openbabel_toolkit.read_molecules` for details about the
     other parameters. See :func:`chemfp.openbabel_toolkit.parse_molecule`
     if just want the OBMol molecule and not the the (id, OBMol) pair.
     
     :param content: the string containing a structure record
     :type content: a string
     :param format: the input structure format
     :type format: a format name string, or Format object
     :param id_tag: SD tag containing the record id
     :type id_tag: string, or None to use the record title
     :param reader_args: reader arguments passed to the underlying toolkit
     :type reader_args: a dictionary
     :param errors: specify how to handle errors
     :type errors: one of "strict", "report", or "ignore"
     :returns: an (id, OBMol molecule) pair


.. _openbabel_toolkit.create_string:

create_string (openbabel_toolkit)
---------------------------------

  .. py:function:: create_string(mol, format, id=None, writer_args=None, errors="strict")

     Convert an OBMol into a structure record in the given format as a Unicode string
     
     If *id* is not None then use it instead of the molecule's own
     title. Warning: this may briefly modify the molecule, so may not
     be thread-safe.
     
     :param mol: the molecule to use for the output
     :type mol: an Open Babel molecule
     :param format: the output structure format
     :type format: a format name string, or Format object
     :param id: an alternate record id
     :type id: a string, or None to use the molecule's own id
     :param writer_args: writer arguments passed to the underlying toolkit
     :type writer_args: a dictionary
     :param errors: specify how to handle errors
     :type errors: one of "strict", "report", or "ignore"
     :returns: a Unicode string


.. _openbabel_toolkit.create_bytes:

create_bytes (openbabel_toolkit)
--------------------------------

  .. py:function:: create_bytes(mol, format, id=None, writer_args=None, errors="strict", level=None)

     Convert an OBMol into a structure record in the given format as a byte string
     
     If *id* is not None then use it instead of the molecule's own
     title. Warning: this may briefly modify the molecule, so may not
     be thread-safe.
     
     :param mol: the molecule to use for the output
     :type mol: an Open Babel molecule
     :param format: the output structure format
     :type format: a format name string, or Format object
     :param id: an alternate record id
     :type id: a string, or None to use the molecule's own id
     :param writer_args: writer arguments passed to the underlying toolkit
     :type writer_args: a dictionary
     :param errors: specify how to handle errors
     :type errors: one of "strict", "report", or "ignore"
     :param level: compression level to use for compressed formats
     :type level: None, a positive integer, or one of the strings 'min', 'default', or 'max'
     :returns: a byte string


.. _openbabel_toolkit.open_molecule_writer:

open_molecule_writer (openbabel_toolkit)
----------------------------------------

  .. py:function:: open_molecule_writer(destination=None, format=None, writer_args=None, errors="strict", location=None, encoding="utf8", encoding_errors="strict", level=None)

     Return a MoleculeWriter which can write Open Babel molecules to a destination.
     
     A :class:`chemfp.base_toolkit.MoleculeWriter` has the methods ``write_molecule``,
     ``write_molecules``, and ``write_ids_and_molecules``, which are ways to write
     an OBMol molecule, an OBMol molecule iterator, or an (id, OBMol molecule) pair
     iterator to a file.
     
     Molecules are written to *destination*. The output format can be a
     string like "sdf.gz" or "smi", a :class:`chemfp.base_toolkit.Format`,
     or Format-like object with "name" and "compression" attributes, or None
     to auto-detect based on the *destination*. If auto-detection is not
     possible, the output will be written as uncompressed SMILES.
     
     The *writer_args* dictionary parameters depend on the format. Every
     format supports an ``options`` entry, which is passed to Open Babel's
     ``SetOptions()``. See the Open Babel documentation for details. Some
     formats supports additional parameters:
     
     * SMILES
     
       * delimiter - one of "tab", "space", "to-eol", the space or tab characters, or None
       * isomeric - True to write isomeric SMILES, False or default is non-isomeric
       * canonicalization - True, "default", or None uses Open Babel's own canonicalization
         algorithm; False or "none" to use no canonicalization; "universal" generates a
         universal SMILES; "anticanonical" generates a SMILES with randomly assigned
         atom classes; "inchified" uses InChI-fied SMILES
     
     * InChI and InChIKey
     
       * delimiter - one of "tab", "space", "to-eol", the space or tab characters, or None
       * include_id - True or default to include the id as the second column; False has no id column
     
     * SDF
     
       * always_v3000 - True to always write V3000 files; False or default to
         write V3000 files only if needed.
       * include_atom_class - True to include atom class; False or default does not
       * include_hcount - True to include hcount; False or default does not
     
     The *errors* parameter specifies how to handle errors. "strict" raises
     an exception, "report" sends a message to stderr and goes to the next
     record, and "ignore" goes to the next record.
     
     The *location* parameter takes a :class:`chemfp.io.Location` instance. If
     None then a default Location will be created.
     
     :param destination: the structure destination
     :type destination: a filename, file object, or None to write to stdout
     :param format: the output structure format
     :type format: a format name string, or Format(-like) object, or None to auto-detect
     :param writer_args: writer arguments passed to the underlying toolkit
     :type writer_args: a dictionary
     :param errors: specify how to handle errors
     :type errors: one of "strict", "report", or "ignore"
     :param location: object used to track writer state information
     :type location: a :class:`chemfp.io.Location` object, or None
     :param level: compression level to use for compressed formats (does not affect Open Babel)
     :type level: None, a positive integer, or one of the strings 'min', 'default', or 'max'
     :returns: a :class:`chemfp.base_toolkit.MoleculeWriter` expecting Open Babel molecules


.. _openbabel_toolkit.open_molecule_writer_to_string:

open_molecule_writer_to_string (openbabel_toolkit)
--------------------------------------------------

  .. py:function:: open_molecule_writer_to_string(format, writer_args=None, errors="strict", location=None)

     Return a MoleculeStringWriter which can write Open Babel molecule records to a string.
     
     See :func:`chemfp.openbabel_toolkit.open_molecule_writer` for full
     parameter details.
     
     Use the writer's :meth:`chemfp.base_toolkit.MoleculeStringWriter.getvalue`
     to get the output as a Unicode string.
     
     :param format: the output structure format
     :type format: a format name string, or Format(-like) object, or None to auto-detect
     :param writer_args: writer arguments passed to the underlying toolkit
     :type writer_args: a dictionary
     :param errors: specify how to handle errors
     :type errors: one of "strict", "report", or "ignore"
     :param location: object used to track writer state information
     :type location: a :class:`chemfp.io.Location` object, or None
     :returns: a :class:`chemfp.base_toolkit.MoleculeStringWriter` expecting Open Babel molecules


.. _openbabel_toolkit.open_molecule_writer_to_bytes:

open_molecule_writer_to_bytes (openbabel_toolkit)
-------------------------------------------------

  .. py:function:: open_molecule_writer_to_bytes(format, writer_args=None, errors="strict", location=None, level=None)

     Return a MoleculeStringWriter which can write Open Babel molecule records to a byte string
     
     See :func:`chemfp.openbabel_toolkit.open_molecule_writer` for full
     parameter details.
     
     Use the writer's :meth:`chemfp.base_toolkit.MoleculeStringWriter.getvalue`
     to get the output as a byte string.
     
     :param format: the output structure format
     :type format: a format name string, or Format(-like) object, or None to auto-detect
     :param writer_args: writer arguments passed to the underlying toolkit
     :type writer_args: a dictionary
     :param errors: specify how to handle errors
     :type errors: one of "strict", "report", or "ignore"
     :param location: object used to track writer state information
     :type location: a :class:`chemfp.io.Location` object, or None
     :param level: compression level to use for compressed formats (does not affect Open Babel)
     :type level: None, a positive integer, or one of the strings 'min', 'default', or 'max'
     :returns: a :class:`chemfp.base_toolkit.MoleculeStringWriter` expecting Open Babel molecules


.. _openbabel_toolkit.copy_molecule:

copy_molecule (openbabel_toolkit)
---------------------------------

  .. py:function:: copy_molecule(mol)

     Return a new OBMol molecule which is a copy of the given Open Babel molecule
     
     :param mol: the molecule to copy
     :type mol: an Open Babel molecule
     :returns: a new OBMol instance


.. _openbabel_toolkit.add_tag:

add_tag (openbabel_toolkit)
---------------------------

  .. py:function:: add_tag(mol, tag, value)

     Add an SD tag value to the Open Babel molecule
     
     Raises a KeyError if the tag is a special internal Open Babel name.
     
     :param mol: the molecule
     :type mol: an Open Babel molecule
     :param tag: the SD tag name
     :type tag: string
     :param value: the text for the tag
     :type value: string
     :returns: None


.. _openbabel_toolkit.get_tag:

get_tag (openbabel_toolkit)
---------------------------

  .. py:function:: get_tag(mol, tag)

     Get the named SD tag value, or None if it doesn't exist
     
     :param mol: the molecule
     :type mol: an Open Babel molecule
     :param tag: the SD tag name
     :type tag: string
     :returns: a string, or None


.. _openbabel_toolkit.get_tag_pairs:

get_tag_pairs (openbabel_toolkit)
---------------------------------

  .. py:function:: get_tag_pairs(mol)

     Get a list of all SD tag (name, value) pairs for the molecule
     
     :param mol: the molecule
     :type mol: an Open Babel molecule
     :returns: a list of (string name, string value) pairs


.. _openbabel_toolkit.get_id:

get_id (openbabel_toolkit)
--------------------------

  .. py:function:: get_id(mol)

     Get the molecule's id using Open Babel's GetTitle()
     
     :param mol: the molecule
     :type mol: an Open Babel molecule
     :returns: a string


.. _openbabel_toolkit.set_id:

set_id (openbabel_toolkit)
--------------------------

  .. py:function:: set_id(mol, id)

     Set the molecule's id using Open Babel's SetTitle()
     
     :param mol: the molecule
     :type mol: an Open Babel molecule
     :param id: the new id
     :type id: string
     :returns: None

.. py:module:: chemfp.openeye_toolkit

chemfp.openeye_toolkit module
=============================


The chemfp toolkit layer for OpenEye.

.. _openeye_toolkit.name:

name
----

.. py:attribute:: name

The string "openeye".

.. _openeye_toolkit.software:

software
--------

.. py:attribute:: software

A string like "OEChem/20170208", where the second part of the string
comes from OEChemGetVersion().


.. _openeye_toolkit.is_licensed:

is_licensed (openeye_toolkit)
-----------------------------

  .. py:function:: is_licensed()

     Return True if the OEChem toolkit license is valid, otherwise False.
     
     This does not check if the OEGraphSim license is valid. I haven't
     yet figured out how I want to handle that distinction. In the meanwhile
     you'll need to use the OEChem API yourself.
     
     :returns: True or False


.. _openeye_toolkit.get_formats:

get_formats (openeye_toolkit)
-----------------------------

  .. py:function:: get_formats(include_unavailable=False)

     Get the list of structure formats that OEChem supports
     
     If *include_unavailable* is True then also include OEChem formats
     which aren't available to this specific version of OEChem.
     
     :param include_unavailable: include unavailable formats?
     :type include_unavailable: True or False
     :returns: a list of :class:`chemfp.base_toolkit.Format` objects


.. _openeye_toolkit.get_input_formats:

get_input_formats (openeye_toolkit)
-----------------------------------

  .. py:function:: get_input_formats()

     Get the list of supported OEChem input formats
     
     :returns: a list of :class:`chemfp.base_toolkit.Format` objects


.. _openeye_toolkit.get_output_formats:

get_output_formats (openeye_toolkit)
------------------------------------

  .. py:function:: get_output_formats()

     Get the list of supported OEChem output formats
     
     :returns: a list of :class:`chemfp.base_toolkit.Format` objects


.. _openeye_toolkit.get_format:

get_format (openeye_toolkit)
----------------------------

  .. py:function:: get_format(format)

     Get the named format, or raise a ValueError
     
     This will raise a ValueError if OEChem does not implement
     the format *format_name* or that format is not available.
     
     :param format_name: the format name
     :type format_name: a string
     :returns: a :class:`chemfp.base_toolkit.Format` object


.. _openeye_toolkit.get_input_format:

get_input_format (openeye_toolkit)
----------------------------------

  .. py:function:: get_input_format(format)

     Get the named input format, or raise a ValueError
     
     This will raise a ValueError if OEChem does not implement
     the format *format_name* or that format is not an input format.
     
     :param format_name: the format name
     :type format_name: a string
     :returns: a :class:`chemfp.base_toolkit.Format` object


.. _openeye_toolkit.get_output_format:

get_output_format (openeye_toolkit)
-----------------------------------

  .. py:function:: get_output_format(format)

     Get the named format, or raise a ValueError
     
     This will raise a ValueError if OEChem does not implement
     the format *format_name* or that format is not an output format.
     
     :param format_name: the format name
     :type format_name: a string
     :returns: a :class:`chemfp.base_toolkit.Format` object


.. _openeye_toolkit.get_input_format_from_source:

get_input_format_from_source (openeye_toolkit)
----------------------------------------------

  .. py:function:: get_input_format_from_source(source=None, format=None)

     Get the most appropriate format given the available source and format information
     
     If *format* is a :class:`chemfp.base_toolkit.Format` then
     return it. If it's a Format-like object with "name" and "compression"
     attributes use it to make a real Format object with the same
     attributes. If it's a string then use it to create a Format object.
     
     If *format* is None, use the *source* to auto-detect the format.
     If auto-detection is not possible, assume it's an uncompressed
     SMILES file.
     
     :param source: the structure data source.
     :type source: a filename (as a string), a file object, or None to read from stdin
     :param format: format information, if known.
     :type format: a Format(-like) object, string, or None
     :returns: a :class:`chemfp.base_toolkit.Format` object


.. _openeye_toolkit.get_output_format_from_destination:

get_output_format_from_destination (openeye_toolkit)
----------------------------------------------------

  .. py:function:: get_output_format_from_destination(destination=None, format=None)

     Get the most appropriate format given the available destination and format information
     
     If *format* is a :class:`chemfp.base_toolkit.Format` then
     return it. If it's a Format-like object with "name" and "compression"
     attributes use it to make a real Format object with the same
     attributes. If it's a string then use it to create a Format object.
     
     If *format* is None, use the *destination* to auto-detect the format.
     If auto-detection is not possible, assume it's an uncompressed
     SMILES file.
     
     :param destination: the structure data source.
     :type destination: a filename (as a string), a file object, or None to read from stdin
     :param format: format information, if known.
     :type format: a Format(-like) object, string, or None
     :returns: a :class:`chemfp.base_toolkit.Format` object


.. _openeye_toolkit.read_molecules:

read_molecules (openeye_toolkit)
--------------------------------

  .. py:function:: read_molecules(source=None, format=None, id_tag=None, reader_args=None, errors="strict", location=None, encoding="utf8", encoding_errors="strict")

     Return an iterator that reads OEGraphMol molecules from a structure file
     
     Iterate through the *format* structure records in *source*. If *format*
     is None then auto-detect the format based on the *source*. For SD files,
     use *id_tag* to get the record id from the given SD tag instead of the
     title line. (read_molecules() will ignore the *id_tag*. It exists to
     make it easier to switch between reader functions.)
     
     Note: the reader will clear and reuse the OEGraphMol instance. Make a
     copy if you want to keep the molecule around.
     
     The *reader_args* dictionary parameters depend on the format. Every
     OEChem format supports:
     
     * aromaticity - one of "default", "openeye", "daylight", "tripos", "mdl", "mmff", or None
     * flavor - a number, string-encoded number, or flavor string
     
     A "flavor string" is a "|" or "," separated list of format-specific
     flavor terms. It can be a simple as "Default", or a more complex string
     like "Default|-ENDM|DELPHI" which for the PDB reader starts with
     the default settings, removes the ENDM flavor, and adds the CHARGE
     and RADIUS flavors.
     
     The supported input flavor terms for each format are:
     
     * SMILES - Canon, Strict, Default
     * sdf - Default
     * skc - Default
     * mol2, mol2h - M2H, Default
     * mmod - FormalCrg, Default
     * pdb - ALL, ALTLOC, BondOrder, CHARGE, Connect, DATA, DELPHI, END, ENDM,
       FORMALCHARGE, FormalCrg, ImplicitH, RADIUS, Rings, SecStruct, TER, TerMask, Default
     * xyz - BondOrder, Connect, FormalCrg, ImplicitH, Rings, Default
     * cdx - SuperAtoms, Default
     * oeb - Default
     
     You can also pass in a numeric value like 123 or a numeric string like "0".
     
     In addition, the SMILES record readers have limited support for the
     "delimiter" reader_arg:
     
     * delimiter - one of "tab", "space", "to-eol", the space or tab characters, or None
     
     Note: the first whitespace after the SMILES string will always be
     treated as a delimiter.
     
     The *errors* parameter specifies how to handle errors. "strict" raises
     an exception, "report" sends a message to stderr and goes to the next
     record, and "ignore" goes to the next record.
     
     The *location* parameter takes a :class:`chemfp.io.Location` instance. If None
     then a default Location will be created.
     
     See :func:`chemfp.openeye_toolkit.read_ids_and_molecules` if you want
     (id, OEGraphMol) pairs instead of just the molecules.
     
     :param source: the structure source
     :type source: a filename, file object, or None to read from stdin
     :param format: the input structure format
     :type format: a format name string, or Format object, or None to auto-detect
     :param id_tag: SD tag containing the record id
     :type id_tag: string, or None to use the record title
     :param reader_args: reader parameters passed to the underlying toolkit
     :type reader_args: a dictionary
     :param errors: specify how to handle errors
     :type errors: one of "strict", "report", or "ignore"
     :param location: object used to track parser state information
     :type location: a :class:`chemfp.io.Location` object, or None
     :returns: a :class:`chemfp.base_toolkit.MoleculeReader` iterating OEGraphMol molecules


.. _openeye_toolkit.read_molecules_from_string:

read_molecules_from_string (openeye_toolkit)
--------------------------------------------

  .. py:function:: read_molecules_from_string(content, format, id_tag=None, reader_args=None, errors="strict", location=None)

     Return an iterator that reads molecules from a string containing structure records
     
     *content* is a string containing 0 or more records in the format *format*. See
     :func:`chemfp.openeye_toolkit.read_molecules` for details about the other
     parameters.  See :func:`chemfp.openeye_toolkit.read_ids_and_molecules_from_string`
     if you  want to read (id, OEGraphMol) pairs instead of just molecules.
     
     Note: the reader will clear and reuse the OEGraphMol instance. Make a
     copy if you want to keep the molecule around.
     
     :param content: the string containing structure records
     :type content: a string
     :param format: the input structure format
     :type format: a format name string, or Format object
     :param id_tag: SD tag containing the record id
     :type id_tag: string, or None to use the record title
     :param reader_args: reader arguments passed to the underlying toolkit
     :type reader_args: a dictionary
     :param errors: specify how to handle errors
     :type errors: one of "strict", "report", or "ignore"
     :param location: object used to track parser state information
     :type location: a :class:`chemfp.io.Location` object, or None
     :returns: a :class:`chemfp.base_toolkit.MoleculeReader` iterating OEGraphMol molecules


.. _openeye_toolkit.read_ids_and_molecules:

read_ids_and_molecules (openeye_toolkit)
----------------------------------------

  .. py:function:: read_ids_and_molecules(source=None, format=None, id_tag=None, reader_args=None, errors="strict", location=None, encoding="utf8", encoding_errors="strict")

     Return an iterator that reads (id, OEGraphMol molecule) pairs from a structure file
     
     See :func:`chemfp.openeye_toolkit.read_molecules` for full parameter details.
     The major difference is that this returns an iterator of (id, OEGraphMol)
     pairs instead of just the molecules.
     
     Note: the reader will clear and reuse the OEGraphMol instance. Make a
     copy if you want to keep the molecule around.
     
     :param source: the structure source
     :type source: a filename, file object, or None to read from stdin
     :param format: the input structure format
     :type format: a format name string, or Format object, or None to auto-detect
     :param id_tag: SD tag containing the record id
     :type id_tag: string, or None to use the record title
     :param reader_args: reader arguments passed to the underlying toolkit
     :type reader_args: a dictionary
     :param errors: specify how to handle errors
     :type errors: one of "strict", "report", or "ignore"
     :param location: object used to track parser state information
     :type location: a :class:`chemfp.io.Location` object, or None
     :returns: a :class:`chemfp.base_toolkit.IdAndMoleculeReader` iterating (id, OEGraphMol) pairs


.. _openeye_toolkit.read_ids_and_molecules_from_string:

read_ids_and_molecules_from_string (openeye_toolkit)
----------------------------------------------------

  .. py:function:: read_ids_and_molecules_from_string(content, format, id_tag=None, reader_args=None, errors="strict", location=None)

     Return an iterator that reads (id, OEGraphMol) pairs from a string containing structure records
     
     *content* is a string containing 0 or more records in the format *format*. See
     :func:`chemfp.openeye_toolkit.read_molecules` for details about the other
     parameters. See :func:`chemfp.openeye_toolkit.read_molecules_from_string`
     if you just want to read the OEGraphMol molecules instead of (id, OEGraphMol) pairs.
     
     Note: the reader will clear and reuse the OEGraphMol instance. Make a
     copy if you want to keep the molecule around.
     
     :param content: the string containing structure records
     :type content: a string
     :param format: the input structure format
     :type format: a format name string, or Format object
     :param id_tag: SD tag containing the record id
     :type id_tag: string, or None to use the record title
     :param reader_args: reader arguments passed to the underlying toolkit
     :type reader_args: a dictionary
     :param errors: specify how to handle errors
     :type errors: one of "strict", "report", or "ignore"
     :param location: object used to track parser state information
     :type location: a :class:`chemfp.io.Location` object, or None
     :returns: a :class:`chemfp.base_toolkit.IdAndMoleculeReader` iterating (id, OEGraphMol) pairs


.. _openeye_toolkit.make_id_and_molecule_parser:

make_id_and_molecule_parser (openeye_toolkit)
---------------------------------------------

  .. py:function:: make_id_and_molecule_parser(format, id_tag=None, reader_args=None, errors="strict")

     Create a specialized function which takes a record and returns an (id, OEGraphMol) pair
     
     The returned function is optimized for reading many records from individual
     strings because it only does parameter validation once. The function
     will reuse the OEGraphMol for successive calls, so make a copy if you want
     to keep it around. However, I haven't really noticed much of a performance
     difference between this and :func:`chemfp.openeye_toolkit.parse_id_and_molecule`
     so I suggest you use that function directly instead of making a specialized function.
     (Let me know if making a specialized function is useful.)
     
     See :func:`chemfp.openeye_toolkit.read_molecules` for details about the
     other parameters.
     
     :param format: the input structure format
     :type format: a format name string, or Format object
     :param id_tag: SD tag containing the record id
     :type id_tag: string, or None to use the record title
     :param reader_args: reader arguments passed to the underlying toolkit
     :type reader_args: a dictionary
     :param errors: specify how to handle errors
     :type errors: one of "strict", "report", or "ignore"
     :returns: a function of the form ``parser(record string) -> (id, OEGraphMol)``


.. _openeye_toolkit.parse_molecule:

parse_molecule (openeye_toolkit)
--------------------------------

  .. py:function:: parse_molecule(content, format, id_tag=None, reader_args=None, errors="strict")

     Parse the first structure record from the *content* string and return an OEGraphMol molecule.
     
     *content* is a string containing a single structure record in format *format*.
     (Additional records are ignored). See :func:`chemfp.openeye_toolkit.read_molecules`
     for details about the other parameters. See :func:`chemfp.openeye_toolkit.parse_id_and_molecule`
     if you want the (id, OEGraphMol) pair instead of just the molecule.
     
     :param content: the string containing a structure record
     :type content: a string
     :param format: the input structure format
     :type format: a format name string, or Format object
     :param id_tag: SD tag containing the record id
     :type id_tag: string, or None to use the record title
     :param reader_args: reader arguments passed to the underlying toolkit
     :type reader_args: a dictionary
     :param errors: specify how to handle errors
     :type errors: one of "strict", "report", or "ignore"
     :returns: an OEGraphMol molecule


.. _openeye_toolkit.parse_id_and_molecule:

parse_id_and_molecule (openeye_toolkit)
---------------------------------------

  .. py:function:: parse_id_and_molecule(content, format, id_tag=None, reader_args=None, errors="strict")

     Parse the first structure record from *content* and return the (id, OEGraphMol) pair.
     
     *content* is a string containing a single structure record in format *format*.
     (Additional records are ignored). See :func:`chemfp.openeye_toolkit.read_molecules`
     for details about the other parameters.
     
     See :func:`chemfp.openeye_toolkit.read_molecules` for details about the
     other parameters. See :func:`chemfp.openeye_toolkit.parse_molecule`
     if just want the OEGraphMol molecule and not the the (id, OEGraphMol) pair.
     
     :param content: the string containing a structure record
     :type content: a string
     :param format: the input structure format
     :type format: a format name string, or Format object
     :param id_tag: SD tag containing the record id
     :type id_tag: string, or None to use the record title
     :param reader_args: reader arguments passed to the underlying toolkit
     :type reader_args: a dictionary
     :param errors: specify how to handle errors
     :type errors: one of "strict", "report", or "ignore"
     :returns: an (id, OEGraphMol molecule) pair


.. _openeye_toolkit.create_string:

create_string (openeye_toolkit)
-------------------------------

  .. py:function:: create_string(mol, format, id=None, writer_args=None, errors="strict")

     Convert an OEChem molecule into a structure record in the given format as a Unicode string
     
     If *id* is not None then use it instead of the molecule's own
     title. Warning: this may briefly modify the molecule, so may not
     be thread-safe.
     
     :param mol: the molecule to use for the output
     :type mol: an OEChem molecule
     :param format: the output structure format
     :type format: a format name string, or Format object
     :param id: an alternate record id
     :type id: a string, or None to use the molecule's own id
     :param writer_args: writer arguments passed to the underlying toolkit
     :type writer_args: a dictionary
     :param errors: specify how to handle errors
     :type errors: one of "strict", "report", or "ignore"
     :returns: a string


.. _openeye_toolkit.create_bytes:

create_bytes (openeye_toolkit)
------------------------------

  .. py:function:: create_bytes(mol, format, id=None, writer_args=None, errors="strict", level=None)

     Convert an OEChem molecule into a structure record in the given format as a byte string
     
     If *id* is not None then use it instead of the molecule's own
     title. Warning: this may briefly modify the molecule, so may not
     be thread-safe.
     
     :param mol: the molecule to use for the output
     :type mol: an OEChem molecule
     :param format: the output structure format
     :type format: a format name string, or Format object
     :param id: an alternate record id
     :type id: a string, or None to use the molecule's own id
     :param writer_args: writer arguments passed to the underlying toolkit
     :type writer_args: a dictionary
     :param errors: specify how to handle errors
     :type errors: one of "strict", "report", or "ignore"
     :param level: compression level to use for compressed formats
     :type level: None, a positive integer, or one of the strings 'min', 'default', or 'max'
     :returns: a string


.. _openeye_toolkit.open_molecule_writer:

open_molecule_writer (openeye_toolkit)
--------------------------------------

  .. py:function:: open_molecule_writer(destination=None, format=None, writer_args=None, errors="strict", location=None, encoding="utf8", encoding_errors="strict", level=None)

     Return a MoleculeWriter which can write OEChem molecules to a destination.
     
     A :class:`chemfp.base_toolkit.MoleculeWriter` has the methods ``write_molecule``,
     ``write_molecules``, and ``write_ids_and_molecules``, which are ways to write
     an OEChem molecule, an OEChem molecule iterator, or an (id, OEChem molecule) pair
     iterator to a file.
     
     Molecules are written to *destination*. The output format can be a
     string like "sdf.gz" or "smi", a :class:`chemfp.base_toolkit.Format`,
     or Format-like object with "name" and "compression" attributes, or None
     to auto-detect based on the *destination*. If auto-detection is not
     possible, the output will be written as uncompressed SMILES.
     
     The *writer_args* dictionary parameters depend on the format. Every
     OEChem format supports:
     
       * aromaticity - one of "default", "openeye", "daylight", "tripos", "mdl", "mmff", or None
       * flavor - a number, string-encoded number, or flavor string
     
     A "flavor string" is a "|" or "," separated list of format-specific
     flavor terms. It can be as simple as "Default", or a more complex
     string like DEFAULT|-AtomStereo|-BondStero|Canonical to generate a
     canonical SMILES string without stereo information.
     
     The supported output flavor terms for each format are:
     
     * SMILES - AtomMaps, AtomStereo, BondStereo, Canonical, ExtBonds, Hydrogens,
       ImpHCount, Isotopes, Kekule, RGroups, SuperAtoms
     * sdf - CurrentParity, MCHG, MDLParity, MISO, MRGP, MV30, NoParity, Default
     * mol2, mol2h - AtomNames, AtomTypeNames, BondTypeNames, Hydrogens, OrderAtoms, Substructure, Default
     * sln - Default
     * pdb - BONDS, BOTH, CHARGE, CurrentResidues, DELPHI, ELEMENT, FORMALCHARGE,
       FormalCrg, HETBONDS, NoResidues, OEResidues, ORDERS, OrderAtoms, RADIUS, TER, Default
     * xyz - Charges, Symbols, Default
     * cdx - Default
     * mopac - CHARGES, XYZ, Default
     * mf - Title, Default
     * oeb - Default
     * inchi, inchikey - Chiral, FixedHLayer, Hydrogens, ReconnectedMetals, Stereo,
       RelativeStereo, RacemicStereo, Default
     
     You can also pass in a numeric value like 123 or a numeric string like "0".
     
     The *errors* parameter specifies how to handle errors. "strict" raises
     an exception, "report" sends a message to stderr and goes to the next
     record, and "ignore" goes to the next record.
     
     The *location* parameter takes a :class:`chemfp.io.Location` instance. If
     None then a default Location will be created.
     
     :param destination: the structure destination
     :type destination: a filename, file object, or None to write to stdout
     :param format: the output structure format
     :type format: a format name string, or Format(-like) object, or None to auto-detect
     :param writer_args: writer parameters passed to the underlying toolkit
     :type writer_args: a dictionary
     :param errors: specify how to handle errors
     :type errors: one of "strict", "report", or "ignore"
     :param location: object used to track writer state information
     :type location: a :class:`chemfp.io.Location` object, or None
     :param level: compression level to use for compressed formats (does not affect OEChem)
     :type level: None, a positive integer, or one of the strings 'min', 'default', or 'max'
     :returns: a :class:`chemfp.base_toolkit.MoleculeWriter` expecting OEChem molecules


.. _openeye_toolkit.open_molecule_writer_to_string:

open_molecule_writer_to_string (openeye_toolkit)
------------------------------------------------

  .. py:function:: open_molecule_writer_to_string(format, writer_args=None, errors="strict", location=None)

     Return a MoleculeStringWriter which can write OEChem molecule records to a Unicode string.
     
     See :func:`chemfp.openeye_toolkit.open_molecule_writer` for full
     parameter details.
     
     Use the writer's :meth:`chemfp.base_toolkit.MoleculeStringWriter.getvalue`
     to get the output string as a Unicode string.
     
     :param format: the output structure format
     :type format: a format name string, or Format(-like) object, or None to auto-detect
     :param writer_args: writer arguments passed to the underlying toolkit
     :type writer_args: a dictionary
     :param errors: specify how to handle errors
     :type errors: one of "strict", "report", or "ignore"
     :param location: object used to track writer state information
     :type location: a :class:`chemfp.io.Location` object, or None
     :returns: a :class:`chemfp.base_toolkit.MoleculeStringWriter` expecting OEChem molecules


.. _openeye_toolkit.open_molecule_writer_to_bytes:

open_molecule_writer_to_bytes (openeye_toolkit)
-----------------------------------------------

  .. py:function:: open_molecule_writer_to_bytes(format, writer_args=None, errors="strict", location=None, level=None)

     Return a MoleculeStringWriter which can write OEChem molecule records to a byte string.
     
     See :func:`chemfp.openeye_toolkit.open_molecule_writer` for full
     parameter details.
     
     Use the writer's :meth:`chemfp.base_toolkit.MoleculeStringWriter.getvalue`
     to get the output string as a byte string.
     
     :param format: the output structure format
     :type format: a format name string, or Format(-like) object, or None to auto-detect
     :param writer_args: writer arguments passed to the underlying toolkit
     :type writer_args: a dictionary
     :param errors: specify how to handle errors
     :type errors: one of "strict", "report", or "ignore"
     :param location: object used to track writer state information
     :type location: a :class:`chemfp.io.Location` object, or None
     :param level: compression level to use for compressed formats (does not affect OEChem)
     :type level: None, a positive integer, or one of the strings 'min', 'default', or 'max'
     :returns: a :class:`chemfp.base_toolkit.MoleculeStringWriter` expecting OEChem molecules


.. _openeye_toolkit.copy_molecule:

copy_molecule (openeye_toolkit)
-------------------------------

  .. py:function:: copy_molecule(mol)

     Return a new OEGraphMol which is a copy of the given OEChem molecule
     
     :param mol: the molecule to copy
     :type mol: an Open Babel molecule
     :returns: a new OBMol instance


.. _openeye_toolkit.add_tag:

add_tag (openeye_toolkit)
-------------------------

  .. py:function:: add_tag(mol, tag, value)

     Add an SD tag value to the OEChem molecule
     
     :param mol: the molecule
     :type mol: an OEChem molecule
     :param tag: the SD tag name
     :type tag: string
     :param value: the text for the tag
     :type value: string
     :returns: None


.. _openeye_toolkit.get_tag:

get_tag (openeye_toolkit)
-------------------------

  .. py:function:: get_tag(mol, tag)

     Get the named SD tag value, or None if it doesn't exist
     
     :param mol: the molecule
     :type mol: an OEChem molecule
     :param tag: the SD tag name
     :type tag: string
     :returns: a string, or None


.. _openeye_toolkit.get_tag_pairs:

get_tag_pairs (openeye_toolkit)
-------------------------------

  .. py:function:: get_tag_pairs(mol)

     Get a list of all SD tag (name, value) pairs for the molecule
     
     :param mol: the molecule
     :type mol: an OEChem molecule
     :returns: a list of (string name, string value) pairs


.. _openeye_toolkit.get_id:

get_id (openeye_toolkit)
------------------------

  .. py:function:: get_id(mol)

     Get the molecule's id using OEChem's GetTitle()
     
     :param mol: the molecule
     :type mol: an OEChem molecule
     :returns: a string


.. _openeye_toolkit.set_id:

set_id (openeye_toolkit)
------------------------

  .. py:function:: set_id(mol, id)

     Set the molecule's id using OEChem's SetTitle()
     
     :param mol: the molecule
     :type mol: an OEChem molecule
     :param id: the new id
     :type id: string
     :returns: None


.. py:module:: chemfp.rdkit_toolkit

chemfp.rdkit_toolkit module
===========================


The chemfp toolkit layer for RDKit.

.. _rdkit_toolkit.name:

name
----

.. py:attribute:: name

The string "rdkit".

.. _rdkit_toolkit.software:

software
--------

.. py:attribute:: software

A string like "RDKit/2016.09.3", where the second part of
the string comes from rdkit.rdBase.rdkitVersion.


.. _rdkit_toolkit.is_licensed:

is_licensed (rdkit_toolkit)
---------------------------

  .. py:function:: is_licensed()

     Return True - RDKit is always licensed
     
     :returns: True


.. _rdkit_toolkit.get_formats:

get_formats (rdkit_toolkit)
---------------------------

  .. py:function:: get_formats(include_unavailable=False)

     Get the list of structure formats that RDKit supports
     
     If *include_unavailable* is True then also include RDKit formats
     which aren't available to this specific version of RDKit, such
     as the InChI formats if your RDKit installation wasn't compiled
     with InChI support.
     
     :param include_unavailable: include unavailable formats?
     :type include_unavailable: True or False
     :returns: a list of Format objects


.. _rdkit_toolkit.get_input_formats:

get_input_formats (rdkit_toolkit)
---------------------------------

  .. py:function:: get_input_formats()

     Get the list of supported RDKit input formats
     
     :returns: a list of :class:`chemfp.base_toolkit.Format` objects


.. _rdkit_toolkit.get_output_formats:

get_output_formats (rdkit_toolkit)
----------------------------------

  .. py:function:: get_output_formats()

     Get the list of supported RDKit output formats
     
     :returns: a list of :class:`chemfp.base_toolkit.Format` objects


.. _rdkit_toolkit.get_format:

get_format (rdkit_toolkit)
--------------------------

  .. py:function:: get_format(format)

     Get the named format, or raise a ValueError
     
     This will raise a ValueError if RDKit does not implement
     the format *format_name* or that format is not available.
     
     :param format_name: the format name
     :type format_name: a string
     :returns: a list of :class:`chemfp.base_toolkit.Format` objects


.. _rdkit_toolkit.get_input_format:

get_input_format (rdkit_toolkit)
--------------------------------

  .. py:function:: get_input_format(format)

     Get the named input format, or raise a ValueError
     
     This will raise a ValueError if RDKit does not implement
     the format *format_name* or that format is not an input format.
     
     :param format_name: the format name
     :type format_name: a string
     :returns: a list of :class:`chemfp.base_toolkit.Format` objects


.. _rdkit_toolkit.get_output_format:

get_output_format (rdkit_toolkit)
---------------------------------

  .. py:function:: get_output_format(format)

     Get the named format, or raise a ValueError
     
     This will raise a ValueError if RDKit does not implement
     the format *format_name* or that format is not an output format.
     
     :param format_name: the format name
     :type format_name: a string
     :returns: a list of :class:`chemfp.base_toolkit.Format` objects


.. _rdkit_toolkit.get_input_format_from_source:

get_input_format_from_source (rdkit_toolkit)
--------------------------------------------

  .. py:function:: get_input_format_from_source(source=None, format=None)

     Get the most appropriate format given the available source and format information
     
     If *format* is a :class:`chemfp.base_toolkit.Format` then
     return it. If it's a Format-like object with "name" and "compression"
     attributes use it to make a real Format object with the same
     attributes. If it's a string then use it to create a Format object.
     
     If *format* is None, use the *source* to auto-detect the format.
     If auto-detection is not possible, assume it's an uncompressed
     SMILES file.
     
     :param source: the structure data source.
     :type source: a filename (as a string), a file object, or None to read from stdin
     :param format: format information, if known.
     :type format: a Format(-like) object, string, or None
     :returns: a :class:`chemfp.base_toolkit.Format` object


.. _rdkit_toolkit.get_output_format_from_destination:

get_output_format_from_destination (rdkit_toolkit)
--------------------------------------------------

  .. py:function:: get_output_format_from_destination(destination=None, format=None)

     Get the most appropriate format given the available destination and format information
     
     If *format* is a :class:`chemfp.base_toolkit.Format` then
     return it. If it's a Format-like object with "name" and "compression"
     attributes use it to make a real Format object with the same
     attributes. If it's a string then use it to create a Format object.
     
     If *format* is None, use the *destination* to auto-detect the format.
     If auto-detection is not possible, assume it's an uncompressed
     SMILES file.
     
     :param destination: The structure data source.
     :type destination: a filename (as a string), a file object, or None to read from stdin
     :param format: format information, if known.
     :type format: a Format(-like) object, string, or None
     :returns: a :class:`chemfp.base_toolkit.Format` object


.. _rdkit_toolkit.read_molecules:

read_molecules (rdkit_toolkit)
------------------------------

  .. py:function:: read_molecules(source=None, format=None, id_tag=None, reader_args=None, errors="strict", location=None, encoding="utf8", encoding_errors="strict")

     Return an iterator that reads RDKit molecules from a structure file
     
     Iterate through the *format* structure records in *source*. If *format*
     is None then auto-detect the format based on the *source*. For SD files,
     use *id_tag* to get the record id from the given SD tag instead of the
     title line. (read_molecules() will ignore the *id_tag*. It exists to
     make it easier to switch between reader functions.)
     
     Note: the reader returns a new RDKit molecule each time.
     
     The *reader_args* dictionary parameters depend on the format. These include:
     
     * SMILES
     
       * delimiter - one of "tab", "space", "to-eol", the space or tab characters, or None
       * has_header - True or False
       * sanitize - True or default sanitizes; False for unsanitized processing
     
     * InChI
     
       * delimiter - one of "tab", "space", "to-eol", the space or tab characters, or None
       * sanitize - True or default sanitizes; False for unsanitized processing
       * removeHs - True or default removes explicit hydrogens; False leaves them in the structure
       * logLevel - an integer log level
       * treatWarningAsError - True raises an exception on error; False or default keeps processing
       
     * SDF
     
       * sanitize - True or default sanitizes; False for unsanitized processing
       * removeHs - True or default removes explicit hydrogens; False leaves them in the structure
       * strictParsing - True or default for strict parsing; False for lenient parsing
     
     The *errors* parameter specifies how to handle errors. "strict" raises
     an exception, "report" sends a message to stderr and goes to the next
     record, and "ignore" goes to the next record.
     
     The *location* parameter takes a :class:`chemfp.io.Location` instance. If None
     then a default Location will be created.
     
     See :func:`chemfp.rdkit_toolkit.read_ids_and_molecules` if you want (id, molecule)
     pairs instead of just the molecules.
     
     :param source: the structure source
     :type source: a filename, file object, or None to read from stdin
     :param format: the input structure format
     :type format: a format name string, or Format object, or None to auto-detect
     :param id_tag: SD tag containing the record id
     :type id_tag: string, or None to use the record title
     :param reader_args: reader parameters passed to the underlying toolkit
     :type reader_args: a dictionary
     :param errors: specify how to handle errors
     :type errors: one of "strict", "report", or "ignore"
     :param location: object used to track parser state information
     :type location: a :class:`chemfp.io.Location` object, or None
     :returns: a :class:`chemfp.base_toolkit.MoleculeReader` iterating RDKit molecules


.. _rdkit_toolkit.read_molecules_from_string:

read_molecules_from_string (rdkit_toolkit)
------------------------------------------

  .. py:function:: read_molecules_from_string(content, format, id_tag=None, reader_args=None, errors="strict", location=None)

     Return an iterator that reads RDKit molecules from a string containing structure records
     
     *content* is a string containing 0 or more records in the format *format*. See
     :func:`chemfp.rdkit_toolkit.read_molecules` for details about the other
     parameters.  See :func:`chemfp.rdkit_toolkit.read_ids_and_molecules_from_string`
     if you  want to read (id, RDKit) pairs instead of just molecules.
     
     :param content: the string containing structure records
     :type content: a string
     :param format: the input structure format
     :type format: a format name string, or Format object
     :param id_tag: SD tag containing the record id
     :type id_tag: string, or None to use the record title
     :param reader_args: reader arguments passed to the underlying toolkit
     :type reader_args: a dictionary
     :param errors: specify how to handle errors
     :type errors: one of "strict", "report", or "ignore"
     :param location: object used to track parser state information
     :type location: a :class:`chemfp.io.Location` object, or None
     :returns: a :class:`chemfp.base_toolkit.MoleculeReader` iterating RDKit molecules


.. _rdkit_toolkit.read_ids_and_molecules:

read_ids_and_molecules (rdkit_toolkit)
--------------------------------------

  .. py:function:: read_ids_and_molecules(source=None, format=None, id_tag=None, reader_args=None, errors="strict", location=None, encoding="utf8", encoding_errors="strict")

     Return an iterator that reads (id, RDKit molecule) pairs from a structure file
     
     See :func:`chemfp.rdkit_toolkit.read_molecules` for full parameter details.
     The major difference is that this returns an iterator of (id, RDKit molecule)
     pairs instead of just the molecules.
     
     :param source: the structure source
     :type source: a filename, file object, or None to read from stdin
     :param format: the input structure format
     :type format: a format name string, or Format object, or None to auto-detect
     :param id_tag: SD tag containing the record id
     :type id_tag: string, or None to use the record title
     :param reader_args: reader arguments passed to the underlying toolkit
     :type reader_args: a dictionary
     :param errors: specify how to handle errors
     :type errors: one of "strict", "report", or "ignore"
     :param location: object used to track parser state information
     :type location: a :class:`chemfp.io.Location` object, or None
     :returns: a :class:`chemfp.base_toolkit.IdAndMoleculeReader` iterating (id, RDKit molecule) pairs


.. _rdkit_toolkit.read_ids_and_molecules_from_string:

read_ids_and_molecules_from_string (rdkit_toolkit)
--------------------------------------------------

  .. py:function:: read_ids_and_molecules_from_string(content, format, id_tag=None, reader_args=None, errors="strict", location=None)

     Return an iterator that reads (id, RDKit molecule) pairs from a string containing structure records
     
     *content* is a string containing 0 or more records in the format *format*. See
     :func:`chemfp.rdkit_toolkit.read_molecules` for details about the other
     parameters. See :func:`chemfp.rdkit_toolkit.read_molecules_from_string`
     if you just want to read the RDKit molecules instead of (id, molecule) pairs.
     
     :param content: the string containing structure records
     :type content: a string
     :param format: the input structure format
     :type format: a format name string, or Format object
     :param id_tag: SD tag containing the record id
     :type id_tag: string, or None to use the record title
     :param reader_args: reader arguments passed to the underlying toolkit
     :type reader_args: a dictionary
     :param errors: specify how to handle errors
     :type errors: one of "strict", "report", or "ignore"
     :param location: object used to track parser state information
     :type location: a :class:`chemfp.io.Location` object, or None
     :returns: a :class:`chemfp.base_toolkit.IdAndMoleculeReader` iterating (id, RDKit molecule) pairs


.. _rdkit_toolkit.make_id_and_molecule_parser:

make_id_and_molecule_parser (rdkit_toolkit)
-------------------------------------------

  .. py:function:: make_id_and_molecule_parser(format, id_tag=None, reader_args=None, errors="strict")

     Create a specialized function which takes a record and returns an (id, RDKit molecule) pair
     
     The returned function is optimized for reading many records from individual
     strings because it only does parameter validation once. However, I haven't
     really noticed much of a performance difference between this and
     :func:`chemfp.rdkit_toolkit.parse_id_and_molecule` so you can probably
     so I suggest you use that function directly instead of making a specialized function.
     (Let me know if making a specialized function is useful.)
     
     See :func:`chemfp.rdkit_toolkit.read_molecules` for details about the
     other parameters.
     
     :param format: the input structure format
     :type format: a format name string, or Format object
     :param id_tag: SD tag containing the record id
     :type id_tag: string, or None to use the record title
     :param reader_args: reader arguments passed to the underlying toolkit
     :type reader_args: a dictionary
     :param errors: specify how to handle errors
     :type errors: one of "strict", "report", or "ignore"
     :returns: a function of the form ``parser(record string) -> (id, RDKit molecule)``


.. _rdkit_toolkit.parse_molecule:

parse_molecule (rdkit_toolkit)
------------------------------

  .. py:function:: parse_molecule(content, format, id_tag=None, reader_args=None, errors="strict")

     Parse the first structure record from the *content* string and return an RDKit molecule.
     
     *content* is a string containing a single structure record in format *format*.
     (Additional records are ignored). See :func:`chemfp.rdkit_toolkit.read_molecules`
     for details about the other parameters. See :func:`chemfp.rdkit_toolkit.parse_id_and_molecule`
     if you want the (id, RDKit molecule) pair instead of just the molecule.
     
     :param content: the string containing a structure record
     :type content: a string
     :param format: the input structure format
     :type format: a format name string, or Format object
     :param id_tag: SD tag containing the record id
     :type id_tag: string, or None to use the record title
     :param reader_args: reader arguments passed to the underlying toolkit
     :type reader_args: a dictionary
     :param errors: specify how to handle errors
     :type errors: one of "strict", "report", or "ignore"
     :returns: an RDKit molecule


.. _rdkit_toolkit.parse_id_and_molecule:

parse_id_and_molecule (rdkit_toolkit)
-------------------------------------

  .. py:function:: parse_id_and_molecule(content, format, id_tag=None, reader_args=None, errors="strict")

     Parse the first structure record from *content* and return the (id, RDKit molecule) pair.
     
     *content* is a string containing a single structure record in format *format*.
     (Additional records are ignored). See :func:`chemfp.rdkit_toolkit.read_molecules`
     for details about the other parameters.
     
     See :func:`chemfp.rdkit_toolkit.read_molecules` for details about the
     other parameters. See :func:`chemfp.rdkit_toolkit.parse_molecule`
     if just want the RDKit molecule and not the the (id, RDKit molecule) pair.
     
     :param content: the string containing a structure record
     :type content: a string
     :param format: the input structure format
     :type format: a format name string, or Format object
     :param id_tag: SD tag containing the record id
     :type id_tag: string, or None to use the record title
     :param reader_args: reader arguments passed to the underlying toolkit
     :type reader_args: a dictionary
     :param errors: specify how to handle errors
     :type errors: one of "strict", "report", or "ignore"
     :returns: an (id, RDKit molecule) pair


.. _rdkit_toolkit.create_string:

create_string (rdkit_toolkit)
-----------------------------

  .. py:function:: create_string(mol, format, id=None, writer_args=None, errors="strict")

     Convert an RDKit molecule into a structure record in the given format as a Unicode string
     
     If *id* is not None then use it instead of the molecule's own
     title. Warning: this may briefly modify the molecule, so may not
     be thread-safe.
     
     :param mol: the molecule to use for the output
     :type mol: an RDKit molecule
     :param format: the output structure format
     :type format: a format name string, or Format object
     :param id: an alternate record id
     :type id: a string, or None to use the molecule's own id
     :param writer_args: writer arguments passed to the underlying toolkit
     :type writer_args: a dictionary
     :param errors: specify how to handle errors
     :type errors: one of "strict", "report", or "ignore"
     :returns: a Unicode string


.. _rdkit_toolkit.create_bytes:

create_bytes (rdkit_toolkit)
----------------------------

  .. py:function:: create_bytes(mol, format, id=None, writer_args=None, errors="strict", level=None)

     Convert an RDKit molecule into a structure record in the given format as a byte string
     
     If *id* is not None then use it instead of the molecule's own
     title. Warning: this may briefly modify the molecule, so may not
     be thread-safe.
     
     :param mol: the molecule to use for the output
     :type mol: an RDKit molecule
     :param format: the output structure format
     :type format: a format name string, or Format object
     :param id: an alternate record id
     :type id: a string, or None to use the molecule's own id
     :param writer_args: writer arguments passed to the underlying toolkit
     :type writer_args: a dictionary
     :param errors: specify how to handle errors
     :type errors: one of "strict", "report", or "ignore"
     :param level: compression level to use for compressed formats
     :type level: None, a positive integer, or one of the strings 'min', 'default', or 'max'
     :returns: a byte string


.. _rdkit_toolkit.open_molecule_writer:

open_molecule_writer (rdkit_toolkit)
------------------------------------

  .. py:function:: open_molecule_writer(destination=None, format=None, writer_args=None, errors="strict", location=None, encoding="utf8", encoding_errors="strict", level=None)

     Return a MoleculeWriter which can write RDKit molecules to a destination.
     
     A :class:`chemfp.base_toolkit.MoleculeWriter` has the methods ``write_molecule``,
     ``write_molecules``, and ``write_ids_and_molecules``, which are ways to write
     an RDKit molecule, an RDKit molecule iterator, or an (id, RDKit molecule) pair
     iterator to a file.
     
     Molecules are written to *destination*. The output format can be a
     string like "sdf.gz" or "smi", a :class:`chemfp.base_toolkit.Format`,
     or Format-like object with "name" and "compression" attributes, or None
     to auto-detect based on the *destination*. If auto-detection is not
     possible, the output will be written as uncompressed SMILES.
     
     The *writer_args* dictionary parameters depend on the format. These include:
     
     * SMILES
     
       * delimiter - one of "tab", "space", "to-eol", the space or tab characters, or None
       * isomericSmiles - True to generate isomeric SMILES
       * kekuleSmiles - True to generate SMILES in Kekule form
       * canonical - True to generate a canonical SMILES
       * allBondsExplicit - True to write explict '-' and ':' bonds, even if they can be inferred; default is False
       * allHsExplicit - True to write explicit hydrogen counts; default is False
       * cxsmiles - True to include CXSMILES annotations; default is False
     
     InChI and InChIKey
     
       * delimiter - one of "tab", "space", "to-eol", the space or tab characters, or None
       * include_id - True or default to include the id as the second column; False has no id column
       * options - an options string passed to the underlying InChI library
       * logLevel - an integer log level
       * treatWarningAsError - True raises an exception on error; False or default keeps processing
     
     SDF
     
       * includeStereo - True include stereo information; False or default does not
       * kekulize - True or default creates the connection table with bonds in Kekeule form
       * v3k - True to alway export in V3000 format
     
     The *errors* parameter specifies how to handle errors. "strict" raises
     an exception, "report" sends a message to stderr and goes to the next
     record, and "ignore" goes to the next record.
     
     The *location* parameter takes a :class:`chemfp.io.Location` instance. If
     None then a default Location will be created.
     
     :param destination: the structure destination
     :type destination: a filename, file object, or None to write to stdout
     :param format: the output structure format
     :type format: a format name string, or Format(-like) object, or None to auto-detect
     :param writer_args: writer parameters passed to the underlying toolkit
     :type writer_args: a dictionary
     :param errors: specify how to handle errors
     :type errors: one of "strict", "report", or "ignore"
     :param location: object used to track writer state information
     :type location: a :class:`chemfp.io.Location` object, or None
     :param level: compression level to use for compressed formats
     :type level: None, a positive integer, or one of the strings 'min', 'default', or 'max'
     :returns: a :class:`chemfp.base_toolkit.MoleculeWriter` expecting RDKit molecules


.. _rdkit_toolkit.open_molecule_writer_to_string:

open_molecule_writer_to_string (rdkit_toolkit)
----------------------------------------------

  .. py:function:: open_molecule_writer_to_string(format, writer_args=None, errors="strict", location=None)

     Return a MoleculeStringWriter which can write molecule records in the given format to a string.
     
     See :func:`chemfp.rdkit_toolkit.open_molecule_writer` for full
     parameter details.
     
     Use the writer's :meth:`chemfp.base_toolkit.MoleculeStringWriter.getvalue`
     to get the output as a Unicode string.
     
     :param format: the output structure format
     :type format: a format name string, or Format(-like) object, or None to auto-detect
     :param writer_args: writer arguments passed to the underlying toolkit
     :type writer_args: a dictionary
     :param errors: specify how to handle errors
     :type errors: one of "strict", "report", or "ignore"
     :param location: object used to track writer state information
     :type location: a :class:`chemfp.io.Location` object, or None
     :returns: a :class:`chemfp.base_toolkit.MoleculeStringWriter` expecting RDKit molecules


.. _rdkit_toolkit.open_molecule_writer_to_bytes:

open_molecule_writer_to_bytes (rdkit_toolkit)
---------------------------------------------

  .. py:function:: open_molecule_writer_to_bytes(format, writer_args=None, errors="strict", location=None, level=None)

     Return a MoleculeStringWriter which can write molecule records in the given format to a text string.
     
     See :func:`chemfp.rdkit_toolkit.open_molecule_writer` for full
     parameter details.
     
     Use the writer's :meth:`chemfp.base_toolkit.MoleculeStringWriter.getvalue`
     to get the output as a byte string.
     
     :param format: the output structure format
     :type format: a format name string, or Format(-like) object, or None to auto-detect
     :param writer_args: writer arguments passed to the underlying toolkit
     :type writer_args: a dictionary
     :param errors: specify how to handle errors
     :type errors: one of "strict", "report", or "ignore"
     :param location: object used to track writer state information
     :type location: a :class:`chemfp.io.Location` object, or None
     :param level: compression level to use for compressed formats
     :type level: None, a positive integer, or one of the strings 'min', 'default', or 'max'
     :returns: a :class:`chemfp.base_toolkit.MoleculeStringWriter` expecting RDKit molecules


.. _rdkit_toolkit.copy_molecule:

copy_molecule (rdkit_toolkit)
-----------------------------

  .. py:function:: copy_molecule(mol)

     Return a new RDKit molecule which is a copy of the given molecule
     
     :param mol: the molecule to copy
     :type mol: an RDKit molecule
     :returns: a new RDKit Mol instance


.. _rdkit_toolkit.add_tag:

add_tag (rdkit_toolkit)
-----------------------

  .. py:function:: add_tag(mol, tag, value)

     Add an SD tag value to the RDKit molecule
     
     :param mol: the molecule
     :type mol: an RDKit molecule
     :param tag: the SD tag name
     :type tag: string
     :param value: the text for the tag
     :type value: string
     :returns: None


.. _rdkit_toolkit.get_tag:

get_tag (rdkit_toolkit)
-----------------------

  .. py:function:: get_tag(mol, tag)

     Get the named SD tag value, or None if it doesn't exist
     
     :param mol: the molecule
     :type mol: an RDKit molecule
     :param tag: the SD tag name
     :type tag: string
     :returns: a string, or None


.. _rdkit_toolkit.get_tag_pairs:

get_tag_pairs (rdkit_toolkit)
-----------------------------

  .. py:function:: get_tag_pairs(mol)

     Get a list of all SD tag (name, value) pairs for the molecule
     
     :param mol: the molecule
     :type mol: an RDKit molecule
     :returns: a list of (string name, string value) pairs


.. _rdkit_toolkit.get_id:

get_id (rdkit_toolkit)
----------------------

  .. py:function:: get_id(mol)

     Get the molecule's id from RDKit's _Name property
     
     :param mol: the molecule
     :type mol: an RDKit molecule
     :returns: a string


.. _rdkit_toolkit.set_id:

set_id (rdkit_toolkit)
----------------------

  .. py:function:: set_id(mol, id)

     Set the molecule's id as RDKit's _Name property
     
     :param mol: the molecule
     :type mol: an RDKit molecule
     :param id: the new id
     :type id: string
     :returns: None


.. py:module:: chemfp.text_toolkit

chemfp.text_toolkit module
==========================

The text_toolkit implements the chemfp toolkit API but where the
"molecules" are simple TextRecord instances which store the
records as text strings. It does not use a back-end chemistry toolkit,
and it cannot convert between different chemistry representations.

The TextRecord is a base class. The actual records depend on the
format, and will be one of:

* :class:`.SDFRecord`
* :class:`.SmiRecord`
* :class:`.CanRecord`
* :class:`.UsmRecord`
* :class:`.SmiStringRecord`
* :class:`.CanStringRecord`
* :class:`.UsmStringRecord`

The text toolkit will let you "convert" between the different SMILES
formats, but it doesn't actually change the SMILES string. The SMILES
records have the attributes ``id``, ``record`` and ``smiles``.

The toolkit also knows a bit about the SD format. The SDF records have
the attributes ``id``, ``id_bytes`` and ``record``, and there are
methods to get SD tag values and add a tag to the end of the tag data
block.

The text_toolkit also supports a few SDF-specific I/O functions to read
SDF records directly as a string instead of wrapped in a TextRecord.

The record types also have the attributes ``encoding`` and
``encoding_errors`` which affect how the record bytes are parsed.

.. _text_toolkit.name:

name
----

.. py:attribute:: name

The string "text"

.. _text_toolkit.software:

software
--------

.. py:attribute:: software

A string like "chemfp/3.0".


.. _text_toolkit.is_licensed:

is_licensed (text_toolkit)
--------------------------

  .. py:function:: is_licensed()

     Return True - chemfp's text toolkit is always licensed
     
     :returns: True


.. _text_toolkit.get_formats:

get_formats (text_toolkit)
--------------------------

  .. py:function:: get_formats(include_unavailable=False)

     Get the list of structure formats that chemfp's text toolkit supports
     
     This version of chemfp will always support the structure formats
     available to chemfp so 'include_unavailable' does not affect anything.
     (It may affect other toolkits.)
     
     :param include_unavailable: include unavailable formats?
     :value include_unavailable: True or False
     :returns: a list of :class:`chemfp.base_toolkit.Format` objects


.. _text_toolkit.get_input_formats:

get_input_formats (text_toolkit)
--------------------------------

  .. py:function:: get_input_formats()

     Get the list of supported chemfp text toolkit input formats
     
     :returns: a list of :class:`chemfp.base_toolkit.Format` objects


.. _text_toolkit.get_output_formats:

get_output_formats (text_toolkit)
---------------------------------

  .. py:function:: get_output_formats()

     Get the list of supported chemfp text toolkit output formats
     
     :returns: a list of :class:`chemfp.base_toolkit.Format` objects


.. _text_toolkit.get_format:

get_format (text_toolkit)
-------------------------

  .. py:function:: get_format(format_name)

     Get the named format, or raise a ValueError
     
     This will raise a ValueError for unknown format names.
     
     :param format_name: the format name
     :value format_name: a string
     :returns: a :class:`chemfp.base_toolkit.Format` object


.. _text_toolkit.get_input_format:

get_input_format (text_toolkit)
-------------------------------

  .. py:function:: get_input_format(format_name)

     Get the named input format, or raise a ValueError
     
     This will raise a ValueError for unknown format names
     or if that format is not an input format.
     
     :param format_name: the format name
     :value format_name: a string
     :returns: a :class:`chemfp.base_toolkit.Format` object


.. _text_toolkit.get_output_format:

get_output_format (text_toolkit)
--------------------------------

  .. py:function:: get_output_format(format_name)

     Get the named format, or raise a ValueError
     
     This will raise a ValueError for unknown format names
     or if that format is not an output format.
     
     :param format_name: the format name
     :value format_name: a string
     :returns: a :class:`chemfp.base_toolkit.Format` object


.. _text_toolkit.get_input_format_from_source:

get_input_format_from_source (text_toolkit)
-------------------------------------------

  .. py:function:: get_input_format_from_source(source=None, format=None)

     Get the most appropriate format given the available source and format information
     
     If *format* is a :class:`chemfp.base_toolkit.Format` then
     return it. If it's a Format-like object with "name" and "compression"
     attributes use it to make a real Format object with the same
     attributes. If it's a string then use it to create a Format object.
     
     If *format* is None, use the *source* to auto-detect the format.
     If auto-detection is not possible, assume it's an uncompressed
     SMILES file.
     
     :param source: The structure data source.
     :type source: A filename (as a string), a file object, or None to read from stdin
     :param format: Format information, if known.
     :type format: A Format(-like) object, string, or None
     :returns: a :class:`chemfp.base_toolkit.Format` object


.. _text_toolkit.get_output_format_from_destination:

get_output_format_from_destination (text_toolkit)
-------------------------------------------------

  .. py:function:: get_output_format_from_destination(destination=None, format=None)

     Get the most appropriate format given the available destination and format information
     
     If *format* is a :class:`chemfp.base_toolkit.Format` then
     return it. If it's a Format-like object with "name" and "compression"
     attributes use it to make a real Format object with the same
     attributes. If it's a string then use it to create a Format object.
     
     If *format* is None, use the *destination* to auto-detect the format.
     If auto-detection is not possible, assume it's an uncompressed
     SMILES file.
     
     :param destination: The structure data source.
     :type destination: A filename (as a string), a file object, or None to read from stdin
     :param format: format information, if known.
     :type format: A Format(-like) object, string, or None
     :returns: A :class:`chemfp.base_toolkit.Format` object


.. _text_toolkit.read_molecules:

read_molecules (text_toolkit)
-----------------------------

  .. py:function:: read_molecules(source=None, format=None, id_tag=None, reader_args=None, errors="strict", location=None, encoding="utf8", encoding_errors="strict")

     Return an iterator that reads TextRecord instances from a structure file
     
     Iterate through the *format* structure records in *source*. If *format*
     is None then auto-detect the format based on the *source*. For SD files,
     use *id_tag* to get the record id from the given SD tag instead of the
     title line. (read_molecules() will ignore the *id_tag*. It exists to
     make it easier to switch between reader functions.)
     
     Only the SMILES formats use the *reader_args* dictionary. The supported
     parameters are:
     
       * delimiter - one of "tab", "space", "to-eol", the space or tab characters, or None
       * has_header - True or False
     
     The *errors* parameter specifies how to handle errors. "strict" raises
     an exception, "report" sends a message to stderr and goes to the next
     record, and "ignore" goes to the next record.
     
     The *location* parameter takes a :class:`chemfp.io.Location` instance. If
     None then a default Location will be created.
     
     See :func:`.read_ids_and_molecules` if you want (id, :class:`.TextRecord`)
     pairs instead of just the molecules.
     
     :param source: the structure source
     :type source: a filename, file object, or None to read from stdin
     :param format: the input structure format
     :type format: a format name string, or Format object, or None to auto-detect
     :param id_tag: SD tag containing the record id
     :type id_tag: string, or None to use the record title
     :param reader_args: reader parameters passed to the underlying toolkit
     :type reader_args: a dictionary
     :param errors: specify how to handle errors
     :type errors: one of "strict", "report", or "ignore"
     :param location: object used to track parser state information
     :type location: a :class:`chemfp.io.Location` object, or None
     :param encoding: the byte encoding
     :type encoding: string (typically 'utf8' or 'latin1')
     :param encoding_errors: how to handle decoding failure
     :type encoding_errors: string (typically 'strict', 'ignore', or 'replace')
     :returns: a :class:`chemfp.base_toolkit.MoleculeReader` iterating :class:`.TextRecord` molecules


.. _text_toolkit.read_molecules_from_string:

read_molecules_from_string (text_toolkit)
-----------------------------------------

  .. py:function:: read_molecules_from_string(content, format, id_tag=None, reader_args=None, errors="strict", location=None)

     Return an iterator that reads TextRecord instances from a string containing structure records
     
     *content* is a string containing 0 or more records in the format *format*. See
     :func:`.read_molecules` for details about the other parameters. See
     :func:`.read_ids_and_molecules_from_string` if you want to read (id, :class:`.TextRecord`)
     pairs instead of just molecules.
     
     :param content: the string containing structure records
     :type content: a string
     :param format: the input structure format
     :type format: a format name string, or Format object
     :param id_tag: SD tag containing the record id
     :type id_tag: string, or None to use the record title
     :param reader_args: reader arguments passed to the underlying toolkit
     :type reader_args: a dictionary
     :param errors: specify how to handle errors
     :type errors: one of "strict", "report", or "ignore"
     :param location: object used to track parser state information
     :type location: a :class:`chemfp.io.Location` object, or None
     :param encoding: the byte encoding
     :type encoding: string (typically 'utf8' or 'latin1')
     :param encoding_errors: how to handle decoding failure
     :type encoding_errors: string (typically 'strict', 'ignore', or 'replace')
     :returns: a :class:`chemfp.base_toolkit.MoleculeReader` iterating :class:`.TextRecord` molecules


.. _text_toolkit.read_ids_and_molecules:

read_ids_and_molecules (text_toolkit)
-------------------------------------

  .. py:function:: read_ids_and_molecules(source=None, format=None, id_tag=None, reader_args=None, errors="strict", location=None, encoding="utf8", encoding_errors="strict")

     Return an iterator that reads (id, TextRecord) pairs from a structure file
     
     See :func:`chemfp.text_toolkit.read_molecules` for full parameter details.
     The major difference is that this returns an iterator of (id, :class:`.TextRecord`)
     pairs instead of just the molecules.
     
     :param source: the structure source
     :type source: a filename, file object, or None to read from stdin
     :param format: the input structure format
     :type format: a format name string, or Format object, or None to auto-detect
     :param id_tag: SD tag containing the record id
     :type id_tag: string, or None to use the record title
     :param reader_args: reader arguments passed to the underlying toolkit
     :type reader_args: a dictionary
     :param errors: specify how to handle errors
     :type errors: one of "strict", "report", or "ignore"
     :param location: object used to track parser state information
     :type location: a :class:`chemfp.io.Location` object, or None
     :param encoding: the byte encoding
     :type encoding: string (typically 'utf8' or 'latin1')
     :param encoding_errors: how to handle decoding failure
     :type encoding_errors: string (typically 'strict', 'ignore', or 'replace')
     :returns: a :class:`chemfp.text_toolkit.IdAndMoleculeReader` iterating (id, :class:`.TextRecord`) pairs


.. _text_toolkit.read_ids_and_molecules_from_string:

read_ids_and_molecules_from_string (text_toolkit)
-------------------------------------------------

  .. py:function:: read_ids_and_molecules_from_string(content, format, id_tag=None, reader_args=None, errors="strict", location=None)

     Return an iterator that reads (id, TextRecord) pairs from a string containing structure records
     
     *content* is a string containing 0 or more records in the format *format*. See
     :func:`chemfp.rdkit_toolkit.read_molecules` for details about the other
     parameters. See :func:`chemfp.rdkit_toolkit.read_molecules_from_string` if you
     just want to read the :class:`.TextRecord` molecules instead of (id, TextRecord) pairs.
     
     :param content: the string containing structure records
     :type content: a string
     :param format: the input structure format
     :type format: a format name string, or Format object
     :param id_tag: SD tag containing the record id
     :type id_tag: string, or None to use the record title
     :param reader_args: reader arguments passed to the underlying toolkit
     :type reader_args: a dictionary
     :param errors: specify how to handle errors
     :type errors: one of "strict", "report", or "ignore"
     :param location: object used to track parser state information
     :type location: a :class:`chemfp.io.Location` object, or None
     :param encoding: the byte encoding
     :type encoding: string (typically 'utf8' or 'latin1')
     :param encoding_errors: how to handle decoding failure
     :type encoding_errors: string (typically 'strict', 'ignore', or 'replace')
     :returns: a :class:`chemfp.base_toolkit.IdAndMoleculeReader` iterating (id, :class:`.TextRecord`) pairs


.. _text_toolkit.make_id_and_molecule_parser:

make_id_and_molecule_parser (text_toolkit)
------------------------------------------

  .. py:function:: make_id_and_molecule_parser(format, id_tag=None, reader_args=None, errors="strict")

     Create a specialized function which takes a record and returns an (id, TextRecord) pair
     
     The returned function is optimized for reading many records from individual
     strings because it only does parameter validation once. However, I haven't
     really noticed much of a performance difference between this and
     :func:`chemfp.text_toolkit.parse_id_and_molecule` so I suggest you use
     that function directly instead of making a specialized function.
     (Let me know if making a specialized function is useful.)
     
     See :func:`chemfp.text_toolkit.read_molecules` for details about the
     other parameters. The specific :class:`.TextRecord` subclass returned
     depends on the format.
     
     :param format: the input structure format
     :type format: a format name string, or Format object
     :param id_tag: SD tag containing the record id
     :type id_tag: string, or None to use the record title
     :param reader_args: reader arguments passed to the underlying toolkit
     :type reader_args: a dictionary
     :param errors: specify how to handle errors
     :type errors: one of "strict", "report", or "ignore"
     :returns: a function of the form ``parser(record string) -> (id, text_record)``


.. _text_toolkit.parse_molecule:

parse_molecule (text_toolkit)
-----------------------------

  .. py:function:: parse_molecule(content, format, id_tag=None, reader_args=None, errors="strict")

     Parse the first structure record from the *content* string and return a TextRecord.
     
     *content* is a string containing a single structure record in format *format*.
     (Additional records are ignored). See :func:`chemfp.text_toolkit.read_molecules`
     for details about the other parameters. See :func:`chemfp.text_toolkit.parse_id_and_molecule`
     if you want the (id, :class:`.TextRecord`) pair instead of just the text record.
     
     :param content: the string containing a structure record
     :type content: a string
     :param format: the input structure format
     :type format: a format name string, or Format object
     :param id_tag: SD tag containing the record id
     :type id_tag: string, or None to use the record title
     :param reader_args: reader arguments passed to the underlying toolkit
     :type reader_args: a dictionary
     :param errors: specify how to handle errors
     :type errors: one of "strict", "report", or "ignore"
     :param encoding: the byte encoding
     :type encoding: string (typically 'utf8' or 'latin1')
     :param encoding_errors: how to handle decoding failure
     :type encoding_errors: string (typically 'strict', 'ignore', or 'replace')
     :returns: a :class:`.TextRecord`


.. _text_toolkit.parse_id_and_molecule:

parse_id_and_molecule (text_toolkit)
------------------------------------

  .. py:function:: parse_id_and_molecule(content, format, id_tag=None, reader_args=None, errors="strict")

     Parse the first structure record from *content* and return the (id, TextRecord) pair.
     
     *content* is a string containing a single structure record in format *format*.
     (Additional records are ignored). See :func:`chemfp.rdkit_toolkit.read_molecules`
     for details about the other parameters.
     
     See :func:`chemfp.rdkit_toolkit.read_molecules` for details about the
     other parameters. See :func:`chemfp.rdkit_toolkit.parse_molecule`
     if just want the :class:`.TextRecord` and not the the (id, TextRecord) pair.
     
     :param content: the string containing a structure record
     :type content: a string
     :param format: the input structure format
     :type format: a format name string, or Format object
     :param id_tag: SD tag containing the record id
     :type id_tag: string, or None to use the record title
     :param reader_args: reader arguments passed to the underlying toolkit
     :type reader_args: a dictionary
     :param errors: specify how to handle errors
     :type errors: one of "strict", "report", or "ignore"
     :param encoding: the byte encoding
     :type encoding: string (typically 'utf8' or 'latin1')
     :param encoding_errors: how to handle decoding failure
     :type encoding_errors: string (typically 'strict', 'ignore', or 'replace')
     :returns: an (id, :class:`.TextRecord` molecule) pair


.. _text_toolkit.create_string:

create_string (text_toolkit)
----------------------------

  .. py:function:: create_string(mol, format, id=None, writer_args=None, errors="strict")

     Convert a TextRecord into a structure record in the given format as a Unicode string
     
     If *id* is not None then use it instead of the molecule's own id.
     
     :param mol: the molecule to use for the output
     :type mol: a :class:`.TextRecord`
     :param format: the output structure format
     :type format: a format name string, or Format object
     :param id: an alternate record id
     :type id: a string, or None to use the molecule's own id
     :param writer_args: writer arguments passed to the underlying toolkit
     :type writer_args: a dictionary
     :param errors: specify how to handle errors
     :type errors: one of "strict", "report", or "ignore"
     :returns: a Unicode string


.. _text_toolkit.create_bytes:

create_bytes (text_toolkit)
---------------------------

  .. py:function:: create_bytes(mol, format, id=None, writer_args=None, errors="strict", level=None)

     Convert a TextRecord into a structure record in the given format as a byte string
     
     If *id* is not None then use it instead of the molecule's own id.
     
     :param mol: the molecule to use for the output
     :type mol: a :class:`.TextRecord`
     :param format: the output structure format
     :type format: a format name string, or Format object
     :param id: an alternate record id
     :type id: a string, or None to use the molecule's own id
     :param writer_args: writer arguments passed to the underlying toolkit
     :type writer_args: a dictionary
     :param errors: specify how to handle errors
     :type errors: one of "strict", "report", or "ignore"
     :param level: compression level to use for compressed formats
     :type level: None, a positive integer, or one of the strings 'min', 'default', or 'max'
     :returns: a byte string


.. _text_toolkit.open_molecule_writer:

open_molecule_writer (text_toolkit)
-----------------------------------

  .. py:function:: open_molecule_writer(destination=None, format=None, writer_args=None, errors="strict", location=None, encoding="utf8", encoding_errors="strict", level=None)

     Return a MoleculeWriter which can write TextRecord instances to a destination.
     
     A :class:`chemfp.base_toolkit.MoleculeWriter` has the methods ``write_molecule``,
     ``write_molecules``, and ``write_ids_and_molecules``, which are ways to write
     an :class:`.TextRecord`, an TextRecord iterator, or an (id, TextRecord) pair
     iterator to a file.
     
     TextRecords are written to *destination*. The output format can be a
     string like "sdf.gz" or "smi", a :class:`chemfp.base_toolkit.Format`,
     or Format-like object with "name" and "compression" attributes, or None
     to auto-detect based on the *destination*. If auto-detection is not
     possible, the output will be written as uncompressed SMILES.
     
     That said, the text toolkit doesn't know how to convert between SMILES
     and SDF formats, and will raise an exception if you try.
     
     The *writer_args* is only used for the "smi", "can", and "usm" output
     formats. The only supported parameter is::
     
       * delimiter - one of "tab", "space", "to-eol", the space or tab characters, or None
     
     The *errors* parameter specifies how to handle errors. "strict" raises
     an exception, "report" sends a message to stderr and goes to the next
     record, and "ignore" goes to the next record.
     
     The *location* parameter takes a :class:`chemfp.io.Location` instance. If
     None then a default Location will be created.
     
     :param destination: the structure destination
     :type destination: a filename, file object, or None to write to stdout
     :param format: the output structure format
     :type format: a format name string, or Format(-like) object, or None to auto-detect
     :param writer_args: writer arguments passed to the underlying toolkit
     :type writer_args: a dictionary
     :param errors: specify how to handle errors
     :type errors: one of "strict", "report", or "ignore"
     :param location: object used to track writer state information
     :type location: a :class:`chemfp.io.Location` object, or None
     :param encoding: the byte encoding
     :type encoding: string (typically 'utf8' or 'latin1')
     :param encoding_errors: how to handle decoding failure
     :type encoding_errors: string (typically 'strict', 'ignore', or 'replace')
     :param level: compression level to use for compressed formats
     :type level: None, a positive integer, or one of the strings 'min', 'default', or 'max'
     :returns: a :class:`chemfp.base_toolkit.MoleculeWriter` expecting :class:`.TextRecord` instances


.. _text_toolkit.open_molecule_writer_to_string:

open_molecule_writer_to_string (text_toolkit)
---------------------------------------------

  .. py:function:: open_molecule_writer_to_string(format, writer_args=None, errors="strict", location=None)

     Return a MoleculeStringWriter which can write TextRecord instances to a string.
     
     See :func:`chemfp.text_toolkit.open_molecule_writer` for full
     parameter details.
     
     Use the writer's :meth:`chemfp.base_toolkit.MoleculeStringWriter.getvalue`
     to get the output as a Unicode string.
     
     :param format: the output structure format
     :type format: a format name string, or Format(-like) object, or None to auto-detect
     :param writer_args: writer arguments passed to the underlying toolkit
     :type writer_args: a dictionary
     :param errors: specify how to handle errors
     :type errors: one of "strict", "report", or "ignore"
     :param location: object used to track writer state information
     :type location: a :class:`chemfp.io.Location` object, or None
     :returns: a :class:`chemfp.base_toolkit.MoleculeStringWriter` expecting :class:`.TextRecord` instances


.. _text_toolkit.open_molecule_writer_to_bytes:

open_molecule_writer_to_bytes (text_toolkit)
--------------------------------------------

  .. py:function:: open_molecule_writer_to_bytes(format, writer_args=None, errors="strict", location=None, level=None)

     Return a MoleculeStringWriter which can write TextRecord instances to a string.
     
     See :func:`chemfp.text_toolkit.open_molecule_writer` for full
     parameter details.
     
     Use the writer's :meth:`chemfp.base_toolkit.MoleculeStringWriter.getvalue`
     to get the output as a byte string.
     
     :param format: the output structure format
     :type format: a format name string, or Format(-like) object, or None to auto-detect
     :param writer_args: writer arguments passed to the underlying toolkit
     :type writer_args: a dictionary
     :param errors: specify how to handle errors
     :type errors: one of "strict", "report", or "ignore"
     :param location: object used to track writer state information
     :type location: a :class:`chemfp.io.Location` object, or None
     :param level: compression level to use for compressed formats
     :type level: None, a positive integer, or one of the strings 'min', 'default', or 'max'
     :returns: a :class:`chemfp.base_toolkit.MoleculeStringWriter` expecting :class:`.TextRecord` instances


.. _text_toolkit.copy_molecule:

copy_molecule (text_toolkit)
----------------------------

  .. py:function:: copy_molecule(mol)

     Return a new TextRecord which is a copy of the given TextRecord
     
     :param mol: the text record
     :type mol: a :class:`.TextRecord`
     :returns: a new :class:`.TextRecord`


.. _text_toolkit.add_tag:

add_tag (text_toolkit)
----------------------

  .. py:function:: add_tag(mol, tag, value)

     Add an SD tag value to the TextRecord
     
     If the *mol* is in "sdf" format then this will modify
     ``mol.record`` to append the new *tag* and *value* to the
     end of the tag block. The other tags will not be modified,
     including tags with the same tag name.
     
     :param mol: the text record
     :type mol: a :class:`.TextRecord`
     :param string tag: the SD tag name
     :param string value: the text for the tag
     :returns: None


.. _text_toolkit.get_tag:

get_tag (text_toolkit)
----------------------

  .. py:function:: get_tag(mol, tag)

     Get the named SD tag value, or None if it doesn't exist
     
     If the *mol* is in "sdf" format then this will return
     the corresponding tag value from ``mol.record``, or None
     if the tag does not exist.
     
     If the record is in any other format then it will return None.
     
     :param mol: the molecule
     :type mol: a :class:`.TextRecord`
     :param tag: the SD tag name
     :type tag: string
     :returns: a string, or None


.. _text_toolkit.get_tag_pairs:

get_tag_pairs (text_toolkit)
----------------------------

  .. py:function:: get_tag_pairs(mol)

     Get a list of all SD tag (name, value) pairs for the TextRecord
     
     If the *mol* is in "sdf" format then this will return
     the list of (tag, value) pairs in ``mol.record``, where the
     *tag* and *value* are strings.
     
     If the record is in any other format then it will return
     an empty list.
     
     :param mol: the molecule
     :type mol: a :class:`.TextRecord`
     :returns: a list of (tag name, tag value) pairs


.. _text_toolkit.get_id:

get_id (text_toolkit)
---------------------

  .. py:function:: get_id(mol)

     Get the molecule's id from the TextRecord's id field
     
     This is toolkit-portable way to get ``mol.id``.
     
     :param mol: the molecule
     :type mol: a TextRecord
     :returns: a string


.. _text_toolkit.set_id:

set_id (text_toolkit)
---------------------

  .. py:function:: set_id(mol, id)

     Set the TextRecord's id to the new id
     
     This is the toolkit-portable way to write ``mol.id = id``.
     
     Note: this does not modify ``mol.record``. Use :func:`chemfp.text_toolkit.create_string`
     or similar text_toolkit functions to get the record text with a new identifier.
     
     :param mol: the molecule
     :type mol: a :class:`.TextRecord`
     :param id: the new id
     :type id: string
     :returns: None


.. _text_toolkit.read_sdf_records:

read_sdf_records (text_toolkit)
-------------------------------

  .. py:function:: read_sdf_records(source=None, reader_args=None, compression=None, errors="strict", location=None, block_size=327680)

     Return an iterator that reads each record from an SD file as a string.
     
     Iterate through the records in *source*, which must be in SD format.
     If *compression* is None or "auto" then auto-detect the compression type
     based on *source*, and default to uncompressed when it can't be determined.
     Use "gz" when the input is gzip compressed, and "none" or "" if uncompressed.
     
     The *reader_args* parameter is currently unused. It exists for future compatability.
     
     The *errors* parameter specifies how to handle errors. "strict" raises
     an exception, "report" sends a message to stderr and goes to the next
     record, and "ignore" goes to the next record.
     
     The *location* parameter takes a :class:`chemfp.io.Location` instance. If
     None then a default Location will be created.
     
     The *block_size* parameter is the number of bytes to read from the SD file.
     The current implementation reads a block, iterates through the records in the
     block, then prepends any remaining text to the start of the next block. You
     shouldn't need to change this parameter, but if you do, please let me know.
     
     Note: to prevent accidental memory consumption if the input is in the wrong
     format, a complete record must be found within the first 327680 bytes or
     5*block_size bytes, whichever is larger.
     
     The parser has only a basic understanding of the SD format. It knows how
     to handle the counts line, the SKP property, and even tag data with
     the value '$$$$'. It is not a full validator and it does not know chemistry.
     
     WARNING: the parser does not yet handle the MS Windows newline convention.
     
     See :func:`.read_sdf_ids_and_records` if you want (id, record)
     pairs, and :func:`.read_sdf_ids_and_values` if you want
     (id, tag data) pairs. See
     :func:`.read_sdf_ids_and_records_from_string` to read from
     a string instead of a file or file-like object.
     
     :param source: the SDF source
     :type source: a filename, file object, or None to read from stdin
     :param reader_args: currently ignored
     :type reader_args: currently ignored
     :param compression: the data content compression method
     :type compression: one of "auto", "none", "", or "gz"
     :param errors: specify how to handle errors
     :type errors: one of "strict", "report", or "ignore"
     :param location: object used to track parser state information
     :type location: a :class:`chemfp.io.Location` object, or None
     :returns: a :func:`chemfp.base_toolkit.RecordReader` iterating over the records as a string


.. _text_toolkit.read_sdf_ids_and_records:

read_sdf_ids_and_records (text_toolkit)
---------------------------------------

  .. py:function:: read_sdf_ids_and_records(source=None, id_tag=None, reader_args=None, compression=None, errors="strict", location=None, encoding="utf8", encoding_errors="strict", block_size=327680)

     Return an iterator that reads the (id, record string) pairs from an SD file
     
     See :func:`.read_sdf_records` for most parameter details. That function
     iterates over the records, while this one iterates over the (id, record)
     pairs. By default the id comes from the title line. Use *id_tag* to get
     the record id from the given SD tag instead.
     
     See :func:`.read_sdf_ids_and_values` if you want to read an identifier
     and tag value, or two tag values, instead of returning the full record.
     
     :param source: the SDF source
     :type source: a filename, file object, or None to read from stdin
     :param id_tag: SD tag containing the record id
     :type id_tag: string, or None to use the record title
     :param reader_args: currently ignored
     :type reader_args: currently ignored
     :param compression: the data content compression method
     :type compression: one of "auto", "none", "", or "gz"
     :param errors: specify how to handle errors
     :type errors: one of "strict", "report", or "ignore"
     :param location: object used to track parser state information
     :type location: a :class:`chemfp.io.Location` object, or None
     :returns: a :class:`chemfp.base_toolkit.IdAndRecordReader` iterating (id, record string) pairs


.. _text_toolkit.read_sdf_ids_and_values:

read_sdf_ids_and_values (text_toolkit)
--------------------------------------

  .. py:function:: read_sdf_ids_and_values(source=None, id_tag=None, value_tag=None, reader_args=None, compression=None, errors="strict", location=None, encoding="utf8", encoding_errors="strict", block_size=327680)

     Return an iterator that reads the (id, tag value string) pairs from an SD file
     
     See :func:`.read_sdf_records` for most parameter details. That function
     iterates over the records, while this one iterates over the (id, tag value)
     pairs.
     
     By default this uses the title line for both the id and tag value
     strings.  Use *id_tag* and *value_tag*, respectively, to use a
     given tag value instead. If a tag doesn't exist then None will
     be used.
     
     :param source: the SDF source
     :type source: a filename, file object, or None to read from stdin
     :param id_tag: SD tag containing the record id
     :type id_tag: string, or None to use the record title
     :param value_tag: SD tag containing the value
     :type value_tag: string, or None to use the record title
     :param reader_args: currently ignored
     :type reader_args: currently ignored
     :param compression: the data content compression method
     :type compression: one of "auto", "none", "", or "gz"
     :param errors: specify how to handle errors
     :type errors: one of "strict", "report", or "ignore"
     :param location: object used to track parser state information
     :type location: a :class:`chemfp.io.Location` object, or None
     :returns: a :class:`chemfp.base_toolkit.IdAndRecordReader` iterating (id, value string) pairs


.. _text_toolkit.read_sdf_records_from_string:

read_sdf_records_from_string (text_toolkit)
-------------------------------------------

  .. py:function:: read_sdf_records_from_string(content, reader_args=None, compression=None, errors="strict", location=None, block_size=327680)

     Return an iterator that reads each record from a string containing SD records
     
     See :func:`.read_sdf_records_from_string` for the parameter details. The
     main difference is that this function reads from *content*, which is a
     string containing 0 or more SDF records.
     
     If content is a (Unicode) string then it must only contain ASCII characters,
     the records will be returned as strings, and the compression option is not
     supported. If content is a byte string then the records will be returned as
     byte strings, and compression is supported.
     
     See :func:`.read_sdf_ids_and_records_from_string` to read (id, record)
     pairs and :func:`.read_sdf_ids_and_values_from_string` to read (id, tag value)
     pairs.
     
     :param content: a string containing zero or more SD records
     :type content: string or bytes
     :param reader_args: currently ignored
     :type reader_args: currently ignored
     :param compression: the data content compression method
     :type compression: one of "auto", "none", "", or "gz"
     :param errors: specify how to handle errors
     :type errors: one of "strict", "report", or "ignore"
     :param location: object used to track parser state information
     :type location: a :class:`chemfp.io.Location` object, or None
     :returns: a :class:`chemfp.base_toolkit.RecordReader` iterating over each record as a string


.. _text_toolkit.read_sdf_ids_and_records_from_string:

read_sdf_ids_and_records_from_string (text_toolkit)
---------------------------------------------------

  .. py:function:: read_sdf_ids_and_records_from_string(content=None, id_tag=None, reader_args=None, compression=None, errors="strict", location=None, encoding="utf8", encoding_errors="strict", block_size=327680)

     Return an iterator that reads the (id, record) pairs from a string containing SD records
     
     This function reads the records from *content*, which is a string containing
     0 or more SDF records. It iterates over the (id, record) pairs. By default
     the id comes from the first line of the SD record. Use *id_tag* to use
     a given tag value instead. See :func:`.read_sdf_records` for details about
     the other parameters.
     
     If content is a (Unicode) string then it must only contain ASCII characters,
     the records will be returned as strings, the compression option is not
     supported, and the encoding and encoding_errors parameters are ignored.
     
     If content is a byte string then the records will be returned as
     byte strings, compression is supported, and the encoding and encoding_errors
     parameters are used to parse the id.
     
     :param content: a string containing zero or more SD records
     :type content: string or bytes
     :param id_tag: SD tag containing the record id
     :type id_tag: string, or None to use the record title
     :param reader_args: currently ignored
     :type reader_args: currently ignored
     :param compression: the data content compression method
     :type compression: one of "auto", "none", "", or "gz"
     :param errors: specify how to handle errors
     :type errors: one of "strict", "report", or "ignore"
     :param location: object used to track parser state information
     :type location: a :class:`chemfp.io.Location` object, or None
     :returns: a :class:`chemfp.base_toolkit.IdAndRecordReader` iterating over the (id, record string) pairs


.. _text_toolkit.read_sdf_ids_and_values_from_string:

read_sdf_ids_and_values_from_string (text_toolkit)
--------------------------------------------------

  .. py:function:: read_sdf_ids_and_values_from_string(content=None, id_tag=None, value_tag=None, compression=None, reader_args=None, errors="strict", location=None, encoding="utf8", encoding_errors="strict", block_size=327680)

     Return an iterator that reads the (id, value) pairs from a string containing SD records
     
     This function reads the records from *content*, which is a string
     containing 0 or more SDF records. It iterates over the (id, value)
     pairs, which by default both contain the title line. Use *id_tag*
     and *value_tag*, respectively, to use a given tag value instead.
     If a tag doesn't exist then None will be used.
     
     If content is a (Unicode) string then it must only contain ASCII characters, the
     compression option is not supported, and the encoding and encoding_errors
     parameters are ignored.
     
     If content is a byte string then the records will be returned as
     byte strings, compression is supported, and the encoding and encoding_errors
     parameters are used to parse the id and value.
     
     See :func:`.read_sdf_records` for details about the other parameters.
     
     :param content: a string containing zero or more SD records
     :type content: string or bytes
     :param id_tag: SD tag containing the record id
     :type id_tag: string, or None to use the record title
     :param value_tag: SD tag containing the value
     :type value_tag: string, or None to use the record title
     :param reader_args: currently ignored
     :type reader_args: currently ignored
     :param compression: the data content compression method
     :type compression: one of "auto", "none", "", or "gz"
     :param errors: specify how to handle errors
     :type errors: one of "strict", "report", or "ignore"
     :param location: object used to track parser state information
     :type location: a :class:`chemfp.io.Location` object, or None
     :returns: a :class:`chemfp.base_toolkit.IdAndRecordReader` iterating over the (id, value) pairs


.. _text_toolkit.get_sdf_tag:

get_sdf_tag (text_toolkit)
--------------------------

  .. py:function:: get_sdf_tag(sdf_record, tag)

     Return the value for a named tag in an SDF record string
     
     Get the value for the tag named *tag* from the string *sdf_record*
     containing an SD record.
     
     :param string sdf_record: an SD record
     :param string tag: a tag name
     :returns: the corresponding tag value as a string, or None


.. _text_toolkit.add_sdf_tag:

add_sdf_tag (text_toolkit)
--------------------------

  .. py:function:: add_sdf_tag(sdf_record, tag, value)

     Add an SD tag value to an SD record string
     
     This will append the new *tag* and *value* to the
     end of the tag data block in the *sdf_record* string.
     
     :param string sdf_record: an SD record
     :param string tag: a tag name
     :param string value: the new tag value
     :returns: a new SD record string with the new tag and value


.. _text_toolkit.get_sdf_tag_pairs:

get_sdf_tag_pairs (text_toolkit)
--------------------------------

  .. py:function:: get_sdf_tag_pairs(sdf_record)

     Return the (tag, value) entries in the SDF record string
     
     Parse the *sdf_record* and return the tag data as a list of (tag,
     value) pairs. The type of the returned strings will be the same
     as the type of the input sdf_record string.
     
     :param string sdf_record: an SDF record
     :returns: a list of (tag, value) pairs


.. _text_toolkit.get_sdf_id:

get_sdf_id (text_toolkit)
-------------------------

  .. py:function:: get_sdf_id(sdf_record)

     Return the id for the SDF record string
     
     The id is the first line of the *sdf_record*. A future version
     of this function may support an *id_tag* parameter. Let me
     know if that would be useful.
     
     The returned id string will have the same type as the input
     sdf_record.
     
     :param string sdf_record: an SD record
     :returns: the first line of the SD record


.. _text_toolkit.set_sdf_id:

set_sdf_id (text_toolkit)
-------------------------

  .. py:function:: set_sdf_id(sdf_record, id)

     Set the id of the SDF record string to a new value
     
     Set the first line of *sdf_record* to the new *id*, which must
     not contain a newline.
     
     The sdf_record and the id must have the same string type.
     
     :param string sdf_record: an SDF record
     :param string id: the new id


chemfp._text_toolkit module (private)
=====================================

.. py:module:: chemfp._text_toolkit

As you might have infered from the leading "_" in "_text_toolkit",
this is not a public module. There is no reason for you to import it
directly, the module name is subject to change, and even the location
of the classes is also subject to change. The reason why I even bring
it up is because the :mod:`chemfp.text_toolkit` returns class
instances from this module, so you might well wonder about them.


TextRecord
----------

.. py:class:: TextRecord

   Base class for the text_toolkit 'molecules', which work with the records as text.
   
   The :mod:`chemfp.text_toolkit` implements the toolkit API, but
   it doesn't know chemistry. Instead of returning real molecule objects, with
   atoms and bonds, it returns TextRecord subclass instances that hold the
   record as a text string.
   
   As an implementation detail (which means its subject to change) there is
   a subclass for each of the support formats.
   
   * :class:`SDFRecord` - holds "sdf" records
   * :class:`SmiRecord` - holds "smi" records (the full line from a "smi" SMILES file)
   * :class:`CanRecord` - holds "can" records (the full line from a "can" SMILES file)
   * :class:`UsmRecord` - holds "usm" records (the full line from a "usm" SMILES file)
   * :class:`SmiStringRecord` - holds "smistring" records (only the "smistring" SMILES string; no id)
   * :class:`CanStringRecord` - holds "canstring" records (only the "canstring" SMILES string; no id)
   * :class:`UsmStringRecord` - holds "usmstring" records (only the "usmstring" SMILES string; no id)
   
   All of the classes have the following attributes:
   .. py:attribute:: id
   
      The record identifier as a Unicode string, or None if there is no identifier
   
   .. py:attribute:: id_bytes
   
      The record identifier as a byte string, or None if there is no identifier
   
   .. py:attribute:: record
   
      The record, as a string. For the smistring, canstring, and usmstring
      formats, this is only the SMILES string.
   
   .. py:attribute:: record_format
   
      One of "sdf", "smi", "can", "usm", "smistring", "canstring", or "usmstring".
   
   The SMILES classes have an attribute:
   
   .. py:attribute:: smiles
   
      The SMILES string component of the record.


  .. py:method:: add_tag(tag, value)

     Add an SD tag value to the TextRecord
     
     This methods does nothing if the record is not an "sdf" record.
     
     :param tag: the SD tag name
     :type tag: string
     :param value: the text for the tag
     :type value: string
     :returns: None


  .. py:method:: get_tag(tag)

     Get the named SD tag value, or None if it doesn't exist or is not an "sdf" record.
     
     :param tag: the SD tag name
     :type tag: byte or Unicode string
     :returns: a Unicode string, or None


  .. py:method:: get_tag_as_bytes(tag)

     Get the named SD tag value, or None if it doesn't exist or is not an "sdf" record.
     
     :param tag: the SD tag name
     :type tag: byte string
     :returns: a byte string, or None


  .. py:method:: get_tag_pairs()

     Get a list of all SD tag (name, value) pairs for the TextRecord using Unicode strings
     
     This function returns an empty list if the record is not an "sdf" record.
     
     :returns: a list of (Unicode string name, Unicode string value) pairs


  .. py:method:: get_tag_pairs_as_bytes()

     Get a list of all SD tag (name, value) pairs for the TextRecord using byte strings
     
     This function returns an empty list if the record is not an "sdf" record.
     
     :returns: a list of (byte string name, byte string value) pairs


  .. py:method:: copy()

     Return a new record which is a copy of the given record


SDFRecord
---------

.. py:class:: SDFRecord

   Holds an SDF record. See :class:`chemfp._text_toolkit.TextRecord` for API details


SmiRecord
---------

.. py:class:: SmiRecord

   Holds an "smi" record. See :class:`chemfp._text_toolkit.TextRecord` for API details


CanRecord
---------

.. py:class:: CanRecord

   Holds an "can" record. See :class:`chemfp._text_toolkit.TextRecord` for API details


UsmRecord
---------

.. py:class:: UsmRecord

   Holds an "usm" record. See :class:`chemfp._text_toolkit.TextRecord` for API details


SmiStringRecord
---------------

.. py:class:: SmiStringRecord

   Holds an "smistring" record. See :class:`chemfp._text_toolkit.TextRecord` for API details


CanStringRecord
---------------

.. py:class:: CanStringRecord

   Holds an "canstring" record. See :class:`chemfp._text_toolkit.TextRecord` for API details


UsmStringRecord
---------------

.. py:class:: UsmStringRecord

   Holds an "usmstring" record. See :class:`chemfp._text_toolkit.TextRecord` for API details


chemfp.io module
================

.. py:module:: chemfp.io

This module implements a single public class, :class:`Location`, which
tracks parser state information, including the location of the current
record in the file. The other functions and classes are undocumented,
should not be used, and may change in future releases.


Location
--------

.. py:class:: Location

   Get location and other internal reader and writer state information
   
   A Location instance gives a way to access information like
   the current record number, line number, and molecule object.::
   
     >>> import chemfp
     >>> with chemfp.read_molecule_fingerprints("RDKit-MACCS166",
     ...                        "ChEBI_lite.sdf.gz", id_tag="ChEBI ID") as reader:
     ...   for id, fp in reader:
     ...     if id == "CHEBI:3499":
     ...         print("Record starts at line", reader.location.lineno)
     ...         print("Record byte range:", reader.location.offsets)
     ...         print("Number of atoms:", reader.location.mol.GetNumAtoms())
     ...         break
     ... 
     [08:18:12]  S group MUL ignored on line 103
     Record starts at line 3599
     Record byte range: (138171, 141791)
     Number of atoms: 36
   
   The supported properties are:
   
     * filename - a string describing the source or destination
     * lineno - the line number for the start of the file
     * mol - the toolkit molecule for the current record
     * offsets - the (start, end) byte positions for the current record
     * output_recno - the number of records written successfully
     * recno - the current record number
     * record - the record as a text string
     * record_format - the record format, like "sdf" or "can"
      
   
   Most of the readers and writers do not support all of the properties.
   Unsupported properties return a None. The *filename* is a read/write
   attribute and the other attributes are read-only.
   
   If you don't pass a location to the readers and writers then they will
   create a new one based on the source or destination, respectively.
   You can also pass in your own Location, created as ``Location(filename)``
   if you have an actual filename, or ``Location.from_source(source)`` or
   ``Location.from_destination(destination)`` if you have a more generic
   source or destination.


  .. py:method:: __init__(filename=None)

     Use *filename* as the location's filename


  .. py:method:: from_source(cls, source)

     Create a Location instance based on the source
     
     If *source* is a string then it's used as the filename.
     If *source* is None then the location filename is "<stdin>".
     If *source* is a file object then its ``name`` attribute
     is used as the filename, or None if there is no attribute.


  .. py:method:: from_destination(cls, destination)

     Create a Location instance based on the destination
     
     If *destination* is a string then it's used as the filename.
     If *destination* is None then the location filename is "<stdout>".
     If *destination* is a file object then its ``name`` attribute
     is used as the filename, or None if there is no attribute.


  .. py:method:: __repr__()

     Return a string like 'Location("<stdout>")'


  .. py:attribute:: Location.first_line

     Read-only attribute.

     The first line of the current record
 

  .. py:attribute:: Location.filename

     Read/write attribute.

     A string which describes the source or destination. This is usually
     the source or destination filename but can be a string like "<stdin>"
     or "<stdout>".


  .. py:attribute:: Location.mol

     Read-only attribute.

     The molecule object for the current record
 

  .. py:attribute:: Location.offsets

     Read-only attribute.

     The (start, end) byte offsets, starting from 0
     
     *start* is the record start byte position and *end* is
     one byte past the last byte of the record.
 

  .. py:attribute:: Location.output_recno

     Read-only attribute.

     The number of records actually written to the file or string.
     
     The value ``recno - output_recno`` is the number of records
     sent to the writer but which had an error and could not be
     written to the output.
 

  .. py:attribute:: Location.recno

     Read-only attribute.

     The current record number
     
     For writers this is the number of records sent to
     the writer, and output_recno is the number of records
     sucessfully written to the file or string.
 

  .. py:attribute:: Location.record

     Read-only attribute.

     The current record as an uncompressed text string
 

  .. py:attribute:: Location.record_format

     Read-only attribute.

     The record format name
 

  .. py:method:: where()

     Return a human readable description about the current reader or writer state.
     
     The description will contain the filename, line number, record
     number, and up to the first 40 characters of the first line of
     the record, if those properties are available.