.. _installing: Installing chemfp ============================== Chemfp 4.1 is available as a pre-compiled package or a source distribution. .. _install_precompiled: Installing a pre-compiled package --------------------------------- Pre-compiled chemfp distributions available for Python 3.8, Python 3.9, and Python 3.10. They were compiled under the"manylinux2014" Docker build environment, which means they should work for most Linux-based operating systems. While chemfp supports macOS, pre-compiled macOS distributions are not available. These binary packages are NOT open source. By default they are distributed under the `Chemfp Base License Agreement v1.1 `_, which lets you use some of the chemfp functionality for internal purposes, including the ability to create FPS files and use the "toolkit" APIs. However, the following features require a time-limited license key: - generate FPB files - create or search in-memory fingerprint arenas with more than 50,000 fingerprints - perform Tversky searches - perform Tanimoto searches of FPS files with more than 20 queries at a time. These features can be enabled with a valid license key, set via the environment variable ``CHEMFP_LICENSE``. Email sales@dalkescientific.com to request a evaluation license or to purchase a license. Use the following command to install a pre-compiled version of chemfp:: python -m pip install chemfp -i https://chemfp.com/packages/ If you get the message: .. code-block:: none ERROR: Could not find a version that satisfies the requirement chemfp (from versions: none) ERROR: No matching distribution found for chemfp then you are likely installing from a non-Linux-based operating system like macOS or Microsoft Windows. Pre-compiled installers are not yet available for those OSes. Currently macOS is supported in the source distribution and Windows is not yet supported. .. _install_source: Installing from source --------------------------------- The chemfp source distribution requires that Python and a C compiler be installed in your machines. Since chemfp doesn't yet run on Microsoft Windows (for tedious technical reasons), then your machine likely already has both Python and a C compiler installed. In case you don't have Python, or you want to install a newer version, you can download a copy of Python from http://www.python.org/download/ . If you don't have a C compiler, .. well, do I really need to give you a pointer for that? chemfp 4.1 supports Python 3.8 or newer. The core chemfp functionality does not depend on a third-party library but you will need a chemistry toolkit in order to generate new fingerprints from structure files. chemfp supports the free Open Babel, RDKit, and CDK toolkits and the proprietary OEChem/OEGraphSim toolkits. Make sure you install the Python libraries for the toolkit(s) you select. The easiest way to install chemfp is with the `pip `_ installer. This comes with all supported versions of Python. To install the source distribution ``tar.gz`` file with pip:: python -m pip install chemfp-4.1.tar.gz The last step may need the ``--user`` option for a user installation instead of installing to Python's own site-package. A likely better option is to use a `virtual environment `_. If you are making in-house modifications, you likely want to use the ``--editable`` option. Configuration options --------------------------------- Chemfp uses the legacy ``setup.py`` file, which implements several configuration options. These can be used either from the ``python setup.py build`` command-line or through environment variables. The environment variable solution is the easiest way to change the settings under pip. .. option:: --with-openmp, --without-openmp Chemfp uses OpenMP to parallelize multi-query searches. The default is :option:`--with-openmp`. If your C compiler does not support OpenMP, then you will need to use :option:`--without-openmp` to tell setup.py to compile without OpenMP:: python setup.py build --without-openmp The system "gcc" on macOS is not GNU gcc but is instead a front-end to Apple's clang-based compiler, which does not support OpenMP. You will need to configure chemfp to build without OpenMP support if you want to use that compiler. Otherwise, install GNU gcc with OpenMP support (eg, using `Homebrew ` or "conda") and use that alternative compiler. You can also set the environment variable CHEMFP_OPENMP to "1" to compile with OpenMP support, or to "0" to compile without OpenMP support:: CHEMFP_OPENMP=0 python -m pip install chemfp-4.1.tar.gz Use the environment variable ``CC`` to change the C compiler. On macOS I use the following to compile chemfp using GNU gcc 11: CC=gcc-11 python -m pip install chemfp-4.1.tar.gz .. NOTE:: If you are benchmarking then use the ``CC`` or ``CFLAGS`` environment variables to ensure you are passing the appropriate architecture flags to the compiler, eg, ``CC -march=native``. There are additional legacy configuration options which are no longer documented and will likely be removed in the future. .. _install_cdk: Installing CDK and JPype --------------------------------- CDK is a Java package. Chemfp is written for Python. Chemfp supports CDK. How can chemfp call into CDK? There are several ways for Python programs to call into Java. I tried two of them and ended up using JPype, following Noel O'Boyle's suggestion. There are a few ways to `install JPype `_. The easiest is likely to use conda (see the documentation for details) or, if you have the the Java run-time, you can pip install it with:: python -m pip install JPype1 This installs the `jpype` module for Python. You'll also need to put the CDK JAR on the CLASSPATH. For example, in the following I download the JAR file then set the CLASSPATH using bash syntax:: cd ~/ftps curl -LO https://github.com/cdk/cdk/releases/download/cdk-2.3/cdk-2.3.jar export CLASSPATH=/Users/dalke/ftps/cdk-2.3.jar (I put my manually downloaded packages in ~/ftps/ for historic reasons.) Use ``cdk2fps --version`` to diagnose if things are working. If it's a success it should look like: .. code-block:: console % cdk2fps --version cdk2fps 4.1 The following message occurs if jpype isn't installed: .. code-block:: none Cannot run cdk2fps: Cannot import jpype, which is required for chemfp to access the CDK jar: No module named 'jpype' The following message occurs if jpype is installed (eg, via pip) but either Java isn't installed on your machine or jpype couldn't find your installation:: Cannot run cdk2fps: No JVM shared library file (libjvm.so) found. Try setting up the JAVA_HOME environment variable properly. The following message occurs if the CDK JAR file is not on the CLASSPATH:: Cannot run cdk2fps: It appears that CDK is not installed: Unable to access the CDK jar via JPype. Is the jar on your CLASSPATH?: Failed to import 'org.openscience' The following message occurs if you are using Python 2 (jpype and therefore chemfp does not support Python 2):: Cannot run cdk2fps: Unable to use cdk2fps on Python 2