Installing¶
Chemfp 3.5 is available as a pre-compiled package or a source distribution.
Installing a pre-compiled package¶
Pre-compiled packages for chemfp are available for Python 2.7, Python 3.6, Python 3.7, Python 3.8 and Python 3.9. They were compiled under the “manylinux1” and “manylinux2014” Docker build environment, which means they should work for most Linux-based operating systems.
These binary packages are NOT open source. By default they are distributed under the Chemfp Base License Agreement v1.1, which lets you use some of the chemfp functionality for internal purposes, including the ability to create FPS files and use the “toolkit” APIs.
However, the following features require a time-limited license key:
- generate FPB files
- create or search in-memory fingerprint arenas with more than 50,000 fingerprints
- perform Tversky searches
- perform Tanimoto searches of FPS files with more than 20 queries at a time.
These features can be enabled with a valid license key, set via the
environment variable CHEMFP_LICENSE
. Email
sales@dalkescientific.com to request a evaluation license or to
purchase a license.
Use the following command to install a pre-compiled version of chemfp:
python -m pip install chemfp -i https://chemfp.com/packages/
If you get the message:
ERROR: Could not find a version that satisfies the requirement chemfp (from versions: none)
ERROR: No matching distribution found for chemfp
then you are likely installing from a non-Linux-based operating system like macOS or Microsoft Windows. Pre-compiled installers are not yet available for those OSes. Currently macOS is supported in the source distribution and Windows is not yet supported.
Installing from source¶
The chemfp source distribution requires that Python and a C compiler be installed in your machines. Since chemfp doesn’t yet run on Microsoft Windows (for tedious technical reasons), then your machine likely already has both Python and a C compiler installed. In case you don’t have Python, or you want to install a newer version, you can download a copy of Python from http://www.python.org/download/ . If you don’t have a C compiler, .. well, do I really need to give you a pointer for that?
You may use chemfp 3.5 with either Python 2.7, or Python 3.6 or newer.
The core chemfp functionality does not depend on a third-party library but you will need a chemistry toolkit in order to generate new fingerprints from structure files. chemfp supports the free Open Babel, RDKit, and CDK toolkits and the proprietary OEChem/OEGraphSim toolkits. Make sure you install the Python libraries for the toolkit(s) you select.
The easiest way to install chemfp is with the pip installer. This comes with Python 2.7.9 or
later, and with Python 3.4 and later so is almost certainly installed
if you have Python. To install the source distribution tar.gz
file
with pip:
python -m pip install chemfp-3.5.tar.gz
Otherwise you can use Python’s standard “setup.py”. Read http://docs.python.org/install/index.html for details of how to use it. The short version is to do the following:
tar xf chemfp-3.5.tar.gz
cd chemfp-3.5
python setup.py build
python setup.py install
The last step may need a sudo
if you otherwise cannot write to your
Python site-package. A far better option is to use a virtual environment.
Configuration options¶
The setup.py file has several compile-time options which can be set
either from the python setup.py build
command-line or through
environment variables. The environment variable solution is the
easiest way to change the settings under pip.
-
--with-openmp
,
--without-openmp
¶
Chemfp uses OpenMP to parallelize multi-query searches. The default is
--with-openmp
. If you have a very old version of gcc, or an
older version of clang, or are on a Mac where the default clang-based
compiler doesn’t support OpenMP, then you will need to use
--without-openmp
to tell setup.py to compile without
OpenMP:
python setup.py build --without-openmp
You can also set the environment variable CHEMFP_OPENMP to “1” to compile with OpenMP support, or to “0” to compile without OpenMP support:
CHEMFP_OPENMP=0 python -m pip install chemfp-3.4.tar.gz
Note: you can use the environment variable CC
to change the C
compiler. For example, the clang compiler on Mac doesn’t support
OpenMP so I installed gcc-10 using Homebrew <https://brew.sh/> and
compiled chemfp using:
CC=gcc-10 python -m pip install chemfp-3.5.tar.gz
-
--with-ssse3
,
--without-ssse3
¶
Chemfp by default compiles with SSSE3 support, which was first
available in 2006 so almost certainly available on your Intel-like
processor. In case I’m wrong (are you compiling for ARM? If so, send
me any compiler patches), you can disable SSSE3 support using the
--without-ssse3
, or set the environment variable
CHEMFP_SSSE3
to “0”.
-
--with-avx2
,
--without-avx2
¶
Chemfp 3.0 added support for the AVX2 instruction set. This can be 30% faster than the POPCNT instruction for 1024 or 2048 bit fingerprints. By default it is enabled, and chemfp checks that the chip implements AVX2 before calling the functions which are explicitly written with AVX2.
Use --without-avx2
or set the environment variable
CHEMFP_AVX2
to “0” to disable it.
-
--arch
=NAME
¶
By default the compiler generates code that works on a variety of processors. This may mean the compiler avoids using some processor-specific features which aren’t available on all of the processors in the base feature set. For example, it may avoid AVX2 instructions if the compiler has been configured to also support processors without AVX2 instructions.
The --arch
option configures the compiler to compile chemfp
for then named architecture. For example, if you are compiling chemfp
on the machine on the same machine where you will run it, you might
specify –arch native so the generated code is optimized for that
machine.
For gcc and clang, this is converted into -march=native.
If you are benchmarking chemfp then you should configure the compiler so it optimizes for the specific hardware architecture you are testing.
Note
This option is experimental and may be removed in the next
version. It is probably better to set this compiler option yourself
via CC
or CFLAGS
.
Installing CDK and JPype¶
CDK is a Java package. Chemfp is written for Python. How can chemfp call into CDK?
There are several ways for Python programs to call into Java. I tried two of them and ended up using JPype, following Noel O’Boyle’s suggestion.
There are a few ways to install JPype. The easiest is likely to use conda (see the documentation for details) or, if you have the the Java run-time, you can pip install it with:
python -m pip install JPype1
This installs the jpype module for Python.
You’ll also need to put the CDK JAR on the CLASSPATH. For example, in the following I download the JAR file then set the CLASSPATH using bash syntax:
cd ~/ftps
curl -LO https://github.com/cdk/cdk/releases/download/cdk-2.3/cdk-2.3.jar
export CLASSPATH=/Users/dalke/ftps/cdk-2.3.jar
(I put my manually downloaded packages in ~/ftps/ for historic reasons.)
Use cdk2fps --version
to diagnose if things are working. If it’s a
success it should look like:
% cdk2fps --version
cdk2fps 3.5
The following message occurs if jpype isn’t installed:
Cannot run cdk2fps: Cannot import jpype, which is required for
chemfp to access the CDK jar: No module named 'jpype'
The following message occurs if jpype is installed (eg, via pip) but either Java isn’t installed on your machine or jpype couldn’t find your installation:
Cannot run cdk2fps: No JVM shared library file (libjvm.so)
found. Try setting up the JAVA_HOME environment variable properly.
The following message occurs if the CDK JAR file is not on the CLASSPATH:
Cannot run cdk2fps: It appears that CDK is not installed: Unable to
access the CDK jar via JPype. Is the jar on your CLASSPATH?: Failed
to import 'org.openscience'
The following message occurs if you are using Python 2 (jpype and therefore chemfp does not support Python 2):
Cannot run cdk2fps: Unable to use cdk2fps on Python 2