Installing
==========
Chemfp requires that Python and a C compiler be installed in your
machines. Since chemfp doesn't run on Microsoft Windows (for tedious
technical reasons), then your machine likely already has both Python
and a C compiler installed. In case you don't have Python, or you want
to install a newer version, you can download a copy of Python from
http://www.python.org/download/ . If you don't have a C
compiler, .. well, do I really need to give you a pointer for that?
You may use chemfp 3.1 with either Python 2.7, or Python 3.5 or
newer. If you want to run on Python 2.6 then you'll need to use chemfp
2.1.
The core chemfp functionality does not depend on a third-party library
but you will need a chemistry toolkit in order to generate new
fingerprints from structure files. chemfp supports the free Open Babel
and RDKit toolkits and the proprietary OEChem toolkit. Make sure you
install the Python libraries for the toolkit(s) you select.
The easiest way to install chemfp is with the `pip
`_ installer. This comes with Python 2.7.9 or
later, and with Python 3.4 and later so it may already be
installed. To install the ``tar.gz`` file with pip::
pip install chemfp-3.1.tar.gz
Otherwise you can use Python's standard "setup.py". Read
http://docs.python.org/install/index.html for details of how to use
it. The short version is to do the following::
tar xf chemfp-3.1.tar.gz
cd chemfp-3.1
python setup.py build
python setup.py install
The last step may need a ``sudo`` if you otherwise cannot write to your
Python site-package. Another option is to use a `virtual environment
`_.
Configuration options
---------------------
The setup.py file has several compile-time options which can be set
either from the ``python setup.py build`` command-line or through
environment variables. The environment variable solution is the
easiest way to change the settings under pip.
.. option:: --with-openmp, --without-openmp
Chemfp uses OpenMP to parallelize multi-query searches. The default is
:option:`--with-openmp`. If you have a very old version of gcc, or an
older version of clang, or are on a Mac where the clang version
doesn't support OpenMP, then you will need to use
:option:`--without-openmp` to tell setup.py to compile without OpenMP::
python setup.py build --without-openmp
You can also set the environment variable CHEMFP_OPENMP to "1" to
compile with OpenMP support, or to "0" to compile without OpenMP
support::
CHEMFP_OPENMP=0 pip install chemfp-3.1.tar.gz
Note: you can use the environment variable ``CC`` to change the C
compiler. For example, the clang compiler on Mac doesn't support
OpenMP so I installed gcc-7 and compile using::
CC=gcc-7 pip install chemfp-3.1.tar.gz
.. option:: --with-ssse3, --without-ssse3
Chemfp by default compiles with SSSE3 support, which was first
available in 2006 so almost certainly available on your Intel-like
processor. In case I'm wrong (are you compiling for ARM? If so, send
my any compiler patches), you can disable SSSE3 support using the
:option:`--without-ssse3`, or set the environment variable
``CHEMFP_SSSE3`` to "0".
Compiling with SSSE3 support has a very odd failure case. If you
compile with the SSSE3 flag enabled, then take the binary to a machine
without SSSE3 support, then it will crash because all of the code will
be compiled to expect the SSSE3 instruction set even though only one
file, popcount_SSSE3.c, should be compiledthat way.
The solution is to compile popcount_SSSE3.c with the SSSE3 flag
enabled and all of the other files without that flag. Unfortunately,
Python's setup.py doesn't make that easy to do. If this is a problem
for you, take a look at ``filter_gcc`` in the chemfp
distribution. It's used like this::
CC=$PWD/filter_gcc python setup.py build
It's a bit of a hack so contact me if you have problems.
.. option:: --with-avx2, --without-avx2
Chemfp 3.0 added support for the AVX2 instruction set. This can be 15%
faster than the POPCNT instruction for large (ie, 2048 bit or greater)
fingerprints. By default it is disabled. Use :option:`--with-avx2` or
set the environment variable ``CHEMFP_AVX2`` to "1" to enable it.
While 15% faster sounds great, I have only tested the AVX2 support in
one machine environment. I expect that it will have similar
portability problems as the SSSE3 code had, that is, if the code is
compiled with the AVX2 compilation flag then it's free to assume that
some other instruction sets, like SSE4.2, are also available. Because
of the way Python's setup.py works, all of the code will be compiled
to use these more advanced instructions. If chemfp is then run on a
machine without those instructions, it will cause the program to crash
with an illegal instruction.
Chemfp does check that the chip implements AVX2 before calling the
functions which are explicitly written with AVX2. The problem is that
other parts of the code may be affected, at the compiler's
disgression. I have no way of knowing.
A solution is to modify the ``filter_gcc`` option I mentioned
earlier. Let me know if this is something you want me to work on with
you.