{ "cells": [ { "cell_type": "markdown", "id": "0da2fdf3", "metadata": {}, "source": [ "# What's new in chemfp?" ] }, { "cell_type": "markdown", "id": "4f3bd7eb", "metadata": {}, "source": [ "## What's new in chemfp 4.0? (12 June 2022)\n", "\n", "The main themes for chemfp 4.0 are \"notebook usability\" and \"diversity selection.\"" ] }, { "cell_type": "markdown", "id": "4c57177d", "metadata": {}, "source": [ "### High-level API\n", "\n", "The chemfp API was primarily designed for application developers who want precise control over what happens. The API was used for command-line tools, like [the ones that come with chemfp](tools.rst), and for web services.\n", "\n", "The command-line tools were written for a wider audience. As a result, I saw that people who used chemfp in a Jupyter notebook, where they had access to the full chemfp API, still preferred \"!shell\"-ing out to the command-line tools. And rightly so, as it was much simpler.\n", "\n", "Chemfp 4.0 includes new \"high-level\" functionality to remedy this problem. The table below shows which API functions correspond to which command-line tools:\n", "\n", "| Command-line | High-level API | Description |\n", "|------------------------|---------------------|-------------------------------------------------------|\n", "| simsearch | chemfp.simsearch | Similarity search |\n", "| rdkit2fps | chemfp.rdkit2fps | Use RDKit to generate fingerprints |\n", "| ob2fps | chemfp.ob2fps | Use Open Babel to generate fingerprints |\n", "| oe2fps | chemfp.oe2fps | Use OEChem/OEGraphSim to generate fingerprints |\n", "| cdk2fps | chemfp.cdk2fps | Use CDK to generate fingerprints |\n", "| (no command-line tool) | chemfp.convert2fps | Figure out which tool to use to generate fingerprints |\n", "| fpcat | (no high-level API) | Convert between fingerprint file formats |\n", "| chemfp maxmin | chemfp.maxmin | MaxMin diversity picking |\n", "| chemfp spherex | chemfp.spherex | Sphere exclusion diversity picking |\n", "| chemfp heapsweep | chemfp.heapsweep | HeapSweep diversity picking |\n" ] }, { "cell_type": "markdown", "id": "877c7ec4", "metadata": {}, "source": [ "Here's an example showing the distribution of fingerprint similarity to caffiene (CHEMBL113) for all fingerprints in ChEMBL 30 which are at least 0.4 similar:" ] }, { "cell_type": "code", "execution_count": 1, "id": "d5394a75", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[]], dtype=object)" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXcAAAEICAYAAACktLTqAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjUuMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/YYfK9AAAACXBIWXMAAAsTAAALEwEAmpwYAAAP3klEQVR4nO3df5BddXnH8fdjIpRJcHVA0xoigQbQDGFGsy3qdHSjHQ3GYItWk9LaOEhGp+q0k/4IrdMydRzxD+0MiuOkStMyyA6lM/IrHcSWrTMOdCBDkR8WCzSUhGoEdGsQa2Of/nEv4zbdTc7uPbvn7pP3a+YOe84993ueh7v7yd3vOXtOZCaSpFpe0HUBkqT2Ge6SVJDhLkkFGe6SVJDhLkkFGe6SVJDhLkkFGe6SVJDhLg0gevw50tDxm1LHhYj4w4g4EBE/iIiHI+LNEbEkIv4oIh7tr98bEav6278+Iu6OiMn+f18/ZayJiPh4RHwd+CFwZkS8MiJuj4hn+uO/u6teJYDw8gOqLiLOAb4KnJ+ZT0bEamAJcBHwXuBdwLeA84D9QAKPAh8BrgN+DfgcsCYzn46ICeBM4ALgYWAZ8ADwJ8A1wDrgduANmfnQwnQp/V9+ctfx4CfAicDaiHhhZu7LzEeB9wMfzcyHs+e+zHwa2AT8a2Zek5mHM/M64F+AzVPG3J2ZD2bmYWAjsC8z/7K//b3A39L7R0HqhOGu8jLzEeB3gMuBgxExHhEvB1bR+4R+pJcDjx+x7nFg5ZTlJ6Z8fTpwfkR8//kHcDHws600IM2B4a7jQmZ+KTN/iV4QJ/BJegH989Ns/mR/u6leARyYOuSUr58A/jEzXzzlsTwzP9heB9LsGO4qLyLOiYg3RcSJwI+A54D/Ab4AfCwizuqf9XJeRJwC7AHOjohfj4ilEfEeYC1wywy7uKW//W9GxAv7j1+IiFctQHvStAx3HQ9OBK4AngK+DbwMuAz4NHA98BXgP4EvAif1593fDuwAngb+AHh7Zj413eCZ+QPgLcAWep/6v03vN4MT568l6eg8W0aSCvKTuyQVZLhLUkGGuyQVZLhLUkFLu9x5RGwGNp988smXnn322XMa49lnn2XZsmXtFtYRexk+VfoAexlWg/Syd+/epzLzpdM+mZmdP9avX59zdccdd8z5tcPGXoZPlT4y7WVYDdILcE/OkKtOy0hSQYa7JBVkuEtSQYa7JBVkuEtSQYa7JBVkuEtSQYa7JBXU6V+otuH+A5Ns23lrJ/ved8WmTvYrScfiJ3dJKshwl6SCDHdJKshwl6SCOg33iNgcEbsmJye7LEOSyuk03DPz5szcPjIy0mUZklSO0zKSVJDhLkkFGe6SVJDhLkkFGe6SVJDhLkkFGe6SVJDhLkkFGe6SVJDhLkkFGe6SVJDhLkkFGe6SVJDhLkkFGe6SVJDhLkkFGe6SVJDhLkkFGe6SVJDhLkkFLZ2PQSPiV4BNwIuAL2bmV+ZjP5Kk6TX+5B4RV0fEwYh44Ij1GyPi4Yh4JCJ2AmTmlzPzUuADwHvaLVmSdCyzmZbZDWycuiIilgBXARcAa4GtEbF2yiYf7T8vSVpAkZnNN45YDdySmef2l18HXJ6Zb+0vX9bf9Ir+4/bM/OoMY20HtgOsWLFi/fj4+JwaOPjMJN95bk4vHdi6lSOtjnfo0CGWL1/e6phdqdJLlT7AXobVIL1s2LBhb2aOTvfcoHPuK4EnpizvB84HPgz8MjASEWsy8/NHvjAzdwG7AEZHR3NsbGxOBXzm2hv51P3zcujgmPZdPNbqeBMTE8z1/8OwqdJLlT7AXobVfPUyL6mYmVcCV87H2JKkYxv0VMgDwKopy6f110mSOjRouN8NnBURZ0TECcAW4KamL46IzRGxa3JycsAyJElTzeZUyOuAO4FzImJ/RFySmYeBDwG3Ad8Ers/MB5uOmZk3Z+b2kZF2D0xK0vGu8Zx7Zm6dYf0eYE9rFUmSBtbNaSZFrN55a6vj7Vh3mG0Nxtx3xaZW9yupHq8tI0kFdRruHlCVpPnRabh7QFWS5ofTMpJUkOEuSQU55y5JBTnnLkkFOS0jSQUZ7pJUkOEuSQUZ7pJUkGfLSFJBni0jSQV5VchFqO2rUc6GV6SUFgfn3CWpIMNdkgoy3CWpIMNdkgryVEhJKshTISWpIKdlJKkgw12SCjLcJakgw12SCjLcJakgw12SCvI8d0kqyPPcJakgp2UkqSDDXZIKMtwlqSDDXZIKMtwlqSDDXZIKMtwlqSDDXZIK8i9UJakg/0JVkgpyWkaSCjLcJakgw12SCjLcJakgw12SCjLcJakgw12SCjLcJakgw12SCjLcJakgw12SCjLcJakgw12SCvKSv5JUkJf8laSCnJaRpIIMd0kqyHCXpIIMd0kqyHCXpIIMd0kqyHCXpIIMd0kqaGnXBWhxWb3z1kbb7Vh3mG0Nt21i3xWbWhtLOh74yV2SCjLcJakgw12SCjLcJakgw12SCjLcJakgw12SCjLcJakgw12SCjLcJakgw12SCmo93CPizIj4YkTc0PbYkqRmGoV7RFwdEQcj4oEj1m+MiIcj4pGI2AmQmY9l5iXzUawkqZmmn9x3AxunroiIJcBVwAXAWmBrRKxttTpJ0pxEZjbbMGI1cEtmnttffh1weWa+tb98GUBmfqK/fENmvuso420HtgOsWLFi/fj4+JwaOPjMJN95bk4vHTorTsJeZrBu5Uh7g83CoUOHWL58eSf7bpu9DKdBetmwYcPezByd7rlBrue+EnhiyvJ+4PyIOAX4OPDqiLjs+bA/UmbuAnYBjI6O5tjY2JyK+My1N/Kp+2tcln7HusP2MoN9F4+1NtZsTExMMNfvzWFjL8NpvnppPUky82ngA22PK0lqbpCzZQ4Aq6Ysn9ZfJ0nq2CDhfjdwVkScEREnAFuAm2YzQERsjohdk5OTA5QhSTpS01MhrwPuBM6JiP0RcUlmHgY+BNwGfBO4PjMfnM3OM/PmzNw+MtLNwTJJqqrRnHtmbp1h/R5gT6sVSZIG5uUHJKmgTsPdOXdJmh+dhrtz7pI0P5yWkaSCDHdJKshwl6SCOr2QSURsBjavWbOmyzK0CKzeeWsn+929cVkn+5UG5QFVSSrIaRlJKshwl6SCDHdJKshwl6SCvPyAJBXk2TKSVJDTMpJUkOEuSQUZ7pJUkOEuSQV5towkFeTZMpJUkNMyklSQ4S5JBRnuklSQ4S5JBRnuklSQ4S5JBXmeuyQV5HnuklSQ0zKSVJDhLkkFGe6SVJDhLkkFGe6SVJDhLkkFGe6SVJDhLkkFGe6SVJCXH5Ckgrz8gCQV5LSMJBVkuEtSQYa7JBVkuEtSQYa7JBVkuEtSQYa7JBVkuEtSQYa7JBVkuEtSQYa7JBVkuEtSQYa7JBW0tMudR8RmYPOaNWu6LEOa0f0HJtm289ZO9r3vik2d7Fc1eMlfSSrIaRlJKshwl6SCDHdJKshwl6SCDHdJKshwl6SCDHdJKshwl6SCDHdJKshwl6SCDHdJKshwl6SCDHdJKshwl6SCDHdJKshwl6SCDHdJKshwl6SCDHdJKshwl6SClrY9YEQsAz4H/BiYyMxr296HJOnoGn1yj4irI+JgRDxwxPqNEfFwRDwSETv7qy8CbsjMS4ELW65XktRA02mZ3cDGqSsiYglwFXABsBbYGhFrgdOAJ/qb/aSdMiVJsxGZ2WzDiNXALZl5bn/5dcDlmfnW/vJl/U33A9/LzFsiYjwzt8ww3nZgO8CKFSvWj4+Pz6mBg89M8p3n5vTSobPiJOxlyHTZx7qVI62Od+jQIZYvX97qmF1pu5f7D0y2NtZsnTGyZM69bNiwYW9mjk733CBz7iv56Sd06IX6+cCVwGcjYhNw80wvzsxdwC6A0dHRHBsbm1MRn7n2Rj51f+uHDjqxY91hexkyXfax7+KxVsebmJhgrj9nw6btXrbtvLW1sWZr98Zl8/K+tP5dm5nPAu9re1xJUnODnAp5AFg1Zfm0/jpJUscGCfe7gbMi4oyIOAHYAtw0mwEiYnNE7Jqc7G6+S5Iqanoq5HXAncA5EbE/Ii7JzMPAh4DbgG8C12fmg7PZeWbenJnbR0baPXAkSce7RnPumbl1hvV7gD2tViRJGpiXH5CkgjoNd+fcJWl+dBruzrlL0vxo/Beq81pExHeBx+f48lOBp1osp0v2Mnyq9AH2MqwG6eX0zHzpdE8MRbgPIiLumenPbxcbexk+VfoAexlW89WLB1QlqSDDXZIKqhDuu7ouoEX2Mnyq9AH2MqzmpZdFP+cuSfr/KnxylyQdwXCXpIIWTbjPcL/W6bZ7Z0RkRAztaVLH6iUitkXEdyPin/uP93dR57E0eU8i4t0R8VBEPBgRX1roGptq8J78+ZT341sR8f0OymykQS+viIg7IuLeiPhGRLytizqbaNDL6RHx9/0+JiLitC7qPJaZ7kM95fmIiCv7fX4jIl4z8E4zc+gfwBLgUeBM4ATgPmDtNNudDHwNuAsY7bruufYCbAM+23WtLfRxFnAv8JL+8su6rnuQ768p238YuLrrugd4X3YBH+x/vRbY13XdA/TyN8Bv9b9+E3BN13XP0MsbgNcAD8zw/NuAvwMCeC3wT4Puc7F8cv9F4JHMfCwzfwyMA++YZruPAZ8EfrSQxc1S016GXZM+LgWuyszvAWTmwQWusanZvidbgesWpLLZa9JLAi/qfz0CPLmA9c1Gk17WAv/Q//qOaZ4fCpn5NeCZo2zyDuCvs+cu4MUR8XOD7HOxhPt092tdOXWD/q8xqzKzu5shNnPMXvre2f/17IaIWDXN811r0sfZwNkR8fWIuCsiNi5YdbPT9D0hIk4HzuCngTJsmvRyOfAbEbGf3iW7P7wwpc1ak17uAy7qf/2rwMkRccoC1Na2xt+DTS2WcD+qiHgB8GlgR9e1tORmYHVmngfcDvxVx/XM1VJ6UzNj9D7t/kVEvLjLglqwBbghM3/SdSED2ArszszT6E0HXNP/GVqMfg94Y0TcC7yR3q0+F/N705rF8oYe636tJwPnAhMRsY/enNVNQ3pQ9Zj3ns3MpzPzv/qLXwDWL1Bts9HkHrr7gZsy878z89+Ab9EL+2Ezm/sBb2F4p2SgWS+XANcDZOadwM/Qu3jVsGnys/JkZl6Uma8G/ri/7vsLVmF7Wr8n9WIJ96PerzUzJzPz1MxcnZmr6R1QvTAz7+mm3KM65r1nj5hru5DebQyHTZN76H6Z3qd2IuJUetM0jy1gjU01uh9wRLwSeAm9W04Oqya9/DvwZoCIeBW9cP/uglbZTJOflVOn/NZxGXD1AtfYlpuA9/bPmnktMJmZ/zHIgI1us9e1zDwcEc/fr3UJvTMVHoyIPwPuycxZ3Zi7Sw17+UhEXAgcpncQZltnBc+gYR+3AW+JiIfo/ar8+5n5dHdVT28W319bgPHsn94wjBr2soPeFNnv0ju4um0Ye2rYyxjwiYhIemfK/XZnBR9F9O5DPQac2j/W8afACwEy8/P0jn28DXgE+CHwvoH3OYTvqSRpQItlWkaSNAuGuyQVZLhLUkGGuyQVZLhLUkGGuyQVZLhLUkH/C0mddVN8AXmjAAAAAElFTkSuQmCC\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "import chemfp\n", "\n", "chemfp.simsearch(query_id=\"CHEMBL113\", targets=\"chembl_30.fpb\", threshold=0.4\n", " ).to_pandas().hist(\"score\", log=True)" ] }, { "cell_type": "markdown", "id": "1abbddd6", "metadata": {}, "source": [ "### Pandas integration\n", "\n", "As you saw in the previous example, chemfp 4.0 add new methods which export chemfp data to a Pandas dataframe. The following shows the nearst 5 neighbors to CHEMBL1113 (using k=6 as caffine is already present):\n", "\n" ] }, { "cell_type": "code", "execution_count": 2, "id": "038250aa", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
target_idscore
0CHEMBL1131.000000
1CHEMBL45913690.733333
2CHEMBL12320480.709677
3CHEMBL4467840.677419
4CHEMBL17387910.666667
5CHEMBL20581730.666667
\n", "
" ], "text/plain": [ " target_id score\n", "0 CHEMBL113 1.000000\n", "1 CHEMBL4591369 0.733333\n", "2 CHEMBL1232048 0.709677\n", "3 CHEMBL446784 0.677419\n", "4 CHEMBL1738791 0.666667\n", "5 CHEMBL2058173 0.666667" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import chemfp\n", "\n", "chemfp.simsearch(query_id=\"CHEMBL113\", targets=\"chembl_30.fpb\", k=6).to_pandas()" ] }, { "cell_type": "markdown", "id": "1cbbb589", "metadata": {}, "source": [ "Multi-query search have three columns of output, with the query id in the first: " ] }, { "cell_type": "code", "execution_count": 3, "id": "6944c399", "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "60ab2ce6eb2f4a0a8d9629c5c744c531", "version_major": 2, "version_minor": 0 }, "text/plain": [ "queries: 0%| | 0/10…" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
query_idtarget_idscore
0CHEMBL1316726CHEMBL22613390.224490
1CHEMBL4588846CHEMBL4954210.250000
2CHEMBL3696763CHEMBL1853540.214286
3CHEMBL4091061CHEMBL16419560.278481
4CHEMBL2349135*NaN
5CHEMBL1604960CHEMBL3431410.315789
6CHEMBL2012902*NaN
7CHEMBL20406CHEMBL23228230.230769
8CHEMBL1796344CHEMBL611060.207547
9CHEMBL3400341*NaN
\n", "
" ], "text/plain": [ " query_id target_id score\n", "0 CHEMBL1316726 CHEMBL2261339 0.224490\n", "1 CHEMBL4588846 CHEMBL495421 0.250000\n", "2 CHEMBL3696763 CHEMBL185354 0.214286\n", "3 CHEMBL4091061 CHEMBL1641956 0.278481\n", "4 CHEMBL2349135 * NaN\n", "5 CHEMBL1604960 CHEMBL343141 0.315789\n", "6 CHEMBL2012902 * NaN\n", "7 CHEMBL20406 CHEMBL2322823 0.230769\n", "8 CHEMBL1796344 CHEMBL61106 0.207547\n", "9 CHEMBL3400341 * NaN" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import chemfp\n", "\n", "with chemfp.load_fingerprints(\"chembl_30.fpb\") as arena:\n", " queries, targets = arena.train_test_split(train_size=10, test_size=100, rng=12345)\n", "\n", "result = chemfp.simsearch(queries=queries, targets=targets, k=1, threshold=0.2)\n", "result.to_pandas()" ] }, { "cell_type": "markdown", "id": "3f364e3b", "metadata": {}, "source": [ "The placeholder target id \"\\*\" and score of None (which pandas converts to a NaN) is because three of the compounds had no nearest neighbor with a similarity of at least 0.2.\n", "\n", "The column names and placeholders can be changed. The following uses `empty = None` to not include those queries in the output:" ] }, { "cell_type": "code", "execution_count": 4, "id": "ce77f0ca", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
FROMTOTanimoto
0CHEMBL1316726CHEMBL22613390.224490
1CHEMBL4588846CHEMBL4954210.250000
2CHEMBL3696763CHEMBL1853540.214286
3CHEMBL4091061CHEMBL16419560.278481
4CHEMBL1604960CHEMBL3431410.315789
5CHEMBL20406CHEMBL23228230.230769
6CHEMBL1796344CHEMBL611060.207547
\n", "
" ], "text/plain": [ " FROM TO Tanimoto\n", "0 CHEMBL1316726 CHEMBL2261339 0.224490\n", "1 CHEMBL4588846 CHEMBL495421 0.250000\n", "2 CHEMBL3696763 CHEMBL185354 0.214286\n", "3 CHEMBL4091061 CHEMBL1641956 0.278481\n", "4 CHEMBL1604960 CHEMBL343141 0.315789\n", "5 CHEMBL20406 CHEMBL2322823 0.230769\n", "6 CHEMBL1796344 CHEMBL61106 0.207547" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "result.to_pandas(columns=[\"FROM\", \"TO\", \"Tanimoto\"], empty=None)" ] }, { "cell_type": "markdown", "id": "b9e3f02d", "metadata": {}, "source": [ "### Progress bars\n", "\n", "A you saw in the previous example, progress bars have been implemented, both on the command-line and through the high-level API.\n", "\n", "Here's an example using the command-line to generate RDKit fingerprints from a PubChem file:" ] }, { "cell_type": "code", "execution_count": 5, "id": "7383c05e", "metadata": { "scrolled": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Compound_099000001_099500000.sdf.gz: 100%|█| 6.77M/6.77M [00:23<00:00, 282kbytes\n" ] } ], "source": [ "!rdkit2fps Compound_099000001_099500000.sdf.gz -o Compound_099000001_099500000.fps" ] }, { "cell_type": "markdown", "id": "7c4b5af4", "metadata": {}, "source": [ "and here is the equivalent using the high-level API:" ] }, { "cell_type": "code", "execution_count": 6, "id": "bb2e48ac", "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "8230ef4af4f14d79bd362fd903557320", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Compound_099000001_099500000.sdf.gz: 0%| | 0.00/6.77M …" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/plain": [ "ConversionInfo(\"Converted structures from 'Compound_099000001_099500000.sdf.gz'. #input_records=10740, #output_records=10740 (total: 8.19 s)\")" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import chemfp\n", "chemfp.rdkit2fps(\"Compound_099000001_099500000.sdf.gz\", \"Compound_099000001_099500000.fps\")" ] }, { "cell_type": "markdown", "id": "538730eb", "metadata": {}, "source": [ "The progress bars use [tqdm](https://tqdm.github.io/). If tqdm is not installed then chemfp will use a version bundled with chemfp itself.\n", "\n", "The default progressbar can be disabled and re-enabled through the command-line:" ] }, { "cell_type": "code", "execution_count": 7, "id": "0cb8eaa0", "metadata": {}, "outputs": [], "source": [ "chemfp.set_default_progressbar(False) # Disable\n", "chemfp.set_default_progressbar(True) # Enable" ] }, { "cell_type": "markdown", "id": "07c3953d", "metadata": {}, "source": [ "If the environment variable CHEMFP_PROGRESSBAR is \"0\" then the default progressbar start in the disabled state." ] }, { "cell_type": "markdown", "id": "96e5a745", "metadata": {}, "source": [ "### Improved `repr`s\n", "\n", "Many of the chemfp objects now implement a custom `__repr__` which is more useful than the default. For examples:" ] }, { "cell_type": "code", "execution_count": 8, "id": "2118e6b0", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "FPBFingerprintArena(#fingerprints=2136187)" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import chemfp\n", "arena = chemfp.load_fingerprints(\"chembl_30.fpb\")\n", "arena" ] }, { "cell_type": "code", "execution_count": 9, "id": "19a596c2", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "SearchResult(#hits=5)" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "arena.knearest_tanimoto_search_fp(arena.fingerprints[12345], k=5)" ] }, { "cell_type": "code", "execution_count": 10, "id": "76c59f1d", "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "5fcb39e1df6a4edda1d630d64c8202c5", "version_major": 2, "version_minor": 0 }, "text/plain": [ "picks: 0%| |…" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/plain": [ "HeapSweepScoreSearch('picked 5 fps. similarity <= 1.0, #candidates=50, seed=-1 (pick: 38.37 ms, total: 38.43 ms)', picker=HeapSweepPicker(#candidates=45, #picks=5), result=PicksAndScores(#picks=5))" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "subarena = arena.sample(50, rng=54321)\n", "picks = chemfp.heapsweep(subarena, num_picks=5)\n", "picks" ] }, { "cell_type": "code", "execution_count": 11, "id": "8e641927", "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "a88c0118ad7b4e3188a7b2aa45d65aaf", "version_major": 2, "version_minor": 0 }, "text/plain": [ "queries: 0%| | 0/50…" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/plain": [ "MultiQuerySimsearch('3-nearest Tanimoto search. #queries=50, #targets=2136187 (search: 848.42 ms total: 848.55 ms)', result=SearchResults(#queries=50, #targets=2136187))" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "result = chemfp.simsearch(queries=subarena, targets=arena, k=3)\n", "result" ] }, { "cell_type": "markdown", "id": "43d4809b", "metadata": {}, "source": [ "### \"Shortcut\" toolkit imports\n", "\n", "Chemfp supports four different cheminformatics toolkits, which it uses for molecule I/O and fingerprint generation. Chemfp also implements a \"toolkit\" wrapper API, so chemfp-based programs can work with multiple toolkits in a consistent way.\n", "\n", "The standard way to access these toolkits is by importing the toolkit wrapper subpackage, like:" ] }, { "cell_type": "code", "execution_count": 12, "id": "df6d9246", "metadata": {}, "outputs": [], "source": [ "from chemfp import openbabel_toolkit\n", "#from chemfp import cdk_toolkit\n", "#from chemfp import openeye_toolkit\n", "#from chemfp import rdkit_toolkit" ] }, { "cell_type": "markdown", "id": "14cfbca7", "metadata": {}, "source": [ "This adds as a bit of overhead which makes interactive use a bit less enjoyable.\n", "\n", "Chemfp 4.0 adds a \"shortcut\" importer object, where the import occurs the first time it is accessed. For example, here's a way to get the list of RDKit formats that chemfp supports:" ] }, { "cell_type": "code", "execution_count": 13, "id": "8491e121", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[Format('rdkit/smi'),\n", " Format('rdkit/can'),\n", " Format('rdkit/usm'),\n", " Format('rdkit/sdf'),\n", " Format('rdkit/sdf3k'),\n", " Format('rdkit/smistring'),\n", " Format('rdkit/canstring'),\n", " Format('rdkit/usmstring'),\n", " Format('rdkit/molfile'),\n", " Format('rdkit/rdbinmol'),\n", " Format('rdkit/fasta'),\n", " Format('rdkit/sequence'),\n", " Format('rdkit/helm'),\n", " Format('rdkit/mol2'),\n", " Format('rdkit/pdb'),\n", " Format('rdkit/xyz'),\n", " Format('rdkit/mae'),\n", " Format('rdkit/inchi'),\n", " Format('rdkit/inchikey'),\n", " Format('rdkit/inchistring'),\n", " Format('rdkit/inchikeystring')]" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import chemfp\n", "chemfp.rdkit.get_formats()" ] }, { "cell_type": "markdown", "id": "b2046873", "metadata": {}, "source": [ "The special importer objects are `chemfp.openbabel`, `chemfp.openeye`, `chemfp.cdk` and `chemfp.rdkit`.\n", "\n", "You should not import these objects as object (like `from chemfp import openeye`) because you will likely get confused with the real `import openeye` -- I certainly do! Instead, alway use `chemfp.openbabel` and, if you want to import a specifc toolkit use `from chemfp import openbabel_toolkit as ob_toolkit` or `as T`.\n", "\n", "NOTE: it seems that Jupyter doesn't understand how to get the properties of these importer objects as tab completion in the notebook doesn't work, while it's just fine in the Python shell." ] }, { "cell_type": "markdown", "id": "b3103c59", "metadata": {}, "source": [ "### FingerprintType improvements" ] }, { "cell_type": "markdown", "id": "cec5b9ea", "metadata": {}, "source": [ "A `FingerprintType` object is the interface to a given toolkit fingerprint type, including its parameters. Before chemfp 4.0 the only way to get a FingerprintType was to use a fingerprint type string, like:" ] }, { "cell_type": "code", "execution_count": 14, "id": "9a33147c", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "OpenBabelFP2FingerprintType_v1()" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import chemfp\n", "fptype = chemfp.get_fingerprint_type(\"OpenBabel-FP2\")\n", "fptype" ] }, { "cell_type": "markdown", "id": "acbfdb75", "metadata": {}, "source": [ "This was too generic, as it proved difficult to remember the correct string name and it's parameters.\n", "\n", "With chemfp 4.0, each of the toolkit wrapper modules includes a FingerprintType with the default values for each of the toolkit fingerprints families. For the Open Babel example:" ] }, { "cell_type": "code", "execution_count": 15, "id": "08558dce", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "OpenBabelFP2FingerprintType_v1()" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from chemfp import openbabel_toolkit\n", "openbabel_toolkit.fp2" ] }, { "cell_type": "code", "execution_count": 16, "id": "6aa3d223", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "RDKitMorganFingerprintType_v1()" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from chemfp import rdkit_toolkit\n", "rdkit_toolkit.morgan" ] }, { "cell_type": "markdown", "id": "6cf7ebb3", "metadata": {}, "source": [ "The newly available FingerprintType objects are:\n", "\n", "* chemfp.rdkit_toolkit.avalon - \"RDKit-Avalon\" fingerprints\n", "* chemfp.rdkit_toolkit.maccs166 - \"RDKit-MACCS166\" fingerprints\n", "* chemfp.rdkit_toolkit.morgan - \"RDKit-Morgan\" fingerprints\n", "* chemfp.rdkit_toolkit.atom_pair - \"RDKit-AtomPair\" fingerprints\n", "* chemfp.rdkit_toolkit.pattern - \"RDKit-Pattern\" fingerprints\n", "* chemfp.rdkit_toolkit.rdk - \"RDKit-Fingerprint\" fingerprints\n", "* chemfp.rdkit_toolkit.secfp - \"RDKit-SECFP\" fingerprints\n", "* chemfp.rdkit_toolkit.torsion - \"RDKit-Torsion\" fingerprints\n", "* chemfp.openeye_toolkit.circular - \"OpenEye-Circular\" fingerprints\n", "* chemfp.openeye_toolkit.maccs166 - \"OpenEye-MACCS166\" fingerprints\n", "* chemfp.openeye_toolkit.mdl_screen - \"OpenEye-MDLScreen\" fingerprints\n", "* chemfp.openeye_toolkit.molecule_screen- \"OpenEye-MoleculeScreen\" fingerprints\n", "* chemfp.openeye_toolkit.path - \"OpenEye-Path\" fingerprints\n", "* chemfp.openeye_toolkit.smarts_screen - \"OpenEye-SMARTSScreen\" fingerprints\n", "* chemfp.openeye_toolkit.tree -\"OpenEye-Tree\" fingerprints\n", "* chemfp.openbabel_toolkit.ecfp0 - \"OpenBabel-ECFP0\" fingerprints\n", "* chemfp.openbabel_toolkit.ecfp2 - \"OpenBabel-ECFP2\" fingerprints\n", "* chemfp.openbabel_toolkit.ecfp4 - \"OpenBabel-ECFP4\" fingerprints\n", "* chemfp.openbabel_toolkit.ecfp6 - \"OpenBabel-ECFP6\" fingerprints\n", "* chemfp.openbabel_toolkit.ecfp8 - \"OpenBabel-ECFP8\" fingerprints\n", "* chemfp.openbabel_toolkit.ecfp10 - \"OpenBabel-ECFP10\" fingerprints\n", "* chemfp.openbabel_toolkit.fp2 - \"OpenBabel-FP2\" fingerprints\n", "* chemfp.openbabel_toolkit.fp3 - \"OpenBabel-FP3\" fingerprints\n", "* chemfp.openbabel_toolkit.fp4 - \"OpenBabel-FP4\" fingerprints\n", "* chemfp.openbabel_toolkit.maccs - \"OpenBabel-MACCS\" fingerprints\n", "* chemfp.cdk_toolkit.atom_pairs2d - \"CDK-AtomPairs2D\" fingerprints\n", "* chemfp.cdk_toolkit.daylight - \"CDK-Daylight\" fingerprints\n", "* chemfp.cdk_toolkit.ecfp0 - \"CDK-ECFP0\" fingerprints\n", "* chemfp.cdk_toolkit.ecfp2 - \"CDK-ECFP2\" fingerprints\n", "* chemfp.cdk_toolkit.ecfp4 - \"CDK-ECFP4\" fingerprints\n", "* chemfp.cdk_toolkit.ecfp6 - \"CDK-ECFP6\" fingerprints\n", "* chemfp.cdk_toolkit.estate - \"CDK-EState\" fingerprints\n", "* chemfp.cdk_toolkit.extended - \"CDK-Extended\" fingerprints\n", "* chemfp.cdk_toolkit.fcfp0 - \"CDK-FCFP0\" fingerprints\n", "* chemfp.cdk_toolkit.fcfp2 - \"CDK-FCFP2\" fingerprints\n", "* chemfp.cdk_toolkit.fcfp4 - \"CDK-FCFP4\" fingerprints\n", "* chemfp.cdk_toolkit.fcfp6 - \"CDK-FCFP6\" fingerprints\n", "* chemfp.cdk_toolkit.graph_only - \"CDK-GraphOnly\" fingerprints\n", "* chemfp.cdk_toolkit.hybridization - \"CDK-Hybridization\" fingerprints\n", "* chemfp.cdk_toolkit.klekota_roth - \"CDK-KlekotaRoth\" fingerprints\n", "* chemfp.cdk_toolkit.maccs - \"CDK-MACCS\" fingerprints\n", "* chemfp.cdk_toolkit.pubchem - \"CDK-Pubchem\" fingerprints\n", "* chemfp.cdk_toolkit.shortest_path - \"CDK-ShortestPath\" fingerprints\n", "* chemfp.cdk_toolkit.substructure - \"CDK-Substructure\" fingerprints\n", " " ] }, { "cell_type": "markdown", "id": "37cc9d9a", "metadata": {}, "source": [ "A FingerprintType is now callable, which lets you make a copy, with some arguments changed." ] }, { "cell_type": "code", "execution_count": 17, "id": "5f2cb27b", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "RDKitMorganFingerprintType_v1()" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "rdkit_toolkit.morgan(fpSize=128, radius=3)" ] }, { "cell_type": "markdown", "id": "68d107d2", "metadata": {}, "source": [ "There are new helper methods on the FingerprintType to simplify common cases of working with structure data. For example, the `from_smiles` method takes a SMILES string as input and generate the corresponding fingerprint." ] }, { "cell_type": "code", "execution_count": 18, "id": "e343f221", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "b\"\\x0b\\x06\\x01\\x08\\x93\\x10\\x19\\x04\\x00\\x84\\n@\\x10'\\x08\\x07\"" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "rdkit_toolkit.morgan(fpSize=128, radius=3).from_smiles(\"CN1C=NC2=C1C(=O)N(C(=O)N2C)C\")" ] }, { "cell_type": "markdown", "id": "ec1f5dcf", "metadata": {}, "source": [ "Or, using the shortcut importer and using the InChI string as input:" ] }, { "cell_type": "code", "execution_count": 19, "id": "8650fe3c", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "b\"\\x0b\\x06\\x01\\x08\\x93\\x10\\x19\\x04\\x00\\x84\\n@\\x10'\\x08\\x07\"" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import chemfp\n", "chemfp.rdkit.morgan(fpSize=128, radius=3).from_inchi(\n", " \"InChI=1S/C8H10N4O2/c1-10-4-9-6-5(10)7(13)12(3)8(14)11(6)2/h4H,1-3H3\")" ] }, { "cell_type": "markdown", "id": "ec740ed2", "metadata": {}, "source": [ "### String and file I/O helper functions\n", "\n", "There are several helper functions to make it easier to read and write from strings and files.\n", "\n", "If you have an FPS file as a string, you can load it with `load_fingerprints_from_string`:" ] }, { "cell_type": "code", "execution_count": 20, "id": "40cabea5", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "FingerprintArena(#fingerprints=2)" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import chemfp\n", "arena = chemfp.load_fingerprints_from_string(\"\"\"\\\n", "aabbcc\\tfirst\n", "10d9b4\\tsecond\n", "\"\"\")\n", "arena" ] }, { "cell_type": "markdown", "id": "06654101", "metadata": {}, "source": [ "There are helper in the chemistry toolkit wrapper modules to read and write molecules:" ] }, { "cell_type": "code", "execution_count": 21, "id": "ecba57ac", "metadata": {}, "outputs": [ { "data": { "text/plain": [ " >" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from chemfp import openbabel_toolkit as ob_toolkit\n", "ob_toolkit.from_smiles(\"c1ccccc1O\")" ] }, { "cell_type": "code", "execution_count": 22, "id": "bf310a7a", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'InChI=1S/C6H6O/c7-6-4-2-1-3-5-6/h1-5,7H \\n'" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ob_toolkit.to_inchi(ob_toolkit.from_smiles(\"c1ccccc1O\"))" ] }, { "cell_type": "code", "execution_count": 23, "id": "5f2d0498", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "==============================\n", "*** Open Babel Warning in WriteMolecule\n", " No 2D or 3D coordinates exist. Stereochemical information will be stored using an Open Babel extension. To generate 2D or 3D coordinates instead use --gen2D or --gen3D.\n" ] } ], "source": [ "ob_toolkit.to_sdf3k_file(\"example.sdf\").write_molecule(ob_toolkit.from_smiles(\"c1ccccc1O\"))" ] }, { "cell_type": "code", "execution_count": 24, "id": "df43a94a", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\r\n", " OpenBabel06122218002D\r\n", "\r\n", " 0 0 0 0 0 999 V3000\r\n", "M V30 BEGIN CTAB\r\n", "M V30 COUNTS 7 7 0 0 0\r\n", "M V30 BEGIN ATOM\r\n", "M V30 1 C 0 0 0 0\r\n", "M V30 2 C 0 0 0 0\r\n", "M V30 3 C 0 0 0 0\r\n" ] } ], "source": [ "!head example.sdf" ] }, { "cell_type": "markdown", "id": "071ac32d", "metadata": {}, "source": [ "### \"chemfp\" command" ] }, { "cell_type": "markdown", "id": "722b8483", "metadata": {}, "source": [ "Chemfp 4.0 added several user-level commands. Rather than create program names which might collide with other name, I decided to add a new \"chemfp\" command, which implements subcommands. Many of the subcommands are the same as the top-level commands. There are also new commands for diversity search, and a couple related to licensing." ] }, { "cell_type": "code", "execution_count": 25, "id": "c0d264b9", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "usage: chemfp [--help] [--license-file FILENAME] [--traceback] [--version]\r\n", " [command] ...\r\n", "\r\n", "chemfp tools for cheminformatics fingerprints and structure I/O\r\n", "\r\n", "positional arguments:\r\n", " command\r\n", " rest\r\n", "\r\n", "options:\r\n", " --help, -h\r\n", " --license-file FILENAME\r\n", " --traceback print the traceback on KeyboardInterrupt\r\n", " --version show program's version number and exit\r\n", "\r\n", "Summary of the available commands.\r\n", "\r\n", "Fingerprint generation\r\n", "\r\n", " cdk2fps Generate RDKit fingerprints\r\n", " ob2fps Generate Open Babel fingerprints\r\n", " oe2fps Generate OpenEye fingerprints\r\n", " rdkit2fps Generate RDKit fingerprints\r\n", " sdf2fps Extract a fingerprint tag from an SD file and generate\r\n", " FPS or FPB fingerprints\r\n", "\r\n", "Fingerprint search\r\n", "\r\n", " simsearch Search an FPS or FPB file for similar fingerprints\r\n", "\r\n", "Fingerprint format conversion\r\n", "\r\n", " fpcat Combine multiple fingerprint files into a single file\r\n", "\r\n", "Fingerprint file tools\r\n", "\r\n", " fpb_text Show the TEXT sections of an FPB file\r\n", "\r\n", "Diversity selection\r\n", "\r\n", " heapsweep Select diverse fingerprints using the heapsweep\r\n", " algorithm\r\n", " maxmin Select diverse fingerprints using the MaxMin algorithm\r\n", " spherex Select diverse fingerprints using the sphere exclusion\r\n", " algorithm\r\n", "\r\n", "Configuration\r\n", "\r\n", " license Show the chemfp license status\r\n", " report Report chemfp similarity search implementation details\r\n", " toolkits Show underlying cheminformatics toolkit availability\r\n", "\r\n" ] } ], "source": [ "!chemfp" ] }, { "cell_type": "markdown", "id": "0dd56543", "metadata": {}, "source": [ "### Diversity selection\n", "\n", "Chemfp 4.0 added several forms of diversity picking.\n", "\n", "MaxMin (see Ashton et al., https://doi.org/10.1002/qsar.200290002) is an approximate but fast method to select maximally diverse fingerprints from a candidate data set. Chemfp's version of MaxMin also supports reference-based MaxMin, which selects diverse fingerprints from the candidates which are also diverse from a set of references. (For example, select 1,000 fingerprints from a vendor catalog which most enrich the diversity of a corporate collection.)\n", "\n", "The following example uses MaxMin to select 5 diverse compounds from ChEMBL 30 which are also diverse from ChEMBL 29 (this takes over a minute):" ] }, { "cell_type": "code", "execution_count": 26, "id": "719bb781", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "#Diversity/1\n", "#num_bits=2048\n", "#type=maxmin threshold=1.0 num-picks=5 all-equal=0 randomize=1 seed=16034062461884639177\n", "#software=chemfp/4.0\n", "#candidates=chembl_30.fpb\n", "#references=chembl_29.fpb\n", "#date=2022-06-12T16:00:36\n", "i\tpick_id\tscore\n", "1\tCHEMBL4778247\t0.2500000\n", "2\tCHEMBL4797319\t0.2553191\n", "3\tCHEMBL4764617\t0.2577320\n", "4\tCHEMBL4761572\t0.2592593\n", "5\tCHEMBL4754099\t0.2638889\n", "T_init: 0.02 T_pick: 76.62 #picks: 5 picks/s: 0.07 T_total: 76.63\n" ] } ], "source": [ "!chemfp maxmin -n 5 --references chembl_29.fpb chembl_30.fpb --times" ] }, { "cell_type": "markdown", "id": "12730d76", "metadata": {}, "source": [ "Heapsweep is an exact but slow method to select maximally diverse fingerprints. It is used to seed the MaxMin algorithm when no initial fingerprint is specified. The 5 globally most diverse fingerprints in ChEMBL 30 are:" ] }, { "cell_type": "code", "execution_count": 27, "id": "badef7a1", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "#Diversity/1\n", "#num_bits=2048\n", "#type=heapsweep threshold=1.0 num-picks=5 all-equal=0 randomize=1 seed=9508979697301357721\n", "#software=chemfp/4.0\n", "#candidates=chembl_30.fpb\n", "#date=2022-06-12T16:01:54\n", "i\tpick_id\tscore\n", "1\tCHEMBL4297424\t0.0769231\n", "2\tCHEMBL2105487\t0.0769231\n", "3\tCHEMBL1796997\t0.0769231\n", "4\tCHEMBL1201290\t0.0833333\n", "5\tCHEMBL4300465\t0.0833333\n", "T_init: 0.02 T_pick: 8.75 #picks: 5 picks/s: 0.57 T_total: 8.77\n" ] } ], "source": [ "!chemfp heapsweep -n 5 chembl_30.fpb --times" ] }, { "cell_type": "markdown", "id": "15b7c6ae", "metadata": {}, "source": [ "Sphere exclusion (see Hudson et al. https://doi.org/10.1002/qsar.19960150402) is used to reduced some of the clumpiness of random sampling, by avoiding picking fingerprints which are close to previous picks. Chemfp's version of sphere exclusion also have a reference-based version. Chemfp also implements directed sphere exclusion (DISE) (see Gobbi et al., https://doi.org/10.1021/ci025554v) where a candidate fingerprint where the lowest-rank is chosen, rather than picking from all remaining candidates. The ranks can be assigned by the method of Gobbi et al. or with user-defined ranks.\n", "\n", "Here are 10 fingerprints picked from ChEMBL such that no picks are with 0.2 similarity of each other, with the output in csv format, and with a fixed seed:" ] }, { "cell_type": "code", "execution_count": 28, "id": "c05067de", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "pick_id,count\n", "CHEMBL1199645,157334\n", "CHEMBL1544208,22982\n", "CHEMBL72725,64693\n", "CHEMBL600500,16792\n", "CHEMBL533667,12671\n", "CHEMBL4073454,28297\n", "CHEMBL3827378,43489\n", "CHEMBL3114290,29501\n", "CHEMBL1184175,25290\n", "CHEMBL566963,45206\n" ] } ], "source": [ "!chemfp spherex -n 10 --threshold 0.2 --out csv chembl_30.fpb --seed 12345" ] }, { "cell_type": "markdown", "id": "8b0d5dfb", "metadata": {}, "source": [ "### CSV and TSV output\n", "\n", "The simsearch, maxmin, heapsweep, and spherex commands support alternative output formats, using `--out`. The two alternatives are \"csv\" and \"tsv\", for comma-separated and tab-separated, and are appropriately quoted for use by Excel. These do not contain a metadata header, and the diversity outputs do not include the pick index. The default output format is \"chemfp\".\n", "\n", "Here are two examples using simsearch:" ] }, { "cell_type": "code", "execution_count": 29, "id": "7db90477", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "query_id,target_id,score\r", "\r\n", "Query1,CHEMBL113,1.0000000\r", "\r\n", "Query1,CHEMBL4591369,0.7333333\r", "\r\n", "Query1,CHEMBL1232048,0.7096774\r", "\r\n" ] } ], "source": [ "!simsearch --query \"CN1C=NC2=C1C(=O)N(C(=O)N2C)C\" chembl_30.fpb --out csv" ] }, { "cell_type": "code", "execution_count": 30, "id": "17470f0b", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "query_id\ttarget_id\tscore\r", "\r\n", "Query1\tCHEMBL113\t1.0000000\r", "\r\n", "Query1\tCHEMBL4591369\t0.7333333\r", "\r\n", "Query1\tCHEMBL1232048\t0.7096774\r", "\r\n" ] } ], "source": [ "!simsearch --query \"CN1C=NC2=C1C(=O)N(C(=O)N2C)C\" chembl_30.fpb --out tsv" ] }, { "cell_type": "markdown", "id": "4526778d", "metadata": {}, "source": [ "The csv and tsv formats use fixed number of columns, even when the default chemfp format uses a variable number of columns. Instead of having N extra columns for each line, there are N lines, one for each output column.\n", "\n", "This can cause problems if N = 0, as when a simsearch query has no matches. In this case there is a synthetic output line with a target id of `*` and a score of `NaN`. The simsearch option `--empty-target-id` and the spherex option `--empty-hit-id` change the default id. The option `--empty-score` changes the default score.\n", "\n", "Use `--no-include-empty` to not include a synthetic line when N = 0." ] }, { "cell_type": "markdown", "id": "df13bd19", "metadata": {}, "source": [ "### Miscellaneous\n", "\n", "- Use the output format \"sdf3k\" to always write SD files in V3000 format.\n", "\n", "- The low-level k-nearest arena query search API accepts an initial array of minimum thresholds, one per query. This may be useful when searching multiple target arenas, as the scores from earlier results may help reduce the search space.\n", "\n", "- Fingerprint arenas have a new `get_bit_counts()` method which for each bit counts the number of fingerprints where that bit is on.\n", "\n", "- Added `open_from_string()` and `load_fingerprints_from_string()` to make it easier to work with FPS or FPB content as a string.\n", "\n", "- Added support for Python 3.10 and CDK 2.7.\n", "\n", "- The SearchResult has a `query_id`, if known.\n", "\n", "- The NxN and NxM arena similarity search functions in `chemfp.search` now support optional `batch_size` and `batch_callback` parameters. These are used to implement progress bars.\n", "\n", "- The Location object now also supports `position` (the approximate location in the input file), `end_position` (the expected end position, or None), and `position_units` (currently only \"bytes\"). These are used for the progress bar.\n", "\n", "- The FPB writers have a `location`, which the high-level conversion functions used to get the number of output records.\n", "\n", "- Added `byte_xor_popcount`, `hex_xor_popcount`, `byte_union_popcount`, and `hex_union_popcount` to the bitops module.\n", "\n", "- Changed CDK-ShortestPath version to \"2.7\" because the PRNG change in 2.7 results in new bit patterns.\n", "\n", "- Added CDK ExtendedFingerprinter support for CDK 2.5 and later.\n", "\n", "\n", "I also split the C extension into several extensions, including some based on Cython, and I dropped most of the slowest popcount implementations on the assumption that `__builtin_popcountll` is always good enough." ] }, { "cell_type": "markdown", "id": "c1ff7b2b", "metadata": {}, "source": [ "### Breaking changes\n", "\n", "Chemfp 4.0 drops support for Python 2.7, for Python 3.7 or earlier, and for toolkits versions before 2020.\n", "\n", "Nearly everything else is backwards compatible with older versions of chemfp. There are three changes which I hope breaks no existing code, and if it does, they should be easy to fix.\n", "\n", "1. At the API level, the default similarity search had k=3 nearest neighbors with threshold=0.7 as Python default parameters. This kept tripping me up when I because when I wanted (say) all k=10 nearest neigbhors I would forget to change the threshold to 0.0. The default now is that if neither k nor threshold are specified then do a k=3 and threshold=0.7 search. If k is specified and threshold is not, use threshold=0.0. This matches the simsearch command-line behavior.\n", "\n", "2. The `SearchResults.reorder_all()` `order` parameter has been renamed 'ordering' be consistent with `SearchResult.reorder()` and with internal use.\n", "\n", "3. The \"difference\" functions in `chemfp.bitops` function have been renamed to `xor` because even I couldn't remember what \"difference\" meant. The xor fuction names are `byte_xor` and `hex_xor`.\n", "\n", "### Deprecation warning\n", "\n", "The FingeprintType method `compute_fingerprint` and `compute_fingerprints` are deprecated in favor of the have been deprecated in favor of the names `from_mol` and `from_mols`. The old methods will be removed in a a future version of chemfp." ] }, { "cell_type": "markdown", "id": "1e9e7a4c", "metadata": {}, "source": [ "## Older news (pre 4.0)" ] }, { "cell_type": "markdown", "id": "0874a43d", "metadata": {}, "source": [ "To learn about what was new in older chemfp releases, see the [What's New](https://chemfp.readthedocs.io/en/chemfp-3.5/whatsnew.html) documentation from chemfp 3.5." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.2" }, "widgets": { "application/vnd.jupyter.widget-state+json": { "state": { "00485fa0a07341ed872c808f43063018": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "HTMLModel", "state": { "layout": "IPY_MODEL_62f7df3fdb7c44f894dc58d18822d65a", "style": "IPY_MODEL_397a75338ac74c31babd85bbf421f871", "value": " 6.77M/6.77M [00:08<00:00, 1.04Mbytes/s]" } }, "01f909d538fd48f5918d61b71681863c": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "DescriptionStyleModel", "state": { "description_width": "" } }, "0258f21671dd43a9b13ba508ff876a79": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "FloatProgressModel", "state": { "bar_style": "success", "layout": "IPY_MODEL_ce2b6f94a0e046a1be41d4ec772a56a1", "max": 10, "style": "IPY_MODEL_6f5a1adbdac246e88ebd35fbdec77d0e", "value": 10 } }, "046871ae23c04d328aa58bc2157081e5": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "ProgressStyleModel", "state": { "description_width": "" } }, "0518a4519d2e49bf88b3286c1cbbad53": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": {} }, "06d6705f0b3f489f9065983c75fbf8aa": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "HTMLModel", "state": { "layout": "IPY_MODEL_3ac903a8910f4b3b95d23ded45ea0169", "style": "IPY_MODEL_f35b1d3f465b438699bc827fde633e3c", "value": "queries: 100%" } }, "07aa5baebe9642c48f0adada5aab1fa0": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "ProgressStyleModel", "state": { "description_width": "" } }, "098603ccaa5b470ba528b7ffb666ebf4": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": { "display": "inline-flex", "flex_flow": "row wrap", "width": "100%" } }, "09c4a521a96e4c71be80f7db62ffa940": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": { "flex": "2" } }, "0a7b7c3fd0b347ca951a83615e2567af": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "FloatProgressModel", "state": { "bar_style": "success", "layout": "IPY_MODEL_4cf25e965b7d434a8f65fef0d779b5a8", "max": 5, "style": "IPY_MODEL_dbd11ae498674bb8adbef0bf1c1048fe", "value": 5 } }, "0fb1f690bb894e9ba31165254ef59ecd": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": { "flex": "2" } }, "14f09f68b78a4265bf6a7a60f5451010": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "FloatProgressModel", "state": { "bar_style": "success", "layout": "IPY_MODEL_0fb1f690bb894e9ba31165254ef59ecd", "max": 10, "style": "IPY_MODEL_6f82bce2abb948cabf542e0f1890c0d6", "value": 10 } }, "169c14f906f64c0e82a2647c9d837793": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "HBoxModel", "state": { "children": [ "IPY_MODEL_32061ebe210848f1b74298fd864f94f7", "IPY_MODEL_517b9f88846f4fafa347d5d327d8d831", "IPY_MODEL_d6fc39a3b6964c478f7ca487abe971c7" ], "layout": "IPY_MODEL_87400d905f5b4e8ead14dcd940b317d1" } }, "17b1143ba31547a884237aae82e31487": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "DescriptionStyleModel", "state": { "description_width": "" } }, "1b3330ccfe1b4cfa864da8619e90633b": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": { "display": "inline-flex", "flex_flow": "row wrap", "width": "100%" } }, "1fc4b14c9bc1415c937beb50f12deffc": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": {} }, "20511db09166498f81baff581f38c93e": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": { "display": "inline-flex", "flex_flow": "row wrap", "width": "100%" } }, "25012c8590e04c52a297f750eff2099f": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": { "flex": "2" } }, "2c3b0f5b641e4822900dd1d8176186dd": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": {} }, "2e32118eba9f4b55b759d8d1e2944b44": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": {} }, "2fe49a4473bf4a52998abf31b61d19ae": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "FloatProgressModel", "state": { "bar_style": "success", "layout": "IPY_MODEL_09c4a521a96e4c71be80f7db62ffa940", "max": 10, "style": "IPY_MODEL_65a122af242e41f4917f4ad04471ac57", "value": 10 } }, "30b281a1dfdd4adba797a662d17072e5": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": {} }, "32061ebe210848f1b74298fd864f94f7": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "HTMLModel", "state": { "layout": "IPY_MODEL_eb5baa2bd745436893a17825866e6583", "style": "IPY_MODEL_7e698fe54f2a4ee5aa5c594d3b720d23", "value": "picks: 100%" } }, "360fdb124520471195c2d6ac57b1deef": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": {} }, "37204b5820fa4bdd81f53b9504e971f6": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "DescriptionStyleModel", "state": { "description_width": "" } }, "397a75338ac74c31babd85bbf421f871": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "DescriptionStyleModel", "state": { "description_width": "" } }, "3ac903a8910f4b3b95d23ded45ea0169": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": {} }, "3b52dacd79c64373a920f1b0006ca5bb": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "HTMLModel", "state": { "layout": "IPY_MODEL_848ec42ea74c420eaf6ffe868f82fbb8", "style": "IPY_MODEL_01f909d538fd48f5918d61b71681863c", "value": "queries: 100%" } }, "3cef9e71b43e4551bee54fd8391d9f44": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": {} }, "3e34a7a755d4462e87b72401c852e2d5": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "HTMLModel", "state": { "layout": "IPY_MODEL_30b281a1dfdd4adba797a662d17072e5", "style": "IPY_MODEL_e53c6bb2fca647a69e283f2ccc3e2ae8", "value": "queries: 100%" } }, "411fbcb3f82640ad9b498507c18be66d": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "FloatProgressModel", "state": { "bar_style": "success", "layout": "IPY_MODEL_f6ecb72ef7de4ef4ae2117b11743731c", "max": 6773817, "style": "IPY_MODEL_07aa5baebe9642c48f0adada5aab1fa0", "value": 6773817 } }, "413b9545dd7d4e839a80e9ae460fe35d": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "DescriptionStyleModel", "state": { "description_width": "" } }, "41f7e908409044a484e9adedf5ec8800": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": { "display": "inline-flex", "flex_flow": "row wrap", "width": "100%" } }, "438cda6ca25346e1b26a9ad0141af194": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "HTMLModel", "state": { "layout": "IPY_MODEL_9ea8ba0361654a469257bc84aea9d359", "style": "IPY_MODEL_92d38f7be53042ac99a8e069defc697a", "value": " 10/10 [00:00<00:00, 265.41 fps/s]" } }, "44ac73c0aa9f4ccd955def4d43bcf59d": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": {} }, "44e1870ec12c49fc9763000a863432ef": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "HTMLModel", "state": { "layout": "IPY_MODEL_57953d50cd5f43edae2de222e8715e1f", "style": "IPY_MODEL_d3b81dd3363a4e978c8efdf55d20df59", "value": "queries: 100%" } }, "4bbb5bff9af9481486e619a9ba8753d7": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "HTMLModel", "state": { "layout": "IPY_MODEL_44ac73c0aa9f4ccd955def4d43bcf59d", "style": "IPY_MODEL_b94abe121826441696f352abc1e8151e", "value": "queries: 100%" } }, "4cf25e965b7d434a8f65fef0d779b5a8": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": { "flex": "2" } }, "5166fc409754432a807cfc71ff51cdf2": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "DescriptionStyleModel", "state": { "description_width": "" } }, "517b9f88846f4fafa347d5d327d8d831": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "FloatProgressModel", "state": { "bar_style": "success", "layout": "IPY_MODEL_c232ec2dd9fa4a109247a5c2553879a9", "max": 5, "style": "IPY_MODEL_9d2eadb5d98747a995f8e29eefde335e", "value": 5 } }, "5629cf0894cd4c0c97346762cb9cafe8": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": { "flex": "2" } }, "57953d50cd5f43edae2de222e8715e1f": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": {} }, "5cba14d5209c4f2794912fe898838ab7": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": {} }, "5db78742250b4cdfae9ae160049c432e": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": {} }, "5ea348afb2dc4b64bc668f181232c4af": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "HBoxModel", "state": { "children": [ "IPY_MODEL_7c1ffda38828444aad09dc81eb05c0bb", "IPY_MODEL_bd0901e54ff94b69b112f8c65ae09354", "IPY_MODEL_b45472d176ae46fc8903a2c3cf70c030" ], "layout": "IPY_MODEL_83a31b9aba6d4a189d452578abadd971" } }, "5fcb39e1df6a4edda1d630d64c8202c5": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "HBoxModel", "state": { "children": [ "IPY_MODEL_94b0d57b680540c897800931dc3cfdf1", "IPY_MODEL_0a7b7c3fd0b347ca951a83615e2567af", "IPY_MODEL_dae56475c32b4effa3be4431b8d879de" ], "layout": "IPY_MODEL_41f7e908409044a484e9adedf5ec8800" } }, "60ab2ce6eb2f4a0a8d9629c5c744c531": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "HBoxModel", "state": { "children": [ "IPY_MODEL_44e1870ec12c49fc9763000a863432ef", "IPY_MODEL_d6884ca0b4b94f02a51993cdd2803737", "IPY_MODEL_bf74e729218141928b5dfa1429eaa816" ], "layout": "IPY_MODEL_d38eb82196674449a21a7e37acafbfaa" } }, "61be03df532246c2ac940ebf20b3dd9f": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "HTMLModel", "state": { "layout": "IPY_MODEL_d0b8121f7103484398461c4d575bbc65", "style": "IPY_MODEL_d09710b73bfe4fce9c598c12d08801cb", "value": "queries: 100%" } }, "62f7df3fdb7c44f894dc58d18822d65a": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": {} }, "63ea0b53295840d38b09acabc3981cb8": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": { "display": "inline-flex", "flex_flow": "row wrap", "width": "100%" } }, "65a122af242e41f4917f4ad04471ac57": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "ProgressStyleModel", "state": { "description_width": "" } }, "67814dbcd10d4a2b80b95a762ab23bfe": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "HTMLModel", "state": { "layout": "IPY_MODEL_360fdb124520471195c2d6ac57b1deef", "style": "IPY_MODEL_fcbbe620e4bd4b8fb5fdf9197dc06a02", "value": "queries: 100%" } }, "6826078dbe8a4adea8564c565f6862f2": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": {} }, "69d2eb824af4463881eb0842a1893bc1": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": {} }, "6b7ac211cb9a49d9bcbd1cd42d988c2a": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "DescriptionStyleModel", "state": { "description_width": "" } }, "6f5a1adbdac246e88ebd35fbdec77d0e": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "ProgressStyleModel", "state": { "description_width": "" } }, "6f82bce2abb948cabf542e0f1890c0d6": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "ProgressStyleModel", "state": { "description_width": "" } }, "71639beaf29b45498721293957f6135a": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": { "flex": "2" } }, "72735de1163647d3927b3416737ccf77": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": { "display": "inline-flex", "flex_flow": "row wrap", "width": "100%" } }, "7309634ff68f40869f39b4b7d22321db": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "FloatProgressModel", "state": { "bar_style": "success", "layout": "IPY_MODEL_cd1f127fe8544e39bbaa9b1e9816ca15", "max": 50, "style": "IPY_MODEL_815c65b6ce30443da6b7a86debe83910", "value": 50 } }, "75d0d153991642f9825833105308211d": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "HTMLModel", "state": { "layout": "IPY_MODEL_0518a4519d2e49bf88b3286c1cbbad53", "style": "IPY_MODEL_37204b5820fa4bdd81f53b9504e971f6", "value": " 10/10 [00:00<00:00, 63.06 fps/s]" } }, "768e36702af643e3a5a36885d1f19630": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "HTMLModel", "state": { "layout": "IPY_MODEL_3cef9e71b43e4551bee54fd8391d9f44", "style": "IPY_MODEL_5166fc409754432a807cfc71ff51cdf2", "value": " 10/10 [00:00<00:00, 273.75 fps/s]" } }, "776a67f32f2b4b799eb24d58ecfa4c87": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "DescriptionStyleModel", "state": { "description_width": "" } }, "7b22a308c44c4961a43cd35d93b83185": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "DescriptionStyleModel", "state": { "description_width": "" } }, "7b46d6d74c714fdea143fc035bc45db5": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "DescriptionStyleModel", "state": { "description_width": "" } }, "7c1ffda38828444aad09dc81eb05c0bb": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "HTMLModel", "state": { "layout": "IPY_MODEL_1fc4b14c9bc1415c937beb50f12deffc", "style": "IPY_MODEL_7b46d6d74c714fdea143fc035bc45db5", "value": "Compound_099000001_099500000.sdf.gz: 100%" } }, "7e698fe54f2a4ee5aa5c594d3b720d23": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "DescriptionStyleModel", "state": { "description_width": "" } }, "815c65b6ce30443da6b7a86debe83910": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "ProgressStyleModel", "state": { "description_width": "" } }, "81b43f65e1f54827bed35963cb202560": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": {} }, "8230ef4af4f14d79bd362fd903557320": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "HBoxModel", "state": { "children": [ "IPY_MODEL_ced7bc355d5f481480e7c4de72d14a83", "IPY_MODEL_411fbcb3f82640ad9b498507c18be66d", "IPY_MODEL_00485fa0a07341ed872c808f43063018" ], "layout": "IPY_MODEL_72735de1163647d3927b3416737ccf77" } }, "82478a442d354fc9a9e963ddfd76cf10": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "HBoxModel", "state": { "children": [ "IPY_MODEL_4bbb5bff9af9481486e619a9ba8753d7", "IPY_MODEL_d46336b502434b53a29e078fad7cf040", "IPY_MODEL_768e36702af643e3a5a36885d1f19630" ], "layout": "IPY_MODEL_d5bd4ec897d24aed985abafb43ca26bc" } }, "83a31b9aba6d4a189d452578abadd971": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": { "display": "inline-flex", "flex_flow": "row wrap", "width": "100%" } }, "84268998851242a99e24b30be7877d54": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "HTMLModel", "state": { "layout": "IPY_MODEL_5cba14d5209c4f2794912fe898838ab7", "style": "IPY_MODEL_cd8ba3e8a42a410493541a4cb330b970", "value": "queries: 100%" } }, "848ec42ea74c420eaf6ffe868f82fbb8": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": {} }, "87400d905f5b4e8ead14dcd940b317d1": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": { "display": "inline-flex", "flex_flow": "row wrap", "width": "100%" } }, "8782a5bceb6044908fce02d44375099e": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "ProgressStyleModel", "state": { "description_width": "" } }, "8d8a6ba08c164d4ca2304d4b1b3cd8b1": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": {} }, "8e9248962316459cacca782357152da7": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "HTMLModel", "state": { "layout": "IPY_MODEL_2e32118eba9f4b55b759d8d1e2944b44", "style": "IPY_MODEL_413b9545dd7d4e839a80e9ae460fe35d", "value": " 10/10 [00:00<00:00, 245.58 fps/s]" } }, "8ef64540a0ab465c970c6f51074f4cd1": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": { "flex": "2" } }, "921c917b0d2a4b9ca6291a19622e6bc4": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "HBoxModel", "state": { "children": [ "IPY_MODEL_3b52dacd79c64373a920f1b0006ca5bb", "IPY_MODEL_2fe49a4473bf4a52998abf31b61d19ae", "IPY_MODEL_f3df021402024adab3c9fce80f689911" ], "layout": "IPY_MODEL_1b3330ccfe1b4cfa864da8619e90633b" } }, "92d03ea52b014ce0ac15b6f8f78acd76": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "DescriptionStyleModel", "state": { "description_width": "" } }, "92d38f7be53042ac99a8e069defc697a": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "DescriptionStyleModel", "state": { "description_width": "" } }, "94b0d57b680540c897800931dc3cfdf1": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "HTMLModel", "state": { "layout": "IPY_MODEL_aeebcdbfa30e40f39d702f4bd08fa3e7", "style": "IPY_MODEL_9921a7fef0b44374b54222cec855f223", "value": "picks: 100%" } }, "98e3ceaf26e141e3a67f748ef0ff6113": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": { "flex": "2" } }, "9921a7fef0b44374b54222cec855f223": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "DescriptionStyleModel", "state": { "description_width": "" } }, "99a7d394793c40a68a475d9875606b23": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "FloatProgressModel", "state": { "bar_style": "success", "layout": "IPY_MODEL_8ef64540a0ab465c970c6f51074f4cd1", "max": 10, "style": "IPY_MODEL_046871ae23c04d328aa58bc2157081e5", "value": 10 } }, "9d2eadb5d98747a995f8e29eefde335e": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "ProgressStyleModel", "state": { "description_width": "" } }, "9ea8ba0361654a469257bc84aea9d359": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": {} }, "a06d032cb93c42a79d5bb0b35efd0575": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": {} }, "a2962c2ef7714d7d9c60a604bb0d3984": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "DescriptionStyleModel", "state": { "description_width": "" } }, "a55d93fd865340a29f1d08b0c70cc8b3": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "FloatProgressModel", "state": { "bar_style": "success", "layout": "IPY_MODEL_98e3ceaf26e141e3a67f748ef0ff6113", "max": 10, "style": "IPY_MODEL_d5dd46702d244b0791f91ac7401f7dc5", "value": 10 } }, "a75d34a4796941e9b7775c94702b5b78": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "DescriptionStyleModel", "state": { "description_width": "" } }, "a88c0118ad7b4e3188a7b2aa45d65aaf": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "HBoxModel", "state": { "children": [ "IPY_MODEL_3e34a7a755d4462e87b72401c852e2d5", "IPY_MODEL_c48b22aa0f034312bcad58efe60efb84", "IPY_MODEL_e42a931453c542ae9fb91384e8cb149f" ], "layout": "IPY_MODEL_f5e84c44f33a44f8922f315712fe9dea" } }, "ab2adf7e47f642878a9abd4425a62060": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "HBoxModel", "state": { "children": [ "IPY_MODEL_67814dbcd10d4a2b80b95a762ab23bfe", "IPY_MODEL_0258f21671dd43a9b13ba508ff876a79", "IPY_MODEL_438cda6ca25346e1b26a9ad0141af194" ], "layout": "IPY_MODEL_f00633405b8f4103952c3b33009588e6" } }, "aeebcdbfa30e40f39d702f4bd08fa3e7": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": {} }, "afe7a2a659204398972ee9a53214876d": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "HBoxModel", "state": { "children": [ "IPY_MODEL_d9bf40539d3d408e9e8712bb325d873d", "IPY_MODEL_7309634ff68f40869f39b4b7d22321db", "IPY_MODEL_d8d236205cc743078e1a2c91b67ca0ad" ], "layout": "IPY_MODEL_098603ccaa5b470ba528b7ffb666ebf4" } }, "b45472d176ae46fc8903a2c3cf70c030": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "HTMLModel", "state": { "layout": "IPY_MODEL_6826078dbe8a4adea8564c565f6862f2", "style": "IPY_MODEL_e0e05aed7fa8438d932ce8e02e719827", "value": " 6.77M/6.77M [00:12<00:00, 1.04Mbytes/s]" } }, "b94abe121826441696f352abc1e8151e": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "DescriptionStyleModel", "state": { "description_width": "" } }, "b98bf06eb6934b3e979b0e55a2a4cc1d": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "DescriptionStyleModel", "state": { "description_width": "" } }, "bd0901e54ff94b69b112f8c65ae09354": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "FloatProgressModel", "state": { "bar_style": "success", "layout": "IPY_MODEL_71639beaf29b45498721293957f6135a", "max": 6773817, "style": "IPY_MODEL_8782a5bceb6044908fce02d44375099e", "value": 6773817 } }, "be815bee5b984c38a145c7895f91e8d7": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": {} }, "bf74e729218141928b5dfa1429eaa816": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "HTMLModel", "state": { "layout": "IPY_MODEL_a06d032cb93c42a79d5bb0b35efd0575", "style": "IPY_MODEL_a75d34a4796941e9b7775c94702b5b78", "value": " 10/10 [00:00<00:00, 198.92 fps/s]" } }, "c232ec2dd9fa4a109247a5c2553879a9": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": { "flex": "2" } }, "c48b22aa0f034312bcad58efe60efb84": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "FloatProgressModel", "state": { "bar_style": "success", "layout": "IPY_MODEL_5629cf0894cd4c0c97346762cb9cafe8", "max": 50, "style": "IPY_MODEL_cfd56849489749b081fcddd49f101000", "value": 50 } }, "cb4aef0c6b3342f0af0a7f8036e8f3f6": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "HBoxModel", "state": { "children": [ "IPY_MODEL_06d6705f0b3f489f9065983c75fbf8aa", "IPY_MODEL_a55d93fd865340a29f1d08b0c70cc8b3", "IPY_MODEL_8e9248962316459cacca782357152da7" ], "layout": "IPY_MODEL_20511db09166498f81baff581f38c93e" } }, "cd1f127fe8544e39bbaa9b1e9816ca15": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": { "flex": "2" } }, "cd8ba3e8a42a410493541a4cb330b970": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "DescriptionStyleModel", "state": { "description_width": "" } }, "ce2b6f94a0e046a1be41d4ec772a56a1": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": { "flex": "2" } }, "ced7bc355d5f481480e7c4de72d14a83": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "HTMLModel", "state": { "layout": "IPY_MODEL_81b43f65e1f54827bed35963cb202560", "style": "IPY_MODEL_e31019a618d3483ab0cf993192922b69", "value": "Compound_099000001_099500000.sdf.gz: 100%" } }, "cfd56849489749b081fcddd49f101000": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "ProgressStyleModel", "state": { "description_width": "" } }, "d09710b73bfe4fce9c598c12d08801cb": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "DescriptionStyleModel", "state": { "description_width": "" } }, "d0b8121f7103484398461c4d575bbc65": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": {} }, "d38eb82196674449a21a7e37acafbfaa": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": { "display": "inline-flex", "flex_flow": "row wrap", "width": "100%" } }, "d3b81dd3363a4e978c8efdf55d20df59": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "DescriptionStyleModel", "state": { "description_width": "" } }, "d46336b502434b53a29e078fad7cf040": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "FloatProgressModel", "state": { "bar_style": "success", "layout": "IPY_MODEL_25012c8590e04c52a297f750eff2099f", "max": 10, "style": "IPY_MODEL_fb77f45c1094499188f500c93021e606", "value": 10 } }, "d548e9e1a9984b6cbdce510c42c6ba9a": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": { "display": "inline-flex", "flex_flow": "row wrap", "width": "100%" } }, "d5bd4ec897d24aed985abafb43ca26bc": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": { "display": "inline-flex", "flex_flow": "row wrap", "width": "100%" } }, "d5dce86fbd994dccad824e21cd38903a": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": { "flex": "2" } }, "d5dd46702d244b0791f91ac7401f7dc5": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "ProgressStyleModel", "state": { "description_width": "" } }, "d6884ca0b4b94f02a51993cdd2803737": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "FloatProgressModel", "state": { "bar_style": "success", "layout": "IPY_MODEL_d5dce86fbd994dccad824e21cd38903a", "max": 10, "style": "IPY_MODEL_eacb85d9e3974fb9b00094a06611636c", "value": 10 } }, "d6fc39a3b6964c478f7ca487abe971c7": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "HTMLModel", "state": { "layout": "IPY_MODEL_69d2eb824af4463881eb0842a1893bc1", "style": "IPY_MODEL_b98bf06eb6934b3e979b0e55a2a4cc1d", "value": " 5/5 [00:00<00:00, 160.61/s]" } }, "d8d236205cc743078e1a2c91b67ca0ad": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "HTMLModel", "state": { "layout": "IPY_MODEL_8d8a6ba08c164d4ca2304d4b1b3cd8b1", "style": "IPY_MODEL_776a67f32f2b4b799eb24d58ecfa4c87", "value": " 50/50 [00:01<00:00, 45.86 fps/s]" } }, "d9bf40539d3d408e9e8712bb325d873d": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "HTMLModel", "state": { "layout": "IPY_MODEL_5db78742250b4cdfae9ae160049c432e", "style": "IPY_MODEL_17b1143ba31547a884237aae82e31487", "value": "queries: 100%" } }, "dae56475c32b4effa3be4431b8d879de": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "HTMLModel", "state": { "layout": "IPY_MODEL_be815bee5b984c38a145c7895f91e8d7", "style": "IPY_MODEL_7b22a308c44c4961a43cd35d93b83185", "value": " 5/5 [00:00<00:00, 137.44/s]" } }, "dbd11ae498674bb8adbef0bf1c1048fe": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "ProgressStyleModel", "state": { "description_width": "" } }, "e0e05aed7fa8438d932ce8e02e719827": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "DescriptionStyleModel", "state": { "description_width": "" } }, "e1d885c62784427aba1a52cd930c974b": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "HBoxModel", "state": { "children": [ "IPY_MODEL_61be03df532246c2ac940ebf20b3dd9f", "IPY_MODEL_99a7d394793c40a68a475d9875606b23", "IPY_MODEL_f537cda65b924731800bbcd34086acd6" ], "layout": "IPY_MODEL_d548e9e1a9984b6cbdce510c42c6ba9a" } }, "e31019a618d3483ab0cf993192922b69": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "DescriptionStyleModel", "state": { "description_width": "" } }, "e42a931453c542ae9fb91384e8cb149f": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "HTMLModel", "state": { "layout": "IPY_MODEL_fec7d8926ee04947b8b0213564a7eee4", "style": "IPY_MODEL_a2962c2ef7714d7d9c60a604bb0d3984", "value": " 50/50 [00:00<00:00, 56.97 fps/s]" } }, "e53c6bb2fca647a69e283f2ccc3e2ae8": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "DescriptionStyleModel", "state": { "description_width": "" } }, "e6dbbf6d837042c7836f1c74fd8ba93c": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "HBoxModel", "state": { "children": [ "IPY_MODEL_84268998851242a99e24b30be7877d54", "IPY_MODEL_14f09f68b78a4265bf6a7a60f5451010", "IPY_MODEL_75d0d153991642f9825833105308211d" ], "layout": "IPY_MODEL_63ea0b53295840d38b09acabc3981cb8" } }, "eacb85d9e3974fb9b00094a06611636c": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "ProgressStyleModel", "state": { "description_width": "" } }, "eb5baa2bd745436893a17825866e6583": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": {} }, "f00633405b8f4103952c3b33009588e6": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": { "display": "inline-flex", "flex_flow": "row wrap", "width": "100%" } }, "f35b1d3f465b438699bc827fde633e3c": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "DescriptionStyleModel", "state": { "description_width": "" } }, "f3df021402024adab3c9fce80f689911": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "HTMLModel", "state": { "layout": "IPY_MODEL_2c3b0f5b641e4822900dd1d8176186dd", "style": "IPY_MODEL_92d03ea52b014ce0ac15b6f8f78acd76", "value": " 10/10 [00:00<00:00, 138.24 fps/s]" } }, "f537cda65b924731800bbcd34086acd6": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "HTMLModel", "state": { "layout": "IPY_MODEL_f7a83c8cecc444d38aff915f122c861e", "style": "IPY_MODEL_6b7ac211cb9a49d9bcbd1cd42d988c2a", "value": " 10/10 [00:00<00:00, 207.25 fps/s]" } }, "f5e84c44f33a44f8922f315712fe9dea": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": { "display": "inline-flex", "flex_flow": "row wrap", "width": "100%" } }, "f6ecb72ef7de4ef4ae2117b11743731c": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": { "flex": "2" } }, "f7a83c8cecc444d38aff915f122c861e": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": {} }, "fb77f45c1094499188f500c93021e606": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "ProgressStyleModel", "state": { "description_width": "" } }, "fcbbe620e4bd4b8fb5fdf9197dc06a02": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "DescriptionStyleModel", "state": { "description_width": "" } }, "fec7d8926ee04947b8b0213564a7eee4": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": {} } }, "version_major": 2, "version_minor": 0 } } }, "nbformat": 4, "nbformat_minor": 5 }