Fast cheminformatics fingerprint search, anywhere you use Python
Chemfp is a set of command-line tools and a Python library for fingerprint generation and high-performance similarity search. Its market-leading performance and comprehensive API make it easy for you to add fast similarity search anywhere you use Python.
NEW! Chemfp 3.5.1 was released on 4 Feburary 2021. See the documentation for the full list of notable changes or go to the download page.
- Do you want single-threaded search of 1M 1024-bit fingerprints in under 10 milliseconds?
- Do you want to make a sparse similarity matrix from 1M 2048-bit fingerprints in less than 30 minutes on a four core machine?
- Do you want to include fingerprint similarity results in your Python web application?
- ... with fast reload times during development, and without the complexity of using a dedicated search server?
- Do you work with fingerprints from multiple chemistry toolkits, or have custom fingerprint types?
- Do you want command-line tools with sub-second similarity search times?
- Do you program in Python and want to write new fingerprint analysis programs?
- Do you want the option to have the source code with no time-based licensing?
If that sounds interesting
You can get started by downloading the most recent version, chemfp 3.5.1, using the following:
python -m pip install chemfp -i https://chemfp.com/packages/
A few features are either limited or disabled. Visit the licensing page to see the licensing terms, to request a evaluation key to unlock those features, and learn about some of the available licensing options.
You do not need to request a license key for Tanimoto searches of the licensed FPB files available from the datasets page, so long as you follow the terms of the Chemfp Base License Agreement.
Chemfp includes extensive documentation. For a more scholarly description, see: Dalke, A. The chemfp project. J. Cheminformatics 11, 76 (2019). doi: 10.1186/s13321-019-0398-8
Open source reference baseline for benchmarking
Chemfp 1.6.1 is the latest version of the no-cost/open source chemfp development track. It only supports Python 2.7. It is being maintained in order to provide a good reference baseline to evaluate similarity search performance, and to support the dwindling number of legacy users who haven't moved to Python 3. See the download page for download details.
Some of the many improvements in chemfp 3.x are: higher performance, support for the FPB binary format for fast loading times, support for more than 4GB of fingerprint data, sublinear Tversky search in addition to sublinear Tanimoto search, API improvements for web development, and support for both Python 2.7 and Python 3.6+.