Chemfp is a fingerprint toolkit. It depends on a third-party chemistry toolkit to generate fingerprints from a chemical structure. The currently supported toolkits are OEChem/OEGraphSim, RDKit and Open Babel.
The latest versions of each toolkit are supported, as well as the previous several releases. OEChem 2017.Oct, which is the last version of OEChem/OEGraphSim to support Python 2.7, will be supported until 2020.
Command-line support for different toolkits
The toolkit integration occurs at multiple levels.
At the command-line level you can use oe2fps, rdkit2fps, and ob2fps to generate toolkit-specific fingerprints from SMILES file, SDF, or other chemistry structure format, and save the result to chemfp's FPS or FPB formats. (The FPB format is only supported in the commercial version of chemfp.)
If you are a Python programmer, you can also use chemfp's fingerprint and toolkit APIs. These work with toolkit-native molecules, so you are free to create the molecule any way you like.
In addition, chemfp provides a common API for fingerprint generation and file I/O across all of the supported toolkits. This might not be that important if you only deal with one toolkit, but it's very handy if you want to handle multiple toolkits.
Example fingerprint search web service
For example, here's a small program named "fpsearch.py" which uses the flask microframework to implement a web service that finds the nearest 10 ChEMBL matches to a query SMILES. It uses only the chemfp APIs, which means it will work with any of the supported fingerprint types and toolkits.
# Save this as "fpsearch.py" from flask import Flask, request, abort, Response import chemfp # Load the database, and use the 'type' metadata to figure out which # toolkit and which fingerprint parameters to use. db = chemfp.load_fingerprints("chembl_23.fps") fptype = db.get_fingerprint_type() app = Flask(__name__) @app.route("/search") def search(): # Get the 'q' query parameter and try to process it as a SMILES string. smiles = request.args.get("q", None) if smiles is None: abort(Response("Missing 'q' parameter")) fp = fptype.parse_molecule_fingerprint(smiles, "smistring", errors="ignore") if fp is None: abort(Response("Cannot parse 'q' parameter as a SMILES")) # Search the database and report the 10 nearest hits. result = db.knearest_tanimoto_search_fp(fp, k=10, threshold=0.0) ids_and_scores = result.get_ids_and_scores() response = "".join("%.3f,%s\n" % (score, id) for (id, score) in ids_and_scores) return Response(response, content_type="text/plain")
To make it work:
Install the flask framework with
pip install flask
Download the ChEMBL 23 SDF and use one of the following to generate fingerprints:
ob2fps chembl_23.sdf.gz -o chembl_23.fps
oe2fps chembl_23.sdf.gz -o chembl_23.fps
rdkit2fps chembl_23.sdf.gz -o chembl_23.fps
Save the above program as
set the environment variable FLASK_APP to "fpsearch.py", eg,
In the directory containing
fpsearch.py, run the command
flask runto start the server.
With your web browser, go to: http://127.0.0.1:5000/search?q=c1ccccc1N
You should see output like:
1.000,CHEMBL538 0.955,CHEMBL3182415 0.656,CHEMBL3392014 0.600,CHEMBL572203 0.588,CHEMBL44201 0.583,CHEMBL403741 0.583,CHEMBL3186715 0.571,CHEMBL3185160 0.567,CHEMBL1595914 0.560,CHEMBL3561416
In case you are curious, I generated the "chembl_23.fps" file using
the fingerprint type
OpenEye-Tree/2 numbits=4096 minbonds=0
maxbonds=4 atype=Arom|AtmNum|Chiral|FCharge|HvyDeg|Hyb btype=Order.
What makes the chemfp API useful is that I could replace the FPS file with, say, the RDKit MACCS fingerprints, restart the server, and the search service will switch from using OEChem and OEGraphSim to using RDKit - with no other changes to the code.