chemfp.com
chemfp documentation
4.2
  • Installing chemfp
  • Working with the chemfp command-line tools
  • The chemfp command-line tools
  • Getting started with the API
  • Fingerprint family and type examples
  • Toolkit API examples
  • Text toolkit examples
  • chemfp API
    • chemfp top-level
    • chemfp.arena
    • chemfp.base_toolkit
    • chemfp.bitops
    • chemfp.cdk_toolkit
    • chemfp.cdk_types
    • chemfp.clustering
    • chemfp.csv_readers
    • chemfp.diversity
    • chemfp.encodings
    • chemfp.fpb_io
    • chemfp.fps_io
    • chemfp.fps_search
      • FPSSearchResult
        • FPSSearchResult.__getitem__()
        • FPSSearchResult.__iter__()
        • FPSSearchResult.__len__()
        • FPSSearchResult.get_ids()
        • FPSSearchResult.get_ids_and_scores()
        • FPSSearchResult.get_scores()
        • FPSSearchResult.query_id
        • FPSSearchResult.reorder()
        • FPSSearchResult.scores
        • FPSSearchResult.to_pandas()
      • FPSSearchResults
        • FPSSearchResults.__getitem__()
        • FPSSearchResults.__iter__()
        • FPSSearchResults.__len__()
        • FPSSearchResults.iter_ids()
        • FPSSearchResults.iter_ids_and_scores()
        • FPSSearchResults.iter_scores()
        • FPSSearchResults.query_ids
        • FPSSearchResults.reorder_all()
        • FPSSearchResults.to_pandas()
      • count_tanimoto_hits_fp()
      • count_tanimoto_hits_arena()
      • threshold_tanimoto_search_fp()
      • threshold_tanimoto_search_arena()
      • knearest_tanimoto_search_fp()
      • knearest_tanimoto_search_arena()
      • count_tversky_hits_fp()
      • count_tversky_hits_arena()
      • threshold_tversky_search_fp()
      • threshold_tversky_search_arena()
      • knearest_tversky_search_fp()
      • knearest_tversky_search_arena()
    • chemfp.highlevel.clustering
    • chemfp.highlevel.conversion
    • chemfp.highlevel.diversity
    • chemfp.highlevel.simarray
    • chemfp.highlevel.similarity
    • chemfp.io
    • chemfp.jcmapper_types
    • chemfp.openbabel_toolkit
    • chemfp.openbabel_types
    • chemfp.openeye_toolkit
    • chemfp.openeye_types
    • chemfp.rdkit_toolkit
    • chemfp.rdkit_types
    • chemfp.search
    • chemfp.simarray_io
    • chemfp.text_records
    • chemfp.text_toolkit
    • chemfp.toolkit
    • chemfp.types
    • Overview
  • Licenses
  • What’s new in chemfp 4.2
  • What’s new in chemfp 4.1
  • What’s new in chemfp 4.0
chemfp documentation
  • chemfp API
  • chemfp.fps_search module

chemfp.fps_search module¶

FPS file similarity search and search result implementations.

Chemfp implements similarity search methods which work directly on FPS files. This might be useful in a streaming environment (where the FPS data is generated on-the-fly and not saved), and where you have at most a handful of queries. In that case, an FPS search is faster than an arena-based search because the FPS parsing overhead is about the same, but the FPS search have the arena creation or memory overhead an in-memory search would have.

class chemfp.fps_search.FPSSearchResult(ids, scores, query_id=None)¶

Bases: object

Search results for a query fingerprint against a target FPS reader.

The results contains a list of hits. Hits contain a target id and score. The hits can be reordered based on id or score.

__getitem__(item)¶

Return the (id, score) pair for the given index, or pairs if item is a slice

__iter__()¶

Iterate through the pairs of (target id, score) using the current ordering

__len__()¶

Return the number of hits.

get_ids()¶

The list of target identifiers in the current ordering.

This returns the same list each time.

get_ids_and_scores()¶

The list of (target identifier, target score) pairs, in the current ordering

get_scores()¶

The list of target scores, in the current ordering.

This returns the same list each time.

query_id¶

The id of the query fingerprint, if available, otherwise None.

reorder(order='decreasing-score')¶

Reorder the hits based on the requested ordering.

The available orderings are:
  • increasing-score - sort by increasing score

  • decreasing-score - sort by decreasing score

  • increasing-score-plus - sort by increasing score, break ties by increasing index

  • decreasing-score-plus - sort by decreasing score, break ties by increasing index

  • increasing-id - sort by increasing target id

  • decreasing-id - sort by decreasing target id

  • move-closest-first - move the hit with the highest score to the first position

  • reverse - reverse the current ordering

scores¶

The similarity scores for the hits.

to_pandas(*, columns=['target_id', 'score'])¶

Return a pandas DataFrame with the target ids and scores

The first column contains the ids, the second column contains the ids. The default columns headers are “target_id” and “score”. Use columns to specify different headers.

Parameters:

columns (a list of two strings) – column names for the returned DataFrame

Returns:

a pandas DataFrame

class chemfp.fps_search.FPSSearchResults(query_ids, results)¶

Bases: object

Search results for a query arena against a target FPS reader.

__getitem__(i)¶

Return a SearchResult by index

__iter__()¶

Iterate through the search results

__len__()¶

The number of search results in this collection

iter_ids()¶

For each search result, yield the list of target identifiers

iter_ids_and_scores()¶

For each search result, yield the list of target (id, score) tuples

iter_scores()¶

For each search result, yield the list of target scores

query_ids¶

A list of query ids, one for each result. This comes from the query arena’s ids.

reorder_all(order='decreasing-score')¶

Reorder the hits for all of the rows based on the requested order.

The available orderings are:

  • increasing-score - sort by increasing score

  • decreasing-score - sort by decreasing score

  • increasing-id - sort by increasing target id

  • decreasing-id - sort by decreasing target id

  • move-closest-first - move the hit with the highest score to the first position

  • reverse - reverse the current ordering

to_pandas(*, columns=['query_id', 'target_id', 'score'], empty=('*', None))¶

Return a pandas DataFrame with query_id, target_id and score columns.

Each query has zero or more hits. Each hit becomes a row in the output table, with the query id in the first column, the hit target id in the second, and the hit score in the third.

If a query has no hits then by default a row is added with the query id, ‘*’ as the target id, and None as the score (which pandas will treat as a NA value).

Use empty to specify different behavior for queries with no hits. If empty is None then no row is added to the table. If empty is a 2-element tuple the first element is used as the target id and the second is used as the score.

Parameters:
  • columns (a list of three strings) – column names for the returned DataFrame

  • empty (a list of two strings, or None) – the target id and score used for queries with no hits, or None to not include a row for that case

Returns:

a pandas DataFrame

chemfp.fps_search.count_tanimoto_hits_fp(query_fp, target_reader, threshold=0.7)¶

Count the number of hits in target_reader at least threshold similar to the query_fp

This uses Tanimoto similarity.

chemfp.fps_search.count_tanimoto_hits_arena(query_arena, target_reader, threshold=0.7)¶

For each fingerprint in query_arena, count the number of hits in target_reader at least threshold similar to it

This uses Tanimoto similarity.

chemfp.fps_search.threshold_tanimoto_search_fp(query_fp, target_reader, threshold=0.7)¶

Find matches in the target reader which are at least threshold similar to the query fingerprint

Returns:

an FPSSearchResult instance contain the result.

chemfp.fps_search.threshold_tanimoto_search_arena(query_arena, target_reader, threshold)¶

Find matches in the target reader which are at least threshold similar to the query arena fingerprints

Returns:

an FPSSearchResults instance containing a list of query results.

chemfp.fps_search.knearest_tanimoto_search_fp(query_fp, target_reader, k=3, threshold=0.0)¶

Find the nearest k matches in the target reader which are at least threshold similar to the query fingerprint

This uses Tanimoto similarity.

Returns:

an FPSSearchResult instance contain the result.

chemfp.fps_search.knearest_tanimoto_search_arena(query_arena, target_reader, k=3, threshold=0.0)¶

Find the nearest k matches in the target reader which are at least threshold similar to the query arena fingerprints

This uses Tanimoto similarity.

Returns:

an FPSSearchResults instance containing a list of query results.

chemfp.fps_search.count_tversky_hits_fp(query_fp, target_reader, threshold, alpha=1.0, beta=1.0)¶

Count the number of hits in target_reader at least threshold similar to the query_fp

This uses Tversky similarity with the specified values of alpha and beta.

chemfp.fps_search.count_tversky_hits_arena(query_arena, target_reader, threshold, alpha=1.0, beta=1.0)¶

Count the number of hits in target_reader at least threshold similar to the query_fp

This uses Tversky similarity with the specified values of alpha and beta.

chemfp.fps_search.threshold_tversky_search_fp(query_fp, target_reader, threshold, alpha=1.0, beta=1.0)¶

Find matches in the target reader which are at least threshold similar to the query fingerprint

This uses Tversky similarity with the specified values of alpha and beta.

Returns:

an FPSSearchResult instance contain the result.

chemfp.fps_search.threshold_tversky_search_arena(query_arena, target_reader, threshold, alpha=1.0, beta=1.0)¶

Find matches in the target reader which are at least threshold similar to the query arena fingerprints

This uses Tversky similarity with the specified values of alpha and beta.

Returns:

an FPSSearchResults instance containing a list of query results.

chemfp.fps_search.knearest_tversky_search_fp(query_fp, target_reader, k=3, threshold=0.0, alpha=1.0, beta=1.0)¶

Find the nearest k matches in the target reader which are at least threshold similar to the query fingerprint

This uses Tversky similarity with the specified values of alpha and beta.

Returns:

an FPSSearchResult instance contain the result.

chemfp.fps_search.knearest_tversky_search_arena(query_arena, target_reader, k=3, threshold=0.0, alpha=1.0, beta=1.0)¶

Find the nearest k matches in the target reader which are at least threshold similar to the query arena fingerprints

This uses Tversky similarity with the specified values of alpha and beta.

Returns:

an FPSSearchResults instance containing a list of query results.

Previous Next

© Copyright 2010-2024, Andrew Dalke.

Built with Sphinx using a theme provided by Read the Docs.