Python API
PoseBusters class
The PoseBusters class collects the molecules to test, runs the modules, and reports the test results.
- class posebusters.PoseBusters(config: str | dict[str, Any] = 'redock', top_n: int | None = None, max_workers: int | None = 0, chunk_size: int | None = 100)
Class to run all tests on a set of molecules.
- bust(mol_pred: Iterable[Mol | Path | str] | Mol | Path | str, mol_true: Mol | Path | str | None = None, mol_cond: Mol | Path | str | None = None, full_report: bool = False) DataFrame
Run tests on one or more molecules.
- Parameters:
mol_pred – Generated molecule(s), e.g. de-novo generated molecule or docked ligand, with one or more poses.
mol_true – True molecule, e.g. crystal ligand, with one or more poses.
mol_cond – Conditioning molecule, e.g. protein.
full_report – Whether to include all columns in the output or only the boolean ones specified in the config.
Notes
Molecules can be provided as rdkit molecule objects or file paths.
- Returns:
Pandas dataframe with results.
- bust_table(mol_table: DataFrame, full_report: bool = False) DataFrame
Run tests on molecules provided in pandas dataframe as paths or rdkit molecule objects.
- Parameters:
mol_table – Pandas dataframe with columns “mol_pred”, “mol_true”, “mol_cond” containing paths to molecules.
full_report – Whether to include all columns in the output or only the boolean ones specified in the config.
- Returns:
Pandas dataframe with results.
- file_paths: DataFrame
Modules
A PoseBusters module is a function that takes one or more of mol_pred, mol_true, and mol_cond as input and returns one or more test results as a dictionary.
- Inputs
Must take one of
mol_pred,mol_true, andmol_condprovided as RDKit molecules.Other inputs are parameters for which default values must be specified.
- Outputs
The output must be a dictionary with at least the results entry.
The
resultsentry contains a dictionary with keys corresponding to the test names and the test outcomes.Other output entries to contain further results e.g. lengths and bound for all bonds in ligand.
Distance Geometry
- posebusters.modules.distance_geometry.check_geometry(mol_pred: Mol, threshold_bad_bond_length: float = 0.2, threshold_clash: float = 0.2, threshold_bad_angle: float = 0.2, bound_matrix_params: dict[str, Any] = {'doTriangleSmoothing': True, 'scaleVDW': True, 'set15bounds': True, 'useMacrocycle14config': False}, ignore_hydrogens: bool = True, sanitize: bool = True, symmetrize_conjugated_terminal_groups: bool = True) dict[str, Any]
Use RDKit distance geometry bounds to check the geometry of a molecule.
- Parameters:
mol_pred – Predicted molecule (docked ligand). Only the first conformer will be checked.
threshold_bad_bond_length – Bond length threshold in relative percentage. 0.2 means that bonds may be up to 20% longer than DG bounds. Defaults to 0.2.
threshold_clash – Threshold for how much overlap constitutes a clash. 0.2 means that the two atoms may be up to 80% of the lower bound apart. Defaults to 0.2.
threshold_bad_angle – Bond angle threshold in relative percentage. 0.2 means that bonds may be up to 20% longer than DG bounds. Defaults to 0.2.
bound_matrix_params – Parameters passe to RDKit’s GetMoleculeBoundsMatrix function.
ignore_hydrogens – Whether to ignore hydrogens. Defaults to True.
sanitize – Sanitize molecule before running DG module (recommended). Defaults to True.
symmetrize_conjugated_terminal_groups – Will symmetrize the lower and upper bounds of the terminal conjugated bonds. Defaults to True.
- Returns:
PoseBusters results dictionary.
Energy Ratio
- posebusters.modules.energy_ratio.check_energy_ratio(mol_pred: Mol, threshold_energy_ratio: float = 7.0, ensemble_number_conformations: int = 100, inchi_strict: bool = False, epsilon=1e-10, num_threads=0) dict[str, dict[str, float | bool]]
Check whether the internal energy of a molecular conformation is too far from its ground state.
Notes
If there are missing explicit hydrogens, they are added and their positions are optimized.
Hydrogens are missing when there are radicals or there are implicit hydrogens.
The energy ratio ‘with hydrogens’ uses the energy of the molecule after filling in hydrogens but before optimizing their positions.
The energy ratio ‘without hydrogens’ uses the energy of the molecule after filling in hydrogens AND after optimizing their positions. So their contribution (if any) to the energy ratio is reduced.
- Parameters:
mol_pred – Predicted molecule (docked ligand) with exactly one conformer.
threshold_energy_ratio – Limit above which the energy ratio is deemed to high. Defaults to 7.0.
ensemble_number_conformations – Number of conformations to generate for the ensemble over which to average. Defaults to 100.
inchi_strict – Whether to treat warnings in the InChI generation as errors. Defaults to False.
num_threads – The number of threads to use for energy minimization. By default, the number of available cores is used.
- Returns:
PoseBusters results dictionary.
Flatness
- posebusters.modules.flatness.check_flatness(mol_pred: Mol, threshold_flatness: float = 0.1, flat_systems: dict[str, str] = {'aromatic_5_membered_rings_sp2': '[ar5^2]1[ar5^2][ar5^2][ar5^2][ar5^2]1', 'aromatic_6_membered_rings_sp2': '[ar6^2]1[ar6^2][ar6^2][ar6^2][ar6^2][ar6^2]1', 'trigonal_planar_double_bonds': '[C;X3;^2](*)(*)=[C;X3;^2](*)(*)'}, check_nonflat: bool = False) dict[str, Any]
Check whether substructures of molecule are flat.
- Parameters:
mol_pred – Molecule with exactly one conformer.
threshold_flatness – Maximum distance from shared plane used as cutoff. Defaults to 0.1.
flat_systems – Patterns of flat (or non-flat) systems provided as SMARTS. Defaults to 5 and 6 membered aromatic rings and carbon sigma bonds.
check_nonflat – Whether to check the ring non-flatness instead of flatness. Turns (flatness <= threshold_flatness) to (flatness >= threshold_flatness).
- Returns:
PoseBusters results dictionary.
Identity
- posebusters.modules.identity.check_identity(mol_pred: Mol, mol_true: Mol, inchi_options: str = '') dict[str, Any]
Check two molecules are identical (docking relevant identity).
- Parameters:
mol_pred – Predicted molecule (docked ligand).
mol_true – Ground truth molecule (crystal ligand) with a conformer.
inchi_options – String of options to pass to the InChI module. Defaults to “”.
- Returns:
PoseBusters results dictionary.
Intermolecular Distance
- posebusters.modules.intermolecular_distance.check_intermolecular_distance(mol_pred: Mol, mol_cond: Mol, radius_type: str = 'vdw', radius_scale: float = 1.0, clash_cutoff: float = 0.75, ignore_types: set[str] = {'hydrogens'}, max_distance: float = 5.0, search_distance: float = 6.0) dict[str, Any]
Check that predicted molecule is not too close and not too far away from conditioning molecule.
- Parameters:
mol_pred – Predicted molecule (docked ligand) with one conformer.
mol_cond – Conditioning molecule (protein) with one conformer.
radius_type – Type of atomic radius to use. Possible values are “vdw” (van der Waals) and “covalent”. Defaults to “vdw”.
radius_scale – Scaling factor for the atomic radii. Defaults to 0.8.
clash_cutoff – Threshold for how much the atoms may overlap before a clash is reported. Defaults to 0.05.
ignore_types – Which types of atoms to ignore in mol_cond. Possible values to include are “hydrogens”, “protein”, “organic_cofactors”, “inorganic_cofactors”, “waters”. Defaults to {“hydrogens”}.
max_distance – Maximum distance (in Angstrom) predicted and conditioning molecule may be apart to be considered as valid. Defaults to 5.0.
- Returns:
PoseBusters results dictionary.
Loading
- posebusters.modules.loading.check_loading(mol_pred: Any = None, mol_true: Any = None, mol_cond: Any = None) dict[str, dict[str, bool]]
Check that molecule files were loaded successfully.
- Parameters:
mol_pred – Predicted molecule. Defaults to None.
mol_true – Ground truth molecule. Defaults to None.
mol_cond – Conditioning molecule. Defaults to None.
- Returns:
PoseBusters results dictionary.
RMSD
- posebusters.modules.rmsd.check_rmsd(mol_pred: Mol, mol_true: Mol, rmsd_threshold: float = 2.0, heavy_only: bool = True, choose_by: str = 'rmsd') dict[str, dict[str, bool | float]]
Calculate RMSD and related metrics between predicted molecule and closest ground truth molecule.
- Parameters:
mol_pred – Predicted molecule (docked ligand) with exactly one conformer.
mol_true – Ground truth molecule (crystal ligand) with at least one conformer. If multiple conformers are present, the lowest RMSD will be reported.
rmsd_threshold – Threshold in angstrom for reporting whether RMSD is within threshold. Defaults to 2.0.
heavy_only – Whether to only consider heavy atoms for RMSD calculation. Defaults to True.
choose_by – Metric to choose which mol_true conformation to compare to. Defaults to “rmsd”.
- Returns:
PoseBusters results dictionary.
Volume Overlap
- posebusters.modules.volume_overlap.check_volume_overlap(mol_pred: Mol, mol_cond: Mol, clash_cutoff: float = 0.05, vdw_scale: float = 0.8, ignore_types: set[str] = {'hydrogens'}, search_distance: float = 6.0) dict[str, dict[str, float | bool]]
Check volume overlap between ligand and protein.
- Parameters:
mol_pred – Predicted molecule (docked ligand) with one conformer.
mol_cond – Conditioning molecule (protein) with one conformer.
clash_cutoff – Threshold for how much volume overlap is allowed. This is the maximum share of volume of mol_pred allowed to overlap with mol_cond. Defaults to 0.05.
vdw_scale – Scaling factor for the van der Waals radii which define the volume around each atom. Defaults to 0.8.
ignore_types – Which types of atoms in mol_cond to ignore. Possible values to include are “hydrogens”, “protein”, “organic_cofactors”, “inorganic_cofactors”, “waters”. Defaults to {“hydrogens”}.
- Returns:
PoseBusters results dictionary.