PoseBusters: Plausibility checks for generated molecule poses.
PoseBusters is a command line tool and Python library to check the output of molecular conformation generators, docking programs, or other programs that generate molecules in 3D.
Quick start
PoseBusters can be installed from PyPI.
>>> pip install posebusters
Use the bust
command to check molecules, docked ligands or generated molecules conditioned on a protein.
>>> bust redocked_ligand.sdf -l crystal_ligand.sdf -p protein.pdb --outfmt short
redocked_ligand.sdf redocked_ligand passes (25 / 25)
...
>>> bust redocked_ligand.sdf -l crystal_ligand.sdf -p protein.pdb --outfmt long
Long summary for redocked_ligand.sdf redocked_ligand
MOL_PRED loaded .
MOL_TRUE loaded .
MOL_COND loaded .
Sanitization .
All atoms connected .
Molecular formula .
Molecular bonds .
Double bond stereochemistry .
Tetrahedral chirality .
Bond lengths .
Bond angles .
...
>>> bust redocked_ligand.sdf -l crystal_ligand.sdf -p protein.pdb --outfmt csv
file,molecule,mol_pred_loaded,mol_true_loaded,mol_cond_loaded,sanitization,all_atoms_connected,molecular_formula,molecular_bonds,double_bond_stereochemistry,tetrahedral_chirality,bond_lengths,bond_angles,internal_steric_clash,aromatic_ring_flatness,double_bond_flatness,internal_energy,protein-ligand_maximum_distance,minimum_distance_to_protein,minimum_distance_to_organic_cofactors,minimum_distance_to_inorganic_cofactors,minimum_distance_to_waters,volume_overlap_with_protein,volume_overlap_with_organic_cofactors,volume_overlap_with_inorganic_cofactors,volume_overlap_with_waters,rmsd_≤_2å
redocked_ligand.sdf,redocked_ligand,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True
...
For more usage examples, bulk processing, and the Python API see the documentation for the command line tool and the Python library.
Docking methods comparison
For more detailed information about the tests and for a study using PoseBusters to compare docking methods, refer to our preprint:
@online{buttenschoen2023posebusters,
title = {{{PoseBusters}}: {{AI-based}} Docking Methods Fail to Generate Physically Valid Poses or Generalise to Novel Sequences},
shorttitle = {{{PoseBusters}}},
author = {Buttenschoen, Martin and Morris, Garrett M. and Deane, Charlotte M.},
date = {2023-08-10},
eprint = {2308.05777},
eprinttype = {arxiv}
}
Sample checks
For more information on the checks, see Checks.
Steric clash |
|
Bad: Molecule intertwined and atoms clashing |
Good: |
Aromatic rings flatness |
|
Bad: Conjugated pi bond systems should be flat |
Good: |
Volume overlap |
|
Bad: Ligand and receptor clash |
Good: |
For more information on the checks, see Checks.
Contents
Checks
Example failure modes
In molecular conformation generation, docking, and de-novo molecular generation the generated molecules conformation should have a reasonable geometry including standard bond lengths and angles and no steric clash.
Bond lengths |
|
Bottom left carbon-oxygen bond too short |
Bond angles |
|
Bond angles off and atoms clashing |
Steric clash |
|
Molecule intertwined and atoms clashing |
High energy conformation |
|
Twisted rings energetically unfavorable |
Aromatic rings not flat |
|
Conjugated pi bond systems should be flat |
In docking the molecular identity should be preserved including stereochemistry.
Tetrahedral stereochemistry changed |
|
Top right oxygen facing the wrong way |
Double bond stereochemistry changed |
|
Right most double bond should be cis |
In de-novo molecular generator or docking the generated molecule should be placed with in a receptor’s binding pocket without any steric clash.
Volume overlap |
|
Ligand and receptor clash |
More details on tests and docking method comparison
For more detailed information about the tests and for a study using PoseBusters to compare docking methods, refer to our preprint:
@online{buttenschoen2023posebusters,
title = {{{PoseBusters}}: {{AI-based}} Docking Methods Fail to Generate Physically Valid Poses or Generalise to Novel Sequences},
shorttitle = {{{PoseBusters}}},
author = {Buttenschoen, Martin and Morris, Garrett M. and Deane, Charlotte M.},
date = {2023-08-10},
eprint = {2308.05777},
eprinttype = {arxiv}
}
Command line interface
PoseBusters provides the command bust
for checking generated molecules
and optionally taking a conditioning protein or ligands into account.
Use bust
to check a series of molecules within one .sdf
file.
>>> bust generated_molecules.sdf --outfmt long
Long summary for generated_molecules.sdf molecule_1
MOL_PRED loaded .
Sanitization .
All atoms connected .
Bond lengths Fail
Bond angles Fail
Internal steric clash .
Aromatic ring flatness .
Double bond flatness .
Internal energy .
Long summary for generated_molecules.sdf molecule_2
MOL_PRED loaded .
Sanitization .
All atoms connected .
Bond lengths .
Bond angles .
Internal steric clash .
Aromatic ring flatness .
Double bond flatness .
Internal energy .
Long summary for generated_molecules.sdf molecule_3
...
Check a docked ligand or generated molecule conditioned on a protein.
>>> bust generated_ligands.sdf -p protein.pdb --outfmt long
Long summary for generated_ligands.sdf conformation_1
MOL_PRED loaded .
MOL_COND loaded .
Sanitization .
All atoms connected .
Bond lengths .
Bond angles .
Internal steric clash .
Aromatic ring flatness .
Double bond flatness .
Internal energy .
Protein-ligand maximum distance .
Minimum distance to protein .
Minimum distance to organic cofactors .
Minimum distance to inorganic cofactors .
Minimum distance to waters .
Volume overlap with protein .
Volume overlap with organic cofactors .
Volume overlap with inorganic cofactors .
Volume overlap with waters .
Long summary for generated_ligands.sdf conformation_2
...
Check a series of re-docked ligands against the crystal ligand and protein.
>>> bust redocked_ligand.sdf -l crystal_ligand.sdf -p protein.pdb --outfmt long
Long summary for redocked_ligand.sdf redocked_ligand
MOL_PRED loaded .
MOL_TRUE loaded .
MOL_COND loaded .
Sanitization .
All atoms connected .
Molecular formula .
Molecular bonds .
Double bond stereochemistry .
Tetrahedral chirality .
Bond lengths .
Bond angles .
Internal steric clash .
Aromatic ring flatness .
Double bond flatness .
Internal energy .
Protein-ligand maximum distance .
Minimum distance to protein .
Minimum distance to organic cofactors .
Minimum distance to inorganic cofactors .
Minimum distance to waters .
...
Use the -t
option to bulk check multiple sets of files.
>>> bust -t molecule_table.csv --outfmt long
Long summary for 1ia1/1ia1_ligand.sdf TQ3
MOL_PRED loaded .
MOL_TRUE loaded .
MOL_COND loaded .
Sanitization .
All atoms connected .
Molecular formula .
Molecular bonds .
Double bond stereochemistry .
Tetrahedral chirality .
Bond lengths .
Bond angles .
Internal steric clash .
Aromatic ring flatness .
Double bond flatness .
Internal energy .
Protein-ligand maximum distance .
Minimum distance to protein .
Minimum distance to organic cofactors .
Minimum distance to inorganic cofactors .
Minimum distance to waters .
...
Output format options
The short format is the default output format.
>>> bust generated_molecules.sdf --outfmt short
generated_molecules.sdf molecule_1 passes (7 / 9)
generated_molecules.sdf molecule_2 passes (9 / 9)
generated_molecules.sdf molecule_3 passes (9 / 9)
The long format lists each test result for each molecule/conformation.
>>> bust generated_molecules.sdf --outfmt long
Long summary for generated_molecules.sdf molecule_1
MOL_PRED loaded .
Sanitization .
All atoms connected .
Bond lengths Fail
Bond angles Fail
Internal steric clash .
Aromatic ring flatness .
Double bond flatness .
Internal energy .
Long summary for generated_molecules.sdf molecule_2
MOL_PRED loaded .
Sanitization .
All atoms connected .
Bond lengths .
Bond angles .
Internal steric clash .
Aromatic ring flatness .
Double bond flatness .
Internal energy .
Long summary for generated_molecules.sdf molecule_3
...
For copying and saving the output use the csv
option.
>>> bust generated_molecules.sdf --outfmt csv
file,molecule,mol_pred_loaded,sanitization,all_atoms_connected,bond_lengths,bond_angles,internal_steric_clash,aromatic_ring_flatness,double_bond_flatness,internal_energy
generated_molecules.sdf,molecule_1,True,True,True,False,False,True,True,True,True
generated_molecules.sdf,molecule_2,True,True,True,True,True,True,True,True,True
generated_molecules.sdf,molecule_3,True,True,True,True,True,True,True,True,True
For the csv
and long
option, the --full-report
option can
be used to show extra information beyond the pass/fail status of each test.
>>> bust generated_molecules.sdf --outfmt csv --full-report
file,molecule,mol_pred_loaded,sanitization,all_atoms_connected,bond_lengths,bond_angles,internal_steric_clash,aromatic_ring_flatness,double_bond_flatness,internal_energy,mol_true_loaded,mol_cond_loaded,passes_valence_checks,passes_kekulization,number_bonds,shortest_bond_relative_length,longest_bond_relative_length,number_short_outlier_bonds,number_long_outlier_bonds,number_angles,most_extreme_relative_angle,number_outlier_angles,number_noncov_pairs,shortest_noncovalent_relative_distance,number_clashes,number_valid_bonds,number_valid_angles,number_valid_noncov_pairs,number_aromatic_rings_checked,number_aromatic_rings_pass,aromatic_ring_maximum_distance_from_plane,number_double_bonds_checked,number_double_bonds_pass,double_bond_maximum_distance_from_plane,ensemble_avg_energy,mol_pred_energy,energy_ratio
generated_molecules.sdf,molecule_1,True,True,True,False,False,True,True,True,True,False,False,True,True,27,0.8567948530292413,1.2797521607839486,0,1,38,1.3092992455795367,2,235,0.7738421740224927,0,26,36,235,0,0,,0,0,,608.5563186553628,1377.8697297363487,2.264161405440378
generated_molecules.sdf,molecule_2,True,True,True,True,True,True,True,True,True,False,False,True,True,27,0.8951618398686666,1.0410778462688173,0,0,38,1.1090396233249729,0,260,0.9498773674061802,0,27,38,260,1,1,0.10944595440461957,0,0,,259.7528747776052,253.50849461438324,0.9759603039290007
generated_molecules.sdf,molecule_3,True,True,True,True,True,True,True,True,True,False,False,True,True,28,0.8337212356590878,1.0705087667583508,0,0,40,1.1485967900513152,0,257,0.9115215657793475,0,28,40,257,1,1,0.09186620947420678,0,0,,301.49094062530395,341.2052918845499,1.131726516149629
...
Saving the output
The --out
option can be used to save the output to a file:
bust generated_molecules.sdf --outfmt csv --out results.csv
Help
Running with the --help
option prints information about the command line options.
>>> bust --help
usage: bust [-l MOL_TRUE] [-p MOL_COND] [-t TABLE] [--outfmt {short,long,csv}]
[--output OUTPUT] [--full-report] [--no-header] [--config CONFIG]
[--top-n TOP_N] [-v] [-h]
[mol_pred [mol_pred ...]]
PoseBusters: Plausibility checks for generated molecule poses.
Input:
mol_pred molecule(s) to check
-l MOL_TRUE true molecule, e.g. crystal ligand
-p MOL_COND conditioning molecule, e.g. protein
-t TABLE run multiple inputs listed in a .csv file
Output:
--outfmt {short,long,csv}
output format
--output OUTPUT output file (default: stdout)
--full-report print details for each test
--no-header print output without header
Configuration:
--config CONFIG configuration file
--top-n TOP_N run on TOP_N results in MOL_PRED only (default: all)
Information:
-v, --version show program's version number and exit
-h, --help show this help message and exit
>>> bust --version
bust 0.2.5
Python API
PoseBusters class
The PoseBusters class collects the molecules to test, runs the modules, and reports the test results.
- class posebusters.PoseBusters(config: str | dict[str, Any] = 'redock', top_n: int | None = None)
Class to run all tests on a set of molecules.
- bust(mol_pred: Iterable[Mol | Path] | Mol | Path, mol_true: Mol | Path | None = None, mol_cond: Mol | Path | None = None, full_report: bool = False) pd.DataFrame
Run tests on one or more molecules.
- Parameters:
mol_pred – Generated molecule(s), e.g. de-novo generated molecule or docked ligand, with one or more poses.
mol_true – True molecule, e.g. crystal ligand, with one or more poses.
mol_cond – Conditioning molecule, e.g. protein.
full_report – Whether to include all columns in the output or only the boolean ones specified in the config.
Notes
Molecules can be provided as rdkit molecule objects or file paths.
- Returns:
Pandas dataframe with results.
- bust_table(mol_table: DataFrame, full_report: bool = False) DataFrame
Run tests on molecules provided in pandas dataframe as paths or rdkit molecule objects.
- Parameters:
mol_table – Pandas dataframe with columns “mol_pred”, “mol_true”, “mol_cond” containing paths to molecules.
full_report – Whether to include all columns in the output or only the boolean ones specified in the config.
- Returns:
Pandas dataframe with results.
Modules
A PoseBusters module is a function that takes one or more of mol_pred
, mol_true
, and mol_cond
as input and returns one or more test results as a dictionary.
- Inputs
Must take one of
mol_pred
,mol_true
, andmol_cond
provided as RDKit molecules.Other inputs are parameters for which default values must be specified.
- Outputs
The output must be a dictionary with at least the results entry.
The
results
entry contains a dictionary with keys corresponding to the test names and the test outcomes.Other output entries to contain further results e.g. lengths and bound for all bonds in ligand.
Distance Geometry
- posebusters.modules.distance_geometry.check_geometry(mol_pred: Mol, threshold_bad_bond_length: float = 0.2, threshold_clash: float = 0.2, threshold_bad_angle: float = 0.2, bound_matrix_params: dict[str, Any] = {'doTriangleSmoothing': True, 'scaleVDW': True, 'set15bounds': True, 'useMacrocycle14config': False}, ignore_hydrogens: bool = True, sanitize: bool = True) dict[str, Any]
Use RDKit distance geometry bounds to check the geometry of a molecule.
- Parameters:
mol_pred – Predicted molecule (docked ligand). Only the first conformer will be checked.
threshold_bad_bond_length – Bond length threshold in relative percentage. 0.2 means that bonds may be up to 20% longer than DG bounds. Defaults to 0.2.
threshold_clash – Threshold for how much overlap constitutes a clash. 0.2 means that the two atoms may be up to 80% of the lower bound apart. Defaults to 0.2.
threshold_bad_angle – Bond angle threshold in relative percentage. 0.2 means that bonds may be up to 20% longer than DG bounds. Defaults to 0.2.
bound_matrix_params – Parameters passe to RDKit’s GetMoleculeBoundsMatrix function.
ignore_hydrogens – Whether to ignore hydrogens. Defaults to True.
sanitize – Sanitize molecule before running DG module (recommended). Defaults to True.
- Returns:
PoseBusters results dictionary.
Energy Ratio
- posebusters.modules.energy_ratio.check_energy_ratio(mol_pred: Mol, threshold_energy_ratio: float = 7.0, ensemble_number_conformations: int = 100)
Check whether the energy of the docked ligand is within user defined range.
- Parameters:
mol_pred – Predicted molecule (docked ligand) with exactly one conformer.
threshold_energy_ratio – Limit above which the energy ratio is deemed to high. Defaults to 7.0.
ensemble_number_conformations – Number of conformations to generate for the ensemble over which to average. Defaults to 100.
- Returns:
PoseBusters results dictionary.
Flatness
- posebusters.modules.flatness.check_flatness(mol_pred: Mol, threshold_flatness: float = 0.1, flat_systems: dict[str, str] = {'aromatic_5_membered_rings_sp2': '[ar5^2]1[ar5^2][ar5^2][ar5^2][ar5^2]1', 'aromatic_6_membered_rings_sp2': '[ar6^2]1[ar6^2][ar6^2][ar6^2][ar6^2][ar6^2]1', 'trigonal_planar_double_bonds': '[C;X3;^2](*)(*)=[C;X3;^2](*)(*)'}) dict[str, Any]
Check whether substructures of molecule are flat.
- Parameters:
mol_pred – Molecule with exactly one conformer.
threshold_flatness – Maximum distance from shared plane used as cutoff. Defaults to 0.1.
flat_systems – Patterns of flat systems provided as SMARTS. Defaults to 5 and 6 membered aromatic rings and carbon sigma bonds.
- Returns:
PoseBusters results dictionary.
Identity
- posebusters.modules.identity.check_identity(mol_pred: Mol, mol_true: Mol, inchi_options: str = '') dict[str, Any]
Check two molecules are identical (docking relevant identity).
- Parameters:
mol_pred – Predicted molecule (docked ligand).
mol_true – Ground truth molecule (crystal ligand) with a conformer.
inchi_options – String of options to pass to the InChI module. Defaults to “”.
- Returns:
PoseBusters results dictionary.
Intermolecular Distance
- posebusters.modules.intermolecular_distance.check_intermolecular_distance(mol_pred: Mol, mol_cond: Mol, radius_type: str = 'vdw', radius_scale: float = 1.0, clash_cutoff: float = 0.75, ignore_types: set[str] = {'hydrogens'}, max_distance: float = 5.0, search_distance: float = 6.0) dict[str, Any]
Check that predicted molecule is not too close and not too far away from conditioning molecule.
- Parameters:
mol_pred – Predicted molecule (docked ligand) with one conformer.
mol_cond – Conditioning molecule (protein) with one conformer.
radius_type – Type of atomic radius to use. Possible values are “vdw” (van der Waals) and “covalent”. Defaults to “vdw”.
radius_scale – Scaling factor for the atomic radii. Defaults to 0.8.
clash_cutoff – Threshold for how much the atoms may overlap before a clash is reported. Defaults to 0.05.
ignore_types – Which types of atoms to ignore in mol_cond. Possible values to include are “hydrogens”, “protein”, “organic_cofactors”, “inorganic_cofactors”, “waters”. Defaults to {“hydrogens”}.
max_distance – Maximum distance (in Angstrom) predicted and conditioning molecule may be apart to be considered as valid. Defaults to 5.0.
- Returns:
PoseBusters results dictionary.
Loading
- posebusters.modules.loading.check_loading(mol_pred: Any = None, mol_true: Any = None, mol_cond: Any = None) dict[str, dict[str, bool]]
Check that molecule files were loaded successfully.
- Parameters:
mol_pred – Predicted molecule. Defaults to None.
mol_true – Ground truth molecule. Defaults to None.
mol_cond – Conditioning molecule. Defaults to None.
- Returns:
PoseBusters results dictionary.
RMSD
- posebusters.modules.rmsd.check_rmsd(mol_pred: Mol, mol_true: Mol, rmsd_threshold: float = 2.0) dict[str, dict[str, bool | float]]
Calculate RMSD and related metrics between predicted molecule and closest ground truth molecule.
- Parameters:
mol_pred – Predicted molecule (docked ligand) with exactly one conformer.
mol_true – Ground truth molecule (crystal ligand) with at least one conformer. If multiple conformers are present, the lowest RMSD will be reported.
rmsd_threshold – Threshold in angstrom for reporting whether RMSD is within threshold. Defaults to 2.0.
- Returns:
PoseBusters results dictionary.
Volume Overlap
- posebusters.modules.volume_overlap.check_volume_overlap(mol_pred: Mol, mol_cond: Mol, clash_cutoff: float = 0.05, vdw_scale: float = 0.8, ignore_types: set[str] = {'hydrogens'}, search_distance: float = 6.0) dict[str, dict]
Check volume overlap between ligand and protein.
- Parameters:
mol_pred – Predicted molecule (docked ligand) with one conformer.
mol_cond – Conditioning molecule (protein) with one conformer.
clash_cutoff – Threshold for how much volume overlap is allowed. This is the maximum share of volume of mol_pred allowed to overlap with mol_cond. Defaults to 0.05.
vdw_scale – Scaling factor for the van der Waals radii which define the volume around each atom. Defaults to 0.8.
ignore_types – Which types of atoms in mol_cond to ignore. Possible values to include are “hydrogens”, “protein”, “organic_cofactors”, “inorganic_cofactors”, “waters”. Defaults to {“hydrogens”}.
- Returns:
PoseBusters results dictionary.
User API examples
Setup python environment and install posebusters to run this notebook.
conda create -n posebusters python=3.10 jupyter notebook
conda activate posebusters
pip install posebusters --upgrade
[1]:
from posebusters import PoseBusters
from pathlib import Path
[2]:
pred_file = Path("inputs/generated_molecules.sdf") # predicted or generated molecules
true_file = Path("inputs/crystal_ligand.sdf") # "ground truth" molecules
cond_file = Path("inputs/protein.pdb") # conditioning molecule
PoseBusters default configs
redock
The `redock’ mode is for ligands docked into their cognate receptor crystal structures.
[3]:
# by default only the binary test report columns are returned
buster = PoseBusters(config="redock")
df = buster.bust([pred_file], true_file, cond_file)
print(df.shape)
df
[20:53:44] WARNING: Problems/mismatches: Mobile-H( Hydrogens: Number; Mobile-H groups: Falsely present, Attachment points; Charge(s): Do not match)
(3, 25)
[3]:
mol_pred_loaded | mol_true_loaded | mol_cond_loaded | sanitization | all_atoms_connected | molecular_formula | molecular_bonds | double_bond_stereochemistry | tetrahedral_chirality | bond_lengths | ... | protein-ligand_maximum_distance | minimum_distance_to_protein | minimum_distance_to_organic_cofactors | minimum_distance_to_inorganic_cofactors | minimum_distance_to_waters | volume_overlap_with_protein | volume_overlap_with_organic_cofactors | volume_overlap_with_inorganic_cofactors | volume_overlap_with_waters | rmsd_≤_2å | ||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
file | molecule | |||||||||||||||||||||
inputs/generated_molecules.sdf | molecule_1 | True | True | True | True | True | False | False | True | False | False | ... | False | True | True | True | True | True | True | True | True | False |
molecule_2 | True | True | True | True | True | False | False | True | True | True | ... | False | True | True | True | True | True | True | True | True | False | |
molecule_3 | True | True | True | True | True | False | False | True | True | True | ... | False | True | True | True | True | True | True | True | True | False |
3 rows × 25 columns
dock
The dock
mode is for de-novo generated molecules for a given receptor or for ligands docked into a non-cognate receptor.
[4]:
buster = PoseBusters(config="dock")
df = buster.bust([pred_file], true_file, cond_file)
print(df.shape)
df
(3, 19)
[4]:
mol_pred_loaded | mol_cond_loaded | sanitization | all_atoms_connected | bond_lengths | bond_angles | internal_steric_clash | aromatic_ring_flatness | double_bond_flatness | internal_energy | protein-ligand_maximum_distance | minimum_distance_to_protein | minimum_distance_to_organic_cofactors | minimum_distance_to_inorganic_cofactors | minimum_distance_to_waters | volume_overlap_with_protein | volume_overlap_with_organic_cofactors | volume_overlap_with_inorganic_cofactors | volume_overlap_with_waters | ||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
file | molecule | |||||||||||||||||||
inputs/generated_molecules.sdf | molecule_1 | True | True | True | True | False | False | True | True | True | True | False | True | True | True | True | True | True | True | True |
molecule_2 | True | True | True | True | True | True | True | True | True | True | False | True | True | True | True | True | True | True | True | |
molecule_3 | True | True | True | True | True | True | True | True | True | True | False | True | True | True | True | True | True | True | True |
mol
The mol
mode is for de-novo generated molecules or for generated molecular conformations.
[5]:
buster = PoseBusters(config="mol")
df = buster.bust([pred_file], None, None)
print(df.shape)
df
(3, 9)
[5]:
mol_pred_loaded | sanitization | all_atoms_connected | bond_lengths | bond_angles | internal_steric_clash | aromatic_ring_flatness | double_bond_flatness | internal_energy | ||
---|---|---|---|---|---|---|---|---|---|---|
file | molecule | |||||||||
inputs/generated_molecules.sdf | molecule_1 | True | True | True | False | False | True | True | True | True |
molecule_2 | True | True | True | True | True | True | True | True | True | |
molecule_3 | True | True | True | True | True | True | True | True | True |
Output formatting
full report
The full_report
option of bust
will return all columns of the test reports, not only the binary columns. This is useful for debugging and for further analysis of the results.
[6]:
buster = PoseBusters(config="mol")
df = buster.bust([pred_file], None, None, full_report=True)
print(df.shape)
df
(3, 36)
[6]:
mol_pred_loaded | sanitization | all_atoms_connected | bond_lengths | bond_angles | internal_steric_clash | aromatic_ring_flatness | double_bond_flatness | internal_energy | mol_true_loaded | ... | number_valid_noncov_pairs | number_aromatic_rings_checked | number_aromatic_rings_pass | aromatic_ring_maximum_distance_from_plane | number_double_bonds_checked | number_double_bonds_pass | double_bond_maximum_distance_from_plane | ensemble_avg_energy | mol_pred_energy | energy_ratio | ||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
file | molecule | |||||||||||||||||||||
inputs/generated_molecules.sdf | molecule_1 | True | True | True | False | False | True | True | True | True | False | ... | 235 | 0 | 0 | NaN | 0 | 0 | NaN | 608.632249 | 1377.869730 | 2.263879 |
molecule_2 | True | True | True | True | True | True | True | True | True | False | ... | 260 | 1 | 1 | 0.078919 | 0 | 0 | NaN | 259.822517 | 253.508495 | 0.975699 | |
molecule_3 | True | True | True | True | True | True | True | True | True | False | ... | 257 | 1 | 1 | 0.091866 | 0 | 0 | NaN | 301.579457 | 341.205292 | 1.131394 |
3 rows × 36 columns