A collection of curated and standardized Molecular DataSets (MolDS) for benchmarking machine learning methods. For all datasets, we provide standardized dataset splitting.

Key Features

  • All the datasets are curated and standardized in the same procedure.
  • We provide standardized data splitting (Details see the summary).
  • Tools for curating and standardizing new datasets.

Installation

Prerequirments:

conda install -c conda-forge rdkit

Data Set Summary

Example