Input/Output Utilities
This module provides functions for loading and saving datasets, as well as converting between different data formats. It is useful for preparing data for training and testing DirectMultiStep models.
Example Use
The most useful functions are load_dataset_sm
, load_dataset_nosm
, save_dataset_sm
, and load_pharma_compounds
. These functions allow you to load and save datasets in a variety of formats.
from pathlib import Path
from directmultistep.utils.io import load_pharma_compounds
data_path = Path.cwd() / "data"
_products, _sms, _path_strings, _steps_list, nameToIdx = load_pharma_compounds(data_path / "pharma_compounds.json")
Source Code
directmultistep.utils.io
DatasetDict
Bases: TypedDict
A dictionary type for storing dataset information.
Attributes:
Name | Type | Description |
---|---|---|
products |
list[str]
|
List of product SMILES strings. |
starting_materials |
list[str]
|
List of starting material SMILES strings. |
path_strings |
list[str]
|
List of string representations of reaction paths. |
n_steps_list |
list[int]
|
List of integers representing the number of steps in each path. |
ds_name |
str
|
Name of the dataset. |
nameToIdx |
dict[str, list[int]] | None
|
A dictionary mapping names to lists of indices. |
Source code in src/directmultistep/utils/io.py
load_dataset_sm(path)
Loads a dataset from a pickle file containing starting materials.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
path
|
Path
|
The path to the pickle file. |
required |
Returns:
Type | Description |
---|---|
DatasetDict
|
A dictionary containing the loaded dataset. |
Source code in src/directmultistep/utils/io.py
load_dataset_nosm(path)
Loads a dataset from a pickle file without starting materials.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
path
|
Path
|
The path to the pickle file. |
required |
Returns:
Type | Description |
---|---|
DatasetDict
|
A dictionary containing the loaded dataset. |
Source code in src/directmultistep/utils/io.py
save_dataset_sm(data, path)
Saves a dataset to a pickle file, including starting materials.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data
|
dict[str, Any]
|
The dataset dictionary to save. |
required |
path
|
Path
|
The path to save the pickle file. |
required |
Source code in src/directmultistep/utils/io.py
convert_dict_of_lists_to_list_of_dicts(dict_of_lists)
Converts a dictionary of lists to a list of dictionaries.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dict_of_lists
|
DatasetDict
|
The dictionary of lists to convert. |
required |
Returns:
Type | Description |
---|---|
list[dict[str, str]]
|
A list of dictionaries. |
Source code in src/directmultistep/utils/io.py
convert_list_of_dicts_to_dict_of_lists(list_of_dicts)
Converts a list of dictionaries to a dictionary of lists.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
list_of_dicts
|
list[dict[str, str]]
|
The list of dictionaries to convert. |
required |
Returns:
Type | Description |
---|---|
dict[str, list[str]]
|
A dictionary of lists. |
Source code in src/directmultistep/utils/io.py
load_pharma_compounds(path_to_json, load_sm=True)
Loads pharmaceutical compounds from a JSON file.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
path_to_json
|
Path
|
The path to the JSON file. |
required |
load_sm
|
bool
|
Whether to load starting materials. |
True
|
Returns:
Type | Description |
---|---|
DatasetDict
|
A dictionary containing the loaded dataset. |
Source code in src/directmultistep/utils/io.py
load_commercial_stock(path)
Loads a set of molecules from a file, canonicalizes them, and returns a set.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
path
|
Path
|
The path to the file containing molecules. |
required |
Returns:
Type | Description |
---|---|
set[str]
|
A set of canonicalized SMILES strings. |