API documentation
Checkpoints
Module for managing PyTorch model checkpoints.
Provides the CheckpointManager class to save and load model and optimizer states during training, track the best metric values, and optionally report checkpoint events.
- class congrads.checkpoints.CheckpointManager(criteria_function: Callable[[dict[str, Tensor], dict[str, Tensor]], bool], network: Module, optimizer: Optimizer, metric_manager: MetricManager, save_dir: str = 'checkpoints', create_dir: bool = False, report_save: bool = False)
Bases:
objectManage saving and loading checkpoints for PyTorch models and optimizers.
Handles checkpointing based on a criteria function, restores metric states, and optionally reports when a checkpoint is saved.
- evaluate_criteria(epoch: int, metric_group: str = 'during_training')
Evaluate the criteria function to determine if a better model is found.
Aggregates the current metric values during training and applies the criteria function. If the criteria function indicates improvement, the best metric values are updated, a checkpoint is saved, and a message is optionally printed.
- load(filename: str)
Load a checkpoint and restore the training state.
Loads the checkpoint from the specified file and restores the network weights, optimizer state, and best metric values.
- resume(filename: str = 'checkpoint.pth', ignore_missing: bool = False) int
Resumes training from a saved checkpoint file.
- Parameters:
- Returns:
- The epoch number from the loaded checkpoint, or 0 if
ignore_missing is True and no checkpoint was found.
- Return type:
- Raises:
TypeError – If a provided attribute has an incompatible type.
FileNotFoundError – If the specified checkpoint file does not exist.
Constraints
Core
Datasets
Descriptor
This module defines the Descriptor class, which allows assigning tags to parts in the network.
It is designed to manage the mapping between tags, their corresponding data dictionary keys and indices, and additional properties such as constant or variable status. It provides a way to easily place constraints on parts of your network, by referencing the tags instead of indices.
The Descriptor class allows for easy constraint definitions on parts of your neural network. It supports registering tags with associated data dictionary keys, indices, and optional attributes, such as whether the data is constant or variable.
- class congrads.descriptor.Descriptor
Bases:
objectA class to manage the mapping between tags.
It represents data locations in the data dictionary and holds the dictionary keys, indices, and additional properties (such as min/max values, output, and constant variables).
This class is designed to manage the relationships between the assigned tags and the data dictionary keys in a neural network model. It allows for the assignment of properties (like minimum and maximum values, and whether data is an output, constant, or variable) to each tag. The data is stored in dictionaries and sets for efficient lookups.
- add(key: str, tag: str, index: int = None, constant: bool = False, affects_loss: bool = True)
Adds a tag to the descriptor with its associated key, index, and properties.
This method registers a tag name and associates it with a data dictionary key, its index, and optional properties such as whether the key hold output or constant data.
- Parameters:
key (str) – The key on which the tagged data is located in the data dictionary.
tag (str) – The identifier of the tag.
index (int) – The index were the data is present. Defaults to None.
constant (bool, optional) – Whether the data is constant and is not learned. Defaults to False.
affects_loss (bool, optional) – Whether the data affects the loss computation. Defaults to True.
- Raises:
TypeError – If a provided attribute has an incompatible type.
ValueError – If a key or index is already assigned for a tag or a duplicate index is used within a key.
- location(tag: str) tuple[str, int | None]
Get the key and index for a given tag.
Looks up the mapping for a registered tag and returns the associated dictionary key and the index.
- Parameters:
tag (str) – The tag identifier. Must be registered.
- Returns:
- A tuple containing:
The key in the data dictionary which holds the data (str).
The tensor index where the data is present or None (int | None).
- Return type:
- Raises:
ValueError – If the tag is not registered in the descriptor.
- select(tag: str, data: dict[str, Tensor]) Tensor
Extract prediction values for a specific tag.
Retrieves the key and index associated with a tag and selects the corresponding slice from the given prediction tensor. Returns the full tensor if no index was specified when registering the tag.
- Parameters:
- Returns:
A tensor slice of shape
(batch_size, 1)containing the predictions for the specified tag, or the full tensor if no index was specified when registering the tag.- Return type:
Tensor
- Raises:
ValueError – If the tag is not registered in the descriptor.
Metrics
Module for managing metrics during training.
Provides the Metric and MetricManager classes for accumulating, aggregating, and resetting metrics over training batches. Supports grouping metrics and using custom accumulation functions.
- class congrads.metrics.Metric(name: str, accumulator: ~collections.abc.Callable[[...], ~torch.Tensor] = <built-in method nanmean of type object>)
Bases:
objectRepresents a single metric to be accumulated and aggregated.
Stores metric values over multiple batches and computes an aggregated result using a specified accumulation function.
- accumulate(value: Tensor) None
Accumulate a new value for the metric.
- Parameters:
value (Tensor) – Metric values for the current batch.
- class congrads.metrics.MetricManager
Bases:
objectManages multiple metrics and groups for training or evaluation.
Supports registering metrics, accumulating values by name, aggregating metrics by group, and resetting metrics by group.
- accumulate(name: str, value: Tensor) None
Accumulate a value for a specific metric by name.
- Parameters:
name (str) – Name of the metric.
value (Tensor) – Metric values for the current batch.
- register(name: str, group: str = 'default', accumulator: ~collections.abc.Callable[[...], ~torch.Tensor] = <built-in method nanmean of type object>) None
Register a new metric under a specified group.