Monotonic Health Score
Prerequisites
Add the Congrads package to the current Colab notebook environment and install it.
Import the necesary functions and classes.
Define utility functions for plotting and other.
Before starting with the general training procedure, we fix the randomizer seeds and get the device on which we are training our model:
We have a built-in Seeder class that will pseudo-randomly fix the seeds of random number generators, Numpy and PyTorch.
If there is a GPU available, use it. Otherwise fall back to CPU.
Problem description
In this last example, the goal is to predict a “health score”. This is a monotonically decreasing function that starts near 1 and ends near 0.
This problem is a simplification of a Remaining Useful Life (RUL) prediction, using synthetically generating data as shown on the figure below. An assumption we make here is that a difference in RUL must be a consequence of a change in input.
Mathematically:
\(x \in [0, 1]\), then \(f(x) \ge 0\)
RUL is non-negative\(x \in [0, 1]\), then \(f(x) \le 1\)
RUL is relative as it should work for runs of different length\(x \in [0, 0.1]\), then \(f(x) \ge 0.9\)
RUL is large near the beginning\(x \in [0.9, 1]\), then \(f(x) \le 0.1\)
RUL is small near the end\(x_1 \le x_2 \in [0, 1]\), then \(f(x_1) \ge f(x_2)\)
RUL decays over time
Dataset
For this example we will use the built-in SectionedGaussians dataset with preconfigured settings, and we will split the dataset into training, validation and test sets such that 1 unseen full waveform is reserved to test the model.
# Load and preprocess data
sections = [
{
"range": (0.0, 0.40),
"add_mean": 0.2,
"std": 0.01,
"max_splits": 1,
"split_prob": 0.7,
"mean_var": 0.6,
"std_var": 0.0,
"range_var": 0.9,
},
{
"range": (0.40, 0.75),
"add_mean": 1.8,
"std": 0.01,
"max_splits": 1,
"split_prob": 0.4,
"mean_var": 0.3,
"std_var": 0.0,
"range_var": 0.9,
},
{
"range": (0.75, 1.0),
"add_mean": 1.8,
"std": 0.01,
"max_splits": 0,
"split_prob": 0.0,
"mean_var": 0.4,
"std_var": 0.0,
"range_var": 0.9,
},
]
dataset = SectionedGaussians(
sections,
n_samples=600,
n_runs=10,
seed=seeder.roll_seed(),
blend_k=50,
)
loaders = split_dataset(
dataset,
loader_args={"batch_size": 100, "shuffle": True},
valid_loader_args={"shuffle": False},
test_loader_args={"shuffle": False},
seed=seeder.roll_seed(),
train_valid_split=0.8,
)
Network
For this example we can again use the MLPNetwork, please configure it and push it to the current device.
# Instantiate an MLPNetworkWithSoftmax, configure the parameters
network = MLPNetwork(n_inputs=1, n_outputs=1, n_hidden_layers=4, hidden_dim=500)
# Push the network to the current device
network = network.to(device)
Descriptor
The current descriptor setup is a little more complicated as the dataset and the problem are also more complex.
Our SectionedGaussians dataset returns a dictionary containing the required input and target keys, as well as an additional context key.
Each key contains data:
inputholds the actual signal data to train the model oncontextholds the time information, a precomputed energy derived from the signal as well as the run_ids which indicate the waveform identifiertargetholds a linear decreasing target score from 1 to 0
When training using constraints we do not use a loss, and as such the target data is not used. The constaints take over the responsability of guiding the network to a compliant solution.
When training without constraints we use an MSE loss between the predictions and this target output. The context data is not used in this case.
Please set up the descriptor and assign tags to the data:
inputholds in column 0 the signal datacontextholds in column 0, 1, 2 the time, energy and run_id data respectivelytargetholds in column 0 the target data
# Instantiate descriptor
descriptor = Descriptor()
descriptor.add("input", "signal", 0, constant=True)
descriptor.add("context", "time", 0, constant=True)
descriptor.add("context", "energy", 1, constant=True)
descriptor.add("context", "run_id", 2, constant=True)
descriptor.add("output", "score", 0)
Constraints
With the help of the descriptor, we can easily reference certain parts of the neural network, and so we can now define our constraints.
We have numerous pre-defined constraints available that allow a variety of options.
In this example, we want to build a network that can predict a health score between 1 and 0, indicating the remaining useful life (RUL) on a synthetic dataset.
The objectives:
\(x \in [0, 1]\), then \(f(x) \ge 0\)
RUL is non-negative\(x \in [0, 1]\), then \(f(x) \le 1\)
RUL is relative as it should work for runs of different length\(x \in [0, 0.1]\), then \(f(x) \ge 0.9\)
RUL is large near the beginning\(x \in [0.9, 1]\), then \(f(x) \le 0.1\)
RUL is small near the end\(x_1 \le x_2 \in [0, 1]\), then \(f(x_1) \ge f(x_2)\)
RUL decays over time
# Constraints definition
Constraint.descriptor = descriptor
Constraint.device = device
constraints = [
ScalarConstraint("score", "<=", 1.05, rescale_factor=2.5),
ScalarConstraint("score", ">=", -0.05, rescale_factor=2.5),
ImplicationConstraint(
head=ScalarConstraint("time", "<=", 0.1),
body=ScalarConstraint("score", ">=", 0.95, rescale_factor=2.0),
),
ImplicationConstraint(
head=ScalarConstraint("time", ">=", 0.9),
body=ScalarConstraint("score", "<=", 0.05, rescale_factor=2.0),
),
PerGroupMonotonicityConstraint(
base=RankedMonotonicityConstraint(
"score",
"time",
direction="descending",
rescale_factor_lower=1.50,
rescale_factor_upper=1.75,
),
tag_group="run_id"
),
ImplicationConstraint(
head=ScalarConstraint("time", "<=", 0.9),
body=BinaryConstraint("score", ">=", "energy", rescale_factor=1.25),
),
]
/usr/local/lib/python3.12/dist-packages/congrads/constraints/base.py:216: UserWarning: Rescale factor for constraint score monotonically (ranked) descending by time is <= 1. The network will favor general loss over the constraint-adjusted loss. Is this intended behavior? Normally, the rescale factor should always be larger than 1.
super().__init__({tag_prediction}, name, enforce, 1.0)
/usr/local/lib/python3.12/dist-packages/congrads/constraints/registry.py:895: UserWarning: Rescale factor for constraint score for each run_id monotonically (ranked) descending by time is <= 1. The network will favor general loss over the constraint-adjusted loss. Is this intended behavior? Normally, the rescale factor should always be larger than 1.
super().__init__(base.tags, name, base.enforce, base.rescale_factor)
Loss and optimizer
For this example, we do not need to use a loss function. The constraints will guide the network to a compliant solution. We therefore use the built-in ZeroLoss function as criterion, which returns 0 as loss effectively disabling it.
We will stick to the Adam optimizer for this example.
# Instantiate loss criterion
criterion = ZeroLoss()
# Instantiate optimizer
optimizer = Adam(network.parameters(), lr=0.001)
Metric manager
To allow keeping track of constraint satisfaction rates for each individual constraints, as well as the losses and possibly other metrics, we instantiate a metric manager.
# Initialize metric manager
metric_manager = MetricManager()
Core
The CongradsCore is the brain of the toolbox. It orchestrates the functionality of all previously created objects, integrating descriptors, constraints, and optimization strategies to perform constraint-guided gradient descent. Essentially, it manages the full training or evaluation pipeline: preparing input and output tensors, applying constraints, computing gradients, updating model parameters, and generating predictions in a coordinated manner.
First, we define callback that handles plotting per epoch.
Refer to the Congrads documentation for more info.
callback_manager = CallbackManager()
class PlottingCallback(Callback):
def on_epoch_end(self, data, ctx):
epoch = data["epoch"]
if epoch % 10 == 0:
clear_output(wait=True)
print(f"Epoch: {epoch}")
plot_regression_epoch(descriptor, network, loaders, device)
plt.show()
plt.close()
callback_manager.add(PlottingCallback())
<CallbackManager callbacks=['PlottingCallback'] ctx_keys=[]>
# Instantiate core
core = CongradsCore(
descriptor=descriptor,
constraints=constraints,
dataloader_train=loaders[0],
dataloader_valid=loaders[1],
dataloader_test=loaders[2],
network=network,
criterion=criterion,
optimizer=optimizer,
metric_manager=metric_manager,
callback_manager=callback_manager,
device=device,
enforce_all=True,
disable_progress_bar_batch=True,
disable_progress_bar_epoch=True
)
Finally, we can start training by running the core.fit(...) function. This function allows setting the maximum epochs and callback functions and will start the training process.
# Start training
core.fit(max_epochs=100)
Epoch: 90