Introducing expert rule constraints

Introducing expert constraints can provide professional domain knowledge support for model training and inference, avoiding the violation of constrained knowledge learned from data in the environmental model, reducing task difficulty, and improving the prediction accuracy of environmental models.

In the refrigerator temperature control task, there is an expert prior that the increase in refrigerator power will lead to a negative correlation between refrigerator temperature and power. Taking this negative correlation as an expert constraint can provide professional domain knowledge support for model training and inference, effectively avoiding the situation where the environmental model learned from data conflicts with the constrained knowledge. Introducing this constraint into the model can reduce task difficulty and improve prediction accuracy, which has high practical value in practice.

REVIVE SDK supports introducing expert constraints in the form of functions. The method of construction is similar to the reward function. The constraint function gives a reward based on whether the environmental transition violates the constraint.

Below is an example of introducing expert constraints for a refrigerator example. First, we define an expert constraint similar to the reward function:

import torch
from copy import deepcopy
from typing import Dict

# Whether to normalize the calculated reward value, default False
normalize = False
# The weight of the calculated reward value can be adjusted by modifying it directly in the function below
weight = 1.0

# Configuring matching_ Nodes can help place rules into specified matchers, or they can be left unconfigured
matching_nodes = ["temperature", "action", "next_temperature"]

# The function name should be defined as get_reward
def get_reward(data: Dict[str, torch.Tensor], graph) -> torch.Tensor:
    # Copy the original data
    noise_data = deepcopy(data)
    # Add random noise to the action node of the original data
    noise_data["action"] += torch.randn_like(noise_data["action"]) * 0.1

    # Calculate the output of the next_temperature node using the data with added noise
    node_name = "next_temperature"
    if graph.get_node(node_name).node_type == 'network':
        # Call graph to calculate the output of the network node
        node_output = graph.compute_node(node_name, noise_data).mode
    else:
        # Call graph to calculate the output of the function node
        node_output = graph.compute_node(node_name, current_batch)

    # Calculate the correlation between the changes in the action node and the next_temperature node
    correlation = ((noise_data["action"] - data["action"]) * (node_output - data[node_name]))

    # If there is a positive correlation, a reward value of -0.2 will be given, otherwise a reward value of 0 will be output
    reward = torch.where(correlation > 0, -0.2 * torch.ones_like(correlation[..., :1]), torch.zeros_like(correlation[..., :1]))

    return reward

Note that when using the expert constraint function to process data, multiple data are usually organized into batches for one-time operation processing to improve the running efficiency of the code. Therefore, when writing the reward function, it is necessary to ensure that the function can handle multidimensional data corresponding to the input tensor shape. In addition, when calculating the output of the expert constraint function, we usually pay attention to the feature dimension of the last dimension. For ease of processing, the calculation dimension of the expert function is usually set to the last dimension. Therefore, when using the data, the last dimension’s features should be obtained using slicing ([…, n:m ]), and the features should be calculated. The corresponding returned reward should be a corresponding Pytorch Tensor, and the batch dimension should remain consistent with the input data, and the feature dimension of the last dimension should be 1.

Command for introducing expert constraint functions for environmental training through the -mrf parameter

python train.py -df test_data.npz -cf test.yaml -mrf data/test_rule.py -rf test_reward.py -vm once -pm once --run_id once

See the README file in the refrigerator example for runnable examples.