Introduction to Multi-Discriminator

Sometimes in a decision flow, we hope that different nodes can be optimized independently based on the scores given by different discriminators. In this case, we need to use the multi-discriminator function.

For example, in the Controlling Mujoco-HalfCheetah using REVIVE SDK task, We hope that the action, delta_x and next_obs nodes are scored by different discriminators, and each node optimizes its own goal independently. Let’s use this example to show how to enable this function during training.

To enable this function, we need to do the following configuration in config.json:

{
   ...
   "venv_algo_config": {
       "revive_p": [
           {
               "name": "matching_nodes",
               "type": "list",
               "default": [["obs", "action"], ["obs", "action", "delta_x"], ["obs", "action", "next_obs"]]
           },
           {
               "name": "matching_fit_nodes",
               "type": "list",
               "default": [["action"], ["delta_x"], ["next_obs"]]
           },
           ...
       ],
       ...
   },
   ...
}

“matching_nodes” represents the input of each discriminator; “matching_fit_nodes” represents the nodes that the discriminator needs to score, and it is also the node that the generator corresponding to this discriminator needs to update.

It should be noted that the lengths of “matching_nodes” and “matching_fit_nodes” need to be equal, and the order should be consistent, for example: [“obs”, “action”] as input to the discriminator of the action node, this discriminator is only responsible for the optimization objective of the action node. [“obs”, “action”, “delta_x”] as input to the discriminator of the delta_x node, this discriminator is only responsible for the optimization objective of the delta_x node. [“obs”, “action”, “next_obs”] as input to the discriminator of the next_obs node, this discriminator is only responsible for the optimization objective of the next_obs node.

Note

By default, a decision flow graph has only one discriminator.