Training Control Policies with Multiple Nodes

Controlling multiple nodes to achieve a goal is a common requirement in control tasks. The Revive SDK supports training control policies that contain multiple nodes through simple configuration.

For example, in autonomous driving tasks, it is necessary to simultaneously control the direction and power output of the vehicle. Both aspects affect the direction of the vehicle’s movement, so they need to be considered together. This can be achieved through the use of the Revive SDK.

The following example demonstrates how to enable this feature during training:

metadata:

   graph:
     action_1:
     - observation
     action_2:
     - action_1
     - observation
     next_observation:
     - action_1
     - action_2
     - observation

   columns:
   ...

The decision flow diagram above shows a multi-node control business, where the action_1 node and the action_2 node need to work together to complete the control task. During training, the -tpn parameter is used for multi-policy node training.

Training Command:

python train.py -df test.npz -cf test.yaml -rf test_reward.py -vm once -pm once -tpn action_1,action_2 --run_id test