Neural Network Disturber

The Neural Network Disturber is introduced as an additional feature during policy training. The core idea of this method is to leverage the outputs of multiple neural network models to enrich the environment samples involved in policy training. By introducing the outputs of these neural networks into the transition environment, it increases the diversity of the state space observed by the policy model, thereby enhancing the robustness and generalization ability of policy learning.

As shown in the figure below, before policy training begins, REVIVE initializes n neural network models and integrates their outputs with the virtual environment model (venv) to assist policy learning.

Neural Network Disturber Work Flow

The advantage of this method lies in its ability to effectively utilize pre-initiated neural network models without updating their weights, thereby reducing computational costs and training time. Furthermore, by introducing the outputs of multiple models, it can mitigate biases or local optima problems in single-model training, promoting more comprehensive learning and improvement of the policy model.

In the Hopper environment with random friction coefficients, offline data is collected in environments where the friction coefficient is set to 2, and the policy is validated in environments where the friction coefficient is randomly selected from [1.5,2.5]. As illustrated in the figure below, enabling the Neural Network Disturber feature ultimately improves policy performance.

Applying Neural Network Disturber

To enable this feature, the following configurations need to be set in config.json:

{
 ...
 "policy_algo_config": {
     "ppo"/"sac": [
         ...
         {
             "name": "disturbing_transition_function",
             "description": "Disturbing the network node in policy learning",
             "type": "bool",
             "default": false
         },
         {
             "name": "disturbing_nodes",
             "description": "Disturbing the network node in policy learning",
             "type": "list",
             "default": []
         },
         {
             "name": "disturbing_net_num",
             "description": "Disturbing the network node in policy learning",
             "type": "int",
             "default": 100
         },
         {
             "name": "disturbing_weight",
             "description": "Disturbing the network node in policy learning",
             "type": "float",
             "default": 0.05
         }
         ...
     ],
     ...
 }

“disturbing_transition_function”: Indicates whether to enable this feature. “disturbing_nodes”: Specifies the nodes that the disturber needs to disturb during policy learning, default [] indicates disturbing all nodes. “disturbing_net_num”: Represents the number of randomly generated neural networks. “disturbing_weight”: Indicates the degree to which the disturbed network outputs are integrated with the virtual environment (venv) outputs.