revive.conf¶

base_config¶

global_seed¶

Set the random number seed for the experiment.

type: int
abbreviation: gs
default: 42
name: global_seed

val_split_ratio¶

Ratio to split validate dataset if it is not explicitly given.

type: float
abbreviation: vsr
default: 0.5
name: val_split_ratio

val_split_mode¶

Mode of auto splitting training and validation dataset, choose from outside_traj and inside_traj. outside_traj means the split is happened outside the trajectories, one trajectory can only be in one dataset. inside_traj means the split is happened inside the trajectories, former part of one trajectory is in training set, later part is in validation set.

type: str
abbreviation: vsm
default: outside_traj
name: val_split_mode

ignore_check¶

Flag to ignore data related check, force training.

type: bool
abbreviation: igc
default: False
name: ignore_check

venv_rollout_horizon¶

Length of sampled trajectory, validate only if the algorithm works on sequential data.

type: int
abbreviation: vrh
default: 100
name: venv_rollout_horizon

venv_gpus_per_worker¶

Number of gpus per worker in venv training, small than 1 means launch multiple workers on the same gpu.

type: float
abbreviation: vgpw
default: 1.0
name: venv_gpus_per_worker

venv_metric¶

Metric used to evaluate the trained venv, choose from nll, mae, mse, wdist.

type: str
default: mae
name: venv_metric

venv_algo¶

Algorithm used in venv training. There are currently three algorithms to choose from, bc and revive_p.

type: str
default: revive_p
name: venv_algo

rollout_plt_frequency¶

How many steps between two plot rollout data. 0 means disable.

type: int
abbreviation: rpf
default: 50
name: rollout_plt_frequency

rollout_dataset_mode¶

Select the rollout dataset. support train and validate

type: str
default: validate
name: rollout_dataset_mode

policy_gpus_per_worker¶

Number of gpus per worker in venv training, small than 1 means launch multiple workers on the same gpu.

type: float
abbreviation: pgpw
default: 1.0
name: policy_gpus_per_worker

behavioral_policy_init¶

Whether to use the learned behavioral policy to as the initialization policy training.

type: bool
abbreviation: bpi
default: True
name: behavioral_policy_init

policy_algo¶

Algorithm used in policy training. There are currently two algorithms to choose from, ppo and sac.

type: str
default: ppo
name: policy_algo

test_horizon¶

Rollout length of the venv test.

type: int
abbreviation: th
default: 100
name: test_horizon

train_venv_trials¶

Number of total trails searched by the search algorithm in venv training.

type: int
abbreviation: tvt
default: 25
name: train_venv_trials

train_policy_trials¶

Number of total trails searched by the search algorithm in policy training.

type: int
abbreviation: tpt
default: 10
name: train_policy_trials

venv_algo_config¶

revive_p¶

revive_batch_size¶

Batch size of training process.

type: int
abbreviation: mbs
default: 1024
name: revive_batch_size

revive_epoch¶

Number of epcoh for the training process

type: int
abbreviation: mep
default: 5000
name: revive_epoch

fintune¶

type: int
abbreviation: bet
default: 1
name: fintune

finetune_fre¶

type: int
abbreviation: betfre
default: 1
name: finetune_fre

policy_hidden_features¶

Number of neurons per layer of the policy network.

type: int
abbreviation: phf
default: 256
name: policy_hidden_features

policy_hidden_layers¶

Depth of policy network.

type: int
abbreviation: phl
default: 4
name: policy_hidden_layers

policy_backbone¶

Backbone of policy network.

type: str
abbreviation: pb
default: res
name: policy_backbone

transition_hidden_features¶

Number of neurons per layer of the transition network.

type: int
abbreviation: thf
default: 256
name: transition_hidden_features

transition_hidden_layers¶

type: int
abbreviation: thl
default: 4
name: transition_hidden_layers

transition_backbone¶

Backbone of Transition network.

type: str
abbreviation: tb
default: res
name: transition_backbone

matcher_hidden_features¶

Number of neurons per layer of the matcher network.

type: int
abbreviation: dhf
default: 256
name: matcher_hidden_features

matcher_hidden_layers¶

Depth of the matcher network.

type: int
abbreviation: dhl
default: 4
name: matcher_hidden_layers

g_steps¶

The number of update rounds of the generator in each epoch.

type: int
default: 1
name: g_steps
search_mode: grid
search_values: 1, 3, 5

d_steps¶

Number of update rounds of matcher in each epoch.

type: int
default: 1
name: d_steps
search_mode: grid
search_values: 1, 3, 5

g_lr¶

Initial learning rate of the generator.

type: float
default: 4e-05
name: g_lr
search_mode: continuous
search_values: 1e-06, 0.0001

d_lr¶

Initial learning rate of the matcher.

type: float
default: 0.0006
name: d_lr
search_mode: continuous
search_values: 1e-06, 0.001

bc¶

bc_batch_size¶

Batch size of training process.

type: int
abbreviation: bbs
default: 256
name: bc_batch_size

bc_epoch¶

Number of epcoh for the training process

type: int
abbreviation: bep
default: 500
name: bc_epoch

policy_hidden_features¶

Number of neurons per layer of the policy network.

type: int
abbreviation: phf
default: 256
name: policy_hidden_features

policy_hidden_layers¶

Depth of policy network.

type: int
abbreviation: phl
default: 4
name: policy_hidden_layers
search_mode: grid
search_values: 3, 4, 5

policy_backbone¶

Backbone of policy network.

type: str
abbreviation: pb
default: res
name: policy_backbone
search_mode: grid
search_values: mlp, res

g_lr¶

Initial learning rate of the training process.

type: float
default: 0.0001
name: g_lr
search_mode: continuous
search_values: 1e-06, 0.001

loss_type¶

Bc support different loss function(“log_prob”, “mae”, “mse”).

name: loss_type
default: log_prob
type: str

policy_algo_config¶

ppo¶

ppo_batch_size¶

Batch size of training process.

type: int
abbreviation: pbs
default: 256
name: ppo_batch_size

ppo_epoch¶

Number of epcoh for the training process

type: int
abbreviation: bep
default: 200
name: ppo_epoch

ppo_rollout_horizon¶

Rollout length of the policy train.

type: int
abbreviation: prh
default: 100
name: ppo_rollout_horizon

policy_hidden_features¶

Number of neurons per layer of the policy network.

type: int
abbreviation: phf
default: 256
name: policy_hidden_features

policy_hidden_layers¶

Depth of policy network.

type: int
abbreviation: phl
default: 4
name: policy_hidden_layers

policy_backbone¶

Backbone of policy network.

type: str
abbreviation: pb
default: mlp
name: policy_backbone

g_lr¶

Initial learning rate of the training process.

type: float
default: 4e-05
name: g_lr
search_mode: continuous
search_values: 1e-06, 0.001