revive.conf¶

base_config¶

global_seed¶

Set the random number seed for the experiment.

type:: int
abbreviation:: gs
default:: 42
name:: global_seed

val_split_ratio¶

Ratio to split validate dataset if it is not explicitly given.

type:: float
abbreviation:: vsr
default:: 0.5
name:: val_split_ratio

val_split_mode¶

Mode of auto splitting training and validation dataset, choose from outside_traj and inside_traj. outside_traj means the split is happened outside the trajectories, one trajectory can only be in one dataset. inside_traj means the split is happened inside the trajectories, former part of one trajectory is in training set, later part is in validation set.

type:: str
abbreviation:: vsm
default:: outside_traj
name:: val_split_mode

ignore_check¶

Flag to ignore data related check, force training.

type:: bool
abbreviation:: igc
default:: False
name:: ignore_check

data_workers¶

Number of workers to data loader. Setting a larger value can accelerate data loading, but it can lead to resource consumption.

type:: int
abbreviation:: dw
default:: 2
name:: data_workers

use_time_step_embed¶

Flag to use positional embedding for time step

type:: bool
abbreviation:: utse
default:: True
name:: use_time_step_embed

time_step_embed_size¶

embedding size of positional embedding for time step

type:: int
abbreviation:: tses
default:: 64
name:: time_step_embed_size

use_traj_id_embed¶

Flag to use binary embedding for trajetory id

type:: bool
abbreviation:: utie
default:: True
name:: use_traj_id_embed

pre_horzion¶

How many steps of data in the configuration trajectory are used for preprocessing operations.

type:: int
abbreviation:: ph
default:: 0
name:: pre_horzion

venv_rollout_horizon¶

Length of sampled trajectory, validate only if the algorithm works on sequential data.

type:: int
abbreviation:: vrh
default:: 100
name:: venv_rollout_horizon

venv_gpus_per_worker¶

Number of gpus per worker in venv training, small than 1 means launch multiple workers on the same gpu.

type:: float
abbreviation:: vgpw
default:: 1.0
name:: venv_gpus_per_worker

venv_train_dataset_mode¶

Can be set to trajectory mode or transition mode.

type:: str
abbreviation:: vtdm
default:: transition
name:: venv_train_dataset_mode

venv_metric¶

Metric used to evaluate the trained venv, choose from nll, mae, mse, wdist.

type:: str
default:: mae
name:: venv_metric

venv_algo¶

Algorithm used in venv training. There are currently three algorithms to choose from, bc and revive_p.

type:: str
default:: revive_p
name:: venv_algo

rollout_plt_frequency¶

How many steps between two plot rollout data. 0 means disable.

type:: int
abbreviation:: rpf
default:: 50
name:: rollout_plt_frequency

venv_save_frequency¶

How many epochs to save a model periodically. 0 means disable.

type:: int
abbreviation:: vsp
default:: 0
name:: venv_save_frequency

plt_response_curve¶

Whether to plot response curve at the end of venv training.

type:: bool
abbreviation:: prc
default:: False
name:: plt_response_curve

rollout_dataset_mode¶

Select the rollout dataset. support train and validate

type:: str
default:: validate
name:: rollout_dataset_mode

venv_val_freq¶

How many epochs to evaluate the model periodically on validate datasset.

type:: int
abbreviation:: vvf
default:: 1
name:: venv_val_freq

policy_gpus_per_worker¶

Number of gpus per worker in venv training, small than 1 means launch multiple workers on the same gpu.

type:: float
abbreviation:: pgpw
default:: 1.0
name:: policy_gpus_per_worker

behavioral_policy_init¶

Whether to use the learned behavioral policy to as the initialization policy training.

type:: bool
abbreviation:: bpi
default:: True
name:: behavioral_policy_init

policy_algo¶

Algorithm used in policy training. There are currently two algorithms to choose from, ppo and sac.

type:: str
default:: ppo
name:: policy_algo

test_horizon¶

Rollout length of the venv test.

type:: int
abbreviation:: th
default:: 100
name:: test_horizon

workers_per_trial¶

Number of workers per trail, should be set greater than 1 only if gpu per worker is all 1.0.

type:: int
abbreviation:: wpt
default:: 1
name:: workers_per_trial

train_venv_trials¶

Number of total trails searched by the search algorithm in venv training.

type:: int
abbreviation:: tvt
default:: 25
name:: train_venv_trials

train_policy_trials¶

Number of total trails searched by the search algorithm in policy training.

type:: int
abbreviation:: tpt
default:: 10
name:: train_policy_trials

venv_algo_config¶

revive_p¶

bc_batch_size¶

type:: int
abbreviation:: bbs
default:: 256
name:: bc_batch_size

bc_epoch¶

type:: int
abbreviation:: bep
default:: 0
name:: bc_epoch

revive_batch_size¶

Batch size of training process.

type:: int
abbreviation:: mbs
default:: 1024
name:: revive_batch_size

revive_epoch¶

Number of epcoh for the training process

type:: int
abbreviation:: mep
default:: 1000
name:: revive_epoch

fintune¶

type:: int
abbreviation:: bet
default:: 1
name:: fintune

finetune_fre¶

type:: int
abbreviation:: betfre
default:: 1
name:: finetune_fre

policy_hidden_features¶

Number of neurons per layer of the policy network.

type:: int
abbreviation:: phf
default:: 256
name:: policy_hidden_features

policy_hidden_layers¶

Depth of policy network.

type:: int
abbreviation:: phl
default:: 4
name:: policy_hidden_layers

policy_backbone¶

Backbone of policy network. Support selecting from [mlp, res, ft_transformer, lstm, gru].

type:: str
abbreviation:: pb
default:: res
name:: policy_backbone

transition_hidden_features¶

Number of neurons per layer of the transition network.

type:: int
abbreviation:: thf
default:: 256
name:: transition_hidden_features

transition_hidden_layers¶

type:: int
abbreviation:: thl
default:: 4
name:: transition_hidden_layers

transition_backbone¶

Backbone of Transition network. Support selecting from [mlp, res, ft_transformer, lstm, gru].

type:: str
abbreviation:: tb
default:: res
name:: transition_backbone

matcher_hidden_features¶

Number of neurons per layer of the matcher network.

type:: int
abbreviation:: dhf
default:: 256
name:: matcher_hidden_features

matcher_hidden_layers¶

Depth of the matcher network.

type:: int
abbreviation:: dhl
default:: 4
name:: matcher_hidden_layers

g_steps¶

The number of update rounds of the generator in each epoch.

type:: int
default:: 1
name:: g_steps
search_mode:: grid
search_values:: 1, 3, 5

d_steps¶

Number of update rounds of matcher in each epoch.

type:: int
default:: 1
name:: d_steps
search_mode:: grid
search_values:: 1, 3, 5

g_lr¶

Initial learning rate of the generator nodes nets.

type:: float
default:: 4e-05
name:: g_lr
search_mode:: continuous
search_values:: 1e-06, 0.0001

d_lr¶

Initial learning rate of the matcher.

type:: float
default:: 0.0006
name:: d_lr
search_mode:: continuous
search_values:: 1e-06, 0.001

bc_weight_decay¶

weight_decay in bc finetune

type:: float
default:: 0.0001
name:: bc_weight_decay

revive_f¶

revive_batch_size¶

Batch size of training process.

type:: int
abbreviation:: mbs
default:: 1024
name:: revive_batch_size

revive_epoch¶

Number of epcoh for the MAIL training process

type:: int
abbreviation:: mep
default:: 1500
name:: revive_epoch

policy_hidden_features¶

Number of neurons per layer of the policy network.

type:: int
abbreviation:: phf
default:: 256
name:: policy_hidden_features

policy_hidden_layers¶

Depth of policy network.

type:: int
abbreviation:: phl
default:: 4
name:: policy_hidden_layers

policy_backbone¶

Backbone of policy network.

type:: str
abbreviation:: pb
default:: res
name:: policy_backbone

transition_hidden_features¶

Number of neurons per layer of the transition network.

type:: int
abbreviation:: thf
default:: 256
name:: transition_hidden_features

transition_hidden_layers¶

type:: int
abbreviation:: thl
default:: 4
name:: transition_hidden_layers

transition_backbone¶

Backbone of Transition network.

type:: str
abbreviation:: tb
default:: res
name:: transition_backbone

matcher_hidden_features¶

Number of neurons per layer of the matcher network.

type:: int
abbreviation:: dhf
default:: 256
name:: matcher_hidden_features

matcher_hidden_layers¶

Depth of the matcher network.

type:: int
abbreviation:: dhl
default:: 4
name:: matcher_hidden_layers

g_steps¶

The number of update rounds of the generator in each epoch.

type:: int
default:: 1
name:: g_steps
search_mode:: grid
search_values:: 1, 3, 5

d_steps¶

Number of update rounds of matcher in each epoch.

type:: int
default:: 1
name:: d_steps
search_mode:: grid
search_values:: 1, 3, 5

g_lr¶

Initial learning rate of the generator nodes nets.

type:: float
default:: 4e-05
name:: g_lr
search_mode:: continuous
search_values:: 1e-06, 0.0001

d_lr¶

Initial learning rate of the matcher.

type:: float
default:: 0.0006
name:: d_lr
search_mode:: continuous
search_values:: 1e-06, 0.001

bc¶

bc_batch_size¶

Batch size of training process.

type:: int
abbreviation:: bbs
default:: 256
name:: bc_batch_size

bc_epoch¶

Number of epcoh for the training process

type:: int
abbreviation:: bep
default:: 500
name:: bc_epoch

policy_hidden_features¶

Number of neurons per layer of the policy network.

type:: int
abbreviation:: phf
default:: 256
name:: policy_hidden_features

policy_hidden_layers¶

Depth of policy network.

type:: int
abbreviation:: phl
default:: 4
name:: policy_hidden_layers
search_mode:: grid
search_values:: 3, 4, 5

policy_backbone¶

Backbone of policy network. Support selecting from [mlp, res, ft_transformer, lstm, gru].

type:: str
abbreviation:: pb
default:: res
name:: policy_backbone

transition_hidden_features¶

type:: int
abbreviation:: thf
default:: 256
name:: transition_hidden_features

transition_hidden_layers¶

type:: int
abbreviation:: thl
default:: 3
name:: transition_hidden_layers

transition_backbone¶

Backbone of Transition network. Support selecting from [mlp, res, ft_transformer, lstm, gru].

type:: str
abbreviation:: tb
default:: res
name:: transition_backbone

g_lr¶

Initial learning rate of the training process.

type:: float
default:: 0.0001
name:: g_lr
search_mode:: continuous
search_values:: 1e-06, 0.001

loss_type¶

Bc support different loss function(“nll”, “mae”, “mse”).

name:: loss_type
default:: nll
type:: str

policy_algo_config¶

ppo¶

ppo_batch_size¶

Batch size of training process.

type:: int
abbreviation:: pbs
default:: 256
name:: ppo_batch_size

policy_bc_epoch¶

pre-train policy with setting epoch

type:: int
default:: 0
name:: policy_bc_epoch

ppo_epoch¶

Number of epcoh for the training process

type:: int
abbreviation:: bep
default:: 1000
name:: ppo_epoch

ppo_rollout_horizon¶

Rollout length of the policy train.

type:: int
abbreviation:: prh
default:: 100
name:: ppo_rollout_horizon

policy_hidden_features¶

Number of neurons per layer of the policy network.

type:: int
abbreviation:: phf
default:: 256
name:: policy_hidden_features

policy_hidden_layers¶

Depth of policy network.

type:: int
abbreviation:: phl
default:: 4
name:: policy_hidden_layers

policy_backbone¶

Backbone of policy network.[mlp, res, ft_transformer]

type:: str
abbreviation:: pb
default:: res
name:: policy_backbone

g_lr¶

Initial learning rate of the training process.

type:: float
default:: 4e-05
name:: g_lr
search_mode:: continuous
search_values:: 1e-06, 0.001

sac¶

sac_batch_size¶

Batch size of training process.

type:: int
abbreviation:: pbs
default:: 1024
name:: sac_batch_size

policy_bc_epoch¶

pre-train policy with setting epoch

type:: int
default:: 0
name:: policy_bc_epoch

sac_epoch¶

Number of epcoh for the training process.

type:: int
abbreviation:: bep
default:: 1000
name:: sac_epoch

sac_steps_per_epoch¶

The number of update rounds of sac in each epoch.

type:: int
abbreviation:: sspe
default:: 200
name:: sac_steps_per_epoch

sac_rollout_horizon¶

type:: int
abbreviation:: srh
default:: 20
name:: sac_rollout_horizon

policy_hidden_features¶

Number of neurons per layer of the policy network.

type:: int
abbreviation:: phf
default:: 256
name:: policy_hidden_features

policy_hidden_layers¶

Depth of policy network.

type:: int
abbreviation:: phl
default:: 4
name:: policy_hidden_layers

policy_backbone¶

Backbone of policy network. [mlp, res, ft_transformer]

type:: str
abbreviation:: pb
default:: res
name:: policy_backbone

policy_hidden_activation¶

hidden_activation of policy network.

type:: str
abbreviation:: pha
default:: leakyrelu
name:: policy_hidden_activation

buffer_size¶

Size of the buffer to store data.

type:: int
abbreviation:: bfs
default:: 1000000.0
name:: buffer_size

g_lr¶

Initial learning rate of the training process.

type:: float
default:: 4e-05
name:: g_lr
search_mode:: continuous
search_values:: 1e-06, 0.001