revive.conf

base_config

global_seed

Set the random number seed for the experiment.

type:

int

abbreviation:

gs

default:

42

name:

global_seed

val_split_ratio

Ratio to split validate dataset if it is not explicitly given.

type:

float

abbreviation:

vsr

default:

0.5

name:

val_split_ratio

val_split_mode

Mode of auto splitting training and validation dataset, choose from outside_traj and inside_traj. outside_traj means the split is happened outside the trajectories, one trajectory can only be in one dataset. inside_traj means the split is happened inside the trajectories, former part of one trajectory is in training set, later part is in validation set.

type:

str

abbreviation:

vsm

default:

outside_traj

name:

val_split_mode

ignore_check

Flag to ignore data related check, force training.

type:

bool

abbreviation:

igc

default:

False

name:

ignore_check

data_workers

Number of workers to data loader. Setting a larger value can accelerate data loading, but it can lead to resource consumption.

type:

int

abbreviation:

dw

default:

2

name:

data_workers

use_time_step_embed

Flag to use positional embedding for time step

type:

bool

abbreviation:

utse

default:

True

name:

use_time_step_embed

time_step_embed_size

embedding size of positional embedding for time step

type:

int

abbreviation:

tses

default:

64

name:

time_step_embed_size

use_traj_id_embed

Flag to use binary embedding for trajetory id

type:

bool

abbreviation:

utie

default:

True

name:

use_traj_id_embed

pre_horzion

How many steps of data in the configuration trajectory are used for preprocessing operations.

type:

int

abbreviation:

ph

default:

0

name:

pre_horzion

venv_rollout_horizon

Length of sampled trajectory, validate only if the algorithm works on sequential data.

type:

int

abbreviation:

vrh

default:

100

name:

venv_rollout_horizon

venv_gpus_per_worker

Number of gpus per worker in venv training, small than 1 means launch multiple workers on the same gpu.

type:

float

abbreviation:

vgpw

default:

1.0

name:

venv_gpus_per_worker

venv_train_dataset_mode

Can be set to trajectory mode or transition mode.

type:

str

abbreviation:

vtdm

default:

transition

name:

venv_train_dataset_mode

venv_metric

Metric used to evaluate the trained venv, choose from nll, mae, mse, wdist.

type:

str

default:

mae

name:

venv_metric

venv_algo

Algorithm used in venv training. There are currently three algorithms to choose from, bc and revive_p.

type:

str

default:

revive_p

name:

venv_algo

rollout_plt_frequency

How many steps between two plot rollout data. 0 means disable.

type:

int

abbreviation:

rpf

default:

50

name:

rollout_plt_frequency

venv_save_frequency

How many epochs to save a model periodically. 0 means disable.

type:

int

abbreviation:

vsp

default:

0

name:

venv_save_frequency

plt_response_curve

Whether to plot response curve at the end of venv training.

type:

bool

abbreviation:

prc

default:

False

name:

plt_response_curve

rollout_dataset_mode

Select the rollout dataset. support train and validate

type:

str

default:

validate

name:

rollout_dataset_mode

venv_val_freq

How many epochs to evaluate the model periodically on validate datasset.

type:

int

abbreviation:

vvf

default:

1

name:

venv_val_freq

policy_gpus_per_worker

Number of gpus per worker in venv training, small than 1 means launch multiple workers on the same gpu.

type:

float

abbreviation:

pgpw

default:

1.0

name:

policy_gpus_per_worker

behavioral_policy_init

Whether to use the learned behavioral policy to as the initialization policy training.

type:

bool

abbreviation:

bpi

default:

True

name:

behavioral_policy_init

policy_algo

Algorithm used in policy training. There are currently two algorithms to choose from, ppo and sac.

type:

str

default:

ppo

name:

policy_algo

test_horizon

Rollout length of the venv test.

type:

int

abbreviation:

th

default:

100

name:

test_horizon

workers_per_trial

Number of workers per trail, should be set greater than 1 only if gpu per worker is all 1.0.

type:

int

abbreviation:

wpt

default:

1

name:

workers_per_trial

train_venv_trials

Number of total trails searched by the search algorithm in venv training.

type:

int

abbreviation:

tvt

default:

25

name:

train_venv_trials

train_policy_trials

Number of total trails searched by the search algorithm in policy training.

type:

int

abbreviation:

tpt

default:

10

name:

train_policy_trials

venv_algo_config

revive_p

bc_batch_size

type:

int

abbreviation:

bbs

default:

256

name:

bc_batch_size

bc_epoch

type:

int

abbreviation:

bep

default:

0

name:

bc_epoch

revive_batch_size

Batch size of training process.

type:

int

abbreviation:

mbs

default:

1024

name:

revive_batch_size

revive_epoch

Number of epcoh for the training process

type:

int

abbreviation:

mep

default:

1000

name:

revive_epoch

fintune

type:

int

abbreviation:

bet

default:

1

name:

fintune

finetune_fre

type:

int

abbreviation:

betfre

default:

1

name:

finetune_fre

policy_hidden_features

Number of neurons per layer of the policy network.

type:

int

abbreviation:

phf

default:

256

name:

policy_hidden_features

policy_hidden_layers

Depth of policy network.

type:

int

abbreviation:

phl

default:

4

name:

policy_hidden_layers

policy_backbone

Backbone of policy network. Support selecting from [mlp, res, ft_transformer, lstm, gru].

type:

str

abbreviation:

pb

default:

res

name:

policy_backbone

transition_hidden_features

Number of neurons per layer of the transition network.

type:

int

abbreviation:

thf

default:

256

name:

transition_hidden_features

transition_hidden_layers

type:

int

abbreviation:

thl

default:

4

name:

transition_hidden_layers

transition_backbone

Backbone of Transition network. Support selecting from [mlp, res, ft_transformer, lstm, gru].

type:

str

abbreviation:

tb

default:

res

name:

transition_backbone

matcher_hidden_features

Number of neurons per layer of the matcher network.

type:

int

abbreviation:

dhf

default:

256

name:

matcher_hidden_features

matcher_hidden_layers

Depth of the matcher network.

type:

int

abbreviation:

dhl

default:

4

name:

matcher_hidden_layers

g_steps

The number of update rounds of the generator in each epoch.

type:

int

default:

1

name:

g_steps

search_mode:

grid

search_values:

1, 3, 5

d_steps

Number of update rounds of matcher in each epoch.

type:

int

default:

1

name:

d_steps

search_mode:

grid

search_values:

1, 3, 5

g_lr

Initial learning rate of the generator nodes nets.

type:

float

default:

4e-05

name:

g_lr

search_mode:

continuous

search_values:

1e-06, 0.0001

d_lr

Initial learning rate of the matcher.

type:

float

default:

0.0006

name:

d_lr

search_mode:

continuous

search_values:

1e-06, 0.001

bc_weight_decay

weight_decay in bc finetune

type:

float

default:

0.0001

name:

bc_weight_decay

revive_f

revive_batch_size

Batch size of training process.

type:

int

abbreviation:

mbs

default:

1024

name:

revive_batch_size

revive_epoch

Number of epcoh for the MAIL training process

type:

int

abbreviation:

mep

default:

1500

name:

revive_epoch

policy_hidden_features

Number of neurons per layer of the policy network.

type:

int

abbreviation:

phf

default:

256

name:

policy_hidden_features

policy_hidden_layers

Depth of policy network.

type:

int

abbreviation:

phl

default:

4

name:

policy_hidden_layers

policy_backbone

Backbone of policy network.

type:

str

abbreviation:

pb

default:

res

name:

policy_backbone

transition_hidden_features

Number of neurons per layer of the transition network.

type:

int

abbreviation:

thf

default:

256

name:

transition_hidden_features

transition_hidden_layers

type:

int

abbreviation:

thl

default:

4

name:

transition_hidden_layers

transition_backbone

Backbone of Transition network.

type:

str

abbreviation:

tb

default:

res

name:

transition_backbone

matcher_hidden_features

Number of neurons per layer of the matcher network.

type:

int

abbreviation:

dhf

default:

256

name:

matcher_hidden_features

matcher_hidden_layers

Depth of the matcher network.

type:

int

abbreviation:

dhl

default:

4

name:

matcher_hidden_layers

g_steps

The number of update rounds of the generator in each epoch.

type:

int

default:

1

name:

g_steps

search_mode:

grid

search_values:

1, 3, 5

d_steps

Number of update rounds of matcher in each epoch.

type:

int

default:

1

name:

d_steps

search_mode:

grid

search_values:

1, 3, 5

g_lr

Initial learning rate of the generator nodes nets.

type:

float

default:

4e-05

name:

g_lr

search_mode:

continuous

search_values:

1e-06, 0.0001

d_lr

Initial learning rate of the matcher.

type:

float

default:

0.0006

name:

d_lr

search_mode:

continuous

search_values:

1e-06, 0.001

bc

bc_batch_size

Batch size of training process.

type:

int

abbreviation:

bbs

default:

256

name:

bc_batch_size

bc_epoch

Number of epcoh for the training process

type:

int

abbreviation:

bep

default:

500

name:

bc_epoch

policy_hidden_features

Number of neurons per layer of the policy network.

type:

int

abbreviation:

phf

default:

256

name:

policy_hidden_features

policy_hidden_layers

Depth of policy network.

type:

int

abbreviation:

phl

default:

4

name:

policy_hidden_layers

search_mode:

grid

search_values:

3, 4, 5

policy_backbone

Backbone of policy network. Support selecting from [mlp, res, ft_transformer, lstm, gru].

type:

str

abbreviation:

pb

default:

res

name:

policy_backbone

transition_hidden_features

type:

int

abbreviation:

thf

default:

256

name:

transition_hidden_features

transition_hidden_layers

type:

int

abbreviation:

thl

default:

3

name:

transition_hidden_layers

transition_backbone

Backbone of Transition network. Support selecting from [mlp, res, ft_transformer, lstm, gru].

type:

str

abbreviation:

tb

default:

res

name:

transition_backbone

g_lr

Initial learning rate of the training process.

type:

float

default:

0.0001

name:

g_lr

search_mode:

continuous

search_values:

1e-06, 0.001

loss_type

Bc support different loss function(“nll”, “mae”, “mse”).

name:

loss_type

default:

nll

type:

str

policy_algo_config

ppo

ppo_batch_size

Batch size of training process.

type:

int

abbreviation:

pbs

default:

256

name:

ppo_batch_size

policy_bc_epoch

pre-train policy with setting epoch

type:

int

default:

0

name:

policy_bc_epoch

ppo_epoch

Number of epcoh for the training process

type:

int

abbreviation:

bep

default:

1000

name:

ppo_epoch

ppo_rollout_horizon

Rollout length of the policy train.

type:

int

abbreviation:

prh

default:

100

name:

ppo_rollout_horizon

policy_hidden_features

Number of neurons per layer of the policy network.

type:

int

abbreviation:

phf

default:

256

name:

policy_hidden_features

policy_hidden_layers

Depth of policy network.

type:

int

abbreviation:

phl

default:

4

name:

policy_hidden_layers

policy_backbone

Backbone of policy network.[mlp, res, ft_transformer]

type:

str

abbreviation:

pb

default:

res

name:

policy_backbone

g_lr

Initial learning rate of the training process.

type:

float

default:

4e-05

name:

g_lr

search_mode:

continuous

search_values:

1e-06, 0.001

sac

sac_batch_size

Batch size of training process.

type:

int

abbreviation:

pbs

default:

1024

name:

sac_batch_size

policy_bc_epoch

pre-train policy with setting epoch

type:

int

default:

0

name:

policy_bc_epoch

sac_epoch

Number of epcoh for the training process.

type:

int

abbreviation:

bep

default:

1000

name:

sac_epoch

sac_steps_per_epoch

The number of update rounds of sac in each epoch.

type:

int

abbreviation:

sspe

default:

200

name:

sac_steps_per_epoch

sac_rollout_horizon

type:

int

abbreviation:

srh

default:

20

name:

sac_rollout_horizon

policy_hidden_features

Number of neurons per layer of the policy network.

type:

int

abbreviation:

phf

default:

256

name:

policy_hidden_features

policy_hidden_layers

Depth of policy network.

type:

int

abbreviation:

phl

default:

4

name:

policy_hidden_layers

policy_backbone

Backbone of policy network. [mlp, res, ft_transformer]

type:

str

abbreviation:

pb

default:

res

name:

policy_backbone

policy_hidden_activation

hidden_activation of policy network.

type:

str

abbreviation:

pha

default:

leakyrelu

name:

policy_hidden_activation

buffer_size

Size of the buffer to store data.

type:

int

abbreviation:

bfs

default:

1000000.0

name:

buffer_size

g_lr

Initial learning rate of the training process.

type:

float

default:

4e-05

name:

g_lr

search_mode:

continuous

search_values:

1e-06, 0.001