revive.conf

base_config

global_seed

Set the random number seed for the experiment.

type

int

abbreviation

gs

default

42

name

global_seed

val_split_ratio

Ratio to split validate dataset if it is not explicitly given.

type

float

abbreviation

vsr

default

0.5

name

val_split_ratio

val_split_mode

Mode of auto splitting training and validation dataset, choose from outside_traj and inside_traj. outside_traj means the split is happened outside the trajectories, one trajectory can only be in one dataset. inside_traj means the split is happened inside the trajectories, former part of one trajectory is in training set, later part is in validation set.

type

str

abbreviation

vsm

default

outside_traj

name

val_split_mode

ignore_check

Flag to ignore data related check, force training.

type

bool

abbreviation

igc

default

False

name

ignore_check

venv_rollout_horizon

Length of sampled trajectory, validate only if the algorithm works on sequential data.

type

int

abbreviation

vrh

default

100

name

venv_rollout_horizon

venv_gpus_per_worker

Number of gpus per worker in venv training, small than 1 means launch multiple workers on the same gpu.

type

float

abbreviation

vgpw

default

1.0

name

venv_gpus_per_worker

venv_metric

Metric used to evaluate the trained venv, choose from nll, mae, mse, wdist.

type

str

default

mae

name

venv_metric

venv_algo

Algorithm used in venv training. There are currently three algorithms to choose from, bc and revive_p.

type

str

default

revive_p

name

venv_algo

rollout_plt_frequency

How many steps between two plot rollout data. 0 means disable.

type

int

abbreviation

rpf

default

50

name

rollout_plt_frequency

rollout_dataset_mode

Select the rollout dataset. support train and validate

type

str

default

validate

name

rollout_dataset_mode

policy_gpus_per_worker

Number of gpus per worker in venv training, small than 1 means launch multiple workers on the same gpu.

type

float

abbreviation

pgpw

default

1.0

name

policy_gpus_per_worker

behavioral_policy_init

Whether to use the learned behavioral policy to as the initialization policy training.

type

bool

abbreviation

bpi

default

True

name

behavioral_policy_init

policy_algo

Algorithm used in policy training. There are currently two algorithms to choose from, ppo and sac.

type

str

default

ppo

name

policy_algo

test_horizon

Rollout length of the venv test.

type

int

abbreviation

th

default

100

name

test_horizon

train_venv_trials

Number of total trails searched by the search algorithm in venv training.

type

int

abbreviation

tvt

default

25

name

train_venv_trials

train_policy_trials

Number of total trails searched by the search algorithm in policy training.

type

int

abbreviation

tpt

default

10

name

train_policy_trials

venv_algo_config

revive_p

revive_batch_size

Batch size of training process.

type

int

abbreviation

mbs

default

1024

name

revive_batch_size

revive_epoch

Number of epcoh for the training process

type

int

abbreviation

mep

default

5000

name

revive_epoch

fintune

type

int

abbreviation

bet

default

1

name

fintune

finetune_fre

type

int

abbreviation

betfre

default

1

name

finetune_fre

policy_hidden_features

Number of neurons per layer of the policy network.

type

int

abbreviation

phf

default

256

name

policy_hidden_features

policy_hidden_layers

Depth of policy network.

type

int

abbreviation

phl

default

4

name

policy_hidden_layers

policy_backbone

Backbone of policy network.

type

str

abbreviation

pb

default

res

name

policy_backbone

transition_hidden_features

Number of neurons per layer of the transition network.

type

int

abbreviation

thf

default

256

name

transition_hidden_features

transition_hidden_layers

type

int

abbreviation

thl

default

4

name

transition_hidden_layers

transition_backbone

Backbone of Transition network.

type

str

abbreviation

tb

default

res

name

transition_backbone

matcher_hidden_features

Number of neurons per layer of the matcher network.

type

int

abbreviation

dhf

default

256

name

matcher_hidden_features

matcher_hidden_layers

Depth of the matcher network.

type

int

abbreviation

dhl

default

4

name

matcher_hidden_layers

g_steps

The number of update rounds of the generator in each epoch.

type

int

default

1

name

g_steps

search_mode

grid

search_values

1, 3, 5

d_steps

Number of update rounds of matcher in each epoch.

type

int

default

1

name

d_steps

search_mode

grid

search_values

1, 3, 5

g_lr

Initial learning rate of the generator.

type

float

default

4e-05

name

g_lr

search_mode

continuous

search_values

1e-06, 0.0001

d_lr

Initial learning rate of the matcher.

type

float

default

0.0006

name

d_lr

search_mode

continuous

search_values

1e-06, 0.001

bc

bc_batch_size

Batch size of training process.

type

int

abbreviation

bbs

default

256

name

bc_batch_size

bc_epoch

Number of epcoh for the training process

type

int

abbreviation

bep

default

500

name

bc_epoch

policy_hidden_features

Number of neurons per layer of the policy network.

type

int

abbreviation

phf

default

256

name

policy_hidden_features

policy_hidden_layers

Depth of policy network.

type

int

abbreviation

phl

default

4

name

policy_hidden_layers

search_mode

grid

search_values

3, 4, 5

policy_backbone

Backbone of policy network.

type

str

abbreviation

pb

default

res

name

policy_backbone

search_mode

grid

search_values

mlp, res

g_lr

Initial learning rate of the training process.

type

float

default

0.0001

name

g_lr

search_mode

continuous

search_values

1e-06, 0.001

loss_type

Bc support different loss function(“log_prob”, “mae”, “mse”).

name

loss_type

default

log_prob

type

str

policy_algo_config

ppo

ppo_batch_size

Batch size of training process.

type

int

abbreviation

pbs

default

256

name

ppo_batch_size

ppo_epoch

Number of epcoh for the training process

type

int

abbreviation

bep

default

200

name

ppo_epoch

ppo_rollout_horizon

Rollout length of the policy train.

type

int

abbreviation

prh

default

100

name

ppo_rollout_horizon

policy_hidden_features

Number of neurons per layer of the policy network.

type

int

abbreviation

phf

default

256

name

policy_hidden_features

policy_hidden_layers

Depth of policy network.

type

int

abbreviation

phl

default

4

name

policy_hidden_layers

policy_backbone

Backbone of policy network.

type

str

abbreviation

pb

default

mlp

name

policy_backbone

g_lr

Initial learning rate of the training process.

type

float

default

4e-05

name

g_lr

search_mode

continuous

search_values

1e-06, 0.001