revive.conf¶
base_config¶
global_seed¶
Set the random number seed for the experiment.
- type
int
- abbreviation
gs
- default
42
- name
global_seed
val_split_ratio¶
Ratio to split validate dataset if it is not explicitly given.
- type
float
- abbreviation
vsr
- default
0.5
- name
val_split_ratio
val_split_mode¶
Mode of auto splitting training and validation dataset, choose from outside_traj and inside_traj. outside_traj means the split is happened outside the trajectories, one trajectory can only be in one dataset. inside_traj means the split is happened inside the trajectories, former part of one trajectory is in training set, later part is in validation set.
- type
str
- abbreviation
vsm
- default
outside_traj
- name
val_split_mode
ignore_check¶
Flag to ignore data related check, force training.
- type
bool
- abbreviation
igc
- default
False
- name
ignore_check
venv_rollout_horizon¶
Length of sampled trajectory, validate only if the algorithm works on sequential data.
- type
int
- abbreviation
vrh
- default
100
- name
venv_rollout_horizon
venv_gpus_per_worker¶
Number of gpus per worker in venv training, small than 1 means launch multiple workers on the same gpu.
- type
float
- abbreviation
vgpw
- default
1.0
- name
venv_gpus_per_worker
venv_metric¶
Metric used to evaluate the trained venv, choose from nll, mae, mse, wdist.
- type
str
- default
mae
- name
venv_metric
venv_algo¶
Algorithm used in venv training. There are currently three algorithms to choose from, bc and revive_p.
- type
str
- default
revive_p
- name
venv_algo
rollout_plt_frequency¶
How many steps between two plot rollout data. 0 means disable.
- type
int
- abbreviation
rpf
- default
50
- name
rollout_plt_frequency
rollout_dataset_mode¶
Select the rollout dataset. support train and validate
- type
str
- default
validate
- name
rollout_dataset_mode
policy_gpus_per_worker¶
Number of gpus per worker in venv training, small than 1 means launch multiple workers on the same gpu.
- type
float
- abbreviation
pgpw
- default
1.0
- name
policy_gpus_per_worker
behavioral_policy_init¶
Whether to use the learned behavioral policy to as the initialization policy training.
- type
bool
- abbreviation
bpi
- default
True
- name
behavioral_policy_init
policy_algo¶
Algorithm used in policy training. There are currently two algorithms to choose from, ppo and sac.
- type
str
- default
ppo
- name
policy_algo
test_horizon¶
Rollout length of the venv test.
- type
int
- abbreviation
th
- default
100
- name
test_horizon
train_venv_trials¶
Number of total trails searched by the search algorithm in venv training.
- type
int
- abbreviation
tvt
- default
25
- name
train_venv_trials
train_policy_trials¶
Number of total trails searched by the search algorithm in policy training.
- type
int
- abbreviation
tpt
- default
10
- name
train_policy_trials
venv_algo_config¶
revive_p¶
revive_batch_size¶
Batch size of training process.
- type
int
- abbreviation
mbs
- default
1024
- name
revive_batch_size
revive_epoch¶
Number of epcoh for the training process
- type
int
- abbreviation
mep
- default
5000
- name
revive_epoch
fintune¶
- type
int
- abbreviation
bet
- default
1
- name
fintune
finetune_fre¶
- type
int
- abbreviation
betfre
- default
1
- name
finetune_fre
policy_backbone¶
Backbone of policy network.
- type
str
- abbreviation
pb
- default
res
- name
policy_backbone
transition_backbone¶
Backbone of Transition network.
- type
str
- abbreviation
tb
- default
res
- name
transition_backbone
g_steps¶
The number of update rounds of the generator in each epoch.
- type
int
- default
1
- name
g_steps
- search_mode
grid
- search_values
1
,3
,5
d_steps¶
Number of update rounds of matcher in each epoch.
- type
int
- default
1
- name
d_steps
- search_mode
grid
- search_values
1
,3
,5
g_lr¶
Initial learning rate of the generator.
- type
float
- default
4e-05
- name
g_lr
- search_mode
continuous
- search_values
1e-06
,0.0001
d_lr¶
Initial learning rate of the matcher.
- type
float
- default
0.0006
- name
d_lr
- search_mode
continuous
- search_values
1e-06
,0.001
bc¶
bc_batch_size¶
Batch size of training process.
- type
int
- abbreviation
bbs
- default
256
- name
bc_batch_size
bc_epoch¶
Number of epcoh for the training process
- type
int
- abbreviation
bep
- default
500
- name
bc_epoch
policy_hidden_features¶
Number of neurons per layer of the policy network.
- type
int
- abbreviation
phf
- default
256
- name
policy_hidden_features
policy_hidden_layers¶
Depth of policy network.
- type
int
- abbreviation
phl
- default
4
- name
policy_hidden_layers
- search_mode
grid
- search_values
3
,4
,5
policy_backbone¶
Backbone of policy network.
- type
str
- abbreviation
pb
- default
res
- name
policy_backbone
- search_mode
grid
- search_values
mlp
,res
g_lr¶
Initial learning rate of the training process.
- type
float
- default
0.0001
- name
g_lr
- search_mode
continuous
- search_values
1e-06
,0.001
loss_type¶
Bc support different loss function(“log_prob”, “mae”, “mse”).
- name
loss_type
- default
log_prob
- type
str
policy_algo_config¶
ppo¶
ppo_batch_size¶
Batch size of training process.
- type
int
- abbreviation
pbs
- default
256
- name
ppo_batch_size
ppo_epoch¶
Number of epcoh for the training process
- type
int
- abbreviation
bep
- default
200
- name
ppo_epoch
ppo_rollout_horizon¶
Rollout length of the policy train.
- type
int
- abbreviation
prh
- default
100
- name
ppo_rollout_horizon
policy_hidden_features¶
Number of neurons per layer of the policy network.
- type
int
- abbreviation
phf
- default
256
- name
policy_hidden_features
policy_hidden_layers¶
Depth of policy network.
- type
int
- abbreviation
phl
- default
4
- name
policy_hidden_layers
policy_backbone¶
Backbone of policy network.
- type
str
- abbreviation
pb
- default
mlp
- name
policy_backbone
g_lr¶
Initial learning rate of the training process.
- type
float
- default
4e-05
- name
g_lr
- search_mode
continuous
- search_values
1e-06
,0.001