revive.conf¶
base_config¶
global_seed¶
Set the random number seed for the experiment.
- type
int- abbreviation
gs- default
42- name
global_seed
val_split_ratio¶
Ratio to split validate dataset if it is not explicitly given.
- type
float- abbreviation
vsr- default
0.5- name
val_split_ratio
val_split_mode¶
Mode of auto splitting training and validation dataset, choose from outside_traj and inside_traj. outside_traj means the split is happened outside the trajectories, one trajectory can only be in one dataset. inside_traj means the split is happened inside the trajectories, former part of one trajectory is in training set, later part is in validation set.
- type
str- abbreviation
vsm- default
outside_traj- name
val_split_mode
ignore_check¶
Flag to ignore data related check, force training.
- type
bool- abbreviation
igc- default
False- name
ignore_check
venv_rollout_horizon¶
Length of sampled trajectory, validate only if the algorithm works on sequential data.
- type
int- abbreviation
vrh- default
100- name
venv_rollout_horizon
venv_gpus_per_worker¶
Number of gpus per worker in venv training, small than 1 means launch multiple workers on the same gpu.
- type
float- abbreviation
vgpw- default
1.0- name
venv_gpus_per_worker
venv_metric¶
Metric used to evaluate the trained venv, choose from nll, mae, mse, wdist.
- type
str- default
mae- name
venv_metric
venv_algo¶
Algorithm used in venv training. There are currently three algorithms to choose from, bc and revive_p.
- type
str- default
revive_p- name
venv_algo
rollout_plt_frequency¶
How many steps between two plot rollout data. 0 means disable.
- type
int- abbreviation
rpf- default
50- name
rollout_plt_frequency
rollout_dataset_mode¶
Select the rollout dataset. support train and validate
- type
str- default
validate- name
rollout_dataset_mode
policy_gpus_per_worker¶
Number of gpus per worker in venv training, small than 1 means launch multiple workers on the same gpu.
- type
float- abbreviation
pgpw- default
1.0- name
policy_gpus_per_worker
behavioral_policy_init¶
Whether to use the learned behavioral policy to as the initialization policy training.
- type
bool- abbreviation
bpi- default
True- name
behavioral_policy_init
policy_algo¶
Algorithm used in policy training. There are currently two algorithms to choose from, ppo and sac.
- type
str- default
ppo- name
policy_algo
test_horizon¶
Rollout length of the venv test.
- type
int- abbreviation
th- default
100- name
test_horizon
train_venv_trials¶
Number of total trails searched by the search algorithm in venv training.
- type
int- abbreviation
tvt- default
25- name
train_venv_trials
train_policy_trials¶
Number of total trails searched by the search algorithm in policy training.
- type
int- abbreviation
tpt- default
10- name
train_policy_trials
venv_algo_config¶
revive_p¶
revive_batch_size¶
Batch size of training process.
- type
int- abbreviation
mbs- default
1024- name
revive_batch_size
revive_epoch¶
Number of epcoh for the training process
- type
int- abbreviation
mep- default
5000- name
revive_epoch
fintune¶
- type
int- abbreviation
bet- default
1- name
fintune
finetune_fre¶
- type
int- abbreviation
betfre- default
1- name
finetune_fre
policy_backbone¶
Backbone of policy network.
- type
str- abbreviation
pb- default
res- name
policy_backbone
transition_backbone¶
Backbone of Transition network.
- type
str- abbreviation
tb- default
res- name
transition_backbone
g_steps¶
The number of update rounds of the generator in each epoch.
- type
int- default
1- name
g_steps- search_mode
grid- search_values
1,3,5
d_steps¶
Number of update rounds of matcher in each epoch.
- type
int- default
1- name
d_steps- search_mode
grid- search_values
1,3,5
g_lr¶
Initial learning rate of the generator.
- type
float- default
4e-05- name
g_lr- search_mode
continuous- search_values
1e-06,0.0001
d_lr¶
Initial learning rate of the matcher.
- type
float- default
0.0006- name
d_lr- search_mode
continuous- search_values
1e-06,0.001
bc¶
bc_batch_size¶
Batch size of training process.
- type
int- abbreviation
bbs- default
256- name
bc_batch_size
bc_epoch¶
Number of epcoh for the training process
- type
int- abbreviation
bep- default
500- name
bc_epoch
policy_hidden_features¶
Number of neurons per layer of the policy network.
- type
int- abbreviation
phf- default
256- name
policy_hidden_features
policy_hidden_layers¶
Depth of policy network.
- type
int- abbreviation
phl- default
4- name
policy_hidden_layers- search_mode
grid- search_values
3,4,5
policy_backbone¶
Backbone of policy network.
- type
str- abbreviation
pb- default
res- name
policy_backbone- search_mode
grid- search_values
mlp,res
g_lr¶
Initial learning rate of the training process.
- type
float- default
0.0001- name
g_lr- search_mode
continuous- search_values
1e-06,0.001
loss_type¶
Bc support different loss function(“log_prob”, “mae”, “mse”).
- name
loss_type- default
log_prob- type
str
policy_algo_config¶
ppo¶
ppo_batch_size¶
Batch size of training process.
- type
int- abbreviation
pbs- default
256- name
ppo_batch_size
ppo_epoch¶
Number of epcoh for the training process
- type
int- abbreviation
bep- default
200- name
ppo_epoch
ppo_rollout_horizon¶
Rollout length of the policy train.
- type
int- abbreviation
prh- default
100- name
ppo_rollout_horizon
policy_hidden_features¶
Number of neurons per layer of the policy network.
- type
int- abbreviation
phf- default
256- name
policy_hidden_features
policy_hidden_layers¶
Depth of policy network.
- type
int- abbreviation
phl- default
4- name
policy_hidden_layers
policy_backbone¶
Backbone of policy network.
- type
str- abbreviation
pb- default
mlp- name
policy_backbone
g_lr¶
Initial learning rate of the training process.
- type
float- default
4e-05- name
g_lr- search_mode
continuous- search_values
1e-06,0.001