revive.conf¶
base_config¶
global_seed¶
Set the random number seed for the experiment.
- type:
int- abbreviation:
gs- default:
42- name:
global_seed
val_split_ratio¶
Ratio to split validate dataset if it is not explicitly given.
- type:
float- abbreviation:
vsr- default:
0.5- name:
val_split_ratio
val_split_mode¶
Mode of auto splitting training and validation dataset, choose from outside_traj and inside_traj. outside_traj means the split is happened outside the trajectories, one trajectory can only be in one dataset. inside_traj means the split is happened inside the trajectories, former part of one trajectory is in training set, later part is in validation set.
- type:
str- abbreviation:
vsm- default:
outside_traj- name:
val_split_mode
ignore_check¶
Flag to ignore data related check, force training.
- type:
bool- abbreviation:
igc- default:
False- name:
ignore_check
data_workers¶
Number of workers to data loader. Setting a larger value can accelerate data loading, but it can lead to resource consumption.
- type:
int- abbreviation:
dw- default:
2- name:
data_workers
use_time_step_embed¶
Flag to use positional embedding for time step
- type:
bool- abbreviation:
utse- default:
True- name:
use_time_step_embed
time_step_embed_size¶
embedding size of positional embedding for time step
- type:
int- abbreviation:
tses- default:
64- name:
time_step_embed_size
use_traj_id_embed¶
Flag to use binary embedding for trajetory id
- type:
bool- abbreviation:
utie- default:
True- name:
use_traj_id_embed
pre_horzion¶
How many steps of data in the configuration trajectory are used for preprocessing operations.
- type:
int- abbreviation:
ph- default:
0- name:
pre_horzion
venv_rollout_horizon¶
Length of sampled trajectory, validate only if the algorithm works on sequential data.
- type:
int- abbreviation:
vrh- default:
100- name:
venv_rollout_horizon
venv_gpus_per_worker¶
Number of gpus per worker in venv training, small than 1 means launch multiple workers on the same gpu.
- type:
float- abbreviation:
vgpw- default:
1.0- name:
venv_gpus_per_worker
venv_train_dataset_mode¶
Can be set to trajectory mode or transition mode.
- type:
str- abbreviation:
vtdm- default:
transition- name:
venv_train_dataset_mode
venv_metric¶
Metric used to evaluate the trained venv, choose from nll, mae, mse, wdist.
- type:
str- default:
mae- name:
venv_metric
venv_algo¶
Algorithm used in venv training. There are currently three algorithms to choose from, bc and revive_p.
- type:
str- default:
revive_p- name:
venv_algo
rollout_plt_frequency¶
How many steps between two plot rollout data. 0 means disable.
- type:
int- abbreviation:
rpf- default:
50- name:
rollout_plt_frequency
venv_save_frequency¶
How many epochs to save a model periodically. 0 means disable.
- type:
int- abbreviation:
vsp- default:
0- name:
venv_save_frequency
plt_response_curve¶
Whether to plot response curve at the end of venv training.
- type:
bool- abbreviation:
prc- default:
False- name:
plt_response_curve
rollout_dataset_mode¶
Select the rollout dataset. support train and validate
- type:
str- default:
validate- name:
rollout_dataset_mode
venv_val_freq¶
How many epochs to evaluate the model periodically on validate datasset.
- type:
int- abbreviation:
vvf- default:
1- name:
venv_val_freq
policy_gpus_per_worker¶
Number of gpus per worker in venv training, small than 1 means launch multiple workers on the same gpu.
- type:
float- abbreviation:
pgpw- default:
1.0- name:
policy_gpus_per_worker
behavioral_policy_init¶
Whether to use the learned behavioral policy to as the initialization policy training.
- type:
bool- abbreviation:
bpi- default:
True- name:
behavioral_policy_init
policy_algo¶
Algorithm used in policy training. There are currently two algorithms to choose from, ppo and sac.
- type:
str- default:
ppo- name:
policy_algo
test_horizon¶
Rollout length of the venv test.
- type:
int- abbreviation:
th- default:
100- name:
test_horizon
workers_per_trial¶
Number of workers per trail, should be set greater than 1 only if gpu per worker is all 1.0.
- type:
int- abbreviation:
wpt- default:
1- name:
workers_per_trial
train_venv_trials¶
Number of total trails searched by the search algorithm in venv training.
- type:
int- abbreviation:
tvt- default:
25- name:
train_venv_trials
train_policy_trials¶
Number of total trails searched by the search algorithm in policy training.
- type:
int- abbreviation:
tpt- default:
10- name:
train_policy_trials
venv_algo_config¶
revive_p¶
bc_batch_size¶
- type:
int- abbreviation:
bbs- default:
256- name:
bc_batch_size
bc_epoch¶
- type:
int- abbreviation:
bep- default:
0- name:
bc_epoch
revive_batch_size¶
Batch size of training process.
- type:
int- abbreviation:
mbs- default:
1024- name:
revive_batch_size
revive_epoch¶
Number of epcoh for the training process
- type:
int- abbreviation:
mep- default:
1000- name:
revive_epoch
fintune¶
- type:
int- abbreviation:
bet- default:
1- name:
fintune
finetune_fre¶
- type:
int- abbreviation:
betfre- default:
1- name:
finetune_fre
policy_backbone¶
Backbone of policy network. Support selecting from [mlp, res, ft_transformer, lstm, gru].
- type:
str- abbreviation:
pb- default:
res- name:
policy_backbone
transition_backbone¶
Backbone of Transition network. Support selecting from [mlp, res, ft_transformer, lstm, gru].
- type:
str- abbreviation:
tb- default:
res- name:
transition_backbone
g_steps¶
The number of update rounds of the generator in each epoch.
- type:
int- default:
1- name:
g_steps- search_mode:
grid- search_values:
1,3,5
d_steps¶
Number of update rounds of matcher in each epoch.
- type:
int- default:
1- name:
d_steps- search_mode:
grid- search_values:
1,3,5
g_lr¶
Initial learning rate of the generator nodes nets.
- type:
float- default:
4e-05- name:
g_lr- search_mode:
continuous- search_values:
1e-06,0.0001
d_lr¶
Initial learning rate of the matcher.
- type:
float- default:
0.0006- name:
d_lr- search_mode:
continuous- search_values:
1e-06,0.001
bc_weight_decay¶
weight_decay in bc finetune
- type:
float- default:
0.0001- name:
bc_weight_decay
revive_f¶
revive_batch_size¶
Batch size of training process.
- type:
int- abbreviation:
mbs- default:
1024- name:
revive_batch_size
revive_epoch¶
Number of epcoh for the MAIL training process
- type:
int- abbreviation:
mep- default:
1500- name:
revive_epoch
policy_hidden_features¶
Number of neurons per layer of the policy network.
- type:
int- abbreviation:
phf- default:
256- name:
policy_hidden_features
policy_hidden_layers¶
Depth of policy network.
- type:
int- abbreviation:
phl- default:
4- name:
policy_hidden_layers
policy_backbone¶
Backbone of policy network.
- type:
str- abbreviation:
pb- default:
res- name:
policy_backbone
transition_hidden_features¶
Number of neurons per layer of the transition network.
- type:
int- abbreviation:
thf- default:
256- name:
transition_hidden_features
transition_hidden_layers¶
- type:
int- abbreviation:
thl- default:
4- name:
transition_hidden_layers
transition_backbone¶
Backbone of Transition network.
- type:
str- abbreviation:
tb- default:
res- name:
transition_backbone
matcher_hidden_features¶
Number of neurons per layer of the matcher network.
- type:
int- abbreviation:
dhf- default:
256- name:
matcher_hidden_features
matcher_hidden_layers¶
Depth of the matcher network.
- type:
int- abbreviation:
dhl- default:
4- name:
matcher_hidden_layers
g_steps¶
The number of update rounds of the generator in each epoch.
- type:
int- default:
1- name:
g_steps- search_mode:
grid- search_values:
1,3,5
d_steps¶
Number of update rounds of matcher in each epoch.
- type:
int- default:
1- name:
d_steps- search_mode:
grid- search_values:
1,3,5
g_lr¶
Initial learning rate of the generator nodes nets.
- type:
float- default:
4e-05- name:
g_lr- search_mode:
continuous- search_values:
1e-06,0.0001
d_lr¶
Initial learning rate of the matcher.
- type:
float- default:
0.0006- name:
d_lr- search_mode:
continuous- search_values:
1e-06,0.001
bc¶
bc_batch_size¶
Batch size of training process.
- type:
int- abbreviation:
bbs- default:
256- name:
bc_batch_size
bc_epoch¶
Number of epcoh for the training process
- type:
int- abbreviation:
bep- default:
500- name:
bc_epoch
policy_hidden_features¶
Number of neurons per layer of the policy network.
- type:
int- abbreviation:
phf- default:
256- name:
policy_hidden_features
policy_hidden_layers¶
Depth of policy network.
- type:
int- abbreviation:
phl- default:
4- name:
policy_hidden_layers- search_mode:
grid- search_values:
3,4,5
policy_backbone¶
Backbone of policy network. Support selecting from [mlp, res, ft_transformer, lstm, gru].
- type:
str- abbreviation:
pb- default:
res- name:
policy_backbone
transition_hidden_features¶
- type:
int- abbreviation:
thf- default:
256- name:
transition_hidden_features
transition_hidden_layers¶
- type:
int- abbreviation:
thl- default:
3- name:
transition_hidden_layers
transition_backbone¶
Backbone of Transition network. Support selecting from [mlp, res, ft_transformer, lstm, gru].
- type:
str- abbreviation:
tb- default:
res- name:
transition_backbone
g_lr¶
Initial learning rate of the training process.
- type:
float- default:
0.0001- name:
g_lr- search_mode:
continuous- search_values:
1e-06,0.001
loss_type¶
Bc support different loss function(“nll”, “mae”, “mse”).
- name:
loss_type- default:
nll- type:
str
policy_algo_config¶
ppo¶
ppo_batch_size¶
Batch size of training process.
- type:
int- abbreviation:
pbs- default:
256- name:
ppo_batch_size
policy_bc_epoch¶
pre-train policy with setting epoch
- type:
int- default:
0- name:
policy_bc_epoch
ppo_epoch¶
Number of epcoh for the training process
- type:
int- abbreviation:
bep- default:
1000- name:
ppo_epoch
ppo_rollout_horizon¶
Rollout length of the policy train.
- type:
int- abbreviation:
prh- default:
100- name:
ppo_rollout_horizon
policy_hidden_features¶
Number of neurons per layer of the policy network.
- type:
int- abbreviation:
phf- default:
256- name:
policy_hidden_features
policy_hidden_layers¶
Depth of policy network.
- type:
int- abbreviation:
phl- default:
4- name:
policy_hidden_layers
policy_backbone¶
Backbone of policy network.[mlp, res, ft_transformer]
- type:
str- abbreviation:
pb- default:
res- name:
policy_backbone
g_lr¶
Initial learning rate of the training process.
- type:
float- default:
4e-05- name:
g_lr- search_mode:
continuous- search_values:
1e-06,0.001
sac¶
sac_batch_size¶
Batch size of training process.
- type:
int- abbreviation:
pbs- default:
1024- name:
sac_batch_size
policy_bc_epoch¶
pre-train policy with setting epoch
- type:
int- default:
0- name:
policy_bc_epoch
sac_epoch¶
Number of epcoh for the training process.
- type:
int- abbreviation:
bep- default:
1000- name:
sac_epoch
sac_steps_per_epoch¶
The number of update rounds of sac in each epoch.
- type:
int- abbreviation:
sspe- default:
200- name:
sac_steps_per_epoch
sac_rollout_horizon¶
- type:
int- abbreviation:
srh- default:
20- name:
sac_rollout_horizon
policy_hidden_features¶
Number of neurons per layer of the policy network.
- type:
int- abbreviation:
phf- default:
256- name:
policy_hidden_features
policy_hidden_layers¶
Depth of policy network.
- type:
int- abbreviation:
phl- default:
4- name:
policy_hidden_layers
policy_backbone¶
Backbone of policy network. [mlp, res, ft_transformer]
- type:
str- abbreviation:
pb- default:
res- name:
policy_backbone
buffer_size¶
Size of the buffer to store data.
- type:
int- abbreviation:
bfs- default:
1000000.0- name:
buffer_size
g_lr¶
Initial learning rate of the training process.
- type:
float- default:
4e-05- name:
g_lr- search_mode:
continuous- search_values:
1e-06,0.001