revive.conf¶
base_config¶
global_seed¶
Set the random number seed for the experiment.
- type:
int
- abbreviation:
gs
- default:
42
- name:
global_seed
val_split_ratio¶
Ratio to split validate dataset if it is not explicitly given.
- type:
float
- abbreviation:
vsr
- default:
0.5
- name:
val_split_ratio
val_split_mode¶
Mode of auto splitting training and validation dataset, choose from outside_traj and inside_traj. outside_traj means the split is happened outside the trajectories, one trajectory can only be in one dataset. inside_traj means the split is happened inside the trajectories, former part of one trajectory is in training set, later part is in validation set.
- type:
str
- abbreviation:
vsm
- default:
outside_traj
- name:
val_split_mode
ignore_check¶
Flag to ignore data related check, force training.
- type:
bool
- abbreviation:
igc
- default:
False
- name:
ignore_check
data_workers¶
Number of workers to data loader. Setting a larger value can accelerate data loading, but it can lead to resource consumption.
- type:
int
- abbreviation:
dw
- default:
2
- name:
data_workers
use_time_step_embed¶
Flag to use positional embedding for time step
- type:
bool
- abbreviation:
utse
- default:
True
- name:
use_time_step_embed
time_step_embed_size¶
embedding size of positional embedding for time step
- type:
int
- abbreviation:
tses
- default:
64
- name:
time_step_embed_size
use_traj_id_embed¶
Flag to use binary embedding for trajetory id
- type:
bool
- abbreviation:
utie
- default:
True
- name:
use_traj_id_embed
pre_horzion¶
How many steps of data in the configuration trajectory are used for preprocessing operations.
- type:
int
- abbreviation:
ph
- default:
0
- name:
pre_horzion
venv_rollout_horizon¶
Length of sampled trajectory, validate only if the algorithm works on sequential data.
- type:
int
- abbreviation:
vrh
- default:
100
- name:
venv_rollout_horizon
venv_gpus_per_worker¶
Number of gpus per worker in venv training, small than 1 means launch multiple workers on the same gpu.
- type:
float
- abbreviation:
vgpw
- default:
1.0
- name:
venv_gpus_per_worker
venv_train_dataset_mode¶
Can be set to trajectory mode or transition mode.
- type:
str
- abbreviation:
vtdm
- default:
transition
- name:
venv_train_dataset_mode
venv_metric¶
Metric used to evaluate the trained venv, choose from nll, mae, mse, wdist.
- type:
str
- default:
mae
- name:
venv_metric
venv_algo¶
Algorithm used in venv training. There are currently three algorithms to choose from, bc and revive_p.
- type:
str
- default:
revive_p
- name:
venv_algo
rollout_plt_frequency¶
How many steps between two plot rollout data. 0 means disable.
- type:
int
- abbreviation:
rpf
- default:
50
- name:
rollout_plt_frequency
venv_save_frequency¶
How many epochs to save a model periodically. 0 means disable.
- type:
int
- abbreviation:
vsp
- default:
0
- name:
venv_save_frequency
plt_response_curve¶
Whether to plot response curve at the end of venv training.
- type:
bool
- abbreviation:
prc
- default:
False
- name:
plt_response_curve
rollout_dataset_mode¶
Select the rollout dataset. support train and validate
- type:
str
- default:
validate
- name:
rollout_dataset_mode
venv_val_freq¶
How many epochs to evaluate the model periodically on validate datasset.
- type:
int
- abbreviation:
vvf
- default:
1
- name:
venv_val_freq
policy_gpus_per_worker¶
Number of gpus per worker in venv training, small than 1 means launch multiple workers on the same gpu.
- type:
float
- abbreviation:
pgpw
- default:
1.0
- name:
policy_gpus_per_worker
behavioral_policy_init¶
Whether to use the learned behavioral policy to as the initialization policy training.
- type:
bool
- abbreviation:
bpi
- default:
True
- name:
behavioral_policy_init
policy_algo¶
Algorithm used in policy training. There are currently two algorithms to choose from, ppo and sac.
- type:
str
- default:
ppo
- name:
policy_algo
test_horizon¶
Rollout length of the venv test.
- type:
int
- abbreviation:
th
- default:
100
- name:
test_horizon
workers_per_trial¶
Number of workers per trail, should be set greater than 1 only if gpu per worker is all 1.0.
- type:
int
- abbreviation:
wpt
- default:
1
- name:
workers_per_trial
train_venv_trials¶
Number of total trails searched by the search algorithm in venv training.
- type:
int
- abbreviation:
tvt
- default:
25
- name:
train_venv_trials
train_policy_trials¶
Number of total trails searched by the search algorithm in policy training.
- type:
int
- abbreviation:
tpt
- default:
10
- name:
train_policy_trials
venv_algo_config¶
revive_p¶
bc_batch_size¶
- type:
int
- abbreviation:
bbs
- default:
256
- name:
bc_batch_size
bc_epoch¶
- type:
int
- abbreviation:
bep
- default:
0
- name:
bc_epoch
revive_batch_size¶
Batch size of training process.
- type:
int
- abbreviation:
mbs
- default:
1024
- name:
revive_batch_size
revive_epoch¶
Number of epcoh for the training process
- type:
int
- abbreviation:
mep
- default:
1000
- name:
revive_epoch
fintune¶
- type:
int
- abbreviation:
bet
- default:
1
- name:
fintune
finetune_fre¶
- type:
int
- abbreviation:
betfre
- default:
1
- name:
finetune_fre
policy_backbone¶
Backbone of policy network. Support selecting from [mlp, res, ft_transformer, lstm, gru].
- type:
str
- abbreviation:
pb
- default:
res
- name:
policy_backbone
transition_backbone¶
Backbone of Transition network. Support selecting from [mlp, res, ft_transformer, lstm, gru].
- type:
str
- abbreviation:
tb
- default:
res
- name:
transition_backbone
g_steps¶
The number of update rounds of the generator in each epoch.
- type:
int
- default:
1
- name:
g_steps
- search_mode:
grid
- search_values:
1
,3
,5
d_steps¶
Number of update rounds of matcher in each epoch.
- type:
int
- default:
1
- name:
d_steps
- search_mode:
grid
- search_values:
1
,3
,5
g_lr¶
Initial learning rate of the generator nodes nets.
- type:
float
- default:
4e-05
- name:
g_lr
- search_mode:
continuous
- search_values:
1e-06
,0.0001
d_lr¶
Initial learning rate of the matcher.
- type:
float
- default:
0.0006
- name:
d_lr
- search_mode:
continuous
- search_values:
1e-06
,0.001
bc_weight_decay¶
weight_decay in bc finetune
- type:
float
- default:
0.0001
- name:
bc_weight_decay
revive_f¶
revive_batch_size¶
Batch size of training process.
- type:
int
- abbreviation:
mbs
- default:
1024
- name:
revive_batch_size
revive_epoch¶
Number of epcoh for the MAIL training process
- type:
int
- abbreviation:
mep
- default:
1500
- name:
revive_epoch
policy_hidden_features¶
Number of neurons per layer of the policy network.
- type:
int
- abbreviation:
phf
- default:
256
- name:
policy_hidden_features
policy_hidden_layers¶
Depth of policy network.
- type:
int
- abbreviation:
phl
- default:
4
- name:
policy_hidden_layers
policy_backbone¶
Backbone of policy network.
- type:
str
- abbreviation:
pb
- default:
res
- name:
policy_backbone
transition_hidden_features¶
Number of neurons per layer of the transition network.
- type:
int
- abbreviation:
thf
- default:
256
- name:
transition_hidden_features
transition_hidden_layers¶
- type:
int
- abbreviation:
thl
- default:
4
- name:
transition_hidden_layers
transition_backbone¶
Backbone of Transition network.
- type:
str
- abbreviation:
tb
- default:
res
- name:
transition_backbone
matcher_hidden_features¶
Number of neurons per layer of the matcher network.
- type:
int
- abbreviation:
dhf
- default:
256
- name:
matcher_hidden_features
matcher_hidden_layers¶
Depth of the matcher network.
- type:
int
- abbreviation:
dhl
- default:
4
- name:
matcher_hidden_layers
g_steps¶
The number of update rounds of the generator in each epoch.
- type:
int
- default:
1
- name:
g_steps
- search_mode:
grid
- search_values:
1
,3
,5
d_steps¶
Number of update rounds of matcher in each epoch.
- type:
int
- default:
1
- name:
d_steps
- search_mode:
grid
- search_values:
1
,3
,5
g_lr¶
Initial learning rate of the generator nodes nets.
- type:
float
- default:
4e-05
- name:
g_lr
- search_mode:
continuous
- search_values:
1e-06
,0.0001
d_lr¶
Initial learning rate of the matcher.
- type:
float
- default:
0.0006
- name:
d_lr
- search_mode:
continuous
- search_values:
1e-06
,0.001
bc¶
bc_batch_size¶
Batch size of training process.
- type:
int
- abbreviation:
bbs
- default:
256
- name:
bc_batch_size
bc_epoch¶
Number of epcoh for the training process
- type:
int
- abbreviation:
bep
- default:
500
- name:
bc_epoch
policy_hidden_features¶
Number of neurons per layer of the policy network.
- type:
int
- abbreviation:
phf
- default:
256
- name:
policy_hidden_features
policy_hidden_layers¶
Depth of policy network.
- type:
int
- abbreviation:
phl
- default:
4
- name:
policy_hidden_layers
- search_mode:
grid
- search_values:
3
,4
,5
policy_backbone¶
Backbone of policy network. Support selecting from [mlp, res, ft_transformer, lstm, gru].
- type:
str
- abbreviation:
pb
- default:
res
- name:
policy_backbone
transition_hidden_features¶
- type:
int
- abbreviation:
thf
- default:
256
- name:
transition_hidden_features
transition_hidden_layers¶
- type:
int
- abbreviation:
thl
- default:
3
- name:
transition_hidden_layers
transition_backbone¶
Backbone of Transition network. Support selecting from [mlp, res, ft_transformer, lstm, gru].
- type:
str
- abbreviation:
tb
- default:
res
- name:
transition_backbone
g_lr¶
Initial learning rate of the training process.
- type:
float
- default:
0.0001
- name:
g_lr
- search_mode:
continuous
- search_values:
1e-06
,0.001
loss_type¶
Bc support different loss function(“nll”, “mae”, “mse”).
- name:
loss_type
- default:
nll
- type:
str
policy_algo_config¶
ppo¶
ppo_batch_size¶
Batch size of training process.
- type:
int
- abbreviation:
pbs
- default:
256
- name:
ppo_batch_size
policy_bc_epoch¶
pre-train policy with setting epoch
- type:
int
- default:
0
- name:
policy_bc_epoch
ppo_epoch¶
Number of epcoh for the training process
- type:
int
- abbreviation:
bep
- default:
1000
- name:
ppo_epoch
ppo_rollout_horizon¶
Rollout length of the policy train.
- type:
int
- abbreviation:
prh
- default:
100
- name:
ppo_rollout_horizon
policy_hidden_features¶
Number of neurons per layer of the policy network.
- type:
int
- abbreviation:
phf
- default:
256
- name:
policy_hidden_features
policy_hidden_layers¶
Depth of policy network.
- type:
int
- abbreviation:
phl
- default:
4
- name:
policy_hidden_layers
policy_backbone¶
Backbone of policy network.[mlp, res, ft_transformer]
- type:
str
- abbreviation:
pb
- default:
res
- name:
policy_backbone
g_lr¶
Initial learning rate of the training process.
- type:
float
- default:
4e-05
- name:
g_lr
- search_mode:
continuous
- search_values:
1e-06
,0.001
sac¶
sac_batch_size¶
Batch size of training process.
- type:
int
- abbreviation:
pbs
- default:
1024
- name:
sac_batch_size
policy_bc_epoch¶
pre-train policy with setting epoch
- type:
int
- default:
0
- name:
policy_bc_epoch
sac_epoch¶
Number of epcoh for the training process.
- type:
int
- abbreviation:
bep
- default:
1000
- name:
sac_epoch
sac_steps_per_epoch¶
The number of update rounds of sac in each epoch.
- type:
int
- abbreviation:
sspe
- default:
200
- name:
sac_steps_per_epoch
sac_rollout_horizon¶
- type:
int
- abbreviation:
srh
- default:
20
- name:
sac_rollout_horizon
policy_hidden_features¶
Number of neurons per layer of the policy network.
- type:
int
- abbreviation:
phf
- default:
256
- name:
policy_hidden_features
policy_hidden_layers¶
Depth of policy network.
- type:
int
- abbreviation:
phl
- default:
4
- name:
policy_hidden_layers
policy_backbone¶
Backbone of policy network. [mlp, res, ft_transformer]
- type:
str
- abbreviation:
pb
- default:
res
- name:
policy_backbone
buffer_size¶
Size of the buffer to store data.
- type:
int
- abbreviation:
bfs
- default:
1000000.0
- name:
buffer_size
g_lr¶
Initial learning rate of the training process.
- type:
float
- default:
4e-05
- name:
g_lr
- search_mode:
continuous
- search_values:
1e-06
,0.001