revive.algo.venv package¶
Submodules¶
revive.algo.venv.base module¶
- class revive.algo.venv.base.VenvOperator(*args, **kwargs)[source]¶
Bases:
object
The base venv class.validate_epoch
- NAME = None¶
Name of the used algorithm.
- property metric_name¶
This define the metric we try to minimize with hyperparameter search.
- property nodes_models_train¶
- property other_models_train¶
- property nodes_models_val¶
- property other_models_val¶
- PARAMETER_DESCRIPTION = []¶
- classmethod get_tune_parameters(config: dict, **kargs)[source]¶
Use ray.tune to wrap the parameters to be searched.
- model_creator(config: dict, graph: DesicionGraph)[source]¶
Create all the models. The algorithm needs to define models for the nodes to be learned.
- Args:
- config:
configuration parameters
- Return:
a list of models
- optimizer_creator(models: List[Module], config: dict)[source]¶
Define optimizers for the created models.
- Args:
- pmodels:
list of all the models
- config:
configuration parameters
- Return:
a list of optimizers
- data_creator()[source]¶
Create DataLoaders.
- Args:
- config:
configuration parameters
- Return:
(train_loader, val_loader)
- train_batch(expert_data, batch_info, scope='train')[source]¶
Define the training process for an batch data.
- validate_batch(expert_data, batch_info, scope='valEnv_on_trainData', loss_mask=None)[source]¶
Define the validate process for an batch data.
- Args:
expert_data: The batch offline Data.
batch_info: A batch info dict.
scope: if
scope=valEnv_on_trainData
means training data test on the model trained by validation dataset.
revive.algo.venv.bc module¶
- class revive.algo.venv.bc.BCOperator(*args, **kwargs)[source]¶
Bases:
VenvOperator
- NAME = 'REVIVE_VENV'¶
Name of the used algorithm.
- PARAMETER_DESCRIPTION = [{'abbreviation': 'bbs', 'default': 256, 'description': 'Batch size of training process.', 'doc': True, 'name': 'bc_batch_size', 'type': <class 'int'>}, {'abbreviation': 'bep', 'default': 500, 'description': 'Number of epcoh for the training process', 'doc': True, 'name': 'bc_epoch', 'type': <class 'int'>}, {'abbreviation': 'bh', 'default': 10, 'name': 'bc_horizon', 'type': <class 'int'>}, {'abbreviation': 'phf', 'default': 256, 'description': 'Number of neurons per layer of the policy network.', 'doc': True, 'name': 'policy_hidden_features', 'type': <class 'int'>}, {'abbreviation': 'phl', 'default': 4, 'description': 'Depth of policy network.', 'doc': True, 'name': 'policy_hidden_layers', 'search_mode': 'grid', 'search_values': [3, 4, 5], 'type': <class 'int'>}, {'abbreviation': 'pa', 'default': 'leakyrelu', 'name': 'policy_activation', 'type': <class 'str'>}, {'abbreviation': 'pn', 'default': None, 'name': 'policy_normalization', 'type': <class 'str'>}, {'abbreviation': 'pb', 'default': 'res', 'description': 'Backbone of policy network. Support selecting from [mlp, res, ft_transformer, lstm, gru].', 'doc': True, 'name': 'policy_backbone', 'type': <class 'str'>}, {'abbreviation': 'thf', 'default': 256, 'doc': True, 'name': 'transition_hidden_features', 'type': <class 'int'>}, {'abbreviation': 'thl', 'default': 3, 'doc': True, 'name': 'transition_hidden_layers', 'type': <class 'int'>}, {'abbreviation': 'ta', 'default': 'leakyrelu', 'name': 'transition_activation', 'type': <class 'str'>}, {'abbreviation': 'tn', 'default': None, 'name': 'transition_normalization', 'type': <class 'str'>}, {'abbreviation': 'tb', 'default': 'res', 'description': 'Backbone of Transition network. Support selecting from [mlp, res, ft_transformer, lstm, gru].', 'doc': True, 'name': 'transition_backbone', 'type': <class 'str'>}, {'default': 0.0001, 'description': 'Initial learning rate of the training process.', 'doc': True, 'name': 'g_lr', 'search_mode': 'continuous', 'search_values': [1e-06, 0.001], 'type': <class 'float'>}, {'abbreviation': 'wd', 'default': 0.0001, 'name': 'weight_decay', 'type': <class 'float'>}, {'abbreviation': 'ld', 'default': 0.99, 'name': 'lr_decay', 'type': <class 'float'>}, {'default': 'nll', 'description': 'Bc support different loss function("nll", "mae", "mse").', 'doc': True, 'name': 'loss_type', 'type': <class 'str'>}, {'default': 5e-05, 'name': 'bc_l2_coef', 'type': <class 'float'>}, {'default': 0.01, 'name': 'logstd_loss_coef', 'type': <class 'float'>}]¶
- model_creator(config: dict, graph: DesicionGraph)[source]¶
Create policies and transition, if needed.
- Parameters:
config – configuration parameters
- Returns:
list of all models
- optimizer_creator(models, config)[source]¶
Define optimizers for the created models.
- Args:
- pmodels:
list of all the models
- config:
configuration parameters
- Return:
a list of optimizers
- data_creator()[source]¶
Create DataLoaders.
- Args:
- config:
configuration parameters
- Return:
(train_loader, val_loader)
- train_epoch(*args, **kwargs)¶
revive.algo.venv.revive module¶
- class revive.algo.venv.revive.ReviveOperator(*args, **kwargs)[source]¶
Bases:
VenvOperator
- NAME = 'REVIVE'¶
Name of the used algorithm.
- matcher_model_creator(config, graph)[source]¶
Create matcher models.
- Parameters:
config – configuration parameters
- Returns:
all the models.
- model_creator(config, graph)[source]¶
Create all the models. The algorithm needs to define models for the nodes to be learned.
- Args:
- config:
configuration parameters
- Return:
a list of models
- data_creator()[source]¶
Create DataLoaders.
- Args:
- config:
configuration parameters
- Return:
(train_loader, val_loader)
- bc_train_batch(expert_data, batch_info, scope='train', loss_type=None, dataset_mode=None, loss_mask=None)[source]¶
- train_epoch(*args, **kwargs)¶
revive.algo.venv.revive_f module¶
- class revive.algo.venv.revive_f.FILTEROperator(*args, **kwargs)[source]¶
Bases:
PPOOperator
- NAME = 'REVIVE_VENV'¶
Name of the used algorithm.
- PARAMETER_DESCRIPTION = [{'abbreviation': 'bep', 'default': 1500, 'name': 'bc_epoch', 'type': <class 'int'>}, {'default': 0.001, 'name': 'bc_lr', 'type': <class 'float'>}, {'default': 1, 'name': 'bc_steps', 'type': <class 'int'>}, {'abbreviation': 'mbs', 'default': 1024, 'description': 'Batch size of training process.', 'doc': True, 'name': 'revive_batch_size', 'type': <class 'int'>}, {'abbreviation': 'mep', 'default': 1500, 'description': 'Number of epcoh for the MAIL training process', 'doc': True, 'name': 'revive_epoch', 'type': <class 'int'>}, {'abbreviation': 'dpe', 'default': 0, 'name': 'matcher_pretrain_epoch', 'type': <class 'int'>}, {'abbreviation': 'phf', 'default': 256, 'description': 'Number of neurons per layer of the policy network.', 'doc': True, 'name': 'policy_hidden_features', 'type': <class 'int'>}, {'abbreviation': 'phl', 'default': 4, 'description': 'Depth of policy network.', 'doc': True, 'name': 'policy_hidden_layers', 'type': <class 'int'>}, {'abbreviation': 'pa', 'default': 'leakyrelu', 'name': 'policy_activation', 'type': <class 'str'>}, {'abbreviation': 'pn', 'default': None, 'name': 'policy_normalization', 'type': <class 'str'>}, {'abbreviation': 'pb', 'default': 'res', 'description': 'Backbone of policy network.', 'doc': True, 'name': 'policy_backbone', 'type': <class 'str'>}, {'abbreviation': 'thf', 'default': 256, 'description': 'Number of neurons per layer of the transition network.', 'doc': True, 'name': 'transition_hidden_features', 'type': <class 'int'>}, {'abbreviation': 'thl', 'default': 4, 'doc': True, 'name': 'transition_hidden_layers', 'type': <class 'int'>}, {'abbreviation': 'ta', 'default': 'leakyrelu', 'name': 'transition_activation', 'type': <class 'str'>}, {'abbreviation': 'tn', 'default': None, 'name': 'transition_normalization', 'type': <class 'str'>}, {'abbreviation': 'tb', 'default': 'res', 'description': 'Backbone of Transition network.', 'doc': True, 'name': 'transition_backbone', 'type': <class 'str'>}, {'default': 'auto', 'name': 'matching_nodes', 'type': <class 'list'>}, {'default': 'auto', 'name': 'matching_fit_nodes', 'type': <class 'list'>}, {'abbreviation': 'dhf', 'default': 256, 'description': 'Number of neurons per layer of the matcher network.', 'doc': True, 'name': 'matcher_hidden_features', 'type': <class 'int'>}, {'abbreviation': 'dhl', 'default': 4, 'description': 'Depth of the matcher network.', 'doc': True, 'name': 'matcher_hidden_layers', 'type': <class 'int'>}, {'abbreviation': 'da', 'default': 'leakyrelu', 'name': 'matcher_activation', 'type': <class 'str'>}, {'abbreviation': 'dn', 'default': None, 'name': 'matcher_normalization', 'type': <class 'str'>}, {'default': 'auto', 'name': 'state_nodes', 'type': <class 'list'>}, {'abbreviation': 'vhf', 'default': 256, 'name': 'value_hidden_features', 'type': <class 'int'>}, {'abbreviation': 'vhl', 'default': 4, 'name': 'value_hidden_layers', 'type': <class 'int'>}, {'abbreviation': 'va', 'default': 'leakyrelu', 'name': 'value_activation', 'type': <class 'str'>}, {'abbreviation': 'vn', 'default': None, 'name': 'value_normalization', 'type': <class 'str'>}, {'abbreviation': 'gt', 'default': 'res', 'name': 'generator_type', 'type': <class 'str'>}, {'abbreviation': 'dt', 'default': 'res', 'name': 'matcher_type', 'type': <class 'str'>}, {'default': False, 'name': 'birnn', 'type': <class 'bool'>}, {'abbreviation': 'sas', 'default': None, 'name': 'std_adapt_strategy', 'type': <class 'str'>}, {'abbreviation': 'ga', 'default': 'ppo', 'name': 'generator_algo', 'type': <class 'str'>}, {'default': 2, 'name': 'ppo_runs', 'type': <class 'int'>}, {'default': 0.2, 'name': 'ppo_epsilon', 'type': <class 'float'>}, {'default': 0, 'name': 'ppo_l2norm_cof', 'type': <class 'float'>}, {'default': 0, 'name': 'ppo_entropy_cof', 'type': <class 'float'>}, {'default': 0, 'name': 'generator_sup_cof', 'type': <class 'float'>}, {'default': 0.99, 'name': 'gae_gamma', 'type': <class 'float'>}, {'default': 0.95, 'name': 'gae_lambda', 'type': <class 'float'>}, {'default': 1, 'description': 'The number of update rounds of the generator in each epoch.', 'doc': True, 'name': 'g_steps', 'search_mode': 'grid', 'search_values': [1, 3, 5], 'type': <class 'int'>}, {'default': 1, 'description': 'Number of update rounds of matcher in each epoch.', 'doc': True, 'name': 'd_steps', 'search_mode': 'grid', 'search_values': [1, 3, 5], 'type': <class 'int'>}, {'default': 4e-05, 'description': 'Initial learning rate of the generator nodes nets.', 'doc': True, 'name': 'g_lr', 'search_mode': 'continuous', 'search_values': [1e-06, 0.0001], 'type': <class 'float'>}, {'default': 0.0006, 'description': 'Initial learning rate of the matcher.', 'doc': True, 'name': 'd_lr', 'search_mode': 'continuous', 'search_values': [1e-06, 0.001], 'type': <class 'float'>}, {'default': 0.001, 'name': 'value_lr', 'type': <class 'float'>}, {'default': 0, 'description': 'Matcher loss length.', 'name': 'matcher_loss_length', 'type': <class 'int'>}, {'default': 1.2, 'description': 'Matcher loss high value. When the matcher_loss beyond the value, the generator would stop train', 'name': 'matcher_loss_high', 'type': <class 'float'>}, {'default': 0.3, 'description': 'Matcher loss high value. When the matcher_loss low the value, the matcher would stop train', 'name': 'matcher_loss_low', 'type': <class 'float'>}, {'default': False, 'description': 'Sample the data for tring the matcher.', 'name': 'matcher_sample', 'type': <class 'bool'>}, {'default': 0.25, 'description': 'reward = (1-mae_reward_weight)*matcher_reward + mae_reward_weight*mae_reward.', 'name': 'mae_reward_weight', 'type': <class 'float'>}, {'default': 1, 'description': 'Repeat rollout more data to train generator.', 'name': 'generator_data_repeat', 'type': <class 'int'>}, {'default': 64, 'description': 'RNN hidden dims', 'name': 'rnn_hidden_features', 'type': <class 'int'>}, {'default': 0, 'description': 'length of the sliding_window in RNN', 'name': 'window_size', 'type': <class 'int'>}, {'default': False, 'name': 'mix_data', 'type': <class 'bool'>}, {'default': 0.95, 'name': 'quantile', 'type': <class 'float'>}, {'default': 0.5, 'name': 'matcher_grad_norm_clip', 'type': <class 'float'>}, {'default': True, 'name': 'mix_sample', 'type': <class 'bool'>}, {'default': 0.5, 'name': 'mix_sample_ratio', 'type': <class 'float'>}, {'default': True, 'name': 'replace_with_expert', 'type': <class 'bool'>}, {'default': 0.1, 'name': 'replace_ratio', 'type': <class 'float'>}, {'default': 0.5, 'name': 'gp_coef', 'type': <class 'float'>}, {'default': 0.01, 'name': 'discr_ent_coef', 'type': <class 'float'>}, {'default': 0.0005, 'name': 'matcher_l2_norm_coeff', 'type': <class 'float'>}, {'default': 1e-06, 'name': 'value_l2_norm_coef', 'type': <class 'float'>}, {'default': 1e-06, 'name': 'generator_l2_norm_coef', 'type': <class 'float'>}, {'default': 50, 'name': 'matcher_record_len', 'type': <class 'int'>}, {'default': 1, 'name': 'matcher_record_interval', 'type': <class 'int'>}, {'default': 0.125, 'name': 'fix_std', 'type': <class 'float'>}, {'default': 5e-05, 'name': 'bc_l2_coef', 'type': <class 'float'>}, {'default': 0.01, 'name': 'logstd_loss_coef', 'type': <class 'float'>}, {'default': 0.0, 'name': 'entropy_coef', 'type': <class 'float'>}, {'default': 'nll', 'name': 'bc_loss', 'type': <class 'str'>}, {'default': 10, 'name': 'controller_weight', 'type': <class 'float'>}]¶
- property nodes_models_mail¶
- property other_models_mail¶
- data_creator(config: dict)[source]¶
Create DataLoaders.
- Args:
- config:
configuration parameters
- Return:
(train_loader, val_loader)
- bc_model_creator(config, graph)[source]¶
Create generator models.
- Parameters:
config – configuration parameters
- Returns:
all the models.
- matcher_model_creator(config, graph)[source]¶
Create matcher models.
- Parameters:
config – configuration parameters
- Returns:
all the models.
- mail_generator_model_creator(config, graph)[source]¶
Create generator models.
- Parameters:
config – configuration parameters
- Returns:
all the models.
- mail_optimizer_creator(models, config)[source]¶
Optimizer creator including generator optimizers and matcher optimizers.
- Parameters:
models – node models, matcher, value_net
config – configuration parameters
- Returns:
generator_optimizer, matcher_optimizer
- train_epoch(*args, **kwargs)¶
- PPO_step(generated_data, graph, value_net, generator_optimizers, matcher, other_generator_optimizers, epsilon=0.1, lam=0, w_ent=0, matcher_index=None, scope=None)[source]¶
Train Policy including policies, transition, and value_net by PPO algorithm.
- Parameters:
generated_data – generated trajectory
graph – decision graph
value_net – value net
generator_optimizers – the optimizers used to optimize node models and value net
epsilon – hyperparameter for clipping in the policy objective
lam – regularization parameter
w_ent – the weight of entropy loss
- Returns:
v_loss, p_loss, sup_loss, total_loss, generator_grad_norm
revive.algo.venv.revive_p module¶
- class revive.algo.venv.revive_p.PPOOperator(*args, **kwargs)[source]¶
Bases:
ReviveOperator
- NAME = 'REVIVE_VENV'¶
Name of the used algorithm.
- PARAMETER_DESCRIPTION = [{'abbreviation': 'bbs', 'default': 256, 'doc': True, 'name': 'bc_batch_size', 'type': <class 'int'>}, {'abbreviation': 'bep', 'default': 0, 'doc': True, 'name': 'bc_epoch', 'type': <class 'int'>}, {'default': 0.001, 'name': 'bc_lr', 'type': <class 'float'>}, {'default': 'nll', 'description': 'Bc support different loss function("nll", "mae", "mse").', 'name': 'bc_loss_type', 'type': <class 'str'>}, {'abbreviation': 'mbs', 'default': 1024, 'description': 'Batch size of training process.', 'doc': True, 'name': 'revive_batch_size', 'type': <class 'int'>}, {'abbreviation': 'mep', 'default': 1000, 'description': 'Number of epcoh for the training process', 'doc': True, 'name': 'revive_epoch', 'type': <class 'int'>}, {'abbreviation': 'bet', 'default': 1, 'doc': True, 'name': 'fintune', 'type': <class 'int'>}, {'abbreviation': 'betfre', 'default': 1, 'doc': True, 'name': 'finetune_fre', 'type': <class 'int'>}, {'abbreviation': 'dpe', 'default': 0, 'name': 'matcher_pretrain_epoch', 'type': <class 'int'>}, {'abbreviation': 'phf', 'default': 256, 'description': 'Number of neurons per layer of the policy network.', 'doc': True, 'name': 'policy_hidden_features', 'type': <class 'int'>}, {'abbreviation': 'phl', 'default': 4, 'description': 'Depth of policy network.', 'doc': True, 'name': 'policy_hidden_layers', 'type': <class 'int'>}, {'abbreviation': 'pa', 'default': 'leakyrelu', 'name': 'policy_activation', 'type': <class 'str'>}, {'abbreviation': 'pn', 'default': None, 'name': 'policy_normalization', 'type': <class 'str'>}, {'abbreviation': 'pb', 'default': 'res', 'description': 'Backbone of policy network. Support selecting from [mlp, res, ft_transformer, lstm, gru].', 'doc': True, 'name': 'policy_backbone', 'type': <class 'str'>}, {'abbreviation': 'thf', 'default': 256, 'description': 'Number of neurons per layer of the transition network.', 'doc': True, 'name': 'transition_hidden_features', 'type': <class 'int'>}, {'abbreviation': 'thl', 'default': 4, 'doc': True, 'name': 'transition_hidden_layers', 'type': <class 'int'>}, {'abbreviation': 'ta', 'default': 'leakyrelu', 'name': 'transition_activation', 'type': <class 'str'>}, {'abbreviation': 'tn', 'default': None, 'name': 'transition_normalization', 'type': <class 'str'>}, {'abbreviation': 'tb', 'default': 'res', 'description': 'Backbone of Transition network. Support selecting from [mlp, res, ft_transformer, lstm, gru].', 'doc': True, 'name': 'transition_backbone', 'type': <class 'str'>}, {'default': 'auto', 'name': 'matching_nodes', 'type': <class 'list'>}, {'default': 'auto', 'name': 'matching_fit_nodes', 'type': <class 'list'>}, {'abbreviation': 'dhf', 'default': 256, 'description': 'Number of neurons per layer of the matcher network.', 'doc': True, 'name': 'matcher_hidden_features', 'type': <class 'int'>}, {'abbreviation': 'dhl', 'default': 4, 'description': 'Depth of the matcher network.', 'doc': True, 'name': 'matcher_hidden_layers', 'type': <class 'int'>}, {'abbreviation': 'da', 'default': 'leakyrelu', 'name': 'matcher_activation', 'type': <class 'str'>}, {'abbreviation': 'dn', 'default': None, 'name': 'matcher_normalization', 'type': <class 'str'>}, {'default': 'auto', 'name': 'state_nodes', 'type': <class 'list'>}, {'abbreviation': 'vhf', 'default': 256, 'name': 'value_hidden_features', 'type': <class 'int'>}, {'abbreviation': 'vhl', 'default': 4, 'name': 'value_hidden_layers', 'type': <class 'int'>}, {'abbreviation': 'va', 'default': 'leakyrelu', 'name': 'value_activation', 'type': <class 'str'>}, {'abbreviation': 'vn', 'default': None, 'name': 'value_normalization', 'type': <class 'str'>}, {'abbreviation': 'gt', 'default': 'res', 'name': 'generator_type', 'type': <class 'str'>}, {'abbreviation': 'dt', 'default': 'res', 'name': 'matcher_type', 'type': <class 'str'>}, {'default': False, 'name': 'birnn', 'type': <class 'bool'>}, {'abbreviation': 'sas', 'default': None, 'name': 'std_adapt_strategy', 'type': <class 'str'>}, {'abbreviation': 'ga', 'default': 'ppo', 'name': 'generator_algo', 'type': <class 'str'>}, {'default': 2, 'name': 'ppo_runs', 'type': <class 'int'>}, {'default': 0.2, 'name': 'ppo_epsilon', 'type': <class 'float'>}, {'default': 0, 'name': 'ppo_l2norm_cof', 'type': <class 'float'>}, {'default': 0, 'name': 'ppo_entropy_cof', 'type': <class 'float'>}, {'default': 0, 'name': 'generator_sup_cof', 'type': <class 'float'>}, {'default': 0.99, 'name': 'gae_gamma', 'type': <class 'float'>}, {'default': 0.95, 'name': 'gae_lambda', 'type': <class 'float'>}, {'default': 1, 'description': 'The number of update rounds of the generator in each epoch.', 'doc': True, 'name': 'g_steps', 'search_mode': 'grid', 'search_values': [1, 3, 5], 'type': <class 'int'>}, {'default': 1, 'description': 'Number of update rounds of matcher in each epoch.', 'doc': True, 'name': 'd_steps', 'search_mode': 'grid', 'search_values': [1, 3, 5], 'type': <class 'int'>}, {'default': 4e-05, 'description': 'Initial learning rate of the generator nodes nets.', 'doc': True, 'name': 'g_lr', 'search_mode': 'continuous', 'search_values': [1e-06, 0.0001], 'type': <class 'float'>}, {'default': 0.0006, 'description': 'Initial learning rate of the matcher.', 'doc': True, 'name': 'd_lr', 'search_mode': 'continuous', 'search_values': [1e-06, 0.001], 'type': <class 'float'>}, {'default': 0, 'description': 'Matcher loss length.', 'name': 'matcher_loss_length', 'type': <class 'int'>}, {'default': 1.2, 'description': 'Matcher loss high value. When the matcher_loss beyond the value, the generator would stop train', 'name': 'matcher_loss_high', 'type': <class 'float'>}, {'default': 0.3, 'description': 'Matcher loss high value. When the matcher_loss low the value, the matcher would stop train', 'name': 'matcher_loss_low', 'type': <class 'float'>}, {'default': False, 'description': 'Sample the data for tring the matcher.', 'name': 'matcher_sample', 'type': <class 'bool'>}, {'default': 0.25, 'description': 'reward = (1-mae_reward_weight)*matcher_reward + mae_reward_weight*mae_reward.', 'name': 'mae_reward_weight', 'type': <class 'float'>}, {'default': 1, 'description': 'Repeat rollout more data to train generator.', 'name': 'generator_data_repeat', 'type': <class 'int'>}, {'default': 64, 'description': 'RNN hidden dims', 'name': 'rnn_hidden_features', 'type': <class 'int'>}, {'default': 0, 'description': 'length of the sliding_window in RNN', 'name': 'window_size', 'type': <class 'int'>}, {'default': 0.0001, 'description': 'weight_decay in bc finetune', 'doc': True, 'name': 'bc_weight_decay', 'type': <class 'float'>}, {'default': False, 'name': 'mix_data', 'type': <class 'bool'>}, {'default': 0.95, 'name': 'quantile', 'type': <class 'float'>}, {'default': 0.5, 'name': 'matcher_grad_norm_clip', 'type': <class 'float'>}, {'default': False, 'name': 'mix_sample', 'type': <class 'bool'>}, {'default': 0.5, 'name': 'mix_sample_ratio', 'type': <class 'float'>}, {'default': 0.0, 'name': 'gp_coef', 'type': <class 'float'>}, {'default': 0.01, 'name': 'discr_ent_coef', 'type': <class 'float'>}, {'default': 0.0005, 'name': 'matcher_l2_norm_coeff', 'type': <class 'float'>}, {'default': 1e-06, 'name': 'value_l2_norm_coef', 'type': <class 'float'>}, {'default': 1e-06, 'name': 'generator_l2_norm_coef', 'type': <class 'float'>}, {'default': 5e-05, 'name': 'bc_l2_coef', 'type': <class 'float'>}, {'default': 0.0, 'name': 'logstd_loss_coef', 'type': <class 'float'>}, {'default': 0.0, 'name': 'entropy_coef', 'type': <class 'float'>}, {'default': 'nll', 'name': 'bc_loss', 'type': <class 'str'>}, {'default': 'auto', 'name': 'ts_conv_nodes', 'type': <class 'list'>}, {'default': 10, 'name': 'controller_weight', 'type': <class 'float'>}, {'default': -1, 'name': 'bc_repeat', 'type': <class 'int'>}]¶
- generator_model_creator(config, graph)[source]¶
Create generator models.
- Parameters:
config – configuration parameters
- Returns:
all the models.
- optimizer_creator(models, config)[source]¶
Optimizer creator including generator optimizers and matcher optimizers.
- Parameters:
models – node models, matcher, value_net
config – configuration parameters
- Returns:
generator_optimizer, matcher_optimizer
- ADV(reward, mask, value, gamma, lam, use_gae=True)[source]¶
Compute advantage function for PPO.
- Parameters:
reward – rewards of each step
mask – mask is 1 if the trajectory done, else 0
value – value for each state
gamma – discount factor
lam – GAE lamda
use_gae – True or False
- Returns:
advantages and new value
- PPO_step(generated_data, graph, value_net, generator_optimizers, matcher, other_generator_optimizers, epsilon=0.1, lam=0, w_ent=0, matcher_index=None, scope=None)[source]¶
Train Policy including policies, transition, and value_net by PPO algorithm.
- Parameters:
generated_data – generated trajectory
graph – decision graph
value_net – value net
generator_optimizers – the optimizers used to optimize node models and value net
epsilon – hyperparameter for clipping in the policy objective
lam – regularization parameter
w_ent – the weight of entropy loss
- Returns:
v_loss, p_loss, sup_loss, total_loss, generator_grad_norm
revive.algo.venv.revive_t module¶
- class revive.algo.venv.revive_t.ReplayBuffer(buffer_size)[source]¶
Bases:
object
A simple FIFO experience replay buffer for SAC agents.
- class revive.algo.venv.revive_t.TD3Operator(*args, **kwargs)[source]¶
Bases:
ReviveOperator
- NAME = 'REVIVE_TD3'¶
Name of the used algorithm.
- PARAMETER_DESCRIPTION = [{'abbreviation': 'bbs', 'default': 256, 'name': 'bc_batch_size', 'type': <class 'int'>}, {'abbreviation': 'bep', 'default': 0, 'name': 'bc_epoch', 'type': <class 'int'>}, {'default': 0.001, 'name': 'bc_lr', 'type': <class 'float'>}, {'abbreviation': 'mbs', 'default': 256, 'description': 'Batch size of training process.', 'doc': True, 'name': 'revive_batch_size', 'type': <class 'int'>}, {'abbreviation': 'mep', 'default': 1000, 'description': 'Number of epcoh for the training process', 'doc': True, 'name': 'revive_epoch', 'type': <class 'int'>}, {'abbreviation': 'bet', 'default': 1, 'doc': True, 'name': 'fintune', 'type': <class 'int'>}, {'abbreviation': 'betfre', 'default': 1, 'doc': True, 'name': 'finetune_fre', 'type': <class 'int'>}, {'abbreviation': 'dpe', 'default': 0, 'name': 'matcher_pretrain_epoch', 'type': <class 'int'>}, {'abbreviation': 'phf', 'default': 256, 'description': 'Number of neurons per layer of the policy network.', 'doc': True, 'name': 'policy_hidden_features', 'type': <class 'int'>}, {'abbreviation': 'phl', 'default': 4, 'description': 'Depth of policy network.', 'doc': True, 'name': 'policy_hidden_layers', 'type': <class 'int'>}, {'abbreviation': 'pa', 'default': 'leakyrelu', 'name': 'policy_activation', 'type': <class 'str'>}, {'abbreviation': 'pn', 'default': None, 'name': 'policy_normalization', 'type': <class 'str'>}, {'abbreviation': 'pb', 'default': 'res', 'description': 'Backbone of policy network.', 'doc': True, 'name': 'policy_backbone', 'type': <class 'str'>}, {'abbreviation': 'thf', 'default': 256, 'description': 'Number of neurons per layer of the transition network.', 'doc': True, 'name': 'transition_hidden_features', 'type': <class 'int'>}, {'abbreviation': 'thl', 'default': 4, 'doc': True, 'name': 'transition_hidden_layers', 'type': <class 'int'>}, {'abbreviation': 'ta', 'default': 'leakyrelu', 'name': 'transition_activation', 'type': <class 'str'>}, {'abbreviation': 'tn', 'default': None, 'name': 'transition_normalization', 'type': <class 'str'>}, {'abbreviation': 'tb', 'default': 'res', 'description': 'Backbone of Transition network.', 'doc': True, 'name': 'transition_backbone', 'type': <class 'str'>}, {'default': 'auto', 'name': 'matching_nodes', 'type': <class 'list'>}, {'abbreviation': 'dhf', 'default': 256, 'description': 'Number of neurons per layer of the matcher network.', 'doc': True, 'name': 'matcher_hidden_features', 'type': <class 'int'>}, {'abbreviation': 'dhl', 'default': 4, 'description': 'Depth of the matcher network.', 'doc': True, 'name': 'matcher_hidden_layers', 'type': <class 'int'>}, {'abbreviation': 'da', 'default': 'leakyrelu', 'name': 'matcher_activation', 'type': <class 'str'>}, {'abbreviation': 'dn', 'default': None, 'name': 'matcher_normalization', 'type': <class 'str'>}, {'default': 'auto', 'name': 'state_nodes', 'type': <class 'list'>}, {'abbreviation': 'vhf', 'default': 256, 'name': 'value_hidden_features', 'type': <class 'int'>}, {'abbreviation': 'vhl', 'default': 4, 'name': 'value_hidden_layers', 'type': <class 'int'>}, {'abbreviation': 'va', 'default': 'leakyrelu', 'name': 'value_activation', 'type': <class 'str'>}, {'abbreviation': 'vn', 'default': None, 'name': 'value_normalization', 'type': <class 'str'>}, {'abbreviation': 'gt', 'default': 'res', 'name': 'generator_type', 'type': <class 'str'>}, {'abbreviation': 'dt', 'default': 'res', 'name': 'matcher_type', 'type': <class 'str'>}, {'default': False, 'name': 'birnn', 'type': <class 'bool'>}, {'abbreviation': 'sas', 'default': None, 'name': 'std_adapt_strategy', 'type': <class 'str'>}, {'default': 1, 'description': 'The number of update rounds of the generator in each epoch.', 'doc': True, 'name': 'g_steps', 'search_mode': 'grid', 'search_values': [1, 3, 5], 'type': <class 'int'>}, {'default': 1, 'description': 'Number of update rounds of matcher in each epoch.', 'doc': True, 'name': 'd_steps', 'search_mode': 'grid', 'search_values': [1, 3, 5], 'type': <class 'int'>}, {'default': 4e-05, 'description': 'Initial learning rate of the generator.', 'doc': True, 'name': 'g_lr', 'search_mode': 'continuous', 'search_values': [1e-06, 0.0001], 'type': <class 'float'>}, {'default': 0.0006, 'description': 'Initial learning rate of the matcher.', 'doc': True, 'name': 'd_lr', 'search_mode': 'continuous', 'search_values': [1e-06, 0.001], 'type': <class 'float'>}, {'default': 1, 'description': 'Matcher loss length.', 'name': 'matcher_loss_length', 'type': <class 'int'>}, {'default': 1.2, 'description': 'Matcher loss high value. When the matcher_loss beyond the value, the generator would stop train', 'name': 'matcher_loss_high', 'type': <class 'float'>}, {'default': 0.6, 'description': 'Matcher loss high value. When the matcher_loss low the value, the matcher would stop train', 'name': 'matcher_loss_low', 'type': <class 'float'>}, {'default': False, 'description': 'Sample the data for tring the matcher.', 'name': 'matcher_sample', 'type': <class 'bool'>}, {'default': 0, 'description': 'reward = (1-mae_reward_weight)*matcher_reward + mae_reward_weight*mae_reward.', 'name': 'mae_reward_weight', 'type': <class 'float'>}, {'default': 0, 'description': 'Number of historical discriminators saved.', 'name': 'history_matcher_num', 'type': <class 'int'>}, {'abbreviation': 'bfs', 'default': 5000.0, 'description': 'Size of the buffer to store data.', 'doc': True, 'name': 'buffer_size', 'type': <class 'int'>}, {'abbreviation': 'tsph', 'default': 10, 'description': 'td3_steps_per_epoch.', 'doc': True, 'name': 'td3_steps_per_epoch', 'type': <class 'int'>}]¶
- setup(*args, **kwargs)¶
- generator_model_creator(config, graph)[source]¶
Create generator models.
- Parameters:
config – configuration parameters
- Returns:
all the models.
revive.algo.venv.template module¶
- class revive.algo.venv.template.AlgorithmOperator(*args, **kwargs)[source]¶
Bases:
VenvOperator
TODO 1: Define the name of this algorithm.
- NAME = ''¶
TODO 2: Define the hyper-parameters of this algorithm.
- PARAMETER_DESCRIPTION = []¶
- classmethod get_tune_parameters(config, **kargs)[source]¶
Use ray.tune to wrap the parameters to be searched.
- model_creator(config)[source]¶
Create all the models.
- Parameters:
config – configuration parameters
- Returns:
list of models
- optimizer_creator(models, config)[source]¶
Create Optimizers.
- Parameters:
models – list of all the models
config – configuration parameters
- Returns:
list of optimizers