revive.algo.venv package¶
Submodules¶
revive.algo.venv.base module¶
- class revive.algo.venv.base.VenvOperator(*args, **kwargs)[source]¶
Bases:
object
The base venv class.
- NAME = None¶
Name of the used algorithm.
- property metric_name¶
This define the metric we try to minimize with hyperparameter search.
- property nodes_models_train¶
- property other_models_train¶
- property nodes_models_val¶
- property other_models_val¶
- PARAMETER_DESCRIPTION = []¶
- classmethod get_tune_parameters(config: dict, **kargs)[source]¶
Use ray.tune to wrap the parameters to be searched.
- model_creator(config: dict, graph: DesicionGraph)[source]¶
Create all the models. The algorithm needs to define models for the nodes to be learned.
- Args:
- config
configuration parameters
- Return:
a list of models
- optimizer_creator(models: List[Module], config: dict)[source]¶
Define optimizers for the created models.
- Args:
- pmodels
list of all the models
- config
configuration parameters
- Return:
a list of optimizers
- data_creator(config: dict)[source]¶
Create DataLoaders.
- Args:
- config
configuration parameters
- Return:
(train_loader, val_loader)
- train_batch(expert_data, batch_info, scope='train')[source]¶
Define the training process for an batch data.
- validate_batch(expert_data, batch_info, scope='valEnv_on_trainData')[source]¶
Define the validate process for an batch data.
- Args:
expert_data: The batch offline Data.
batch_info: A batch info dict.
scope: if
scope=valEnv_on_trainData
means training data test on the model trained by validation dataset.
revive.algo.venv.bc module¶
- class revive.algo.venv.bc.BCOperator(*args, **kwargs)[source]¶
Bases:
VenvOperator
- NAME = 'BC'¶
Name of the used algorithm.
- PARAMETER_DESCRIPTION = [{'name': 'bc_batch_size', 'description': 'Batch size of training process.', 'abbreviation': 'bbs', 'type': <class 'int'>, 'default': 256, 'doc': True}, {'name': 'bc_epoch', 'description': 'Number of epcoh for the training process', 'abbreviation': 'bep', 'type': <class 'int'>, 'default': 500, 'doc': True}, {'name': 'bc_horizon', 'abbreviation': 'bh', 'type': <class 'int'>, 'default': 10}, {'name': 'policy_hidden_features', 'description': 'Number of neurons per layer of the policy network.', 'abbreviation': 'phf', 'type': <class 'int'>, 'default': 256, 'doc': True}, {'name': 'policy_hidden_layers', 'description': 'Depth of policy network.', 'abbreviation': 'phl', 'type': <class 'int'>, 'default': 4, 'search_mode': 'grid', 'search_values': [3, 4, 5], 'doc': True}, {'name': 'policy_activation', 'abbreviation': 'pa', 'type': <class 'str'>, 'default': 'leakyrelu'}, {'name': 'policy_normalization', 'abbreviation': 'pn', 'type': <class 'str'>, 'default': None}, {'name': 'policy_backbone', 'description': 'Backbone of policy network.', 'abbreviation': 'pb', 'type': <class 'str'>, 'default': 'res', 'search_mode': 'grid', 'search_values': ['mlp', 'res'], 'doc': True}, {'name': 'transition_hidden_features', 'abbreviation': 'thf', 'type': <class 'int'>, 'default': 256}, {'name': 'transition_hidden_layers', 'abbreviation': 'thl', 'type': <class 'int'>, 'default': 3}, {'name': 'transition_activation', 'abbreviation': 'ta', 'type': <class 'str'>, 'default': 'leakyrelu'}, {'name': 'transition_normalization', 'abbreviation': 'tn', 'type': <class 'str'>, 'default': 'ln'}, {'name': 'transition_backbone', 'description': 'Backbone of Transition network.', 'abbreviation': 'tb', 'type': <class 'str'>, 'default': 'res'}, {'name': 'g_lr', 'description': 'Initial learning rate of the training process.', 'type': <class 'float'>, 'default': 0.0001, 'search_mode': 'continuous', 'search_values': [1e-06, 0.001], 'doc': True}, {'name': 'weight_decay', 'abbreviation': 'wd', 'type': <class 'float'>, 'default': 0.0001}, {'name': 'lr_decay', 'abbreviation': 'ld', 'type': <class 'float'>, 'default': 0.99}, {'name': 'loss_type', 'description': 'Bc support different loss function("log_prob", "mae", "mse").', 'type': <class 'str'>, 'default': 'log_prob', 'doc': True}]¶
- model_creator(config: dict, graph: DesicionGraph)[source]¶
Create policies and transition, if needed.
- Parameters
config – configuration parameters
- Returns
list of all models
- optimizer_creator(models, config)[source]¶
Define optimizers for the created models.
- Args:
- pmodels
list of all the models
- config
configuration parameters
- Return:
a list of optimizers
- data_creator(config: dict)[source]¶
Create DataLoaders.
- Args:
- config
configuration parameters
- Return:
(train_loader, val_loader)
- train_epoch(*args, **kwargs)¶
revive.algo.venv.revive module¶
- class revive.algo.venv.revive.ReviveOperator(*args, **kwargs)[source]¶
Bases:
VenvOperator
- NAME = 'REVIVE'¶
Name of the used algorithm.
- matcher_model_creator(config, graph)[source]¶
Create matcher models.
- Parameters
config – configuration parameters
- Returns
all the models.
- model_creator(config, graph)[source]¶
Create all the models. The algorithm needs to define models for the nodes to be learned.
- Args:
- config
configuration parameters
- Return:
a list of models
- data_creator(config: dict)[source]¶
Create DataLoaders.
- Args:
- config
configuration parameters
- Return:
(train_loader, val_loader)
- train_epoch(*args, **kwargs)¶
revive.algo.venv.revive_p module¶
- class revive.algo.venv.revive_p.PPOOperator(*args, **kwargs)[source]¶
Bases:
ReviveOperator
- NAME = 'REVIVE_PPO'¶
Name of the used algorithm.
- PARAMETER_DESCRIPTION = [{'name': 'bc_batch_size', 'abbreviation': 'bbs', 'type': <class 'int'>, 'default': 256}, {'name': 'bc_epoch', 'abbreviation': 'bep', 'type': <class 'int'>, 'default': 0}, {'name': 'bc_lr', 'type': <class 'float'>, 'default': 0.001}, {'name': 'revive_batch_size', 'description': 'Batch size of training process.', 'abbreviation': 'mbs', 'type': <class 'int'>, 'default': 1024, 'doc': True}, {'name': 'revive_epoch', 'description': 'Number of epcoh for the training process', 'abbreviation': 'mep', 'type': <class 'int'>, 'default': 5000, 'doc': True}, {'name': 'fintune', 'abbreviation': 'bet', 'type': <class 'int'>, 'default': 1, 'doc': True}, {'name': 'finetune_fre', 'abbreviation': 'betfre', 'type': <class 'int'>, 'default': 1, 'doc': True}, {'name': 'matcher_pretrain_epoch', 'abbreviation': 'dpe', 'type': <class 'int'>, 'default': 0}, {'name': 'policy_hidden_features', 'description': 'Number of neurons per layer of the policy network.', 'abbreviation': 'phf', 'type': <class 'int'>, 'default': 256, 'doc': True}, {'name': 'policy_hidden_layers', 'description': 'Depth of policy network.', 'abbreviation': 'phl', 'type': <class 'int'>, 'default': 4, 'doc': True}, {'name': 'policy_activation', 'abbreviation': 'pa', 'type': <class 'str'>, 'default': 'leakyrelu'}, {'name': 'policy_normalization', 'abbreviation': 'pn', 'type': <class 'str'>, 'default': None}, {'name': 'policy_backbone', 'description': 'Backbone of policy network.', 'abbreviation': 'pb', 'type': <class 'str'>, 'default': 'res', 'doc': True}, {'name': 'transition_hidden_features', 'description': 'Number of neurons per layer of the transition network.', 'abbreviation': 'thf', 'type': <class 'int'>, 'default': 256, 'doc': True}, {'name': 'transition_hidden_layers', 'abbreviation': 'thl', 'type': <class 'int'>, 'default': 4, 'doc': True}, {'name': 'transition_activation', 'abbreviation': 'ta', 'type': <class 'str'>, 'default': 'leakyrelu'}, {'name': 'transition_normalization', 'abbreviation': 'tn', 'type': <class 'str'>, 'default': None}, {'name': 'transition_backbone', 'description': 'Backbone of Transition network.', 'abbreviation': 'tb', 'type': <class 'str'>, 'default': 'res', 'doc': True}, {'name': 'matching_nodes', 'type': <class 'list'>, 'default': 'auto'}, {'name': 'matching_fit_nodes', 'type': <class 'list'>, 'default': 'auto'}, {'name': 'matcher_hidden_features', 'description': 'Number of neurons per layer of the matcher network.', 'abbreviation': 'dhf', 'type': <class 'int'>, 'default': 256, 'doc': True}, {'name': 'matcher_hidden_layers', 'description': 'Depth of the matcher network.', 'abbreviation': 'dhl', 'type': <class 'int'>, 'default': 4, 'doc': True}, {'name': 'matcher_activation', 'abbreviation': 'da', 'type': <class 'str'>, 'default': 'leakyrelu'}, {'name': 'matcher_normalization', 'abbreviation': 'dn', 'type': <class 'str'>, 'default': None}, {'name': 'state_nodes', 'type': <class 'list'>, 'default': 'auto'}, {'name': 'value_hidden_features', 'abbreviation': 'vhf', 'type': <class 'int'>, 'default': 256}, {'name': 'value_hidden_layers', 'abbreviation': 'vhl', 'type': <class 'int'>, 'default': 4}, {'name': 'value_activation', 'abbreviation': 'va', 'type': <class 'str'>, 'default': 'leakyrelu'}, {'name': 'value_normalization', 'abbreviation': 'vn', 'type': <class 'str'>, 'default': None}, {'name': 'generator_type', 'abbreviation': 'gt', 'type': <class 'str'>, 'default': 'res'}, {'name': 'matcher_type', 'abbreviation': 'dt', 'type': <class 'str'>, 'default': 'res'}, {'name': 'birnn', 'type': <class 'bool'>, 'default': False}, {'name': 'std_adapt_strategy', 'abbreviation': 'sas', 'type': <class 'str'>, 'default': None}, {'name': 'generator_algo', 'abbreviation': 'ga', 'type': <class 'str'>, 'default': 'ppo'}, {'name': 'ppo_runs', 'type': <class 'int'>, 'default': 2}, {'name': 'ppo_epsilon', 'type': <class 'float'>, 'default': 0.2}, {'name': 'ppo_l2norm_cof', 'type': <class 'float'>, 'default': 0}, {'name': 'ppo_entropy_cof', 'type': <class 'float'>, 'default': 0}, {'name': 'generator_sup_cof', 'type': <class 'float'>, 'default': 0}, {'name': 'gae_gamma', 'type': <class 'float'>, 'default': 0.99}, {'name': 'gae_lambda', 'type': <class 'float'>, 'default': 0.95}, {'name': 'g_steps', 'description': 'The number of update rounds of the generator in each epoch.', 'type': <class 'int'>, 'default': 1, 'search_mode': 'grid', 'search_values': [1, 3, 5], 'doc': True}, {'name': 'd_steps', 'description': 'Number of update rounds of matcher in each epoch.', 'type': <class 'int'>, 'default': 1, 'search_mode': 'grid', 'search_values': [1, 3, 5], 'doc': True}, {'name': 'g_lr', 'description': 'Initial learning rate of the generator nodes nets.', 'type': <class 'float'>, 'default': 4e-05, 'search_mode': 'continuous', 'search_values': [1e-06, 0.0001], 'doc': True}, {'name': 'd_lr', 'description': 'Initial learning rate of the matcher.', 'type': <class 'float'>, 'default': 0.0006, 'search_mode': 'continuous', 'search_values': [1e-06, 0.001], 'doc': True}, {'name': 'matcher_loss_length', 'description': 'Matcher loss length.', 'type': <class 'int'>, 'default': 0}, {'name': 'matcher_loss_high', 'description': 'Matcher loss high value. When the matcher_loss beyond the value, the generator would stop train', 'type': <class 'float'>, 'default': 1.2}, {'name': 'matcher_loss_low', 'description': 'Matcher loss high value. When the matcher_loss low the value, the matcher would stop train', 'type': <class 'float'>, 'default': 0.3}, {'name': 'matcher_sample', 'description': 'Sample the data for tring the matcher.', 'type': <class 'bool'>, 'default': False}, {'name': 'mae_reward_weight', 'description': 'reward = (1-mae_reward_weight)*matcher_reward + mae_reward_weight*mae_reward.', 'type': <class 'float'>, 'default': 0.25}, {'name': 'history_matcher_num', 'description': 'Number of historical discriminators saved.', 'type': <class 'int'>, 'default': 0}, {'name': 'history_matcher_save_epochs', 'description': 'History matcher save epochs.', 'type': <class 'int'>, 'default': 100}, {'name': 'generator_data_repeat', 'description': 'Repeat rollout more data to train generator.', 'type': <class 'int'>, 'default': 1}, {'name': 'rnn_hidden_features', 'description': 'RNN hidden dims', 'type': <class 'int'>, 'default': 64}, {'name': 'window_size', 'description': 'length of the sliding_window in RNN', 'type': <class 'int'>, 'default': 0}, {'name': 'bc_weight_decay', 'description': 'weight_decay in bc finetune', 'type': <class 'float'>, 'default': 0.0001, 'search_mode': 'continuous', 'search_values': [1e-05, 0.001], 'doc': True}]¶
- generator_model_creator(config, graph)[source]¶
Create generator models.
- Parameters
config – configuration parameters
- Returns
all the models.
- optimizer_creator(models, config)[source]¶
Optimizer creator including generator optimizers and matcher optimizers.
- Parameters
models – node models, matcher, value_net
config – configuration parameters
- Returns
generator_optimizer, matcher_optimizer
- ADV(reward, mask, value, gamma, lam, use_gae=True)[source]¶
Compute advantage function for PPO.
- Parameters
reward – rewards of each step
mask – mask is 1 if the trajectory done, else 0
value – value for each state
gamma – discount factor
lam – GAE lamda
use_gae – True or False
- Returns
advantages and new value
- PPO_step(generated_data, graph, value_net, generator_optimizers, matcher, other_generator_optimizers, epsilon=0.1, lam=0, w_ent=0, matcher_index=None)[source]¶
Train Policy including policies, transition, and value_net by PPO algorithm.
- Parameters
generated_data – generated trajectory
graph – decision graph
value_net – value net
generator_optimizers – the optimizers used to optimize node models and value net
epsilon – hyperparameter for clipping in the policy objective
lam – regularization parameter
w_ent – the weight of entropy loss
- Returns
v_loss, p_loss, sup_loss, total_loss, generator_grad_norm
revive.algo.venv.revive_t module¶
revive.algo.venv.template module¶
- class revive.algo.venv.template.AlgorithmOperator(*args, **kwargs)[source]¶
Bases:
VenvOperator
TODO 1: Define the name of this algorithm.
- NAME = ''¶
TODO 2: Define the hyper-parameters of this algorithm.
- PARAMETER_DESCRIPTION = []¶
- classmethod get_tune_parameters(config, **kargs)[source]¶
Use ray.tune to wrap the parameters to be searched.
- model_creator(config)[source]¶
Create all the models.
- Parameters
config – configuration parameters
- Returns
list of models
- optimizer_creator(models, config)[source]¶
Create Optimizers.
- Parameters
models – list of all the models
config – configuration parameters
- Returns
list of optimizers