revive.algo.venv package

Submodules

revive.algo.venv.base module

revive.algo.venv.base.catch_error(func)[source]

push the training error message to data buffer

class revive.algo.venv.base.VenvOperator(*args, **kwargs)[source]

Bases: object

The base venv class.

NAME = None

Name of the used algorithm.

property metric_name

This define the metric we try to minimize with hyperparameter search.

property nodes_models_train
property other_models_train
property nodes_models_val
property other_models_val
PARAMETER_DESCRIPTION = []
classmethod get_parameters(command=None, **kargs)[source]
classmethod get_tune_parameters(config: dict, **kargs)[source]

Use ray.tune to wrap the parameters to be searched.

model_creator(config: dict, graph: DesicionGraph)[source]

Create all the models. The algorithm needs to define models for the nodes to be learned.

Args:
config

configuration parameters

Return:

a list of models

optimizer_creator(models: List[Module], config: dict)[source]

Define optimizers for the created models.

Args:
pmodels

list of all the models

config

configuration parameters

Return:

a list of optimizers

data_creator(config: dict)[source]

Create DataLoaders.

Args:
config

configuration parameters

Return:

(train_loader, val_loader)

nan_in_grad()[source]
before_train_epoch(*args, **kwargs)[source]
train_epoch(*args, **kwargs)[source]
validate(*args, **kwargs)[source]
train_batch(expert_data, batch_info, scope='train')[source]

Define the training process for an batch data.

validate_batch(expert_data, batch_info, scope='valEnv_on_trainData')[source]

Define the validate process for an batch data.

Args:

expert_data: The batch offline Data.

batch_info: A batch info dict.

scope: if scope=valEnv_on_trainData means training data test on the model trained by validation dataset.

class revive.algo.venv.base.VenvAlgorithm(algo: str, workspace: Optional[str] = None)[source]

Bases: object

Class use to manage venv algorithms

get_train_func(config)[source]
get_trainer(config)[source]
get_trainable(config)[source]
get_parameters(command=None)[source]
get_tune_parameters(config)[source]

revive.algo.venv.bc module

class revive.algo.venv.bc.BCOperator(*args, **kwargs)[source]

Bases: VenvOperator

NAME = 'BC'

Name of the used algorithm.

PARAMETER_DESCRIPTION = [{'name': 'bc_batch_size', 'description': 'Batch size of training process.', 'abbreviation': 'bbs', 'type': <class 'int'>, 'default': 256, 'doc': True}, {'name': 'bc_epoch', 'description': 'Number of epcoh for the training process', 'abbreviation': 'bep', 'type': <class 'int'>, 'default': 500, 'doc': True}, {'name': 'bc_horizon', 'abbreviation': 'bh', 'type': <class 'int'>, 'default': 10}, {'name': 'policy_hidden_features', 'description': 'Number of neurons per layer of the policy network.', 'abbreviation': 'phf', 'type': <class 'int'>, 'default': 256, 'doc': True}, {'name': 'policy_hidden_layers', 'description': 'Depth of policy network.', 'abbreviation': 'phl', 'type': <class 'int'>, 'default': 4, 'search_mode': 'grid', 'search_values': [3, 4, 5], 'doc': True}, {'name': 'policy_activation', 'abbreviation': 'pa', 'type': <class 'str'>, 'default': 'leakyrelu'}, {'name': 'policy_normalization', 'abbreviation': 'pn', 'type': <class 'str'>, 'default': None}, {'name': 'policy_backbone', 'description': 'Backbone of policy network.', 'abbreviation': 'pb', 'type': <class 'str'>, 'default': 'res', 'search_mode': 'grid', 'search_values': ['mlp', 'res'], 'doc': True}, {'name': 'transition_hidden_features', 'abbreviation': 'thf', 'type': <class 'int'>, 'default': 256}, {'name': 'transition_hidden_layers', 'abbreviation': 'thl', 'type': <class 'int'>, 'default': 3}, {'name': 'transition_activation', 'abbreviation': 'ta', 'type': <class 'str'>, 'default': 'leakyrelu'}, {'name': 'transition_normalization', 'abbreviation': 'tn', 'type': <class 'str'>, 'default': 'ln'}, {'name': 'transition_backbone', 'description': 'Backbone of Transition network.', 'abbreviation': 'tb', 'type': <class 'str'>, 'default': 'res'}, {'name': 'g_lr', 'description': 'Initial learning rate of the training process.', 'type': <class 'float'>, 'default': 0.0001, 'search_mode': 'continuous', 'search_values': [1e-06, 0.001], 'doc': True}, {'name': 'weight_decay', 'abbreviation': 'wd', 'type': <class 'float'>, 'default': 0.0001}, {'name': 'lr_decay', 'abbreviation': 'ld', 'type': <class 'float'>, 'default': 0.99}, {'name': 'loss_type', 'description': 'Bc support different loss function("log_prob", "mae", "mse").', 'type': <class 'str'>, 'default': 'log_prob', 'doc': True}]
model_creator(config: dict, graph: DesicionGraph)[source]

Create policies and transition, if needed.

Parameters

config – configuration parameters

Returns

list of all models

optimizer_creator(models, config)[source]

Define optimizers for the created models.

Args:
pmodels

list of all the models

config

configuration parameters

Return:

a list of optimizers

data_creator(config: dict)[source]

Create DataLoaders.

Args:
config

configuration parameters

Return:

(train_loader, val_loader)

train_epoch(*args, **kwargs)
train_batch(expert_data, batch_info, scope='train')[source]

Define the training process for an batch data.

revive.algo.venv.revive module

class revive.algo.venv.revive.ReviveOperator(*args, **kwargs)[source]

Bases: VenvOperator

NAME = 'REVIVE'

Name of the used algorithm.

matcher_model_creator(config, graph)[source]

Create matcher models.

Parameters

config – configuration parameters

Returns

all the models.

model_creator(config, graph)[source]

Create all the models. The algorithm needs to define models for the nodes to be learned.

Args:
config

configuration parameters

Return:

a list of models

data_creator(config: dict)[source]

Create DataLoaders.

Args:
config

configuration parameters

Return:

(train_loader, val_loader)

switch_data_loader()[source]
bc_train_batch(expert_data, batch_info, scope='train', loss_type='nll')[source]
train_epoch(*args, **kwargs)

revive.algo.venv.revive_p module

class revive.algo.venv.revive_p.PPOOperator(*args, **kwargs)[source]

Bases: ReviveOperator

NAME = 'REVIVE_PPO'

Name of the used algorithm.

PARAMETER_DESCRIPTION = [{'name': 'bc_batch_size', 'abbreviation': 'bbs', 'type': <class 'int'>, 'default': 256}, {'name': 'bc_epoch', 'abbreviation': 'bep', 'type': <class 'int'>, 'default': 0}, {'name': 'bc_lr', 'type': <class 'float'>, 'default': 0.001}, {'name': 'revive_batch_size', 'description': 'Batch size of training process.', 'abbreviation': 'mbs', 'type': <class 'int'>, 'default': 1024, 'doc': True}, {'name': 'revive_epoch', 'description': 'Number of epcoh for the training process', 'abbreviation': 'mep', 'type': <class 'int'>, 'default': 5000, 'doc': True}, {'name': 'fintune', 'abbreviation': 'bet', 'type': <class 'int'>, 'default': 1, 'doc': True}, {'name': 'finetune_fre', 'abbreviation': 'betfre', 'type': <class 'int'>, 'default': 1, 'doc': True}, {'name': 'matcher_pretrain_epoch', 'abbreviation': 'dpe', 'type': <class 'int'>, 'default': 0}, {'name': 'policy_hidden_features', 'description': 'Number of neurons per layer of the policy network.', 'abbreviation': 'phf', 'type': <class 'int'>, 'default': 256, 'doc': True}, {'name': 'policy_hidden_layers', 'description': 'Depth of policy network.', 'abbreviation': 'phl', 'type': <class 'int'>, 'default': 4, 'doc': True}, {'name': 'policy_activation', 'abbreviation': 'pa', 'type': <class 'str'>, 'default': 'leakyrelu'}, {'name': 'policy_normalization', 'abbreviation': 'pn', 'type': <class 'str'>, 'default': None}, {'name': 'policy_backbone', 'description': 'Backbone of policy network.', 'abbreviation': 'pb', 'type': <class 'str'>, 'default': 'res', 'doc': True}, {'name': 'transition_hidden_features', 'description': 'Number of neurons per layer of the transition network.', 'abbreviation': 'thf', 'type': <class 'int'>, 'default': 256, 'doc': True}, {'name': 'transition_hidden_layers', 'abbreviation': 'thl', 'type': <class 'int'>, 'default': 4, 'doc': True}, {'name': 'transition_activation', 'abbreviation': 'ta', 'type': <class 'str'>, 'default': 'leakyrelu'}, {'name': 'transition_normalization', 'abbreviation': 'tn', 'type': <class 'str'>, 'default': None}, {'name': 'transition_backbone', 'description': 'Backbone of Transition network.', 'abbreviation': 'tb', 'type': <class 'str'>, 'default': 'res', 'doc': True}, {'name': 'matching_nodes', 'type': <class 'list'>, 'default': 'auto'}, {'name': 'matching_fit_nodes', 'type': <class 'list'>, 'default': 'auto'}, {'name': 'matcher_hidden_features', 'description': 'Number of neurons per layer of the matcher network.', 'abbreviation': 'dhf', 'type': <class 'int'>, 'default': 256, 'doc': True}, {'name': 'matcher_hidden_layers', 'description': 'Depth of the matcher network.', 'abbreviation': 'dhl', 'type': <class 'int'>, 'default': 4, 'doc': True}, {'name': 'matcher_activation', 'abbreviation': 'da', 'type': <class 'str'>, 'default': 'leakyrelu'}, {'name': 'matcher_normalization', 'abbreviation': 'dn', 'type': <class 'str'>, 'default': None}, {'name': 'state_nodes', 'type': <class 'list'>, 'default': 'auto'}, {'name': 'value_hidden_features', 'abbreviation': 'vhf', 'type': <class 'int'>, 'default': 256}, {'name': 'value_hidden_layers', 'abbreviation': 'vhl', 'type': <class 'int'>, 'default': 4}, {'name': 'value_activation', 'abbreviation': 'va', 'type': <class 'str'>, 'default': 'leakyrelu'}, {'name': 'value_normalization', 'abbreviation': 'vn', 'type': <class 'str'>, 'default': None}, {'name': 'generator_type', 'abbreviation': 'gt', 'type': <class 'str'>, 'default': 'res'}, {'name': 'matcher_type', 'abbreviation': 'dt', 'type': <class 'str'>, 'default': 'res'}, {'name': 'birnn', 'type': <class 'bool'>, 'default': False}, {'name': 'std_adapt_strategy', 'abbreviation': 'sas', 'type': <class 'str'>, 'default': None}, {'name': 'generator_algo', 'abbreviation': 'ga', 'type': <class 'str'>, 'default': 'ppo'}, {'name': 'ppo_runs', 'type': <class 'int'>, 'default': 2}, {'name': 'ppo_epsilon', 'type': <class 'float'>, 'default': 0.2}, {'name': 'ppo_l2norm_cof', 'type': <class 'float'>, 'default': 0}, {'name': 'ppo_entropy_cof', 'type': <class 'float'>, 'default': 0}, {'name': 'generator_sup_cof', 'type': <class 'float'>, 'default': 0}, {'name': 'gae_gamma', 'type': <class 'float'>, 'default': 0.99}, {'name': 'gae_lambda', 'type': <class 'float'>, 'default': 0.95}, {'name': 'g_steps', 'description': 'The number of update rounds of the generator in each epoch.', 'type': <class 'int'>, 'default': 1, 'search_mode': 'grid', 'search_values': [1, 3, 5], 'doc': True}, {'name': 'd_steps', 'description': 'Number of update rounds of matcher in each epoch.', 'type': <class 'int'>, 'default': 1, 'search_mode': 'grid', 'search_values': [1, 3, 5], 'doc': True}, {'name': 'g_lr', 'description': 'Initial learning rate of the generator nodes nets.', 'type': <class 'float'>, 'default': 4e-05, 'search_mode': 'continuous', 'search_values': [1e-06, 0.0001], 'doc': True}, {'name': 'd_lr', 'description': 'Initial learning rate of the matcher.', 'type': <class 'float'>, 'default': 0.0006, 'search_mode': 'continuous', 'search_values': [1e-06, 0.001], 'doc': True}, {'name': 'matcher_loss_length', 'description': 'Matcher loss length.', 'type': <class 'int'>, 'default': 0}, {'name': 'matcher_loss_high', 'description': 'Matcher loss high value. When the matcher_loss beyond the value, the generator would stop train', 'type': <class 'float'>, 'default': 1.2}, {'name': 'matcher_loss_low', 'description': 'Matcher loss high value. When the matcher_loss low the value, the matcher would stop train', 'type': <class 'float'>, 'default': 0.3}, {'name': 'matcher_sample', 'description': 'Sample the data for tring the matcher.', 'type': <class 'bool'>, 'default': False}, {'name': 'mae_reward_weight', 'description': 'reward = (1-mae_reward_weight)*matcher_reward + mae_reward_weight*mae_reward.', 'type': <class 'float'>, 'default': 0.25}, {'name': 'history_matcher_num', 'description': 'Number of historical discriminators saved.', 'type': <class 'int'>, 'default': 0}, {'name': 'history_matcher_save_epochs', 'description': 'History matcher save epochs.', 'type': <class 'int'>, 'default': 100}, {'name': 'generator_data_repeat', 'description': 'Repeat rollout more data to train generator.', 'type': <class 'int'>, 'default': 1}, {'name': 'rnn_hidden_features', 'description': 'RNN hidden dims', 'type': <class 'int'>, 'default': 64}, {'name': 'window_size', 'description': 'length of the sliding_window in RNN', 'type': <class 'int'>, 'default': 0}, {'name': 'bc_weight_decay', 'description': 'weight_decay in bc finetune', 'type': <class 'float'>, 'default': 0.0001, 'search_mode': 'continuous', 'search_values': [1e-05, 0.001], 'doc': True}]
generator_model_creator(config, graph)[source]

Create generator models.

Parameters

config – configuration parameters

Returns

all the models.

optimizer_creator(models, config)[source]

Optimizer creator including generator optimizers and matcher optimizers.

Parameters
  • models – node models, matcher, value_net

  • config – configuration parameters

Returns

generator_optimizer, matcher_optimizer

ADV(reward, mask, value, gamma, lam, use_gae=True)[source]

Compute advantage function for PPO.

Parameters
  • reward – rewards of each step

  • mask – mask is 1 if the trajectory done, else 0

  • value – value for each state

  • gamma – discount factor

  • lam – GAE lamda

  • use_gae – True or False

Returns

advantages and new value

PPO_step(generated_data, graph, value_net, generator_optimizers, matcher, other_generator_optimizers, epsilon=0.1, lam=0, w_ent=0, matcher_index=None)[source]

Train Policy including policies, transition, and value_net by PPO algorithm.

Parameters
  • generated_data – generated trajectory

  • graph – decision graph

  • value_net – value net

  • generator_optimizers – the optimizers used to optimize node models and value net

  • epsilon – hyperparameter for clipping in the policy objective

  • lam – regularization parameter

  • w_ent – the weight of entropy loss

Returns

v_loss, p_loss, sup_loss, total_loss, generator_grad_norm

revive.algo.venv.revive_t module

revive.algo.venv.template module

class revive.algo.venv.template.AlgorithmOperator(*args, **kwargs)[source]

Bases: VenvOperator

TODO 1: Define the name of this algorithm.

NAME = ''

TODO 2: Define the hyper-parameters of this algorithm.

PARAMETER_DESCRIPTION = []
classmethod get_parameters(command=None, **kargs)[source]
classmethod get_tune_parameters(config, **kargs)[source]

Use ray.tune to wrap the parameters to be searched.

model_creator(config)[source]

Create all the models.

Parameters

config – configuration parameters

Returns

list of models

optimizer_creator(models, config)[source]

Create Optimizers.

Parameters
  • models – list of all the models

  • config – configuration parameters

Returns

list of optimizers

data_creator(config)[source]

Create DataLoaders.

Parameters

config – configuration parameters

Returns

train_loader and val_loader

train_epoch(iterator, info)[source]
train_batch(expert_data, batch_info, scope)[source]

Define the training process for an batch data.

Module contents