revive.algo.venv package¶

Submodules¶

revive.algo.venv.base module¶

revive.algo.venv.base.catch_error(func)[source]¶: push the training error message to data buffer

class revive.algo.venv.base.VenvOperator(*args, **kwargs)[source]¶

Bases: object

The base venv class.

NAME = None¶: Name of the used algorithm.

property metric_name¶: This define the metric we try to minimize with hyperparameter search.

property nodes_models_train¶

property other_models_train¶

property nodes_models_val¶

property other_models_val¶

PARAMETER_DESCRIPTION = []¶

classmethod get_parameters(command=None, **kargs)[source]¶

classmethod get_tune_parameters(config: dict, **kargs)[source]¶: Use ray.tune to wrap the parameters to be searched.

model_creator(config: dict, graph: DesicionGraph)[source]¶

Create all the models. The algorithm needs to define models for the nodes to be learned.

Args:

config: configuration parameters

Return:

a list of models

optimizer_creator(models: List[Module], config: dict)[source]¶

Define optimizers for the created models.

Args:

pmodels: list of all the models
config: configuration parameters

Return:

a list of optimizers

data_creator(config: dict)[source]¶

Create DataLoaders.

Args:

config: configuration parameters

Return:

(train_loader, val_loader)

nan_in_grad()[source]¶

before_train_epoch(*args, **kwargs)[source]¶

train_epoch(*args, **kwargs)[source]¶

validate(*args, **kwargs)[source]¶

train_batch(expert_data, batch_info, scope='train')[source]¶: Define the training process for an batch data.

validate_batch(expert_data, batch_info, scope='valEnv_on_trainData')[source]¶

Define the validate process for an batch data.

Args:

expert_data: The batch offline Data.

batch_info: A batch info dict.

scope: if scope=valEnv_on_trainData means training data test on the model trained by validation dataset.

class revive.algo.venv.base.VenvAlgorithm(algo: str, workspace: Optional[str] = None)[source]¶

Bases: object

Class use to manage venv algorithms

get_train_func(config)[source]¶

get_trainer(config)[source]¶

get_trainable(config)[source]¶

get_parameters(command=None)[source]¶

get_tune_parameters(config)[source]¶

revive.algo.venv.bc module¶

class revive.algo.venv.bc.BCOperator(*args, **kwargs)[source]¶

Bases: VenvOperator

NAME = 'BC'¶: Name of the used algorithm.

PARAMETER_DESCRIPTION = [{'name': 'bc_batch_size', 'description': 'Batch size of training process.', 'abbreviation': 'bbs', 'type': <class 'int'>, 'default': 256, 'doc': True}, {'name': 'bc_epoch', 'description': 'Number of epcoh for the training process', 'abbreviation': 'bep', 'type': <class 'int'>, 'default': 500, 'doc': True}, {'name': 'bc_horizon', 'abbreviation': 'bh', 'type': <class 'int'>, 'default': 10}, {'name': 'policy_hidden_features', 'description': 'Number of neurons per layer of the policy network.', 'abbreviation': 'phf', 'type': <class 'int'>, 'default': 256, 'doc': True}, {'name': 'policy_hidden_layers', 'description': 'Depth of policy network.', 'abbreviation': 'phl', 'type': <class 'int'>, 'default': 4, 'search_mode': 'grid', 'search_values': [3, 4, 5], 'doc': True}, {'name': 'policy_activation', 'abbreviation': 'pa', 'type': <class 'str'>, 'default': 'leakyrelu'}, {'name': 'policy_normalization', 'abbreviation': 'pn', 'type': <class 'str'>, 'default': None}, {'name': 'policy_backbone', 'description': 'Backbone of policy network.', 'abbreviation': 'pb', 'type': <class 'str'>, 'default': 'res', 'search_mode': 'grid', 'search_values': ['mlp', 'res'], 'doc': True}, {'name': 'transition_hidden_features', 'abbreviation': 'thf', 'type': <class 'int'>, 'default': 256}, {'name': 'transition_hidden_layers', 'abbreviation': 'thl', 'type': <class 'int'>, 'default': 3}, {'name': 'transition_activation', 'abbreviation': 'ta', 'type': <class 'str'>, 'default': 'leakyrelu'}, {'name': 'transition_normalization', 'abbreviation': 'tn', 'type': <class 'str'>, 'default': 'ln'}, {'name': 'transition_backbone', 'description': 'Backbone of Transition network.', 'abbreviation': 'tb', 'type': <class 'str'>, 'default': 'res'}, {'name': 'g_lr', 'description': 'Initial learning rate of the training process.', 'type': <class 'float'>, 'default': 0.0001, 'search_mode': 'continuous', 'search_values': [1e-06, 0.001], 'doc': True}, {'name': 'weight_decay', 'abbreviation': 'wd', 'type': <class 'float'>, 'default': 0.0001}, {'name': 'lr_decay', 'abbreviation': 'ld', 'type': <class 'float'>, 'default': 0.99}, {'name': 'loss_type', 'description': 'Bc support different loss function("log_prob", "mae", "mse").', 'type': <class 'str'>, 'default': 'log_prob', 'doc': True}]¶

model_creator(config: dict, graph: DesicionGraph)[source]¶

Create policies and transition, if needed.

Parameters: config – configuration parameters
Returns: list of all models

optimizer_creator(models, config)[source]¶

Define optimizers for the created models.

Args:

pmodels: list of all the models
config: configuration parameters

Return:

a list of optimizers

data_creator(config: dict)[source]¶

Create DataLoaders.

Args:

config: configuration parameters

Return:

(train_loader, val_loader)

train_epoch(*args, **kwargs)¶

train_batch(expert_data, batch_info, scope='train')[source]¶: Define the training process for an batch data.

revive.algo.venv.revive module¶

class revive.algo.venv.revive.ReviveOperator(*args, **kwargs)[source]¶

Bases: VenvOperator

NAME = 'REVIVE'¶: Name of the used algorithm.

matcher_model_creator(config, graph)[source]¶

Create matcher models.

Parameters: config – configuration parameters
Returns: all the models.

model_creator(config, graph)[source]¶

Create all the models. The algorithm needs to define models for the nodes to be learned.

Args:

config: configuration parameters

Return:

a list of models

data_creator(config: dict)[source]¶

Create DataLoaders.

Args:

config: configuration parameters

Return:

(train_loader, val_loader)

switch_data_loader()[source]¶

bc_train_batch(expert_data, batch_info, scope='train', loss_type='nll')[source]¶

train_epoch(*args, **kwargs)¶

revive.algo.venv.revive_p module¶

class revive.algo.venv.revive_p.PPOOperator(*args, **kwargs)[source]¶

Bases: ReviveOperator

NAME = 'REVIVE_PPO'¶: Name of the used algorithm.

PARAMETER_DESCRIPTION = [{'name': 'bc_batch_size', 'abbreviation': 'bbs', 'type': <class 'int'>, 'default': 256}, {'name': 'bc_epoch', 'abbreviation': 'bep', 'type': <class 'int'>, 'default': 0}, {'name': 'bc_lr', 'type': <class 'float'>, 'default': 0.001}, {'name': 'revive_batch_size', 'description': 'Batch size of training process.', 'abbreviation': 'mbs', 'type': <class 'int'>, 'default': 1024, 'doc': True}, {'name': 'revive_epoch', 'description': 'Number of epcoh for the training process', 'abbreviation': 'mep', 'type': <class 'int'>, 'default': 5000, 'doc': True}, {'name': 'fintune', 'abbreviation': 'bet', 'type': <class 'int'>, 'default': 1, 'doc': True}, {'name': 'finetune_fre', 'abbreviation': 'betfre', 'type': <class 'int'>, 'default': 1, 'doc': True}, {'name': 'matcher_pretrain_epoch', 'abbreviation': 'dpe', 'type': <class 'int'>, 'default': 0}, {'name': 'policy_hidden_features', 'description': 'Number of neurons per layer of the policy network.', 'abbreviation': 'phf', 'type': <class 'int'>, 'default': 256, 'doc': True}, {'name': 'policy_hidden_layers', 'description': 'Depth of policy network.', 'abbreviation': 'phl', 'type': <class 'int'>, 'default': 4, 'doc': True}, {'name': 'policy_activation', 'abbreviation': 'pa', 'type': <class 'str'>, 'default': 'leakyrelu'}, {'name': 'policy_normalization', 'abbreviation': 'pn', 'type': <class 'str'>, 'default': None}, {'name': 'policy_backbone', 'description': 'Backbone of policy network.', 'abbreviation': 'pb', 'type': <class 'str'>, 'default': 'res', 'doc': True}, {'name': 'transition_hidden_features', 'description': 'Number of neurons per layer of the transition network.', 'abbreviation': 'thf', 'type': <class 'int'>, 'default': 256, 'doc': True}, {'name': 'transition_hidden_layers', 'abbreviation': 'thl', 'type': <class 'int'>, 'default': 4, 'doc': True}, {'name': 'transition_activation', 'abbreviation': 'ta', 'type': <class 'str'>, 'default': 'leakyrelu'}, {'name': 'transition_normalization', 'abbreviation': 'tn', 'type': <class 'str'>, 'default': None}, {'name': 'transition_backbone', 'description': 'Backbone of Transition network.', 'abbreviation': 'tb', 'type': <class 'str'>, 'default': 'res', 'doc': True}, {'name': 'matching_nodes', 'type': <class 'list'>, 'default': 'auto'}, {'name': 'matching_fit_nodes', 'type': <class 'list'>, 'default': 'auto'}, {'name': 'matcher_hidden_features', 'description': 'Number of neurons per layer of the matcher network.', 'abbreviation': 'dhf', 'type': <class 'int'>, 'default': 256, 'doc': True}, {'name': 'matcher_hidden_layers', 'description': 'Depth of the matcher network.', 'abbreviation': 'dhl', 'type': <class 'int'>, 'default': 4, 'doc': True}, {'name': 'matcher_activation', 'abbreviation': 'da', 'type': <class 'str'>, 'default': 'leakyrelu'}, {'name': 'matcher_normalization', 'abbreviation': 'dn', 'type': <class 'str'>, 'default': None}, {'name': 'state_nodes', 'type': <class 'list'>, 'default': 'auto'}, {'name': 'value_hidden_features', 'abbreviation': 'vhf', 'type': <class 'int'>, 'default': 256}, {'name': 'value_hidden_layers', 'abbreviation': 'vhl', 'type': <class 'int'>, 'default': 4}, {'name': 'value_activation', 'abbreviation': 'va', 'type': <class 'str'>, 'default': 'leakyrelu'}, {'name': 'value_normalization', 'abbreviation': 'vn', 'type': <class 'str'>, 'default': None}, {'name': 'generator_type', 'abbreviation': 'gt', 'type': <class 'str'>, 'default': 'res'}, {'name': 'matcher_type', 'abbreviation': 'dt', 'type': <class 'str'>, 'default': 'res'}, {'name': 'birnn', 'type': <class 'bool'>, 'default': False}, {'name': 'std_adapt_strategy', 'abbreviation': 'sas', 'type': <class 'str'>, 'default': None}, {'name': 'generator_algo', 'abbreviation': 'ga', 'type': <class 'str'>, 'default': 'ppo'}, {'name': 'ppo_runs', 'type': <class 'int'>, 'default': 2}, {'name': 'ppo_epsilon', 'type': <class 'float'>, 'default': 0.2}, {'name': 'ppo_l2norm_cof', 'type': <class 'float'>, 'default': 0}, {'name': 'ppo_entropy_cof', 'type': <class 'float'>, 'default': 0}, {'name': 'generator_sup_cof', 'type': <class 'float'>, 'default': 0}, {'name': 'gae_gamma', 'type': <class 'float'>, 'default': 0.99}, {'name': 'gae_lambda', 'type': <class 'float'>, 'default': 0.95}, {'name': 'g_steps', 'description': 'The number of update rounds of the generator in each epoch.', 'type': <class 'int'>, 'default': 1, 'search_mode': 'grid', 'search_values': [1, 3, 5], 'doc': True}, {'name': 'd_steps', 'description': 'Number of update rounds of matcher in each epoch.', 'type': <class 'int'>, 'default': 1, 'search_mode': 'grid', 'search_values': [1, 3, 5], 'doc': True}, {'name': 'g_lr', 'description': 'Initial learning rate of the generator nodes nets.', 'type': <class 'float'>, 'default': 4e-05, 'search_mode': 'continuous', 'search_values': [1e-06, 0.0001], 'doc': True}, {'name': 'd_lr', 'description': 'Initial learning rate of the matcher.', 'type': <class 'float'>, 'default': 0.0006, 'search_mode': 'continuous', 'search_values': [1e-06, 0.001], 'doc': True}, {'name': 'matcher_loss_length', 'description': 'Matcher loss length.', 'type': <class 'int'>, 'default': 0}, {'name': 'matcher_loss_high', 'description': 'Matcher loss high value. When the matcher_loss beyond the value, the generator would stop train', 'type': <class 'float'>, 'default': 1.2}, {'name': 'matcher_loss_low', 'description': 'Matcher loss high value. When the matcher_loss low the value, the matcher would stop train', 'type': <class 'float'>, 'default': 0.3}, {'name': 'matcher_sample', 'description': 'Sample the data for tring the matcher.', 'type': <class 'bool'>, 'default': False}, {'name': 'mae_reward_weight', 'description': 'reward = (1-mae_reward_weight)*matcher_reward + mae_reward_weight*mae_reward.', 'type': <class 'float'>, 'default': 0.25}, {'name': 'history_matcher_num', 'description': 'Number of historical discriminators saved.', 'type': <class 'int'>, 'default': 0}, {'name': 'history_matcher_save_epochs', 'description': 'History matcher save epochs.', 'type': <class 'int'>, 'default': 100}, {'name': 'generator_data_repeat', 'description': 'Repeat rollout more data to train generator.', 'type': <class 'int'>, 'default': 1}, {'name': 'rnn_hidden_features', 'description': 'RNN hidden dims', 'type': <class 'int'>, 'default': 64}, {'name': 'window_size', 'description': 'length of the sliding_window in RNN', 'type': <class 'int'>, 'default': 0}, {'name': 'bc_weight_decay', 'description': 'weight_decay in bc finetune', 'type': <class 'float'>, 'default': 0.0001, 'search_mode': 'continuous', 'search_values': [1e-05, 0.001], 'doc': True}]¶

generator_model_creator(config, graph)[source]¶

Create generator models.

Parameters: config – configuration parameters
Returns: all the models.

optimizer_creator(models, config)[source]¶

Optimizer creator including generator optimizers and matcher optimizers.

Parameters

models – node models, matcher, value_net
config – configuration parameters

Returns

generator_optimizer, matcher_optimizer

ADV(reward, mask, value, gamma, lam, use_gae=True)[source]¶

Compute advantage function for PPO.

Parameters

reward – rewards of each step
mask – mask is 1 if the trajectory done, else 0
value – value for each state
gamma – discount factor
lam – GAE lamda
use_gae – True or False

Returns

advantages and new value

PPO_step(generated_data, graph, value_net, generator_optimizers, matcher, other_generator_optimizers, epsilon=0.1, lam=0, w_ent=0, matcher_index=None)[source]¶

Train Policy including policies, transition, and value_net by PPO algorithm.

Parameters

generated_data – generated trajectory
graph – decision graph
value_net – value net
generator_optimizers – the optimizers used to optimize node models and value net
epsilon – hyperparameter for clipping in the policy objective
lam – regularization parameter
w_ent – the weight of entropy loss

Returns

v_loss, p_loss, sup_loss, total_loss, generator_grad_norm

revive.algo.venv.revive_t module¶

revive.algo.venv.template module¶

class revive.algo.venv.template.AlgorithmOperator(*args, **kwargs)[source]¶

Bases: VenvOperator

TODO 1: Define the name of this algorithm.

NAME = ''¶: TODO 2: Define the hyper-parameters of this algorithm.

PARAMETER_DESCRIPTION = []¶

classmethod get_parameters(command=None, **kargs)[source]¶

classmethod get_tune_parameters(config, **kargs)[source]¶: Use ray.tune to wrap the parameters to be searched.

model_creator(config)[source]¶

Create all the models.

Parameters: config – configuration parameters
Returns: list of models

optimizer_creator(models, config)[source]¶

Create Optimizers.

Parameters

models – list of all the models
config – configuration parameters

Returns

list of optimizers

data_creator(config)[source]¶

Create DataLoaders.

Parameters: config – configuration parameters
Returns: train_loader and val_loader

train_epoch(iterator, info)[source]¶

train_batch(expert_data, batch_info, scope)[source]¶: Define the training process for an batch data.

revive.algo.venv package¶

Submodules¶

revive.algo.venv.base module¶

revive.algo.venv.bc module¶

revive.algo.venv.revive module¶

revive.algo.venv.revive_p module¶

revive.algo.venv.revive_t module¶

revive.algo.venv.template module¶

Module contents¶