revive.algo.venv package¶

Submodules¶

revive.algo.venv.base module¶

revive.algo.venv.base.catch_error(func)[source]¶: push the training error message to data buffer

class revive.algo.venv.base.VenvOperator(*args, **kwargs)[source]¶

Bases: object

The base venv class.validate_epoch

NAME = None¶: Name of the used algorithm.

property metric_name¶: This define the metric we try to minimize with hyperparameter search.

property nodes_models_train¶

property other_models_train¶

property nodes_models_val¶

property other_models_val¶

PARAMETER_DESCRIPTION = []¶

classmethod get_parameters(command=None, **kargs)[source]¶

classmethod get_tune_parameters(config: dict, **kargs)[source]¶: Use ray.tune to wrap the parameters to be searched.

model_creator(config: dict, graph: DesicionGraph)[source]¶

Create all the models. The algorithm needs to define models for the nodes to be learned.

Args:

config:: configuration parameters

Return:

a list of models

optimizer_creator(models: List[Module], config: dict)[source]¶

Define optimizers for the created models.

Args:

pmodels:: list of all the models
config:: configuration parameters

Return:

a list of optimizers

data_creator()[source]¶

Create DataLoaders.

Args:

config:: configuration parameters

Return:

(train_loader, val_loader)

nan_in_grad()[source]¶

before_train_epoch(*args, **kwargs)[source]¶

after_train_epoch(*args, **kwargs)[source]¶

before_validate_epoch(*args, **kwargs)[source]¶

after_validate_epoch(*args, **kwargs)[source]¶

train_epoch(*args, **kwargs)[source]¶

validate_epoch(*args, **kwargs)[source]¶

train_batch(expert_data, batch_info, scope='train')[source]¶: Define the training process for an batch data.

validate_batch(expert_data, batch_info, scope='valEnv_on_trainData', loss_mask=None)[source]¶

Define the validate process for an batch data.

Args:

expert_data: The batch offline Data.

batch_info: A batch info dict.

scope: if scope=valEnv_on_trainData means training data test on the model trained by validation dataset.

class revive.algo.venv.base.VenvAlgorithm(algo: str, workspace: str | None = None)[source]¶

Bases: object

Class use to manage venv algorithms

get_train_func(config={})[source]¶

get_trainer(config)[source]¶

get_trainable(config)[source]¶

get_parameters(command=None)[source]¶

get_tune_parameters(config)[source]¶

revive.algo.venv.bc module¶

class revive.algo.venv.bc.BCOperator(*args, **kwargs)[source]¶

Bases: VenvOperator

NAME = 'REVIVE_VENV'¶: Name of the used algorithm.

PARAMETER_DESCRIPTION = [{'abbreviation': 'bbs', 'default': 256, 'description': 'Batch size of training process.', 'doc': True, 'name': 'bc_batch_size', 'type': <class 'int'>}, {'abbreviation': 'bep', 'default': 500, 'description': 'Number of epcoh for the training process', 'doc': True, 'name': 'bc_epoch', 'type': <class 'int'>}, {'abbreviation': 'bh', 'default': 10, 'name': 'bc_horizon', 'type': <class 'int'>}, {'abbreviation': 'phf', 'default': 256, 'description': 'Number of neurons per layer of the policy network.', 'doc': True, 'name': 'policy_hidden_features', 'type': <class 'int'>}, {'abbreviation': 'phl', 'default': 4, 'description': 'Depth of policy network.', 'doc': True, 'name': 'policy_hidden_layers', 'search_mode': 'grid', 'search_values': [3, 4, 5], 'type': <class 'int'>}, {'abbreviation': 'pa', 'default': 'leakyrelu', 'name': 'policy_activation', 'type': <class 'str'>}, {'abbreviation': 'pn', 'default': None, 'name': 'policy_normalization', 'type': <class 'str'>}, {'abbreviation': 'pb', 'default': 'res', 'description': 'Backbone of policy network. Support selecting from [mlp, res, ft_transformer, lstm, gru].', 'doc': True, 'name': 'policy_backbone', 'type': <class 'str'>}, {'abbreviation': 'thf', 'default': 256, 'doc': True, 'name': 'transition_hidden_features', 'type': <class 'int'>}, {'abbreviation': 'thl', 'default': 3, 'doc': True, 'name': 'transition_hidden_layers', 'type': <class 'int'>}, {'abbreviation': 'ta', 'default': 'leakyrelu', 'name': 'transition_activation', 'type': <class 'str'>}, {'abbreviation': 'tn', 'default': None, 'name': 'transition_normalization', 'type': <class 'str'>}, {'abbreviation': 'tb', 'default': 'res', 'description': 'Backbone of Transition network. Support selecting from [mlp, res, ft_transformer, lstm, gru].', 'doc': True, 'name': 'transition_backbone', 'type': <class 'str'>}, {'default': 0.0001, 'description': 'Initial learning rate of the training process.', 'doc': True, 'name': 'g_lr', 'search_mode': 'continuous', 'search_values': [1e-06, 0.001], 'type': <class 'float'>}, {'abbreviation': 'wd', 'default': 0.0001, 'name': 'weight_decay', 'type': <class 'float'>}, {'abbreviation': 'ld', 'default': 0.99, 'name': 'lr_decay', 'type': <class 'float'>}, {'default': 'nll', 'description': 'Bc support different loss function("nll", "mae", "mse").', 'doc': True, 'name': 'loss_type', 'type': <class 'str'>}, {'default': 5e-05, 'name': 'bc_l2_coef', 'type': <class 'float'>}, {'default': 0.01, 'name': 'logstd_loss_coef', 'type': <class 'float'>}]¶

model_creator(config: dict, graph: DesicionGraph)[source]¶

Create policies and transition, if needed.

Parameters:: config – configuration parameters
Returns:: list of all models

optimizer_creator(models, config)[source]¶

Define optimizers for the created models.

Args:

pmodels:: list of all the models
config:: configuration parameters

Return:

a list of optimizers

data_creator()[source]¶

Create DataLoaders.

Args:

config:: configuration parameters

Return:

(train_loader, val_loader)

train_epoch(*args, **kwargs)¶

train_batch(expert_data, batch_info, scope='train')[source]¶: Define the training process for an batch data.

revive.algo.venv.revive module¶

class revive.algo.venv.revive.ReviveOperator(*args, **kwargs)[source]¶

Bases: VenvOperator

NAME = 'REVIVE'¶: Name of the used algorithm.

matcher_model_creator(config, graph)[source]¶

Create matcher models.

Parameters:: config – configuration parameters
Returns:: all the models.

model_creator(config, graph)[source]¶

Create all the models. The algorithm needs to define models for the nodes to be learned.

Args:

config:: configuration parameters

Return:

a list of models

data_creator()[source]¶

Create DataLoaders.

Args:

config:: configuration parameters

Return:

(train_loader, val_loader)

switch_mail_data_loader(mode='revive')[source]¶

bc_train_batch(expert_data, batch_info, scope='train', loss_type=None, dataset_mode=None, loss_mask=None)[source]¶

train_epoch(*args, **kwargs)¶

mix_data_process(expert_data, generated_data, matchers, matcher_optimizer)[source]¶

net_l2_norm(network, mean=False)[source]¶

revive.algo.venv.revive_f module¶

class revive.algo.venv.revive_f.FILTEROperator(*args, **kwargs)[source]¶

Bases: PPOOperator

NAME = 'REVIVE_VENV'¶: Name of the used algorithm.

PARAMETER_DESCRIPTION = [{'abbreviation': 'bep', 'default': 1500, 'name': 'bc_epoch', 'type': <class 'int'>}, {'default': 0.001, 'name': 'bc_lr', 'type': <class 'float'>}, {'default': 1, 'name': 'bc_steps', 'type': <class 'int'>}, {'abbreviation': 'mbs', 'default': 1024, 'description': 'Batch size of training process.', 'doc': True, 'name': 'revive_batch_size', 'type': <class 'int'>}, {'abbreviation': 'mep', 'default': 1500, 'description': 'Number of epcoh for the MAIL training process', 'doc': True, 'name': 'revive_epoch', 'type': <class 'int'>}, {'abbreviation': 'dpe', 'default': 0, 'name': 'matcher_pretrain_epoch', 'type': <class 'int'>}, {'abbreviation': 'phf', 'default': 256, 'description': 'Number of neurons per layer of the policy network.', 'doc': True, 'name': 'policy_hidden_features', 'type': <class 'int'>}, {'abbreviation': 'phl', 'default': 4, 'description': 'Depth of policy network.', 'doc': True, 'name': 'policy_hidden_layers', 'type': <class 'int'>}, {'abbreviation': 'pa', 'default': 'leakyrelu', 'name': 'policy_activation', 'type': <class 'str'>}, {'abbreviation': 'pn', 'default': None, 'name': 'policy_normalization', 'type': <class 'str'>}, {'abbreviation': 'pb', 'default': 'res', 'description': 'Backbone of policy network.', 'doc': True, 'name': 'policy_backbone', 'type': <class 'str'>}, {'abbreviation': 'thf', 'default': 256, 'description': 'Number of neurons per layer of the transition network.', 'doc': True, 'name': 'transition_hidden_features', 'type': <class 'int'>}, {'abbreviation': 'thl', 'default': 4, 'doc': True, 'name': 'transition_hidden_layers', 'type': <class 'int'>}, {'abbreviation': 'ta', 'default': 'leakyrelu', 'name': 'transition_activation', 'type': <class 'str'>}, {'abbreviation': 'tn', 'default': None, 'name': 'transition_normalization', 'type': <class 'str'>}, {'abbreviation': 'tb', 'default': 'res', 'description': 'Backbone of Transition network.', 'doc': True, 'name': 'transition_backbone', 'type': <class 'str'>}, {'default': 'auto', 'name': 'matching_nodes', 'type': <class 'list'>}, {'default': 'auto', 'name': 'matching_fit_nodes', 'type': <class 'list'>}, {'abbreviation': 'dhf', 'default': 256, 'description': 'Number of neurons per layer of the matcher network.', 'doc': True, 'name': 'matcher_hidden_features', 'type': <class 'int'>}, {'abbreviation': 'dhl', 'default': 4, 'description': 'Depth of the matcher network.', 'doc': True, 'name': 'matcher_hidden_layers', 'type': <class 'int'>}, {'abbreviation': 'da', 'default': 'leakyrelu', 'name': 'matcher_activation', 'type': <class 'str'>}, {'abbreviation': 'dn', 'default': None, 'name': 'matcher_normalization', 'type': <class 'str'>}, {'default': 'auto', 'name': 'state_nodes', 'type': <class 'list'>}, {'abbreviation': 'vhf', 'default': 256, 'name': 'value_hidden_features', 'type': <class 'int'>}, {'abbreviation': 'vhl', 'default': 4, 'name': 'value_hidden_layers', 'type': <class 'int'>}, {'abbreviation': 'va', 'default': 'leakyrelu', 'name': 'value_activation', 'type': <class 'str'>}, {'abbreviation': 'vn', 'default': None, 'name': 'value_normalization', 'type': <class 'str'>}, {'abbreviation': 'gt', 'default': 'res', 'name': 'generator_type', 'type': <class 'str'>}, {'abbreviation': 'dt', 'default': 'res', 'name': 'matcher_type', 'type': <class 'str'>}, {'default': False, 'name': 'birnn', 'type': <class 'bool'>}, {'abbreviation': 'sas', 'default': None, 'name': 'std_adapt_strategy', 'type': <class 'str'>}, {'abbreviation': 'ga', 'default': 'ppo', 'name': 'generator_algo', 'type': <class 'str'>}, {'default': 2, 'name': 'ppo_runs', 'type': <class 'int'>}, {'default': 0.2, 'name': 'ppo_epsilon', 'type': <class 'float'>}, {'default': 0, 'name': 'ppo_l2norm_cof', 'type': <class 'float'>}, {'default': 0, 'name': 'ppo_entropy_cof', 'type': <class 'float'>}, {'default': 0, 'name': 'generator_sup_cof', 'type': <class 'float'>}, {'default': 0.99, 'name': 'gae_gamma', 'type': <class 'float'>}, {'default': 0.95, 'name': 'gae_lambda', 'type': <class 'float'>}, {'default': 1, 'description': 'The number of update rounds of the generator in each epoch.', 'doc': True, 'name': 'g_steps', 'search_mode': 'grid', 'search_values': [1, 3, 5], 'type': <class 'int'>}, {'default': 1, 'description': 'Number of update rounds of matcher in each epoch.', 'doc': True, 'name': 'd_steps', 'search_mode': 'grid', 'search_values': [1, 3, 5], 'type': <class 'int'>}, {'default': 4e-05, 'description': 'Initial learning rate of the generator nodes nets.', 'doc': True, 'name': 'g_lr', 'search_mode': 'continuous', 'search_values': [1e-06, 0.0001], 'type': <class 'float'>}, {'default': 0.0006, 'description': 'Initial learning rate of the matcher.', 'doc': True, 'name': 'd_lr', 'search_mode': 'continuous', 'search_values': [1e-06, 0.001], 'type': <class 'float'>}, {'default': 0.001, 'name': 'value_lr', 'type': <class 'float'>}, {'default': 0, 'description': 'Matcher loss length.', 'name': 'matcher_loss_length', 'type': <class 'int'>}, {'default': 1.2, 'description': 'Matcher loss high value. When the matcher_loss beyond the value, the generator would stop train', 'name': 'matcher_loss_high', 'type': <class 'float'>}, {'default': 0.3, 'description': 'Matcher loss high value. When the matcher_loss low the value, the matcher would stop train', 'name': 'matcher_loss_low', 'type': <class 'float'>}, {'default': False, 'description': 'Sample the data for tring the matcher.', 'name': 'matcher_sample', 'type': <class 'bool'>}, {'default': 0.25, 'description': 'reward = (1-mae_reward_weight)*matcher_reward + mae_reward_weight*mae_reward.', 'name': 'mae_reward_weight', 'type': <class 'float'>}, {'default': 1, 'description': 'Repeat rollout more data to train generator.', 'name': 'generator_data_repeat', 'type': <class 'int'>}, {'default': 64, 'description': 'RNN hidden dims', 'name': 'rnn_hidden_features', 'type': <class 'int'>}, {'default': 0, 'description': 'length of the sliding_window in RNN', 'name': 'window_size', 'type': <class 'int'>}, {'default': False, 'name': 'mix_data', 'type': <class 'bool'>}, {'default': 0.95, 'name': 'quantile', 'type': <class 'float'>}, {'default': 0.5, 'name': 'matcher_grad_norm_clip', 'type': <class 'float'>}, {'default': True, 'name': 'mix_sample', 'type': <class 'bool'>}, {'default': 0.5, 'name': 'mix_sample_ratio', 'type': <class 'float'>}, {'default': True, 'name': 'replace_with_expert', 'type': <class 'bool'>}, {'default': 0.1, 'name': 'replace_ratio', 'type': <class 'float'>}, {'default': 0.5, 'name': 'gp_coef', 'type': <class 'float'>}, {'default': 0.01, 'name': 'discr_ent_coef', 'type': <class 'float'>}, {'default': 0.0005, 'name': 'matcher_l2_norm_coeff', 'type': <class 'float'>}, {'default': 1e-06, 'name': 'value_l2_norm_coef', 'type': <class 'float'>}, {'default': 1e-06, 'name': 'generator_l2_norm_coef', 'type': <class 'float'>}, {'default': 50, 'name': 'matcher_record_len', 'type': <class 'int'>}, {'default': 1, 'name': 'matcher_record_interval', 'type': <class 'int'>}, {'default': 0.125, 'name': 'fix_std', 'type': <class 'float'>}, {'default': 5e-05, 'name': 'bc_l2_coef', 'type': <class 'float'>}, {'default': 0.01, 'name': 'logstd_loss_coef', 'type': <class 'float'>}, {'default': 0.0, 'name': 'entropy_coef', 'type': <class 'float'>}, {'default': 'nll', 'name': 'bc_loss', 'type': <class 'str'>}, {'default': 10, 'name': 'controller_weight', 'type': <class 'float'>}]¶

property nodes_models_mail¶

property other_models_mail¶

data_creator(config: dict)[source]¶

Create DataLoaders.

Args:

config:: configuration parameters

Return:

(train_loader, val_loader)

mail_model_creator(config, graph)[source]¶

bc_model_creator(config, graph)[source]¶

Create generator models.

Parameters:: config – configuration parameters
Returns:: all the models.

matcher_model_creator(config, graph)[source]¶

Create matcher models.

Parameters:: config – configuration parameters
Returns:: all the models.

mail_generator_model_creator(config, graph)[source]¶

Create generator models.

Parameters:: config – configuration parameters
Returns:: all the models.

bc_optimizer_creator(models, config)[source]¶

mail_optimizer_creator(models, config)[source]¶

Optimizer creator including generator optimizers and matcher optimizers.

Parameters:

models – node models, matcher, value_net
config – configuration parameters

Returns:

generator_optimizer, matcher_optimizer

train_epoch(*args, **kwargs)¶

PPO_step(generated_data, graph, value_net, generator_optimizers, matcher, other_generator_optimizers, epsilon=0.1, lam=0, w_ent=0, matcher_index=None, scope=None)[source]¶

Train Policy including policies, transition, and value_net by PPO algorithm.

Parameters:

generated_data – generated trajectory
graph – decision graph
value_net – value net
generator_optimizers – the optimizers used to optimize node models and value net
epsilon – hyperparameter for clipping in the policy objective
lam – regularization parameter
w_ent – the weight of entropy loss

Returns:

v_loss, p_loss, sup_loss, total_loss, generator_grad_norm

revive.algo.venv.revive_p module¶

class revive.algo.venv.revive_p.PPOOperator(*args, **kwargs)[source]¶

Bases: ReviveOperator

NAME = 'REVIVE_VENV'¶: Name of the used algorithm.

PARAMETER_DESCRIPTION = [{'abbreviation': 'bbs', 'default': 256, 'doc': True, 'name': 'bc_batch_size', 'type': <class 'int'>}, {'abbreviation': 'bep', 'default': 0, 'doc': True, 'name': 'bc_epoch', 'type': <class 'int'>}, {'default': 0.001, 'name': 'bc_lr', 'type': <class 'float'>}, {'default': 'nll', 'description': 'Bc support different loss function("nll", "mae", "mse").', 'name': 'bc_loss_type', 'type': <class 'str'>}, {'abbreviation': 'mbs', 'default': 1024, 'description': 'Batch size of training process.', 'doc': True, 'name': 'revive_batch_size', 'type': <class 'int'>}, {'abbreviation': 'mep', 'default': 1000, 'description': 'Number of epcoh for the training process', 'doc': True, 'name': 'revive_epoch', 'type': <class 'int'>}, {'abbreviation': 'bet', 'default': 1, 'doc': True, 'name': 'fintune', 'type': <class 'int'>}, {'abbreviation': 'betfre', 'default': 1, 'doc': True, 'name': 'finetune_fre', 'type': <class 'int'>}, {'abbreviation': 'dpe', 'default': 0, 'name': 'matcher_pretrain_epoch', 'type': <class 'int'>}, {'abbreviation': 'phf', 'default': 256, 'description': 'Number of neurons per layer of the policy network.', 'doc': True, 'name': 'policy_hidden_features', 'type': <class 'int'>}, {'abbreviation': 'phl', 'default': 4, 'description': 'Depth of policy network.', 'doc': True, 'name': 'policy_hidden_layers', 'type': <class 'int'>}, {'abbreviation': 'pa', 'default': 'leakyrelu', 'name': 'policy_activation', 'type': <class 'str'>}, {'abbreviation': 'pn', 'default': None, 'name': 'policy_normalization', 'type': <class 'str'>}, {'abbreviation': 'pb', 'default': 'res', 'description': 'Backbone of policy network. Support selecting from [mlp, res, ft_transformer, lstm, gru].', 'doc': True, 'name': 'policy_backbone', 'type': <class 'str'>}, {'abbreviation': 'thf', 'default': 256, 'description': 'Number of neurons per layer of the transition network.', 'doc': True, 'name': 'transition_hidden_features', 'type': <class 'int'>}, {'abbreviation': 'thl', 'default': 4, 'doc': True, 'name': 'transition_hidden_layers', 'type': <class 'int'>}, {'abbreviation': 'ta', 'default': 'leakyrelu', 'name': 'transition_activation', 'type': <class 'str'>}, {'abbreviation': 'tn', 'default': None, 'name': 'transition_normalization', 'type': <class 'str'>}, {'abbreviation': 'tb', 'default': 'res', 'description': 'Backbone of Transition network. Support selecting from [mlp, res, ft_transformer, lstm, gru].', 'doc': True, 'name': 'transition_backbone', 'type': <class 'str'>}, {'default': 'auto', 'name': 'matching_nodes', 'type': <class 'list'>}, {'default': 'auto', 'name': 'matching_fit_nodes', 'type': <class 'list'>}, {'abbreviation': 'dhf', 'default': 256, 'description': 'Number of neurons per layer of the matcher network.', 'doc': True, 'name': 'matcher_hidden_features', 'type': <class 'int'>}, {'abbreviation': 'dhl', 'default': 4, 'description': 'Depth of the matcher network.', 'doc': True, 'name': 'matcher_hidden_layers', 'type': <class 'int'>}, {'abbreviation': 'da', 'default': 'leakyrelu', 'name': 'matcher_activation', 'type': <class 'str'>}, {'abbreviation': 'dn', 'default': None, 'name': 'matcher_normalization', 'type': <class 'str'>}, {'default': 'auto', 'name': 'state_nodes', 'type': <class 'list'>}, {'abbreviation': 'vhf', 'default': 256, 'name': 'value_hidden_features', 'type': <class 'int'>}, {'abbreviation': 'vhl', 'default': 4, 'name': 'value_hidden_layers', 'type': <class 'int'>}, {'abbreviation': 'va', 'default': 'leakyrelu', 'name': 'value_activation', 'type': <class 'str'>}, {'abbreviation': 'vn', 'default': None, 'name': 'value_normalization', 'type': <class 'str'>}, {'abbreviation': 'gt', 'default': 'res', 'name': 'generator_type', 'type': <class 'str'>}, {'abbreviation': 'dt', 'default': 'res', 'name': 'matcher_type', 'type': <class 'str'>}, {'default': False, 'name': 'birnn', 'type': <class 'bool'>}, {'abbreviation': 'sas', 'default': None, 'name': 'std_adapt_strategy', 'type': <class 'str'>}, {'abbreviation': 'ga', 'default': 'ppo', 'name': 'generator_algo', 'type': <class 'str'>}, {'default': 2, 'name': 'ppo_runs', 'type': <class 'int'>}, {'default': 0.2, 'name': 'ppo_epsilon', 'type': <class 'float'>}, {'default': 0, 'name': 'ppo_l2norm_cof', 'type': <class 'float'>}, {'default': 0, 'name': 'ppo_entropy_cof', 'type': <class 'float'>}, {'default': 0, 'name': 'generator_sup_cof', 'type': <class 'float'>}, {'default': 0.99, 'name': 'gae_gamma', 'type': <class 'float'>}, {'default': 0.95, 'name': 'gae_lambda', 'type': <class 'float'>}, {'default': 1, 'description': 'The number of update rounds of the generator in each epoch.', 'doc': True, 'name': 'g_steps', 'search_mode': 'grid', 'search_values': [1, 3, 5], 'type': <class 'int'>}, {'default': 1, 'description': 'Number of update rounds of matcher in each epoch.', 'doc': True, 'name': 'd_steps', 'search_mode': 'grid', 'search_values': [1, 3, 5], 'type': <class 'int'>}, {'default': 4e-05, 'description': 'Initial learning rate of the generator nodes nets.', 'doc': True, 'name': 'g_lr', 'search_mode': 'continuous', 'search_values': [1e-06, 0.0001], 'type': <class 'float'>}, {'default': 0.0006, 'description': 'Initial learning rate of the matcher.', 'doc': True, 'name': 'd_lr', 'search_mode': 'continuous', 'search_values': [1e-06, 0.001], 'type': <class 'float'>}, {'default': 0, 'description': 'Matcher loss length.', 'name': 'matcher_loss_length', 'type': <class 'int'>}, {'default': 1.2, 'description': 'Matcher loss high value. When the matcher_loss beyond the value, the generator would stop train', 'name': 'matcher_loss_high', 'type': <class 'float'>}, {'default': 0.3, 'description': 'Matcher loss high value. When the matcher_loss low the value, the matcher would stop train', 'name': 'matcher_loss_low', 'type': <class 'float'>}, {'default': False, 'description': 'Sample the data for tring the matcher.', 'name': 'matcher_sample', 'type': <class 'bool'>}, {'default': 0.25, 'description': 'reward = (1-mae_reward_weight)*matcher_reward + mae_reward_weight*mae_reward.', 'name': 'mae_reward_weight', 'type': <class 'float'>}, {'default': 1, 'description': 'Repeat rollout more data to train generator.', 'name': 'generator_data_repeat', 'type': <class 'int'>}, {'default': 64, 'description': 'RNN hidden dims', 'name': 'rnn_hidden_features', 'type': <class 'int'>}, {'default': 0, 'description': 'length of the sliding_window in RNN', 'name': 'window_size', 'type': <class 'int'>}, {'default': 0.0001, 'description': 'weight_decay in bc finetune', 'doc': True, 'name': 'bc_weight_decay', 'type': <class 'float'>}, {'default': False, 'name': 'mix_data', 'type': <class 'bool'>}, {'default': 0.95, 'name': 'quantile', 'type': <class 'float'>}, {'default': 0.5, 'name': 'matcher_grad_norm_clip', 'type': <class 'float'>}, {'default': False, 'name': 'mix_sample', 'type': <class 'bool'>}, {'default': 0.5, 'name': 'mix_sample_ratio', 'type': <class 'float'>}, {'default': 0.0, 'name': 'gp_coef', 'type': <class 'float'>}, {'default': 0.01, 'name': 'discr_ent_coef', 'type': <class 'float'>}, {'default': 0.0005, 'name': 'matcher_l2_norm_coeff', 'type': <class 'float'>}, {'default': 1e-06, 'name': 'value_l2_norm_coef', 'type': <class 'float'>}, {'default': 1e-06, 'name': 'generator_l2_norm_coef', 'type': <class 'float'>}, {'default': 5e-05, 'name': 'bc_l2_coef', 'type': <class 'float'>}, {'default': 0.0, 'name': 'logstd_loss_coef', 'type': <class 'float'>}, {'default': 0.0, 'name': 'entropy_coef', 'type': <class 'float'>}, {'default': 'nll', 'name': 'bc_loss', 'type': <class 'str'>}, {'default': 'auto', 'name': 'ts_conv_nodes', 'type': <class 'list'>}, {'default': 10, 'name': 'controller_weight', 'type': <class 'float'>}, {'default': -1, 'name': 'bc_repeat', 'type': <class 'int'>}]¶

generator_model_creator(config, graph)[source]¶

Create generator models.

Parameters:: config – configuration parameters
Returns:: all the models.

optimizer_creator(models, config)[source]¶

Optimizer creator including generator optimizers and matcher optimizers.

Parameters:

models – node models, matcher, value_net
config – configuration parameters

Returns:

generator_optimizer, matcher_optimizer

ADV(reward, mask, value, gamma, lam, use_gae=True)[source]¶

Compute advantage function for PPO.

Parameters:

reward – rewards of each step
mask – mask is 1 if the trajectory done, else 0
value – value for each state
gamma – discount factor
lam – GAE lamda
use_gae – True or False

Returns:

advantages and new value

PPO_step(generated_data, graph, value_net, generator_optimizers, matcher, other_generator_optimizers, epsilon=0.1, lam=0, w_ent=0, matcher_index=None, scope=None)[source]¶

Train Policy including policies, transition, and value_net by PPO algorithm.

Parameters:

generated_data – generated trajectory
graph – decision graph
value_net – value net
generator_optimizers – the optimizers used to optimize node models and value net
epsilon – hyperparameter for clipping in the policy objective
lam – regularization parameter
w_ent – the weight of entropy loss

Returns:

v_loss, p_loss, sup_loss, total_loss, generator_grad_norm

revive.algo.venv.revive_t module¶

class revive.algo.venv.revive_t.ReplayBuffer(buffer_size)[source]¶

Bases: object

A simple FIFO experience replay buffer for SAC agents.

put(batch_data: Batch)[source]¶

__len__()[source]¶

sample(batch_size)[source]¶

class revive.algo.venv.revive_t.TD3Operator(*args, **kwargs)[source]¶

Bases: ReviveOperator

NAME = 'REVIVE_TD3'¶: Name of the used algorithm.

PARAMETER_DESCRIPTION = [{'abbreviation': 'bbs', 'default': 256, 'name': 'bc_batch_size', 'type': <class 'int'>}, {'abbreviation': 'bep', 'default': 0, 'name': 'bc_epoch', 'type': <class 'int'>}, {'default': 0.001, 'name': 'bc_lr', 'type': <class 'float'>}, {'abbreviation': 'mbs', 'default': 256, 'description': 'Batch size of training process.', 'doc': True, 'name': 'revive_batch_size', 'type': <class 'int'>}, {'abbreviation': 'mep', 'default': 1000, 'description': 'Number of epcoh for the training process', 'doc': True, 'name': 'revive_epoch', 'type': <class 'int'>}, {'abbreviation': 'bet', 'default': 1, 'doc': True, 'name': 'fintune', 'type': <class 'int'>}, {'abbreviation': 'betfre', 'default': 1, 'doc': True, 'name': 'finetune_fre', 'type': <class 'int'>}, {'abbreviation': 'dpe', 'default': 0, 'name': 'matcher_pretrain_epoch', 'type': <class 'int'>}, {'abbreviation': 'phf', 'default': 256, 'description': 'Number of neurons per layer of the policy network.', 'doc': True, 'name': 'policy_hidden_features', 'type': <class 'int'>}, {'abbreviation': 'phl', 'default': 4, 'description': 'Depth of policy network.', 'doc': True, 'name': 'policy_hidden_layers', 'type': <class 'int'>}, {'abbreviation': 'pa', 'default': 'leakyrelu', 'name': 'policy_activation', 'type': <class 'str'>}, {'abbreviation': 'pn', 'default': None, 'name': 'policy_normalization', 'type': <class 'str'>}, {'abbreviation': 'pb', 'default': 'res', 'description': 'Backbone of policy network.', 'doc': True, 'name': 'policy_backbone', 'type': <class 'str'>}, {'abbreviation': 'thf', 'default': 256, 'description': 'Number of neurons per layer of the transition network.', 'doc': True, 'name': 'transition_hidden_features', 'type': <class 'int'>}, {'abbreviation': 'thl', 'default': 4, 'doc': True, 'name': 'transition_hidden_layers', 'type': <class 'int'>}, {'abbreviation': 'ta', 'default': 'leakyrelu', 'name': 'transition_activation', 'type': <class 'str'>}, {'abbreviation': 'tn', 'default': None, 'name': 'transition_normalization', 'type': <class 'str'>}, {'abbreviation': 'tb', 'default': 'res', 'description': 'Backbone of Transition network.', 'doc': True, 'name': 'transition_backbone', 'type': <class 'str'>}, {'default': 'auto', 'name': 'matching_nodes', 'type': <class 'list'>}, {'abbreviation': 'dhf', 'default': 256, 'description': 'Number of neurons per layer of the matcher network.', 'doc': True, 'name': 'matcher_hidden_features', 'type': <class 'int'>}, {'abbreviation': 'dhl', 'default': 4, 'description': 'Depth of the matcher network.', 'doc': True, 'name': 'matcher_hidden_layers', 'type': <class 'int'>}, {'abbreviation': 'da', 'default': 'leakyrelu', 'name': 'matcher_activation', 'type': <class 'str'>}, {'abbreviation': 'dn', 'default': None, 'name': 'matcher_normalization', 'type': <class 'str'>}, {'default': 'auto', 'name': 'state_nodes', 'type': <class 'list'>}, {'abbreviation': 'vhf', 'default': 256, 'name': 'value_hidden_features', 'type': <class 'int'>}, {'abbreviation': 'vhl', 'default': 4, 'name': 'value_hidden_layers', 'type': <class 'int'>}, {'abbreviation': 'va', 'default': 'leakyrelu', 'name': 'value_activation', 'type': <class 'str'>}, {'abbreviation': 'vn', 'default': None, 'name': 'value_normalization', 'type': <class 'str'>}, {'abbreviation': 'gt', 'default': 'res', 'name': 'generator_type', 'type': <class 'str'>}, {'abbreviation': 'dt', 'default': 'res', 'name': 'matcher_type', 'type': <class 'str'>}, {'default': False, 'name': 'birnn', 'type': <class 'bool'>}, {'abbreviation': 'sas', 'default': None, 'name': 'std_adapt_strategy', 'type': <class 'str'>}, {'default': 1, 'description': 'The number of update rounds of the generator in each epoch.', 'doc': True, 'name': 'g_steps', 'search_mode': 'grid', 'search_values': [1, 3, 5], 'type': <class 'int'>}, {'default': 1, 'description': 'Number of update rounds of matcher in each epoch.', 'doc': True, 'name': 'd_steps', 'search_mode': 'grid', 'search_values': [1, 3, 5], 'type': <class 'int'>}, {'default': 4e-05, 'description': 'Initial learning rate of the generator.', 'doc': True, 'name': 'g_lr', 'search_mode': 'continuous', 'search_values': [1e-06, 0.0001], 'type': <class 'float'>}, {'default': 0.0006, 'description': 'Initial learning rate of the matcher.', 'doc': True, 'name': 'd_lr', 'search_mode': 'continuous', 'search_values': [1e-06, 0.001], 'type': <class 'float'>}, {'default': 1, 'description': 'Matcher loss length.', 'name': 'matcher_loss_length', 'type': <class 'int'>}, {'default': 1.2, 'description': 'Matcher loss high value. When the matcher_loss beyond the value, the generator would stop train', 'name': 'matcher_loss_high', 'type': <class 'float'>}, {'default': 0.6, 'description': 'Matcher loss high value. When the matcher_loss low the value, the matcher would stop train', 'name': 'matcher_loss_low', 'type': <class 'float'>}, {'default': False, 'description': 'Sample the data for tring the matcher.', 'name': 'matcher_sample', 'type': <class 'bool'>}, {'default': 0, 'description': 'reward = (1-mae_reward_weight)*matcher_reward + mae_reward_weight*mae_reward.', 'name': 'mae_reward_weight', 'type': <class 'float'>}, {'default': 0, 'description': 'Number of historical discriminators saved.', 'name': 'history_matcher_num', 'type': <class 'int'>}, {'abbreviation': 'bfs', 'default': 5000.0, 'description': 'Size of the buffer to store data.', 'doc': True, 'name': 'buffer_size', 'type': <class 'int'>}, {'abbreviation': 'tsph', 'default': 10, 'description': 'td3_steps_per_epoch.', 'doc': True, 'name': 'td3_steps_per_epoch', 'type': <class 'int'>}]¶

setup(*args, **kwargs)¶

generator_model_creator(config, graph)[source]¶

Create generator models.

Parameters:: config – configuration parameters
Returns:: all the models.

optimizer_creator(models, config)[source]¶

Optimizer creator including generator optimizers and matcher optimizers.

Parameters:

models – node models, matcher, value_net
config – configuration parameters

Returns:

generator_optimizer, matcher_optimizer

TD3_step(buffer, graph, value_net_1, value_net_2, generator_optimizer=None, value_net_1_optimizer=None, value_net_2_optimizer=None)[source]¶

revive.algo.venv.template module¶

class revive.algo.venv.template.AlgorithmOperator(*args, **kwargs)[source]¶

Bases: VenvOperator

TODO 1: Define the name of this algorithm.

NAME = ''¶: TODO 2: Define the hyper-parameters of this algorithm.

PARAMETER_DESCRIPTION = []¶

classmethod get_parameters(command=None, **kargs)[source]¶

classmethod get_tune_parameters(config, **kargs)[source]¶: Use ray.tune to wrap the parameters to be searched.

model_creator(config)[source]¶

Create all the models.

Parameters:: config – configuration parameters
Returns:: list of models

optimizer_creator(models, config)[source]¶

Create Optimizers.

Parameters:

models – list of all the models
config – configuration parameters

Returns:

list of optimizers

data_creator(config)[source]¶

Create DataLoaders.

Parameters:: config – configuration parameters
Returns:: train_loader and val_loader

train_epoch(iterator, info)[source]¶

train_batch(expert_data, batch_info, scope)[source]¶: Define the training process for an batch data.

revive.algo.venv package¶

Submodules¶

revive.algo.venv.base module¶

revive.algo.venv.bc module¶

revive.algo.venv.revive module¶

revive.algo.venv.revive_f module¶

revive.algo.venv.revive_p module¶

revive.algo.venv.revive_t module¶

revive.algo.venv.template module¶

Module contents¶