REVIVE is an easy-to-use software for building intelligent decision-making systems based on offline data. This type of task is also commonly referred to as offline reinforcement learning which aims at learning a policy from a batch of historical data without extra environment interactions. It holds great promises for turning historical data into powerful decision-making engines. Effective offline reinforcement learning methods will be able to derive policies with maximum utility from existing data, thus automating a wide range of decision-making areas from mechanical system control and energy efficiency improvements to facilitating scientific research discoveries.


REVIVE is a general software that aims to bring automatic decision-making to real-world scenarios. The software operates in the pipeline of two stages:

Venv Training: A virtual-environment model is trained from the offline data to mimic each agent’s policy along with the transition between states (also known as nature’s policy).

Policy Training: Treat one of the agents as the active agent and freeze others as its environment. Revive trains the active agent with reinforcement learning to derive a better target policy for that agent. A good policy can often maximize the predefined reward.

There are three concepts to note in REVIVE:

  • Venv: Venv refers to a virtual environment model. It is a virtual representation that serves as the digital counterpart of a physical object or process. REVIVE constructs a virtual environment model in the digital form of the system through historical data-driven neural networks and expert knowledge in the modeling process.

  • Policy: Policy defines how an agent acts given a specific state. The agent is the subject of decision-making and makes different decisions when facing different situations to maximize the predefined reward.

  • Reward: Reward is a numerical quantity used to portray the performance of a policy. A good policy can obtain higher rewards. For example, a good policy in a mechanical system can reduce energy consumption under the condition that the task is completed. A reasonable reward in this context can be defined as the amount of energy the policy saves if the task is completed and a very low value if the task fails.