Generate Decision Flow through Causal Inference

When we are unsure of how to construct a decision flow using existing data, we can use the causal inference tool provided by the REVIVE SDK to automatically build a decision flow *.yaml file. This tool is based on Causal Inference With the help of causal inference methods, we can also easily check which dimensions in offline data are useful for learning environments or policy models, as well as the relationship between the states and behaviors of these dimensions. The tool is purely data-driven. With the help of this tool, users can easily obtain the generated decision flow chart, which may help REVIVE learn virtual environments more accurately and infer more effective policies.

This tool can automatically generate a decision flow chart based on the provided .yaml and .npz files.

Note

Users do not need to define the graph section in the .yaml file, as the REVIVE SDK will automatically generate it using causal inference algorithms.

In some cases, the generated decision flow may be potentially different from that of users’ thoughts. So, users would be sure to check out the reasonability of the generated decision flow before using REVIVE to learn environment models and policies.

Here, we show how to use Causal Inference according to the example we’ve introduced in the Example of applying REVIVE to Lander Hover

Generally, we construct the columns part of the .yaml as follows:

metadata:

   columns:
      - obs_0:
        dim: obs
        type: continuous
      - obs_1:
        dim: obs
        type: continuous
      - obs_2:
        dim: obs
        type: continuous
      - obs_3:
        dim: obs
        type: continuous
      - action:
        dim: action
        type: category
        values: [0,1,2,3]

In the file, we do not have to provide the graph part. After we provide this .yaml file together with the corresponding .npz file to the Causal Inference Tool. The tool will output the new .yaml and .npz files with the name defined by users.

Specifically, the .yaml is re-built as shown in the following:

metadata:

   columns:
   - obs_0:
       dim: action_realated_obs
       type: continuous
   - obs_1:
       dim: action_realated_obs
       type: continuous
   - obs_2:
       dim: translation_related_obs
       type: continuous
   - obs_3:
       dim: translation_related_obs
       type: continuous
   - action:
       dim: action
       type: category
       values:
       - 0
       - 1
       - 2
       - 3

   graph:
     action:
     - action_realated_obs
     next_action_realated_obs:
     - action_realated_obs
     - translation_related_obs
     - action
     next_translation_related_obs:
     - action_realated_obs
     - translation_related_obs
     - action

In the generated .yaml file, the tool (Causal Inference) builds the graph part. In the columns part, the values of the action are shown in a little bit different form of a list with that of the original .yaml file as [0,1,2,3]. To be honest, these two forms are the same.

The generated .yaml file shows that the tool (Causal Inference) split the four dimensions of obs data into two parts. The first two dimensions are with the newly changed name action_realated_obs, which is directly connected with action as shown in the graph. And another two dimensions of obs are now named translation_related_obs. The split two obs are now treated as two transition nodes as shown in the graph which can be understood as two environments for REVIVE to learn.

The newly built graph shows the real relationship between the first two dimensions of obs and action for the following reason. In this case, we used the rule model based on coordinate information to simulate the historical decision process and obtain historical decision data. In another word, those data are collected by the “rule model”, whose action data are only connected to the coordinates of the lander, which are the first two dimensions of the obs data. And the tool of causal inference does indeed uncover this relationship by splitting obs into two parts and connecting only the two first dimensions of obs with action.

For users who are confused about constructing the decision flow of their experiment. The causal inference tool of REVIVE might be a wonderful choice for briefly generating a cursory decision flow, but which is needed to be checked out carefully.

The causal inference tool can also deduce those useless dimensions in the data. The outputted .yaml from the tool is shown in the following for the experiment of using the example of the Pendulum. Before that, we add two dimensions in the obs data with unrelated random values. So, obs contains two additional useless data. In the result, we find that the appended dimensions obs_3 and obs_4 of obs are flagged as useless_obs, which means these two dimensions are removed from obs. And this result meets our expectations.

metadata:

  columns:
  - obs_0:
      dim: obs
      type: continuous
  - obs_1:
      dim: obs
      type: continuous
  - obs_2:
      dim: obs
      type: continuous
  - obs_3:
      dim: useless_obs
      type: continuous
  - obs_4:
      dim: useless_obs
      type: continuous
  - action:
      dim: action
      type: continuous

  graph:
    action:
    - obs
    next_obs:
    - obs
    - action