浣跨敤REVIVE SDK杩涜宸ヤ笟鏈哄櫒鎺у埗
=======================================

.. image:: images/ib.png
 :alt: example-of-lander_hover
 :align: center

宸ヤ笟鏈哄櫒鎺у埗浠诲姟鎻忚堪
~~~~~~~~~~~~~~~~~~~~~~~~~~~
宸ヤ笟鍩哄噯锛圛B锛夋槸涓€涓己鍖栧涔犲熀鍑嗙幆澧冿紝鏃ㄥ湪妯℃嫙鍚勭宸ヤ笟鎺у埗浠诲姟涓殑鐗规€э紝渚嬪锛氶鍔涙垨鐕冩皵杞満銆佸寲瀛﹀弽搴斿櫒銆傚畠鍥婃嫭浜嗙湡瀹炰笘鐣屼腑宸ヤ笟棰嗗煙甯歌鐨勮澶氶棶棰橈紝
濡傦細杩炵画鐘舵€佸拰鍔ㄤ綔绌洪棿鐨勯珮缁村害鎬с€佸欢杩熷鍔便€佸鏉傚櫔澹版ā寮忎互鍙婂涓弽搴旂洰鏍囩殑楂橀殢鏈烘€с€傛垜浠繕閫氳繃灏嗙郴缁熺姸鎬佺殑涓や釜缁村害娣诲姞鍒拌瀵熺┖闂达紝鏉ヨ绠楁瘡涓楠ょ殑鍗虫椂濂栧姳锛�
浠庤€屽鍘熷宸ヤ笟鍩哄噯鐜杩涜浜嗘暟鎹骞裤€傜敱浜嶪B鏈韩鏄竴涓珮缁村拰楂樺害闅忔満鐨勭幆澧冿紝鍦ㄨ繖涓幆澧冧笂杩涜閲囨牱鐨勬椂鍊欙紝骞朵笉浼氬鍔ㄤ綔鏁版嵁杩涜鍔犲櫔鐨勫鐞嗐€�


================= ====================
Action Space      Continuous(3,)
Observation       Shape (180,)
================= ====================


鍔ㄤ綔绌洪棿
--------------------------

鍔ㄤ綔绌洪棿鐢辫繛缁殑 3 缁村悜閲忕粍鎴愶紝璇︾粏淇℃伅璇峰弬鑰� `http://polixir.ai/research/neorl <http://polixir.ai/research/neorl>`__銆�


瑙傚療绌洪棿 
--------------------------

鐘舵€佹槸涓€涓� 180 缁村悜閲忋€備簨瀹炰笂锛屾瘡涓椂鍒荤殑瑙傛祴鏄� 6 缁村悜閲忥紝鏁版嵁闆嗚嚜鍔ㄦ嫾鎺ヤ簡鍓� 29 甯х殑鏁版嵁锛屽洜姝ゅ綋鍓嶈娴嬬殑缁村害涓� :math:`180=6*30`銆傝缁嗕俊鎭鍙傝€� `http://polixir.ai/research/neorl <http://polixir.ai/research/neorl>`__銆�


宸ヤ笟鏈哄櫒鎺у埗浠诲姟鐩爣
--------------------------

闇€瑕佽宸ヤ笟鏈哄櫒鐨勫悇椤规寚鏍囩淮鎸佸湪鐩爣鍊奸檮杩戯紝璇︾粏淇℃伅璇峰弬鑰� `http://polixir.ai/research/neorl <http://polixir.ai/research/neorl>`__銆�


浣跨敤REVIVE SDK璁粌鎺у埗绛栫暐
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

REVIVE SDK鏄竴涓巻鍙叉暟鎹┍鍔ㄧ殑宸ュ叿锛屾牴鎹枃妗f暀绋嬮儴鍒嗙殑鎻忚堪锛屽湪宸ヤ笟鏈哄櫒鎺у埗浠诲姟涓婁娇鐢≧EVIVE SDK鍙互鍒嗕负浠ヤ笅鍑犳锛�

 1. 澶勭悊鍘嗗彶鍐崇瓥鏁版嵁锛�
 2. 缁撳悎涓氬姟鍦烘櫙鍜屾敹闆嗙殑鍘嗗彶鏁版嵁鏋勫缓 :doc:`鍐崇瓥娴佸浘鍜屾暟缁勬暟鎹�<../tutorial/data_preparation_cn>`锛屽叾涓喅绛栨祦鍥句富瑕佹弿杩颁簡涓氬姟鏁版嵁鐨勪氦浜掗€昏緫锛�
    浣跨敤 ``.yaml`` 鏂囦欢杩涘瓨鍌紝鏁扮粍鏁版嵁瀛樺偍浜嗗喅绛栨祦鍥句腑瀹氫箟鐨勮妭鐐规暟鎹紝浣跨敤 ``.npz`` 鎴� ``.h5`` 鏂囦欢杩涜瀛樺偍銆�
 3. 鏈変簡涓婅堪鐨勫喅绛栨祦鍥惧拰鏁扮粍鏁版嵁锛孯EVIVE SDK宸茬粡鍙互杩涜铏氭嫙鐜妯″瀷鐨勮缁冦€備絾涓轰簡鑾峰緱鏇翠紭鐨勬帶鍒剁瓥鐣ワ紝闇€瑕佹牴鎹换鍔$洰鏍囧畾涔� :doc:`濂栧姳鍑芥暟<../tutorial/reward_function_cn>` 锛屽鍔卞嚱鏁板畾涔変簡
    绛栫暐鐨勪紭鍖栫洰鏍囷紝鍙互鎸囧鎺у埗绛栫暐浣垮緱宸ヤ笟鏈哄櫒鏇村姞绋冲畾銆�
 4. 瀹氫箟瀹� :doc:`鍐崇瓥娴佸浘<../tutorial/data_preparation_cn>`锛� :doc:`璁粌鏁版嵁<../tutorial/data_preparation_cn>` 鍜� :doc:`濂栧姳鍑芥暟<../tutorial/reward_function_cn>` 涔嬪悗锛屾垜浠氨鍙互
    浣跨敤REVIVE SDK寮€濮嬭櫄鎷熺幆澧冩ā鍨嬭缁冨拰绛栫暐妯″瀷璁粌銆�
 5. 鏈€鍚庡皢REVIVE SDK璁粌鐨勭瓥鐣ユā鍨嬭繘琛屼笂绾挎祴璇曘€�


鍑嗗鏁版嵁
-----------------------

鎴戜滑浣跨敤Neorl涓殑IB鏁版嵁闆嗗拰濂栧姳鍑芥暟鏉ユ瀯寤鸿缁冧换鍔°€傝缁嗕俊鎭鍙傝€� `http://polixir.ai/research/neorl <http://polixir.ai/research/neorl>`__.


瀹氫箟鍐崇瓥娴佸浘
--------------------------------------

IB 浠诲姟鐨勫畬鏁磋缁冭繃绋嬫秹鍙婂埌寮傛瀯鍐崇瓥娴佸浘鍔犺浇銆傝鎯呭彲浠ュ弬鑰� :doc:`寮傛瀯鍐崇瓥娴佸浘鍔犺浇 <../tutorial/heterogeneous_decision_graphs_cn>`銆�

浠ヤ笅鏄� **璁粌铏氭嫙鐜** 鏃剁殑 ``.yaml`` 鏂囦欢锛�

.. code:: yaml

    metadata:
        columns:
        - obs_0:
            dim: obs
            type: continuous
        - obs_1:
            dim: obs
            type: continuous
        ...
        - obs_179:
            dim: obs
            type: continuous

        - obs_0:
            dim: current_next_obs
            type: continuous
        - obs_1:
            dim: current_next_obs
            type: continuous
        ...
        - obs_5:
            dim: current_next_obs
            type: continuous

        - action_0:
            dim: action
            type: continuous
        - action_1:
            dim: action
            type: continuous
        - action_2:
            dim: action
            type: continuous

        graph:
            #action:
            #- obs
            current_next_obs:
            - obs
            - action
            next_obs:
            - obs
            - current_next_obs
    
        expert_functions:
            next_obs:
            'node_function' : 'expert_function.next_obs'

浠ヤ笅鏄� **璁粌绛栫暐** 鏃剁殑 ``.yaml`` 鏂囦欢锛�

.. code:: yaml

    metadata:
        columns:
        - obs_0:
            dim: obs
            type: continuous
        - obs_1:
            dim: obs
            type: continuous
        ...
        - obs_179:
            dim: obs
            type: continuous

        - obs_0:
            dim: current_next_obs
            type: continuous
        - obs_1:
            dim: current_next_obs
            type: continuous
        ...
        - obs_5:
            dim: current_next_obs
            type: continuous

        - action_0:
            dim: action
            type: continuous
        - action_1:
            dim: action
            type: continuous
        - action_2:
            dim: action
            type: continuous

        graph:
            action:
            - obs
            current_next_obs:
            - obs
            - action
            next_obs:
            - obs
            - current_next_obs
  
        expert_functions:
            next_obs:
            'node_function' : 'expert_function.next_obs'

        #nodes:
        #  action:
        #      step_input: True

鏋勫缓濂栧姳鍑芥暟
----------------------------------------------

杩欓噷鎴戜滑瀹氫箟浜� IB 浠诲姟涓瓥鐣ヨ妭鐐圭殑濂栧姳鍑芥暟锛�

.. code:: python

    import torch
    from typing import Dict


    def get_reward(data : Dict[str, torch.Tensor]) -> torch.Tensor:
        obs = data["obs"]
        next_obs = data["next_obs"]

        single_reward = False
        if len(obs.shape) == 1:
            single_reward = True
            obs = obs.reshape(1, -1)
        if len(next_obs.shape) == 1:
            next_obs = next_obs.reshape(1, -1)

        CRF = 3.0
        CRC = 1.0

        fatigue = next_obs[:, 4]
        consumption = next_obs[:, 5]

        cost = CRF * fatigue + CRC * consumption

        reward = -cost

        if single_reward:
            reward = reward[0].item()
        else:
            reward = reward.reshape(-1, 1)

        return reward


浣跨敤REVIVE SDK璁粌涓€涓帶鍒剁瓥鐣�
-------------------------------------

REVIVE SDK宸茬粡鎻愪緵浜嗚缁冩墍闇€鐨勬暟鎹拰浠g爜锛岃鎯呰鍙傝€� `REVIVE SDK婧愮爜搴� <https://agit.ai/Polixir/revive/src/branch/master/examples/task/IB>`__銆�
瀹屾垚REVIVE SDK鐨勫畨瑁呭悗锛屽彲浠ュ垏鎹㈠埌 ``examples/task/IB`` 鐩綍涓嬶紝杩愯涓嬮潰鐨� Bash 鍛戒护寮€鍚櫄鎷熺幆澧冩ā鍨嬭缁冨拰绛栫暐妯″瀷璁粌銆傚湪璁粌杩囩▼涓紝鎴戜滑鍙互闅忔椂浣跨敤tensorboard鎵撳紑鏃ュ織鐩綍浠ョ洃鎺ц缁冭繃绋嬨€傚綋REVIVE SDK瀹屾垚铏氭嫙鐜妯″瀷璁粌鍜岀瓥鐣ユā鍨嬭缁冨悗銆�
鎴戜滑鍙互鍦ㄦ棩蹇楁枃浠跺す锛� ``logs/<run_id>``锛変笅鎵惧埌淇濆瓨鐨勬ā鍨嬶紙 ``.pkl`` 鎴� ``.onnx``锛夈€�

.. code:: bash

 python train.py -df data/ib.npz -cf data/ib_env.yaml -rf data/ib_reward.py -rcf data/config.json -vm tune -pm None --run_id revive

 python train.py -df data/ib.npz -cf data/ib_policy.yaml -rf data/ib_reward.py -rcf data/config.json -vm None -pm tune --run_id revive


鍦↖B鐜涓祴璇曡缁冨緱鍒扮殑绛栫暐妯″瀷
------------------------------------

璁粌瀹屾垚鍚庯紝鍙互鐢ㄦ彁渚涚殑jupyter notebook鑴氭湰瀵瑰畬鎴愯缁冪殑绛栫暐鎬ц兘杩涜娴嬭瘯銆� 鍏蜂綋璇峰弬鑰� `jupyter notebook <https://agit.ai/Polixir/revive/src/branch/master/examples/task/IB/TestPolicy.ipynb>`__銆�