浣跨敤REVIVE SDK杩涜宸ヤ笟鏈哄櫒鎺у埗 ======================================= .. image:: images/ib.png :alt: example-of-lander_hover :align: center 宸ヤ笟鏈哄櫒鎺у埗浠诲姟鎻忚堪 ~~~~~~~~~~~~~~~~~~~~~~~~~~~ 宸ヤ笟鍩哄噯锛圛B锛夋槸涓€涓己鍖栧涔犲熀鍑嗙幆澧冿紝鏃ㄥ湪妯℃嫙鍚勭宸ヤ笟鎺у埗浠诲姟涓殑鐗规€э紝渚嬪锛氶鍔涙垨鐕冩皵杞満銆佸寲瀛﹀弽搴斿櫒銆傚畠鍥婃嫭浜嗙湡瀹炰笘鐣屼腑宸ヤ笟棰嗗煙甯歌鐨勮澶氶棶棰橈紝 濡傦細杩炵画鐘舵€佸拰鍔ㄤ綔绌洪棿鐨勯珮缁村害鎬с€佸欢杩熷鍔便€佸鏉傚櫔澹版ā寮忎互鍙婂涓弽搴旂洰鏍囩殑楂橀殢鏈烘€с€傛垜浠繕閫氳繃灏嗙郴缁熺姸鎬佺殑涓や釜缁村害娣诲姞鍒拌瀵熺┖闂达紝鏉ヨ绠楁瘡涓楠ょ殑鍗虫椂濂栧姳锛� 浠庤€屽鍘熷宸ヤ笟鍩哄噯鐜杩涜浜嗘暟鎹骞裤€傜敱浜嶪B鏈韩鏄竴涓珮缁村拰楂樺害闅忔満鐨勭幆澧冿紝鍦ㄨ繖涓幆澧冧笂杩涜閲囨牱鐨勬椂鍊欙紝骞朵笉浼氬鍔ㄤ綔鏁版嵁杩涜鍔犲櫔鐨勫鐞嗐€� ================= ==================== Action Space Continuous(3,) Observation Shape (180,) ================= ==================== 鍔ㄤ綔绌洪棿 -------------------------- 鍔ㄤ綔绌洪棿鐢辫繛缁殑 3 缁村悜閲忕粍鎴愶紝璇︾粏淇℃伅璇峰弬鑰� `http://polixir.ai/research/neorl <http://polixir.ai/research/neorl>`__銆� 瑙傚療绌洪棿 -------------------------- 鐘舵€佹槸涓€涓� 180 缁村悜閲忋€備簨瀹炰笂锛屾瘡涓椂鍒荤殑瑙傛祴鏄� 6 缁村悜閲忥紝鏁版嵁闆嗚嚜鍔ㄦ嫾鎺ヤ簡鍓� 29 甯х殑鏁版嵁锛屽洜姝ゅ綋鍓嶈娴嬬殑缁村害涓� :math:`180=6*30`銆傝缁嗕俊鎭鍙傝€� `http://polixir.ai/research/neorl <http://polixir.ai/research/neorl>`__銆� 宸ヤ笟鏈哄櫒鎺у埗浠诲姟鐩爣 -------------------------- 闇€瑕佽宸ヤ笟鏈哄櫒鐨勫悇椤规寚鏍囩淮鎸佸湪鐩爣鍊奸檮杩戯紝璇︾粏淇℃伅璇峰弬鑰� `http://polixir.ai/research/neorl <http://polixir.ai/research/neorl>`__銆� 浣跨敤REVIVE SDK璁粌鎺у埗绛栫暐 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ REVIVE SDK鏄竴涓巻鍙叉暟鎹┍鍔ㄧ殑宸ュ叿锛屾牴鎹枃妗f暀绋嬮儴鍒嗙殑鎻忚堪锛屽湪宸ヤ笟鏈哄櫒鎺у埗浠诲姟涓婁娇鐢≧EVIVE SDK鍙互鍒嗕负浠ヤ笅鍑犳锛� 1. 澶勭悊鍘嗗彶鍐崇瓥鏁版嵁锛� 2. 缁撳悎涓氬姟鍦烘櫙鍜屾敹闆嗙殑鍘嗗彶鏁版嵁鏋勫缓 :doc:`鍐崇瓥娴佸浘鍜屾暟缁勬暟鎹�<../tutorial/data_preparation_cn>`锛屽叾涓喅绛栨祦鍥句富瑕佹弿杩颁簡涓氬姟鏁版嵁鐨勪氦浜掗€昏緫锛� 浣跨敤 ``.yaml`` 鏂囦欢杩涘瓨鍌紝鏁扮粍鏁版嵁瀛樺偍浜嗗喅绛栨祦鍥句腑瀹氫箟鐨勮妭鐐规暟鎹紝浣跨敤 ``.npz`` 鎴� ``.h5`` 鏂囦欢杩涜瀛樺偍銆� 3. 鏈変簡涓婅堪鐨勫喅绛栨祦鍥惧拰鏁扮粍鏁版嵁锛孯EVIVE SDK宸茬粡鍙互杩涜铏氭嫙鐜妯″瀷鐨勮缁冦€備絾涓轰簡鑾峰緱鏇翠紭鐨勬帶鍒剁瓥鐣ワ紝闇€瑕佹牴鎹换鍔$洰鏍囧畾涔� :doc:`濂栧姳鍑芥暟<../tutorial/reward_function_cn>` 锛屽鍔卞嚱鏁板畾涔変簡 绛栫暐鐨勪紭鍖栫洰鏍囷紝鍙互鎸囧鎺у埗绛栫暐浣垮緱宸ヤ笟鏈哄櫒鏇村姞绋冲畾銆� 4. 瀹氫箟瀹� :doc:`鍐崇瓥娴佸浘<../tutorial/data_preparation_cn>`锛� :doc:`璁粌鏁版嵁<../tutorial/data_preparation_cn>` 鍜� :doc:`濂栧姳鍑芥暟<../tutorial/reward_function_cn>` 涔嬪悗锛屾垜浠氨鍙互 浣跨敤REVIVE SDK寮€濮嬭櫄鎷熺幆澧冩ā鍨嬭缁冨拰绛栫暐妯″瀷璁粌銆� 5. 鏈€鍚庡皢REVIVE SDK璁粌鐨勭瓥鐣ユā鍨嬭繘琛屼笂绾挎祴璇曘€� 鍑嗗鏁版嵁 ----------------------- 鎴戜滑浣跨敤Neorl涓殑IB鏁版嵁闆嗗拰濂栧姳鍑芥暟鏉ユ瀯寤鸿缁冧换鍔°€傝缁嗕俊鎭鍙傝€� `http://polixir.ai/research/neorl <http://polixir.ai/research/neorl>`__. 瀹氫箟鍐崇瓥娴佸浘 -------------------------------------- IB 浠诲姟鐨勫畬鏁磋缁冭繃绋嬫秹鍙婂埌寮傛瀯鍐崇瓥娴佸浘鍔犺浇銆傝鎯呭彲浠ュ弬鑰� :doc:`寮傛瀯鍐崇瓥娴佸浘鍔犺浇 <../tutorial/heterogeneous_decision_graphs_cn>`銆� 浠ヤ笅鏄� **璁粌铏氭嫙鐜** 鏃剁殑 ``.yaml`` 鏂囦欢锛� .. code:: yaml metadata: columns: - obs_0: dim: obs type: continuous - obs_1: dim: obs type: continuous ... - obs_179: dim: obs type: continuous - obs_0: dim: current_next_obs type: continuous - obs_1: dim: current_next_obs type: continuous ... - obs_5: dim: current_next_obs type: continuous - action_0: dim: action type: continuous - action_1: dim: action type: continuous - action_2: dim: action type: continuous graph: #action: #- obs current_next_obs: - obs - action next_obs: - obs - current_next_obs expert_functions: next_obs: 'node_function' : 'expert_function.next_obs' 浠ヤ笅鏄� **璁粌绛栫暐** 鏃剁殑 ``.yaml`` 鏂囦欢锛� .. code:: yaml metadata: columns: - obs_0: dim: obs type: continuous - obs_1: dim: obs type: continuous ... - obs_179: dim: obs type: continuous - obs_0: dim: current_next_obs type: continuous - obs_1: dim: current_next_obs type: continuous ... - obs_5: dim: current_next_obs type: continuous - action_0: dim: action type: continuous - action_1: dim: action type: continuous - action_2: dim: action type: continuous graph: action: - obs current_next_obs: - obs - action next_obs: - obs - current_next_obs expert_functions: next_obs: 'node_function' : 'expert_function.next_obs' #nodes: # action: # step_input: True 鏋勫缓濂栧姳鍑芥暟 ---------------------------------------------- 杩欓噷鎴戜滑瀹氫箟浜� IB 浠诲姟涓瓥鐣ヨ妭鐐圭殑濂栧姳鍑芥暟锛� .. code:: python import torch from typing import Dict def get_reward(data : Dict[str, torch.Tensor]) -> torch.Tensor: obs = data["obs"] next_obs = data["next_obs"] single_reward = False if len(obs.shape) == 1: single_reward = True obs = obs.reshape(1, -1) if len(next_obs.shape) == 1: next_obs = next_obs.reshape(1, -1) CRF = 3.0 CRC = 1.0 fatigue = next_obs[:, 4] consumption = next_obs[:, 5] cost = CRF * fatigue + CRC * consumption reward = -cost if single_reward: reward = reward[0].item() else: reward = reward.reshape(-1, 1) return reward 浣跨敤REVIVE SDK璁粌涓€涓帶鍒剁瓥鐣� ------------------------------------- REVIVE SDK宸茬粡鎻愪緵浜嗚缁冩墍闇€鐨勬暟鎹拰浠g爜锛岃鎯呰鍙傝€� `REVIVE SDK婧愮爜搴� <https://agit.ai/Polixir/revive/src/branch/master/examples/task/IB>`__銆� 瀹屾垚REVIVE SDK鐨勫畨瑁呭悗锛屽彲浠ュ垏鎹㈠埌 ``examples/task/IB`` 鐩綍涓嬶紝杩愯涓嬮潰鐨� Bash 鍛戒护寮€鍚櫄鎷熺幆澧冩ā鍨嬭缁冨拰绛栫暐妯″瀷璁粌銆傚湪璁粌杩囩▼涓紝鎴戜滑鍙互闅忔椂浣跨敤tensorboard鎵撳紑鏃ュ織鐩綍浠ョ洃鎺ц缁冭繃绋嬨€傚綋REVIVE SDK瀹屾垚铏氭嫙鐜妯″瀷璁粌鍜岀瓥鐣ユā鍨嬭缁冨悗銆� 鎴戜滑鍙互鍦ㄦ棩蹇楁枃浠跺す锛� ``logs/<run_id>``锛変笅鎵惧埌淇濆瓨鐨勬ā鍨嬶紙 ``.pkl`` 鎴� ``.onnx``锛夈€� .. code:: bash python train.py -df data/ib.npz -cf data/ib_env.yaml -rf data/ib_reward.py -rcf data/config.json -vm tune -pm None --run_id revive python train.py -df data/ib.npz -cf data/ib_policy.yaml -rf data/ib_reward.py -rcf data/config.json -vm None -pm tune --run_id revive 鍦↖B鐜涓祴璇曡缁冨緱鍒扮殑绛栫暐妯″瀷 ------------------------------------ 璁粌瀹屾垚鍚庯紝鍙互鐢ㄦ彁渚涚殑jupyter notebook鑴氭湰瀵瑰畬鎴愯缁冪殑绛栫暐鎬ц兘杩涜娴嬭瘯銆� 鍏蜂綋璇峰弬鑰� `jupyter notebook <https://agit.ai/Polixir/revive/src/branch/master/examples/task/IB/TestPolicy.ipynb>`__銆�