亚洲在线久爱草,狠狠天天香蕉网,天天搞日日干久草,伊人亚洲日本欧美

為了賬號安全,請及時綁定郵箱和手機立即綁定
已解決430363個問題,去搜搜看,總會有你想問的

CartPole-v0 的意外觀察空間

CartPole-v0 的意外觀察空間

慕斯709654 2022-01-05 10:47:48
我對通過自省獲得的觀察空間感到驚訝CartPole-v0。根據官方文檔,這是我應該得到的:但是,這是我得到的:print(env.observation_space.low)print(env.observation_space.high)#[-4.8000002e+00 -3.4028235e+38 -4.1887903e-01 -3.4028235e+38]#[4.8000002e+00 3.4028235e+38 4.1887903e-01 3.4028235e+38]我正在使用最新版本的gym:!pip list|grep gymgym                 0.12.1   知道發生了什么嗎?
查看完整描述

2 回答

?
莫回無

TA貢獻1865條經驗 獲得超7個贊

如代碼中所述,您似乎正在獲得預期的行為,這有點令人困惑。一方面,對于 的觀察空間是 [-4.8, 4.8] cart position,然而,實際上,當推車到達極限 [-2.4, 2.4] 時,情節應該結束。與pole angle情況類似。


class CartPoleEnv(gym.Env):

"""

Description:

    A pole is attached by an un-actuated joint to a cart, which moves along a frictionless track. The pendulum starts upright, and the goal is to prevent it from falling over by increasing and reducing the cart's velocity.


Source:

    This environment corresponds to the version of the cart-pole problem described by Barto, Sutton, and Anderson


Observation: 

    Type: Box(4)

    Num Observation                 Min         Max

    0   Cart Position             -4.8            4.8

    1   Cart Velocity             -Inf            Inf

    2   Pole Angle                 -24 deg        24 deg

    3   Pole Velocity At Tip      -Inf            Inf


Actions:

    Type: Discrete(2)

    Num Action

    0   Push cart to the left

    1   Push cart to the right


    Note: The amount the velocity that is reduced or increased is not fixed; it depends on the angle the pole is pointing. This is because the center of gravity of the pole increases the amount of energy needed to move the cart underneath it


Reward:

    Reward is 1 for every step taken, including the termination step


Starting State:

    All observations are assigned a uniform random value in [-0.05..0.05]


Episode Termination:

    Pole Angle is more than 12 degrees

    Cart Position is more than 2.4 (center of the cart reaches the edge of the display)

    Episode length is greater than 200

    Solved Requirements

    Considered solved when the average reward is greater than or equal to 195.0 over 100 consecutive trials.

"""

在此鏈接中,您可以閱讀相關的 Github 問題。


*請注意,24 度等于 4.1887903e-01 弧度。


查看完整回答
反對 回復 2022-01-05
?
搖曳的薔薇

TA貢獻1793條經驗 獲得超6個贊

看起來像過時的文檔,已經創建了一個問題:https : //github.com/openai/gym/issues/368


查看完整回答
反對 回復 2022-01-05
  • 2 回答
  • 0 關注
  • 454 瀏覽
慕課專欄
更多

添加回答

舉報

0/150
提交
取消
微信客服

購課補貼
聯系客服咨詢優惠詳情

幫助反饋 APP下載

慕課網APP
您的移動學習伙伴

公眾號

掃描二維碼
關注慕課網微信公眾號