Uniform State Abstraction For Reinforcement Learning¶
This work extend the MultiGrid Reinforcement Learning to deep RL. It built a high-level Abstract MDP (AMDP) which can be solved by dynamic programming, and used the built AMDP to shape the reward.
- The state abstract is built with splitting the state space.
- Need to build the AMDP before use it for reward shaping. The AMDP is built with exploration space without update neural network.
- The reward of AMDP is estimated with collected transitions in exploration phase, or simply set as -1 and the agent will stop after reach the goal states.
- Requires that one or more abstract states are selected as goal states.
- The goal states in AMDP are selected with a policy trained with value iteration on the AMDP.
- The evaluation works well on very simple low dimension cases.