Uniform State Abstraction For Reinforcement Learning



This work extend the MultiGrid Reinforcement Learning to deep RL. It built a high-level Abstract MDP (AMDP) which can be solved by dynamic programming, and used the built AMDP to shape the reward.


  • The state abstract is built with splitting the state space.
  • Need to build the AMDP before use it for reward shaping. The AMDP is built with exploration space without update neural network.
  • The reward of AMDP is estimated with collected transitions in exploration phase, or simply set as -1 and the agent will stop after reach the goal states.
  • Requires that one or more abstract states are selected as goal states.
  • The goal states in AMDP are selected with a policy trained with value iteration on the AMDP.
  • The evaluation works well on very simple low dimension cases.

Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation


The upper-level controller produces a goal for the low level critic. The experiments are evaluated on the Montezuma’s Revenge with spare reward.

Deep Abstract Q-Networks


In [ ]: