## Uniform State Abstraction For Reinforcement Learning¶

### Summary¶

This work extend the MultiGrid Reinforcement Learning to deep RL. It built a high-level Abstract MDP (AMDP) which can be solved by dynamic programming, and used the built AMDP to shape the reward.

### Details¶

• The state abstract is built with splitting the state space.
• Need to build the AMDP before use it for reward shaping. The AMDP is built with exploration space without update neural network.
• The reward of AMDP is estimated with collected transitions in exploration phase, or simply set as -1 and the agent will stop after reach the goal states.
• Requires that one or more abstract states are selected as goal states.
• The goal states in AMDP are selected with a policy trained with value iteration on the AMDP.
• The evaluation works well on very simple low dimension cases.

## Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation¶