In [3]:

# %load /Users/facai/Study/book_notes/preconfig.py
%matplotlib inline

import matplotlib.pyplot as plt
import seaborn as sns
sns.set(color_codes=True)
sns.set(font='SimHei', font_scale=2.5)
plt.rcParams['axes.grid'] = False

def show_image(filename, figsize=None, res_dir=True):
    if figsize:
        plt.figure(figsize=figsize)

    if res_dir:
        filename = './res/{}'.format(filename)

    plt.imshow(plt.imread(filename))

Chapter 1: Introduction¶

Two characteristics:

trial-and-error search
delayed reward

Markov decision process:

sensation
action
goal

challenges:

trade-off between exploration and exploitation

main subelements of a reinforcement learning system:

policy: the learning agent's way of behaving at a given time.
reward signal: goal of the problem.
value function: what is good in the long run. 远见. Hard.
model of the environemt: (optional) inference about how the environment will behave.

reinforcement learning:

VS supervised learning: learning from interaction.
VS unsupervised learning: maximize a reward signal, instead of trying to find hidden structure.
VS evolutionary methods: its search is guided by value function. more efficient in general.

In [6]:

show_image('fig1_1.png', figsize=(12, 8))

In [ ]: