# Self-Supervised Attention-Aware Reinforcement Learning¶

## Main Idea¶

Self-supervisedly select regions of interst without explicit annotations.

The attention is learned with a self-supervised learning loss.
The target image and source image are sampled randomly from a task.
Region of intersts are the fontground, background otherwise.
Note the Mask generator and encoder have shared weights.

The auto encoded feature is $$\hat{\Phi}\left(\boldsymbol{x}_{s}, \boldsymbol{x}_{t}\right) \triangleq\left(1-\Psi\left(\boldsymbol{x}_{s}\right)\right) \cdot\left(1-\Psi\left(\boldsymbol{x}_{t}\right)\right) \cdot \Phi\left(x_{s}\right)+\Psi\left(\boldsymbol{x}_{t}\right) \cdot \Phi\left(x_{t}\right)$$

I am not quite sure why after the plus of outputs of the 2 extractions, it applies the attention $\left(1-\Psi\left(\boldsymbol{x}_{s}\right)\right)$ again.