4.1 word2vecの改良①

word2vecは、語彙の数が大きくなると計算量が膨大になる問題がある。

これを解決するために、Embeddedingレイヤを実装する。Embeddedingレイヤの発送、one-hot表現の形式から、単語IDに該当する行(ベクトル)を抜き出すためのレイヤを作成する。

4.1.2 Embeddedingレイヤの実装

Forward方向は、行列のうち指定された1行を抜き出す。 Backward方向は、行列のうち指定された1行を伝える。

class Embedding:
    def __init__(self, W):
        self.params = [W]
        self.grads = [np.zeros_like(W)]
        self.idx = None

    def forward(self, idx):
        W, = self.params
        self.idx = idx
        out = W[idx]
        return out

    def backward(self, dout):
        dW, = self.grads
        dW[...] = 0
        np.add.at(dW, self.idx, dout)
        return None

4.2 word2vecの改良②

次は中間層以降の処理 ―行列の積とSoftmaxレイヤの計算―を高速化する。

  • Negative Samplingという技法を使用する。計算量を一定に抑えることができる。

4.2.1 中間層以降の計算の問題点

中間層以降の処理については、以下の2つの処理について多くの計算時間が必要になる。

  • 中間層のニューロンと重み行列(Wout)の積 : 中間層のサイズ100×重み行列のサイズ100×100万 となり、非常に大きな計算量が必要となる。
    • 行列の積の計算を「軽く」することが必要である。
  • Softmaxレイヤの計算 : 語彙数が増えると、Softmaxの計算量が増加する。
    • Softmaxに代わる「軽い」計算が求められる。

4.2.2 多値分類から二値分類へ

  • 多値分類 : "you"と"goodbye"を与えたときに、正解である"say"という単語の確率が高くなるようにニューラルネットワークの学習を行った。
    • コンテキストが「you」と「goodbye」のとき、ターゲットとなる単語は何ですか?
  • 二値分類 : Yes/Noで答えを出せるような質問を考える。
    • コンテキストが「you」と「goodbye」のときん、ターゲットとなる単語うは「say」ですか?
    • 「say」に対応する列(単語ベクトル)だけを抽出し、その抽出したベクトルと中間層のニューロンの内積を計算すればよい。

4.2.3 シグモイド関数と交差エントロピー誤差

シグモイド関数は以下のようにして表現できる。

$$y = \dfrac{1}{1+\exp(-x)}$$

シグモイド関数に対して使用される損失関数は、多値分類のときと同じく「交差エントロピー誤差」で計算できる。

$$L=-\left(t\log{y}+(1-t)\log(1-y)\right)$$

  • $y$はシグモイド関数の出力
  • $t$は正解ラベル (0か1のどちらかを取る)

大きい誤差の場合は「大きく」学習し、小さい誤差の学習は「小さく」学習する。

4.2.4 多値分類から二値分類へ(実装編)

class EmbeddingDot:
    def __init__(self, W):
        self.embed = Embedding(W)
        self.params = self.embed.params
        self.grads = self.embed.grads
        self.cache = None

    def forward(self, h, idx):
        target_W = self.embed.forward(idx)
        out = np.sum(target_W * h, axis=1)

        self.cache = (h, target_W)
        return out

    def backward(self, dout):
        h, target_W = self.cache
        dout = dout.reshape(dout.shape[0], 1)

        dtarget_W = dout * h
        self.embed.backward(dtarget_W)
        dh = dout * target_W
        return dh

4.2.5 Negative Sampling

現在の方法では、sayに対してのみ学習を行ったため、sayが正解であれば問題ないが、そうでない場合はsayの確率が小さくなるように学習が進む。

しかし、say以外の単語については学習ができておらず、それ以外の単語については、近似解として負例をいくつかピックアップしてサンプリングして用いる。

4.2.6 Negative Samplingのサンプリング手法

Negative Samplingのサンプリングの手法だが、コーパスの統計データに基づいてサンプリングを行う。

Pythonのnp.random.choise()を使用するが、引数に確率分布を入力しておけば、その確率に基づいてサンプリングを行ってくれる。

確率分布は、以下の式で表現される。もととなる確率分布に対して0.75を累乗する。

$$P'(w_i) = \dfrac{P(w_i)^{0.75}}{\sum_{j}^{n}P(w_j)^{0.75}}$$

4.2.7 Negative Samplingの実装

class NegativeSamplingLoss:
    def __init__(self, W, corpus, power=0.75, sample_size=5):
        self.sample_size = sample_size
        self.sampler = UnigramSampler(corpus, power, sample_size)
        self.loss_layers = [SigmoidWithLoss() for _ in range(sample_size + 1)]
        self.embed_dot_layers = [EmbeddingDot(W) for _ in range(sample_size + 1)]

        self.params, self.grads = [], []
        for layer in self.embed_dot_layers:
            self.params += layer.params
            self.grads += layer.grads

    def forward(self, h, target):
        batch_size = target.shape[0]
        negative_sample = self.sampler.get_negative_sample(target)

        # 正例のフォワード
        score = self.embed_dot_layers[0].forward(h, target)
        correct_label = np.ones(batch_size, dtype=np.int32)
        loss = self.loss_layers[0].forward(score, correct_label)

        # 負例のフォワード
        negative_label = np.zeros(batch_size, dtype=np.int32)
        for i in range(self.sample_size):
            negative_target = negative_sample[:, i]
            score = self.embed_dot_layers[1 + i].forward(h, negative_target)
            loss += self.loss_layers[1 + i].forward(score, negative_label)

        return loss

    def backward(self, dout=1):
        dh = 0
        for l0, l1 in zip(self.loss_layers, self.embed_dot_layers):
            dscore = l0.backward(dout)
            dh += l1.backward(dscore)

        return dh

4.3 改良版word2vecの学習

4.3.1 CBOWモデルの実装

In [1]:
# coding: utf-8
import sys
sys.path.append('..')
import numpy as np
from common import config
# GPUで実行する場合は、下記のコメントアウトを消去(要cupy)
# ===============================================
# config.GPU = True
# ===============================================
import pickle
from common.trainer import Trainer
from common.optimizer import Adam
from cbow import CBOW
from skip_gram import SkipGram
from common.util import create_contexts_target, to_cpu, to_gpu
from dataset import ptb


# ハイパーパラメータの設定
window_size = 5
hidden_size = 100
batch_size = 100
max_epoch = 10

# データの読み込み
corpus, word_to_id, id_to_word = ptb.load_data('train')
vocab_size = len(word_to_id)

contexts, target = create_contexts_target(corpus, window_size)
if config.GPU:
    contexts, target = to_gpu(contexts), to_gpu(target)

# モデルなどの生成
model = CBOW(vocab_size, hidden_size, window_size, corpus)
# model = SkipGram(vocab_size, hidden_size, window_size, corpus)
optimizer = Adam()
trainer = Trainer(model, optimizer)

# 学習開始
trainer.fit(contexts, target, max_epoch, batch_size)
trainer.plot()

# 後ほど利用できるように、必要なデータを保存
word_vecs = model.word_vecs
if config.GPU:
    word_vecs = to_cpu(word_vecs)
params = {}
params['word_vecs'] = word_vecs.astype(np.float16)
params['word_to_id'] = word_to_id
params['id_to_word'] = id_to_word
pkl_file = 'cbow_params.pkl'  # or 'skipgram_params.pkl'
with open(pkl_file, 'wb') as f:
    pickle.dump(params, f, -1)
| epoch 1 |  iter 1 / 9295 | time 0[s] | loss 4.16
| epoch 1 |  iter 21 / 9295 | time 1[s] | loss 4.16
| epoch 1 |  iter 41 / 9295 | time 3[s] | loss 4.15
| epoch 1 |  iter 61 / 9295 | time 4[s] | loss 4.12
| epoch 1 |  iter 81 / 9295 | time 6[s] | loss 4.04
| epoch 1 |  iter 101 / 9295 | time 7[s] | loss 3.92
| epoch 1 |  iter 121 / 9295 | time 8[s] | loss 3.77
| epoch 1 |  iter 141 / 9295 | time 10[s] | loss 3.63
| epoch 1 |  iter 161 / 9295 | time 11[s] | loss 3.48
| epoch 1 |  iter 181 / 9295 | time 13[s] | loss 3.35
| epoch 1 |  iter 201 / 9295 | time 14[s] | loss 3.25
| epoch 1 |  iter 221 / 9295 | time 15[s] | loss 3.15
| epoch 1 |  iter 241 / 9295 | time 17[s] | loss 3.09
| epoch 1 |  iter 261 / 9295 | time 18[s] | loss 3.03
| epoch 1 |  iter 281 / 9295 | time 20[s] | loss 2.97
| epoch 1 |  iter 301 / 9295 | time 21[s] | loss 2.93
| epoch 1 |  iter 321 / 9295 | time 23[s] | loss 2.88
| epoch 1 |  iter 341 / 9295 | time 24[s] | loss 2.84
| epoch 1 |  iter 361 / 9295 | time 25[s] | loss 2.81
| epoch 1 |  iter 381 / 9295 | time 27[s] | loss 2.79
| epoch 1 |  iter 401 / 9295 | time 28[s] | loss 2.76
| epoch 1 |  iter 421 / 9295 | time 30[s] | loss 2.76
| epoch 1 |  iter 441 / 9295 | time 31[s] | loss 2.73
| epoch 1 |  iter 461 / 9295 | time 33[s] | loss 2.69
| epoch 1 |  iter 481 / 9295 | time 34[s] | loss 2.69
| epoch 1 |  iter 501 / 9295 | time 35[s] | loss 2.69
| epoch 1 |  iter 521 / 9295 | time 37[s] | loss 2.68
| epoch 1 |  iter 541 / 9295 | time 38[s] | loss 2.67
| epoch 1 |  iter 561 / 9295 | time 40[s] | loss 2.66
| epoch 1 |  iter 581 / 9295 | time 41[s] | loss 2.67
| epoch 1 |  iter 601 / 9295 | time 42[s] | loss 2.63
| epoch 1 |  iter 621 / 9295 | time 44[s] | loss 2.64
| epoch 1 |  iter 641 / 9295 | time 45[s] | loss 2.61
| epoch 1 |  iter 661 / 9295 | time 47[s] | loss 2.60
| epoch 1 |  iter 681 / 9295 | time 48[s] | loss 2.64
| epoch 1 |  iter 701 / 9295 | time 50[s] | loss 2.63
| epoch 1 |  iter 721 / 9295 | time 51[s] | loss 2.61
| epoch 1 |  iter 741 / 9295 | time 52[s] | loss 2.61
| epoch 1 |  iter 761 / 9295 | time 54[s] | loss 2.61
| epoch 1 |  iter 781 / 9295 | time 55[s] | loss 2.60
| epoch 1 |  iter 801 / 9295 | time 57[s] | loss 2.61
| epoch 1 |  iter 821 / 9295 | time 58[s] | loss 2.57
| epoch 1 |  iter 841 / 9295 | time 60[s] | loss 2.59
| epoch 1 |  iter 861 / 9295 | time 61[s] | loss 2.60
| epoch 1 |  iter 881 / 9295 | time 62[s] | loss 2.57
| epoch 1 |  iter 901 / 9295 | time 64[s] | loss 2.57
| epoch 1 |  iter 921 / 9295 | time 65[s] | loss 2.57
| epoch 1 |  iter 941 / 9295 | time 67[s] | loss 2.54
| epoch 1 |  iter 961 / 9295 | time 68[s] | loss 2.55
| epoch 1 |  iter 981 / 9295 | time 70[s] | loss 2.56
| epoch 1 |  iter 1001 / 9295 | time 71[s] | loss 2.54
| epoch 1 |  iter 1021 / 9295 | time 72[s] | loss 2.57
| epoch 1 |  iter 1041 / 9295 | time 74[s] | loss 2.55
| epoch 1 |  iter 1061 / 9295 | time 75[s] | loss 2.55
| epoch 1 |  iter 1081 / 9295 | time 77[s] | loss 2.55
| epoch 1 |  iter 1101 / 9295 | time 78[s] | loss 2.53
| epoch 1 |  iter 1121 / 9295 | time 80[s] | loss 2.54
| epoch 1 |  iter 1141 / 9295 | time 81[s] | loss 2.51
| epoch 1 |  iter 1161 / 9295 | time 82[s] | loss 2.55
| epoch 1 |  iter 1181 / 9295 | time 84[s] | loss 2.56
| epoch 1 |  iter 1201 / 9295 | time 85[s] | loss 2.55
| epoch 1 |  iter 1221 / 9295 | time 87[s] | loss 2.53
| epoch 1 |  iter 1241 / 9295 | time 88[s] | loss 2.54
| epoch 1 |  iter 1261 / 9295 | time 90[s] | loss 2.51
| epoch 1 |  iter 1281 / 9295 | time 91[s] | loss 2.52
| epoch 1 |  iter 1301 / 9295 | time 92[s] | loss 2.53
| epoch 1 |  iter 1321 / 9295 | time 94[s] | loss 2.53
| epoch 1 |  iter 1341 / 9295 | time 95[s] | loss 2.56
| epoch 1 |  iter 1361 / 9295 | time 97[s] | loss 2.52
| epoch 1 |  iter 1381 / 9295 | time 98[s] | loss 2.53
| epoch 1 |  iter 1401 / 9295 | time 100[s] | loss 2.52
| epoch 1 |  iter 1421 / 9295 | time 101[s] | loss 2.54
| epoch 1 |  iter 1441 / 9295 | time 103[s] | loss 2.54
| epoch 1 |  iter 1461 / 9295 | time 104[s] | loss 2.53
| epoch 1 |  iter 1481 / 9295 | time 105[s] | loss 2.52
| epoch 1 |  iter 1501 / 9295 | time 107[s] | loss 2.54
| epoch 1 |  iter 1521 / 9295 | time 108[s] | loss 2.53
| epoch 1 |  iter 1541 / 9295 | time 110[s] | loss 2.52
| epoch 1 |  iter 1561 / 9295 | time 111[s] | loss 2.47
| epoch 1 |  iter 1581 / 9295 | time 113[s] | loss 2.52
| epoch 1 |  iter 1601 / 9295 | time 114[s] | loss 2.52
| epoch 1 |  iter 1621 / 9295 | time 115[s] | loss 2.48
| epoch 1 |  iter 1641 / 9295 | time 117[s] | loss 2.50
| epoch 1 |  iter 1661 / 9295 | time 118[s] | loss 2.53
| epoch 1 |  iter 1681 / 9295 | time 120[s] | loss 2.50
| epoch 1 |  iter 1701 / 9295 | time 121[s] | loss 2.49
| epoch 1 |  iter 1721 / 9295 | time 122[s] | loss 2.49
| epoch 1 |  iter 1741 / 9295 | time 124[s] | loss 2.50
| epoch 1 |  iter 1761 / 9295 | time 125[s] | loss 2.48
| epoch 1 |  iter 1781 / 9295 | time 127[s] | loss 2.51
| epoch 1 |  iter 1801 / 9295 | time 128[s] | loss 2.51
| epoch 1 |  iter 1821 / 9295 | time 130[s] | loss 2.51
| epoch 1 |  iter 1841 / 9295 | time 131[s] | loss 2.50
| epoch 1 |  iter 1861 / 9295 | time 133[s] | loss 2.52
| epoch 1 |  iter 1881 / 9295 | time 134[s] | loss 2.50
| epoch 1 |  iter 1901 / 9295 | time 136[s] | loss 2.48
| epoch 1 |  iter 1921 / 9295 | time 137[s] | loss 2.49
| epoch 1 |  iter 1941 / 9295 | time 139[s] | loss 2.47
| epoch 1 |  iter 1961 / 9295 | time 140[s] | loss 2.49
| epoch 1 |  iter 1981 / 9295 | time 141[s] | loss 2.51
| epoch 1 |  iter 2001 / 9295 | time 143[s] | loss 2.49
| epoch 1 |  iter 2021 / 9295 | time 144[s] | loss 2.49
| epoch 1 |  iter 2041 / 9295 | time 146[s] | loss 2.46
| epoch 1 |  iter 2061 / 9295 | time 147[s] | loss 2.48
| epoch 1 |  iter 2081 / 9295 | time 149[s] | loss 2.48
| epoch 1 |  iter 2101 / 9295 | time 150[s] | loss 2.49
| epoch 1 |  iter 2121 / 9295 | time 152[s] | loss 2.46
| epoch 1 |  iter 2141 / 9295 | time 153[s] | loss 2.46
| epoch 1 |  iter 2161 / 9295 | time 155[s] | loss 2.48
| epoch 1 |  iter 2181 / 9295 | time 156[s] | loss 2.49
| epoch 1 |  iter 2201 / 9295 | time 158[s] | loss 2.48
| epoch 1 |  iter 2221 / 9295 | time 159[s] | loss 2.50
| epoch 1 |  iter 2241 / 9295 | time 160[s] | loss 2.49
| epoch 1 |  iter 2261 / 9295 | time 162[s] | loss 2.49
| epoch 1 |  iter 2281 / 9295 | time 163[s] | loss 2.50
| epoch 1 |  iter 2301 / 9295 | time 165[s] | loss 2.47
| epoch 1 |  iter 2321 / 9295 | time 166[s] | loss 2.46
| epoch 1 |  iter 2341 / 9295 | time 168[s] | loss 2.51
| epoch 1 |  iter 2361 / 9295 | time 169[s] | loss 2.48
| epoch 1 |  iter 2381 / 9295 | time 170[s] | loss 2.45
| epoch 1 |  iter 2401 / 9295 | time 172[s] | loss 2.46
| epoch 1 |  iter 2421 / 9295 | time 173[s] | loss 2.47
| epoch 1 |  iter 2441 / 9295 | time 175[s] | loss 2.48
| epoch 1 |  iter 2461 / 9295 | time 176[s] | loss 2.47
| epoch 1 |  iter 2481 / 9295 | time 178[s] | loss 2.45
| epoch 1 |  iter 2501 / 9295 | time 179[s] | loss 2.48
| epoch 1 |  iter 2521 / 9295 | time 181[s] | loss 2.47
| epoch 1 |  iter 2541 / 9295 | time 182[s] | loss 2.45
| epoch 1 |  iter 2561 / 9295 | time 184[s] | loss 2.47
| epoch 1 |  iter 2581 / 9295 | time 185[s] | loss 2.46
| epoch 1 |  iter 2601 / 9295 | time 187[s] | loss 2.45
| epoch 1 |  iter 2621 / 9295 | time 188[s] | loss 2.47
| epoch 1 |  iter 2641 / 9295 | time 190[s] | loss 2.47
| epoch 1 |  iter 2661 / 9295 | time 191[s] | loss 2.48
| epoch 1 |  iter 2681 / 9295 | time 193[s] | loss 2.47
| epoch 1 |  iter 2701 / 9295 | time 194[s] | loss 2.45
| epoch 1 |  iter 2721 / 9295 | time 195[s] | loss 2.44
| epoch 1 |  iter 2741 / 9295 | time 197[s] | loss 2.44
| epoch 1 |  iter 2761 / 9295 | time 198[s] | loss 2.49
| epoch 1 |  iter 2781 / 9295 | time 200[s] | loss 2.46
| epoch 1 |  iter 2801 / 9295 | time 201[s] | loss 2.46
| epoch 1 |  iter 2821 / 9295 | time 203[s] | loss 2.49
| epoch 1 |  iter 2841 / 9295 | time 204[s] | loss 2.45
| epoch 1 |  iter 2861 / 9295 | time 206[s] | loss 2.47
| epoch 1 |  iter 2881 / 9295 | time 207[s] | loss 2.45
| epoch 1 |  iter 2901 / 9295 | time 208[s] | loss 2.43
| epoch 1 |  iter 2921 / 9295 | time 210[s] | loss 2.45
| epoch 1 |  iter 2941 / 9295 | time 211[s] | loss 2.44
| epoch 1 |  iter 2961 / 9295 | time 213[s] | loss 2.47
| epoch 1 |  iter 2981 / 9295 | time 214[s] | loss 2.44
| epoch 1 |  iter 3001 / 9295 | time 216[s] | loss 2.43
| epoch 1 |  iter 3021 / 9295 | time 217[s] | loss 2.45
| epoch 1 |  iter 3041 / 9295 | time 219[s] | loss 2.44
| epoch 1 |  iter 3061 / 9295 | time 220[s] | loss 2.46
| epoch 1 |  iter 3081 / 9295 | time 222[s] | loss 2.43
| epoch 1 |  iter 3101 / 9295 | time 223[s] | loss 2.43
| epoch 1 |  iter 3121 / 9295 | time 225[s] | loss 2.43
| epoch 1 |  iter 3141 / 9295 | time 226[s] | loss 2.46
| epoch 1 |  iter 3161 / 9295 | time 227[s] | loss 2.42
| epoch 1 |  iter 3181 / 9295 | time 229[s] | loss 2.47
| epoch 1 |  iter 3201 / 9295 | time 230[s] | loss 2.44
| epoch 1 |  iter 3221 / 9295 | time 232[s] | loss 2.44
| epoch 1 |  iter 3241 / 9295 | time 233[s] | loss 2.45
| epoch 1 |  iter 3261 / 9295 | time 235[s] | loss 2.46
| epoch 1 |  iter 3281 / 9295 | time 236[s] | loss 2.41
| epoch 1 |  iter 3301 / 9295 | time 237[s] | loss 2.44
| epoch 1 |  iter 3321 / 9295 | time 239[s] | loss 2.42
| epoch 1 |  iter 3341 / 9295 | time 240[s] | loss 2.44
| epoch 1 |  iter 3361 / 9295 | time 242[s] | loss 2.42
| epoch 1 |  iter 3381 / 9295 | time 243[s] | loss 2.43
| epoch 1 |  iter 3401 / 9295 | time 245[s] | loss 2.45
| epoch 1 |  iter 3421 / 9295 | time 246[s] | loss 2.45
| epoch 1 |  iter 3441 / 9295 | time 248[s] | loss 2.44
| epoch 1 |  iter 3461 / 9295 | time 249[s] | loss 2.43
| epoch 1 |  iter 3481 / 9295 | time 250[s] | loss 2.44
| epoch 1 |  iter 3501 / 9295 | time 252[s] | loss 2.42
| epoch 1 |  iter 3521 / 9295 | time 253[s] | loss 2.44
| epoch 1 |  iter 3541 / 9295 | time 255[s] | loss 2.40
| epoch 1 |  iter 3561 / 9295 | time 256[s] | loss 2.41
| epoch 1 |  iter 3581 / 9295 | time 257[s] | loss 2.41
| epoch 1 |  iter 3601 / 9295 | time 259[s] | loss 2.44
| epoch 1 |  iter 3621 / 9295 | time 260[s] | loss 2.41
| epoch 1 |  iter 3641 / 9295 | time 262[s] | loss 2.43
| epoch 1 |  iter 3661 / 9295 | time 263[s] | loss 2.41
| epoch 1 |  iter 3681 / 9295 | time 265[s] | loss 2.43
| epoch 1 |  iter 3701 / 9295 | time 266[s] | loss 2.44
| epoch 1 |  iter 3721 / 9295 | time 267[s] | loss 2.43
| epoch 1 |  iter 3741 / 9295 | time 269[s] | loss 2.44
| epoch 1 |  iter 3761 / 9295 | time 270[s] | loss 2.41
| epoch 1 |  iter 3781 / 9295 | time 272[s] | loss 2.41
| epoch 1 |  iter 3801 / 9295 | time 273[s] | loss 2.41
| epoch 1 |  iter 3821 / 9295 | time 275[s] | loss 2.42
| epoch 1 |  iter 3841 / 9295 | time 276[s] | loss 2.38
| epoch 1 |  iter 3861 / 9295 | time 277[s] | loss 2.41
| epoch 1 |  iter 3881 / 9295 | time 279[s] | loss 2.41
| epoch 1 |  iter 3901 / 9295 | time 280[s] | loss 2.41
| epoch 1 |  iter 3921 / 9295 | time 282[s] | loss 2.41
| epoch 1 |  iter 3941 / 9295 | time 283[s] | loss 2.42
| epoch 1 |  iter 3961 / 9295 | time 284[s] | loss 2.38
| epoch 1 |  iter 3981 / 9295 | time 286[s] | loss 2.40
| epoch 1 |  iter 4001 / 9295 | time 287[s] | loss 2.37
| epoch 1 |  iter 4021 / 9295 | time 289[s] | loss 2.42
| epoch 1 |  iter 4041 / 9295 | time 290[s] | loss 2.39
| epoch 1 |  iter 4061 / 9295 | time 292[s] | loss 2.40
| epoch 1 |  iter 4081 / 9295 | time 293[s] | loss 2.39
| epoch 1 |  iter 4101 / 9295 | time 294[s] | loss 2.39
| epoch 1 |  iter 4121 / 9295 | time 296[s] | loss 2.37
| epoch 1 |  iter 4141 / 9295 | time 297[s] | loss 2.38
| epoch 1 |  iter 4161 / 9295 | time 299[s] | loss 2.43
| epoch 1 |  iter 4181 / 9295 | time 300[s] | loss 2.39
| epoch 1 |  iter 4201 / 9295 | time 301[s] | loss 2.40
| epoch 1 |  iter 4221 / 9295 | time 303[s] | loss 2.38
| epoch 1 |  iter 4241 / 9295 | time 304[s] | loss 2.41
| epoch 1 |  iter 4261 / 9295 | time 306[s] | loss 2.39
| epoch 1 |  iter 4281 / 9295 | time 307[s] | loss 2.41
| epoch 1 |  iter 4301 / 9295 | time 308[s] | loss 2.37
| epoch 1 |  iter 4321 / 9295 | time 310[s] | loss 2.37
| epoch 1 |  iter 4341 / 9295 | time 311[s] | loss 2.37
| epoch 1 |  iter 4361 / 9295 | time 313[s] | loss 2.38
| epoch 1 |  iter 4381 / 9295 | time 314[s] | loss 2.40
| epoch 1 |  iter 4401 / 9295 | time 316[s] | loss 2.38
| epoch 1 |  iter 4421 / 9295 | time 317[s] | loss 2.41
| epoch 1 |  iter 4441 / 9295 | time 318[s] | loss 2.38
| epoch 1 |  iter 4461 / 9295 | time 320[s] | loss 2.40
| epoch 1 |  iter 4481 / 9295 | time 321[s] | loss 2.42
| epoch 1 |  iter 4501 / 9295 | time 323[s] | loss 2.36
| epoch 1 |  iter 4521 / 9295 | time 324[s] | loss 2.36
| epoch 1 |  iter 4541 / 9295 | time 326[s] | loss 2.40
| epoch 1 |  iter 4561 / 9295 | time 327[s] | loss 2.37
| epoch 1 |  iter 4581 / 9295 | time 328[s] | loss 2.37
| epoch 1 |  iter 4601 / 9295 | time 330[s] | loss 2.34
| epoch 1 |  iter 4621 / 9295 | time 331[s] | loss 2.37
| epoch 1 |  iter 4641 / 9295 | time 333[s] | loss 2.37
| epoch 1 |  iter 4661 / 9295 | time 334[s] | loss 2.39
| epoch 1 |  iter 4681 / 9295 | time 336[s] | loss 2.38
| epoch 1 |  iter 4701 / 9295 | time 337[s] | loss 2.38
| epoch 1 |  iter 4721 / 9295 | time 338[s] | loss 2.39
| epoch 1 |  iter 4741 / 9295 | time 340[s] | loss 2.36
| epoch 1 |  iter 4761 / 9295 | time 341[s] | loss 2.39
| epoch 1 |  iter 4781 / 9295 | time 343[s] | loss 2.37
| epoch 1 |  iter 4801 / 9295 | time 344[s] | loss 2.39
| epoch 1 |  iter 4821 / 9295 | time 345[s] | loss 2.36
| epoch 1 |  iter 4841 / 9295 | time 347[s] | loss 2.39
| epoch 1 |  iter 4861 / 9295 | time 348[s] | loss 2.37
| epoch 1 |  iter 4881 / 9295 | time 350[s] | loss 2.36
| epoch 1 |  iter 4901 / 9295 | time 351[s] | loss 2.37
| epoch 1 |  iter 4921 / 9295 | time 353[s] | loss 2.38
| epoch 1 |  iter 4941 / 9295 | time 354[s] | loss 2.37
| epoch 1 |  iter 4961 / 9295 | time 356[s] | loss 2.35
| epoch 1 |  iter 4981 / 9295 | time 357[s] | loss 2.35
| epoch 1 |  iter 5001 / 9295 | time 359[s] | loss 2.34
| epoch 1 |  iter 5021 / 9295 | time 360[s] | loss 2.35
| epoch 1 |  iter 5041 / 9295 | time 362[s] | loss 2.36
| epoch 1 |  iter 5061 / 9295 | time 363[s] | loss 2.32
| epoch 1 |  iter 5081 / 9295 | time 365[s] | loss 2.37
| epoch 1 |  iter 5101 / 9295 | time 366[s] | loss 2.36
| epoch 1 |  iter 5121 / 9295 | time 367[s] | loss 2.35
| epoch 1 |  iter 5141 / 9295 | time 369[s] | loss 2.33
| epoch 1 |  iter 5161 / 9295 | time 370[s] | loss 2.35
| epoch 1 |  iter 5181 / 9295 | time 372[s] | loss 2.38
| epoch 1 |  iter 5201 / 9295 | time 373[s] | loss 2.35
| epoch 1 |  iter 5221 / 9295 | time 375[s] | loss 2.37
| epoch 1 |  iter 5241 / 9295 | time 376[s] | loss 2.34
| epoch 1 |  iter 5261 / 9295 | time 378[s] | loss 2.37
| epoch 1 |  iter 5281 / 9295 | time 379[s] | loss 2.30
| epoch 1 |  iter 5301 / 9295 | time 380[s] | loss 2.37
| epoch 1 |  iter 5321 / 9295 | time 382[s] | loss 2.38
| epoch 1 |  iter 5341 / 9295 | time 383[s] | loss 2.38
| epoch 1 |  iter 5361 / 9295 | time 385[s] | loss 2.34
| epoch 1 |  iter 5381 / 9295 | time 386[s] | loss 2.34
| epoch 1 |  iter 5401 / 9295 | time 387[s] | loss 2.35
| epoch 1 |  iter 5421 / 9295 | time 389[s] | loss 2.32
| epoch 1 |  iter 5441 / 9295 | time 390[s] | loss 2.39
| epoch 1 |  iter 5461 / 9295 | time 392[s] | loss 2.33
| epoch 1 |  iter 5481 / 9295 | time 393[s] | loss 2.36
| epoch 1 |  iter 5501 / 9295 | time 395[s] | loss 2.34
| epoch 1 |  iter 5521 / 9295 | time 396[s] | loss 2.35
| epoch 1 |  iter 5541 / 9295 | time 398[s] | loss 2.35
| epoch 1 |  iter 5561 / 9295 | time 399[s] | loss 2.36
| epoch 1 |  iter 5581 / 9295 | time 401[s] | loss 2.36
| epoch 1 |  iter 5601 / 9295 | time 402[s] | loss 2.36
| epoch 1 |  iter 5621 / 9295 | time 403[s] | loss 2.32
| epoch 1 |  iter 5641 / 9295 | time 405[s] | loss 2.30
| epoch 1 |  iter 5661 / 9295 | time 406[s] | loss 2.29
| epoch 1 |  iter 5681 / 9295 | time 408[s] | loss 2.30
| epoch 1 |  iter 5701 / 9295 | time 409[s] | loss 2.34
| epoch 1 |  iter 5721 / 9295 | time 411[s] | loss 2.36
| epoch 1 |  iter 5741 / 9295 | time 412[s] | loss 2.31
| epoch 1 |  iter 5761 / 9295 | time 413[s] | loss 2.32
| epoch 1 |  iter 5781 / 9295 | time 415[s] | loss 2.35
| epoch 1 |  iter 5801 / 9295 | time 416[s] | loss 2.31
| epoch 1 |  iter 5821 / 9295 | time 418[s] | loss 2.36
| epoch 1 |  iter 5841 / 9295 | time 419[s] | loss 2.32
| epoch 1 |  iter 5861 / 9295 | time 420[s] | loss 2.35
| epoch 1 |  iter 5881 / 9295 | time 422[s] | loss 2.34
| epoch 1 |  iter 5901 / 9295 | time 423[s] | loss 2.34
| epoch 1 |  iter 5921 / 9295 | time 425[s] | loss 2.33
| epoch 1 |  iter 5941 / 9295 | time 426[s] | loss 2.33
| epoch 1 |  iter 5961 / 9295 | time 428[s] | loss 2.34
| epoch 1 |  iter 5981 / 9295 | time 429[s] | loss 2.32
| epoch 1 |  iter 6001 / 9295 | time 430[s] | loss 2.33
| epoch 1 |  iter 6021 / 9295 | time 432[s] | loss 2.31
| epoch 1 |  iter 6041 / 9295 | time 433[s] | loss 2.31
| epoch 1 |  iter 6061 / 9295 | time 435[s] | loss 2.32
| epoch 1 |  iter 6081 / 9295 | time 436[s] | loss 2.30
| epoch 1 |  iter 6101 / 9295 | time 437[s] | loss 2.31
| epoch 1 |  iter 6121 / 9295 | time 439[s] | loss 2.32
| epoch 1 |  iter 6141 / 9295 | time 440[s] | loss 2.31
| epoch 1 |  iter 6161 / 9295 | time 442[s] | loss 2.34
| epoch 1 |  iter 6181 / 9295 | time 443[s] | loss 2.32
| epoch 1 |  iter 6201 / 9295 | time 444[s] | loss 2.29
| epoch 1 |  iter 6221 / 9295 | time 446[s] | loss 2.31
| epoch 1 |  iter 6241 / 9295 | time 447[s] | loss 2.33
| epoch 1 |  iter 6261 / 9295 | time 449[s] | loss 2.32
| epoch 1 |  iter 6281 / 9295 | time 450[s] | loss 2.33
| epoch 1 |  iter 6301 / 9295 | time 452[s] | loss 2.31
| epoch 1 |  iter 6321 / 9295 | time 453[s] | loss 2.32
| epoch 1 |  iter 6341 / 9295 | time 454[s] | loss 2.30
| epoch 1 |  iter 6361 / 9295 | time 456[s] | loss 2.28
| epoch 1 |  iter 6381 / 9295 | time 457[s] | loss 2.32
| epoch 1 |  iter 6401 / 9295 | time 459[s] | loss 2.30
| epoch 1 |  iter 6421 / 9295 | time 460[s] | loss 2.32
| epoch 1 |  iter 6441 / 9295 | time 461[s] | loss 2.30
| epoch 1 |  iter 6461 / 9295 | time 463[s] | loss 2.31
| epoch 1 |  iter 6481 / 9295 | time 464[s] | loss 2.31
| epoch 1 |  iter 6501 / 9295 | time 466[s] | loss 2.31
| epoch 1 |  iter 6521 / 9295 | time 467[s] | loss 2.29
| epoch 1 |  iter 6541 / 9295 | time 469[s] | loss 2.31
| epoch 1 |  iter 6561 / 9295 | time 470[s] | loss 2.29
| epoch 1 |  iter 6581 / 9295 | time 471[s] | loss 2.33
| epoch 1 |  iter 6601 / 9295 | time 473[s] | loss 2.28
| epoch 1 |  iter 6621 / 9295 | time 474[s] | loss 2.32
| epoch 1 |  iter 6641 / 9295 | time 476[s] | loss 2.29
| epoch 1 |  iter 6661 / 9295 | time 477[s] | loss 2.29
| epoch 1 |  iter 6681 / 9295 | time 478[s] | loss 2.30
| epoch 1 |  iter 6701 / 9295 | time 480[s] | loss 2.30
| epoch 1 |  iter 6721 / 9295 | time 481[s] | loss 2.32
| epoch 1 |  iter 6741 / 9295 | time 483[s] | loss 2.30
| epoch 1 |  iter 6761 / 9295 | time 484[s] | loss 2.30
| epoch 1 |  iter 6781 / 9295 | time 486[s] | loss 2.30
| epoch 1 |  iter 6801 / 9295 | time 487[s] | loss 2.29
| epoch 1 |  iter 6821 / 9295 | time 488[s] | loss 2.30
| epoch 1 |  iter 6841 / 9295 | time 490[s] | loss 2.28
| epoch 1 |  iter 6861 / 9295 | time 491[s] | loss 2.30
| epoch 1 |  iter 6881 / 9295 | time 493[s] | loss 2.26
| epoch 1 |  iter 6901 / 9295 | time 494[s] | loss 2.29
| epoch 1 |  iter 6921 / 9295 | time 495[s] | loss 2.28
| epoch 1 |  iter 6941 / 9295 | time 497[s] | loss 2.29
| epoch 1 |  iter 6961 / 9295 | time 498[s] | loss 2.28
| epoch 1 |  iter 6981 / 9295 | time 500[s] | loss 2.26
| epoch 1 |  iter 7001 / 9295 | time 501[s] | loss 2.31
| epoch 1 |  iter 7021 / 9295 | time 503[s] | loss 2.29
| epoch 1 |  iter 7041 / 9295 | time 504[s] | loss 2.26
| epoch 1 |  iter 7061 / 9295 | time 505[s] | loss 2.30
| epoch 1 |  iter 7081 / 9295 | time 507[s] | loss 2.28
| epoch 1 |  iter 7101 / 9295 | time 508[s] | loss 2.28
| epoch 1 |  iter 7121 / 9295 | time 510[s] | loss 2.31
| epoch 1 |  iter 7141 / 9295 | time 511[s] | loss 2.29
| epoch 1 |  iter 7161 / 9295 | time 512[s] | loss 2.27
| epoch 1 |  iter 7181 / 9295 | time 514[s] | loss 2.30
| epoch 1 |  iter 7201 / 9295 | time 516[s] | loss 2.29
| epoch 1 |  iter 7221 / 9295 | time 517[s] | loss 2.30
| epoch 1 |  iter 7241 / 9295 | time 519[s] | loss 2.29
| epoch 1 |  iter 7261 / 9295 | time 520[s] | loss 2.31
| epoch 1 |  iter 7281 / 9295 | time 522[s] | loss 2.26
| epoch 1 |  iter 7301 / 9295 | time 523[s] | loss 2.30
| epoch 1 |  iter 7321 / 9295 | time 525[s] | loss 2.27
| epoch 1 |  iter 7341 / 9295 | time 526[s] | loss 2.29
| epoch 1 |  iter 7361 / 9295 | time 528[s] | loss 2.32
| epoch 1 |  iter 7381 / 9295 | time 529[s] | loss 2.28
| epoch 1 |  iter 7401 / 9295 | time 531[s] | loss 2.30
| epoch 1 |  iter 7421 / 9295 | time 532[s] | loss 2.26
| epoch 1 |  iter 7441 / 9295 | time 534[s] | loss 2.28
| epoch 1 |  iter 7461 / 9295 | time 535[s] | loss 2.24
| epoch 1 |  iter 7481 / 9295 | time 537[s] | loss 2.26
| epoch 1 |  iter 7501 / 9295 | time 538[s] | loss 2.25
| epoch 1 |  iter 7521 / 9295 | time 540[s] | loss 2.24
| epoch 1 |  iter 7541 / 9295 | time 541[s] | loss 2.25
| epoch 1 |  iter 7561 / 9295 | time 543[s] | loss 2.32
| epoch 1 |  iter 7581 / 9295 | time 545[s] | loss 2.30
| epoch 1 |  iter 7601 / 9295 | time 546[s] | loss 2.29
| epoch 1 |  iter 7621 / 9295 | time 548[s] | loss 2.27
| epoch 1 |  iter 7641 / 9295 | time 549[s] | loss 2.27
| epoch 1 |  iter 7661 / 9295 | time 551[s] | loss 2.26
| epoch 1 |  iter 7681 / 9295 | time 553[s] | loss 2.24
| epoch 1 |  iter 7701 / 9295 | time 554[s] | loss 2.26
| epoch 1 |  iter 7721 / 9295 | time 556[s] | loss 2.28
| epoch 1 |  iter 7741 / 9295 | time 557[s] | loss 2.29
| epoch 1 |  iter 7761 / 9295 | time 559[s] | loss 2.28
| epoch 1 |  iter 7781 / 9295 | time 561[s] | loss 2.25
| epoch 1 |  iter 7801 / 9295 | time 563[s] | loss 2.28
| epoch 1 |  iter 7821 / 9295 | time 564[s] | loss 2.28
| epoch 1 |  iter 7841 / 9295 | time 566[s] | loss 2.26
| epoch 1 |  iter 7861 / 9295 | time 567[s] | loss 2.29
| epoch 1 |  iter 7881 / 9295 | time 569[s] | loss 2.25
| epoch 1 |  iter 7901 / 9295 | time 571[s] | loss 2.29
| epoch 1 |  iter 7921 / 9295 | time 572[s] | loss 2.23
| epoch 1 |  iter 7941 / 9295 | time 573[s] | loss 2.24
| epoch 1 |  iter 7961 / 9295 | time 575[s] | loss 2.27
| epoch 1 |  iter 7981 / 9295 | time 576[s] | loss 2.27
| epoch 1 |  iter 8001 / 9295 | time 578[s] | loss 2.25
| epoch 1 |  iter 8021 / 9295 | time 579[s] | loss 2.25
| epoch 1 |  iter 8041 / 9295 | time 581[s] | loss 2.26
| epoch 1 |  iter 8061 / 9295 | time 582[s] | loss 2.27
| epoch 1 |  iter 8081 / 9295 | time 584[s] | loss 2.29
| epoch 1 |  iter 8101 / 9295 | time 586[s] | loss 2.25
| epoch 1 |  iter 8121 / 9295 | time 587[s] | loss 2.27
| epoch 1 |  iter 8141 / 9295 | time 589[s] | loss 2.24
| epoch 1 |  iter 8161 / 9295 | time 590[s] | loss 2.26
| epoch 1 |  iter 8181 / 9295 | time 592[s] | loss 2.25
| epoch 1 |  iter 8201 / 9295 | time 593[s] | loss 2.25
| epoch 1 |  iter 8221 / 9295 | time 595[s] | loss 2.26
| epoch 1 |  iter 8241 / 9295 | time 596[s] | loss 2.24
| epoch 1 |  iter 8261 / 9295 | time 597[s] | loss 2.22
| epoch 1 |  iter 8281 / 9295 | time 599[s] | loss 2.20
| epoch 1 |  iter 8301 / 9295 | time 601[s] | loss 2.24
| epoch 1 |  iter 8321 / 9295 | time 602[s] | loss 2.25
| epoch 1 |  iter 8341 / 9295 | time 604[s] | loss 2.23
| epoch 1 |  iter 8361 / 9295 | time 605[s] | loss 2.23
| epoch 1 |  iter 8381 / 9295 | time 607[s] | loss 2.27
| epoch 1 |  iter 8401 / 9295 | time 608[s] | loss 2.24
| epoch 1 |  iter 8421 / 9295 | time 609[s] | loss 2.22
| epoch 1 |  iter 8441 / 9295 | time 611[s] | loss 2.25
| epoch 1 |  iter 8461 / 9295 | time 612[s] | loss 2.25
| epoch 1 |  iter 8481 / 9295 | time 614[s] | loss 2.25
| epoch 1 |  iter 8501 / 9295 | time 615[s] | loss 2.26
| epoch 1 |  iter 8521 / 9295 | time 617[s] | loss 2.25
| epoch 1 |  iter 8541 / 9295 | time 618[s] | loss 2.24
| epoch 1 |  iter 8561 / 9295 | time 620[s] | loss 2.22
| epoch 1 |  iter 8581 / 9295 | time 621[s] | loss 2.26
| epoch 1 |  iter 8601 / 9295 | time 623[s] | loss 2.24
| epoch 1 |  iter 8621 / 9295 | time 624[s] | loss 2.24
| epoch 1 |  iter 8641 / 9295 | time 625[s] | loss 2.28
| epoch 1 |  iter 8661 / 9295 | time 627[s] | loss 2.26
| epoch 1 |  iter 8681 / 9295 | time 629[s] | loss 2.21
| epoch 1 |  iter 8701 / 9295 | time 630[s] | loss 2.20
| epoch 1 |  iter 8721 / 9295 | time 631[s] | loss 2.24
| epoch 1 |  iter 8741 / 9295 | time 633[s] | loss 2.29
| epoch 1 |  iter 8761 / 9295 | time 634[s] | loss 2.24
| epoch 1 |  iter 8781 / 9295 | time 636[s] | loss 2.21
| epoch 1 |  iter 8801 / 9295 | time 637[s] | loss 2.20
| epoch 1 |  iter 8821 / 9295 | time 639[s] | loss 2.20
| epoch 1 |  iter 8841 / 9295 | time 640[s] | loss 2.24
| epoch 1 |  iter 8861 / 9295 | time 642[s] | loss 2.25
| epoch 1 |  iter 8881 / 9295 | time 643[s] | loss 2.26
| epoch 1 |  iter 8901 / 9295 | time 645[s] | loss 2.22
| epoch 1 |  iter 8921 / 9295 | time 646[s] | loss 2.23
| epoch 1 |  iter 8941 / 9295 | time 648[s] | loss 2.29
| epoch 1 |  iter 8961 / 9295 | time 649[s] | loss 2.23
| epoch 1 |  iter 8981 / 9295 | time 650[s] | loss 2.24
| epoch 1 |  iter 9001 / 9295 | time 652[s] | loss 2.24
| epoch 1 |  iter 9021 / 9295 | time 653[s] | loss 2.24
| epoch 1 |  iter 9041 / 9295 | time 655[s] | loss 2.25
| epoch 1 |  iter 9061 / 9295 | time 656[s] | loss 2.23
| epoch 1 |  iter 9081 / 9295 | time 657[s] | loss 2.22
| epoch 1 |  iter 9101 / 9295 | time 659[s] | loss 2.21
| epoch 1 |  iter 9121 / 9295 | time 660[s] | loss 2.24
| epoch 1 |  iter 9141 / 9295 | time 661[s] | loss 2.24
| epoch 1 |  iter 9161 / 9295 | time 663[s] | loss 2.25
| epoch 1 |  iter 9181 / 9295 | time 664[s] | loss 2.26
| epoch 1 |  iter 9201 / 9295 | time 666[s] | loss 2.22
| epoch 1 |  iter 9221 / 9295 | time 667[s] | loss 2.20
| epoch 1 |  iter 9241 / 9295 | time 668[s] | loss 2.24
| epoch 1 |  iter 9261 / 9295 | time 670[s] | loss 2.17
| epoch 1 |  iter 9281 / 9295 | time 671[s] | loss 2.25
| epoch 2 |  iter 1 / 9295 | time 672[s] | loss 2.23
| epoch 2 |  iter 21 / 9295 | time 674[s] | loss 2.20
| epoch 2 |  iter 41 / 9295 | time 675[s] | loss 2.20
| epoch 2 |  iter 61 / 9295 | time 677[s] | loss 2.16
| epoch 2 |  iter 81 / 9295 | time 678[s] | loss 2.20
| epoch 2 |  iter 101 / 9295 | time 680[s] | loss 2.17
| epoch 2 |  iter 121 / 9295 | time 681[s] | loss 2.19
| epoch 2 |  iter 141 / 9295 | time 683[s] | loss 2.20
| epoch 2 |  iter 161 / 9295 | time 684[s] | loss 2.21
| epoch 2 |  iter 181 / 9295 | time 686[s] | loss 2.18
| epoch 2 |  iter 201 / 9295 | time 687[s] | loss 2.18
| epoch 2 |  iter 221 / 9295 | time 688[s] | loss 2.16
| epoch 2 |  iter 241 / 9295 | time 690[s] | loss 2.16
| epoch 2 |  iter 261 / 9295 | time 692[s] | loss 2.17
| epoch 2 |  iter 281 / 9295 | time 693[s] | loss 2.16
| epoch 2 |  iter 301 / 9295 | time 694[s] | loss 2.17
| epoch 2 |  iter 321 / 9295 | time 696[s] | loss 2.18
| epoch 2 |  iter 341 / 9295 | time 698[s] | loss 2.18
| epoch 2 |  iter 361 / 9295 | time 699[s] | loss 2.19
| epoch 2 |  iter 381 / 9295 | time 701[s] | loss 2.16
| epoch 2 |  iter 401 / 9295 | time 702[s] | loss 2.19
| epoch 2 |  iter 421 / 9295 | time 704[s] | loss 2.16
| epoch 2 |  iter 441 / 9295 | time 706[s] | loss 2.19
| epoch 2 |  iter 461 / 9295 | time 707[s] | loss 2.18
| epoch 2 |  iter 481 / 9295 | time 709[s] | loss 2.16
| epoch 2 |  iter 501 / 9295 | time 711[s] | loss 2.18
| epoch 2 |  iter 521 / 9295 | time 712[s] | loss 2.19
| epoch 2 |  iter 541 / 9295 | time 714[s] | loss 2.18
| epoch 2 |  iter 561 / 9295 | time 716[s] | loss 2.14
| epoch 2 |  iter 581 / 9295 | time 717[s] | loss 2.22
| epoch 2 |  iter 601 / 9295 | time 719[s] | loss 2.18
| epoch 2 |  iter 621 / 9295 | time 721[s] | loss 2.16
| epoch 2 |  iter 641 / 9295 | time 722[s] | loss 2.16
| epoch 2 |  iter 661 / 9295 | time 723[s] | loss 2.17
| epoch 2 |  iter 681 / 9295 | time 725[s] | loss 2.16
| epoch 2 |  iter 701 / 9295 | time 726[s] | loss 2.19
| epoch 2 |  iter 721 / 9295 | time 727[s] | loss 2.18
| epoch 2 |  iter 741 / 9295 | time 729[s] | loss 2.21
| epoch 2 |  iter 761 / 9295 | time 730[s] | loss 2.13
| epoch 2 |  iter 781 / 9295 | time 732[s] | loss 2.11
| epoch 2 |  iter 801 / 9295 | time 733[s] | loss 2.16
| epoch 2 |  iter 821 / 9295 | time 734[s] | loss 2.17
| epoch 2 |  iter 841 / 9295 | time 736[s] | loss 2.16
| epoch 2 |  iter 861 / 9295 | time 737[s] | loss 2.17
| epoch 2 |  iter 881 / 9295 | time 739[s] | loss 2.16
| epoch 2 |  iter 901 / 9295 | time 740[s] | loss 2.17
| epoch 2 |  iter 921 / 9295 | time 741[s] | loss 2.14
| epoch 2 |  iter 941 / 9295 | time 743[s] | loss 2.16
| epoch 2 |  iter 961 / 9295 | time 744[s] | loss 2.16
| epoch 2 |  iter 981 / 9295 | time 746[s] | loss 2.15
| epoch 2 |  iter 1001 / 9295 | time 747[s] | loss 2.20
| epoch 2 |  iter 1021 / 9295 | time 749[s] | loss 2.14
| epoch 2 |  iter 1041 / 9295 | time 750[s] | loss 2.17
| epoch 2 |  iter 1061 / 9295 | time 752[s] | loss 2.14
| epoch 2 |  iter 1081 / 9295 | time 753[s] | loss 2.14
| epoch 2 |  iter 1101 / 9295 | time 754[s] | loss 2.18
| epoch 2 |  iter 1121 / 9295 | time 756[s] | loss 2.19
| epoch 2 |  iter 1141 / 9295 | time 757[s] | loss 2.15
| epoch 2 |  iter 1161 / 9295 | time 759[s] | loss 2.15
| epoch 2 |  iter 1181 / 9295 | time 760[s] | loss 2.18
| epoch 2 |  iter 1201 / 9295 | time 762[s] | loss 2.16
| epoch 2 |  iter 1221 / 9295 | time 763[s] | loss 2.18
| epoch 2 |  iter 1241 / 9295 | time 765[s] | loss 2.16
| epoch 2 |  iter 1261 / 9295 | time 766[s] | loss 2.14
| epoch 2 |  iter 1281 / 9295 | time 767[s] | loss 2.16
| epoch 2 |  iter 1301 / 9295 | time 769[s] | loss 2.12
| epoch 2 |  iter 1321 / 9295 | time 770[s] | loss 2.16
| epoch 2 |  iter 1341 / 9295 | time 772[s] | loss 2.19
| epoch 2 |  iter 1361 / 9295 | time 773[s] | loss 2.14
| epoch 2 |  iter 1381 / 9295 | time 775[s] | loss 2.18
| epoch 2 |  iter 1401 / 9295 | time 776[s] | loss 2.14
| epoch 2 |  iter 1421 / 9295 | time 777[s] | loss 2.15
| epoch 2 |  iter 1441 / 9295 | time 779[s] | loss 2.18
| epoch 2 |  iter 1461 / 9295 | time 780[s] | loss 2.14
| epoch 2 |  iter 1481 / 9295 | time 782[s] | loss 2.16
| epoch 2 |  iter 1501 / 9295 | time 783[s] | loss 2.19
| epoch 2 |  iter 1521 / 9295 | time 784[s] | loss 2.17
| epoch 2 |  iter 1541 / 9295 | time 786[s] | loss 2.16
| epoch 2 |  iter 1561 / 9295 | time 787[s] | loss 2.15
| epoch 2 |  iter 1581 / 9295 | time 789[s] | loss 2.16
| epoch 2 |  iter 1601 / 9295 | time 791[s] | loss 2.15
| epoch 2 |  iter 1621 / 9295 | time 792[s] | loss 2.11
| epoch 2 |  iter 1641 / 9295 | time 793[s] | loss 2.15
| epoch 2 |  iter 1661 / 9295 | time 795[s] | loss 2.19
| epoch 2 |  iter 1681 / 9295 | time 796[s] | loss 2.14
| epoch 2 |  iter 1701 / 9295 | time 798[s] | loss 2.15
| epoch 2 |  iter 1721 / 9295 | time 799[s] | loss 2.19
| epoch 2 |  iter 1741 / 9295 | time 800[s] | loss 2.15
| epoch 2 |  iter 1761 / 9295 | time 802[s] | loss 2.14
| epoch 2 |  iter 1781 / 9295 | time 803[s] | loss 2.12
| epoch 2 |  iter 1801 / 9295 | time 805[s] | loss 2.16
| epoch 2 |  iter 1821 / 9295 | time 806[s] | loss 2.14
| epoch 2 |  iter 1841 / 9295 | time 807[s] | loss 2.13
| epoch 2 |  iter 1861 / 9295 | time 809[s] | loss 2.12
| epoch 2 |  iter 1881 / 9295 | time 810[s] | loss 2.17
| epoch 2 |  iter 1901 / 9295 | time 811[s] | loss 2.15
| epoch 2 |  iter 1921 / 9295 | time 813[s] | loss 2.16
| epoch 2 |  iter 1941 / 9295 | time 814[s] | loss 2.13
| epoch 2 |  iter 1961 / 9295 | time 816[s] | loss 2.15
| epoch 2 |  iter 1981 / 9295 | time 817[s] | loss 2.14
| epoch 2 |  iter 2001 / 9295 | time 818[s] | loss 2.15
| epoch 2 |  iter 2021 / 9295 | time 820[s] | loss 2.17
| epoch 2 |  iter 2041 / 9295 | time 821[s] | loss 2.13
| epoch 2 |  iter 2061 / 9295 | time 823[s] | loss 2.13
| epoch 2 |  iter 2081 / 9295 | time 824[s] | loss 2.16
| epoch 2 |  iter 2101 / 9295 | time 825[s] | loss 2.17
| epoch 2 |  iter 2121 / 9295 | time 827[s] | loss 2.16
| epoch 2 |  iter 2141 / 9295 | time 828[s] | loss 2.16
| epoch 2 |  iter 2161 / 9295 | time 829[s] | loss 2.16
| epoch 2 |  iter 2181 / 9295 | time 831[s] | loss 2.13
| epoch 2 |  iter 2201 / 9295 | time 832[s] | loss 2.14
| epoch 2 |  iter 2221 / 9295 | time 834[s] | loss 2.15
| epoch 2 |  iter 2241 / 9295 | time 835[s] | loss 2.13
| epoch 2 |  iter 2261 / 9295 | time 836[s] | loss 2.17
| epoch 2 |  iter 2281 / 9295 | time 838[s] | loss 2.14
| epoch 2 |  iter 2301 / 9295 | time 839[s] | loss 2.12
| epoch 2 |  iter 2321 / 9295 | time 840[s] | loss 2.13
| epoch 2 |  iter 2341 / 9295 | time 842[s] | loss 2.14
| epoch 2 |  iter 2361 / 9295 | time 843[s] | loss 2.15
| epoch 2 |  iter 2381 / 9295 | time 844[s] | loss 2.13
| epoch 2 |  iter 2401 / 9295 | time 846[s] | loss 2.13
| epoch 2 |  iter 2421 / 9295 | time 847[s] | loss 2.12
| epoch 2 |  iter 2441 / 9295 | time 848[s] | loss 2.16
| epoch 2 |  iter 2461 / 9295 | time 850[s] | loss 2.16
| epoch 2 |  iter 2481 / 9295 | time 851[s] | loss 2.13
| epoch 2 |  iter 2501 / 9295 | time 852[s] | loss 2.10
| epoch 2 |  iter 2521 / 9295 | time 854[s] | loss 2.12
| epoch 2 |  iter 2541 / 9295 | time 855[s] | loss 2.12
| epoch 2 |  iter 2561 / 9295 | time 856[s] | loss 2.15
| epoch 2 |  iter 2581 / 9295 | time 858[s] | loss 2.10
| epoch 2 |  iter 2601 / 9295 | time 859[s] | loss 2.14
| epoch 2 |  iter 2621 / 9295 | time 860[s] | loss 2.12
| epoch 2 |  iter 2641 / 9295 | time 862[s] | loss 2.11
| epoch 2 |  iter 2661 / 9295 | time 863[s] | loss 2.12
| epoch 2 |  iter 2681 / 9295 | time 865[s] | loss 2.16
| epoch 2 |  iter 2701 / 9295 | time 866[s] | loss 2.14
| epoch 2 |  iter 2721 / 9295 | time 867[s] | loss 2.15
| epoch 2 |  iter 2741 / 9295 | time 869[s] | loss 2.13
| epoch 2 |  iter 2761 / 9295 | time 870[s] | loss 2.15
| epoch 2 |  iter 2781 / 9295 | time 871[s] | loss 2.16
| epoch 2 |  iter 2801 / 9295 | time 873[s] | loss 2.14
| epoch 2 |  iter 2821 / 9295 | time 874[s] | loss 2.11
| epoch 2 |  iter 2841 / 9295 | time 876[s] | loss 2.11
| epoch 2 |  iter 2861 / 9295 | time 877[s] | loss 2.09
| epoch 2 |  iter 2881 / 9295 | time 878[s] | loss 2.12
| epoch 2 |  iter 2901 / 9295 | time 880[s] | loss 2.12
| epoch 2 |  iter 2921 / 9295 | time 881[s] | loss 2.11
| epoch 2 |  iter 2941 / 9295 | time 882[s] | loss 2.09
| epoch 2 |  iter 2961 / 9295 | time 884[s] | loss 2.14
| epoch 2 |  iter 2981 / 9295 | time 885[s] | loss 2.13
| epoch 2 |  iter 3001 / 9295 | time 886[s] | loss 2.12
| epoch 2 |  iter 3021 / 9295 | time 888[s] | loss 2.14
| epoch 2 |  iter 3041 / 9295 | time 889[s] | loss 2.13
| epoch 2 |  iter 3061 / 9295 | time 891[s] | loss 2.08
| epoch 2 |  iter 3081 / 9295 | time 892[s] | loss 2.15
| epoch 2 |  iter 3101 / 9295 | time 893[s] | loss 2.14
| epoch 2 |  iter 3121 / 9295 | time 895[s] | loss 2.17
| epoch 2 |  iter 3141 / 9295 | time 896[s] | loss 2.12
| epoch 2 |  iter 3161 / 9295 | time 898[s] | loss 2.12
| epoch 2 |  iter 3181 / 9295 | time 899[s] | loss 2.08
| epoch 2 |  iter 3201 / 9295 | time 901[s] | loss 2.11
| epoch 2 |  iter 3221 / 9295 | time 902[s] | loss 2.12
| epoch 2 |  iter 3241 / 9295 | time 903[s] | loss 2.10
| epoch 2 |  iter 3261 / 9295 | time 905[s] | loss 2.14
| epoch 2 |  iter 3281 / 9295 | time 907[s] | loss 2.11
| epoch 2 |  iter 3301 / 9295 | time 908[s] | loss 2.11
| epoch 2 |  iter 3321 / 9295 | time 909[s] | loss 2.12
| epoch 2 |  iter 3341 / 9295 | time 911[s] | loss 2.13
| epoch 2 |  iter 3361 / 9295 | time 912[s] | loss 2.12
| epoch 2 |  iter 3381 / 9295 | time 913[s] | loss 2.13
| epoch 2 |  iter 3401 / 9295 | time 915[s] | loss 2.12
| epoch 2 |  iter 3421 / 9295 | time 916[s] | loss 2.12
| epoch 2 |  iter 3441 / 9295 | time 918[s] | loss 2.10
| epoch 2 |  iter 3461 / 9295 | time 919[s] | loss 2.11
| epoch 2 |  iter 3481 / 9295 | time 920[s] | loss 2.10
| epoch 2 |  iter 3501 / 9295 | time 922[s] | loss 2.10
| epoch 2 |  iter 3521 / 9295 | time 923[s] | loss 2.08
| epoch 2 |  iter 3541 / 9295 | time 924[s] | loss 2.12
| epoch 2 |  iter 3561 / 9295 | time 926[s] | loss 2.10
| epoch 2 |  iter 3581 / 9295 | time 927[s] | loss 2.11
| epoch 2 |  iter 3601 / 9295 | time 929[s] | loss 2.10
| epoch 2 |  iter 3621 / 9295 | time 930[s] | loss 2.11
| epoch 2 |  iter 3641 / 9295 | time 931[s] | loss 2.12
| epoch 2 |  iter 3661 / 9295 | time 933[s] | loss 2.08
| epoch 2 |  iter 3681 / 9295 | time 934[s] | loss 2.11
| epoch 2 |  iter 3701 / 9295 | time 935[s] | loss 2.11
| epoch 2 |  iter 3721 / 9295 | time 937[s] | loss 2.08
| epoch 2 |  iter 3741 / 9295 | time 938[s] | loss 2.11
| epoch 2 |  iter 3761 / 9295 | time 939[s] | loss 2.12
| epoch 2 |  iter 3781 / 9295 | time 941[s] | loss 2.13
| epoch 2 |  iter 3801 / 9295 | time 942[s] | loss 2.10
| epoch 2 |  iter 3821 / 9295 | time 943[s] | loss 2.14
| epoch 2 |  iter 3841 / 9295 | time 945[s] | loss 2.10
| epoch 2 |  iter 3861 / 9295 | time 946[s] | loss 2.06
| epoch 2 |  iter 3881 / 9295 | time 947[s] | loss 2.13
| epoch 2 |  iter 3901 / 9295 | time 949[s] | loss 2.10
| epoch 2 |  iter 3921 / 9295 | time 950[s] | loss 2.12
| epoch 2 |  iter 3941 / 9295 | time 951[s] | loss 2.09
| epoch 2 |  iter 3961 / 9295 | time 953[s] | loss 2.14
| epoch 2 |  iter 3981 / 9295 | time 954[s] | loss 2.12
| epoch 2 |  iter 4001 / 9295 | time 955[s] | loss 2.10
| epoch 2 |  iter 4021 / 9295 | time 957[s] | loss 2.09
| epoch 2 |  iter 4041 / 9295 | time 958[s] | loss 2.09
| epoch 2 |  iter 4061 / 9295 | time 960[s] | loss 2.05
| epoch 2 |  iter 4081 / 9295 | time 961[s] | loss 2.11
| epoch 2 |  iter 4101 / 9295 | time 962[s] | loss 2.13
| epoch 2 |  iter 4121 / 9295 | time 964[s] | loss 2.06
| epoch 2 |  iter 4141 / 9295 | time 965[s] | loss 2.09
| epoch 2 |  iter 4161 / 9295 | time 966[s] | loss 2.13
| epoch 2 |  iter 4181 / 9295 | time 968[s] | loss 2.12
| epoch 2 |  iter 4201 / 9295 | time 969[s] | loss 2.08
| epoch 2 |  iter 4221 / 9295 | time 970[s] | loss 2.12
| epoch 2 |  iter 4241 / 9295 | time 972[s] | loss 2.10
| epoch 2 |  iter 4261 / 9295 | time 973[s] | loss 2.09
| epoch 2 |  iter 4281 / 9295 | time 975[s] | loss 2.13
| epoch 2 |  iter 4301 / 9295 | time 976[s] | loss 2.04
| epoch 2 |  iter 4321 / 9295 | time 977[s] | loss 2.10
| epoch 2 |  iter 4341 / 9295 | time 979[s] | loss 2.09
| epoch 2 |  iter 4361 / 9295 | time 980[s] | loss 2.08
| epoch 2 |  iter 4381 / 9295 | time 982[s] | loss 2.10
| epoch 2 |  iter 4401 / 9295 | time 984[s] | loss 2.13
| epoch 2 |  iter 4421 / 9295 | time 985[s] | loss 2.07
| epoch 2 |  iter 4441 / 9295 | time 986[s] | loss 2.07
| epoch 2 |  iter 4461 / 9295 | time 988[s] | loss 2.08
| epoch 2 |  iter 4481 / 9295 | time 989[s] | loss 2.07
| epoch 2 |  iter 4501 / 9295 | time 990[s] | loss 2.08
| epoch 2 |  iter 4521 / 9295 | time 992[s] | loss 2.11
| epoch 2 |  iter 4541 / 9295 | time 993[s] | loss 2.08
| epoch 2 |  iter 4561 / 9295 | time 995[s] | loss 2.08
| epoch 2 |  iter 4581 / 9295 | time 996[s] | loss 2.08
| epoch 2 |  iter 4601 / 9295 | time 998[s] | loss 2.08
| epoch 2 |  iter 4621 / 9295 | time 999[s] | loss 2.10
| epoch 2 |  iter 4641 / 9295 | time 1001[s] | loss 2.06
| epoch 2 |  iter 4661 / 9295 | time 1002[s] | loss 2.11
| epoch 2 |  iter 4681 / 9295 | time 1004[s] | loss 2.09
| epoch 2 |  iter 4701 / 9295 | time 1005[s] | loss 2.10
| epoch 2 |  iter 4721 / 9295 | time 1006[s] | loss 2.11
| epoch 2 |  iter 4741 / 9295 | time 1008[s] | loss 2.08
| epoch 2 |  iter 4761 / 9295 | time 1009[s] | loss 2.09
| epoch 2 |  iter 4781 / 9295 | time 1011[s] | loss 2.10
| epoch 2 |  iter 4801 / 9295 | time 1012[s] | loss 2.09
| epoch 2 |  iter 4821 / 9295 | time 1014[s] | loss 2.12
| epoch 2 |  iter 4841 / 9295 | time 1015[s] | loss 2.10
| epoch 2 |  iter 4861 / 9295 | time 1016[s] | loss 2.12
| epoch 2 |  iter 4881 / 9295 | time 1018[s] | loss 2.10
| epoch 2 |  iter 4901 / 9295 | time 1019[s] | loss 2.08
| epoch 2 |  iter 4921 / 9295 | time 1021[s] | loss 2.13
| epoch 2 |  iter 4941 / 9295 | time 1022[s] | loss 2.09
| epoch 2 |  iter 4961 / 9295 | time 1024[s] | loss 2.06
| epoch 2 |  iter 4981 / 9295 | time 1025[s] | loss 2.07
| epoch 2 |  iter 5001 / 9295 | time 1026[s] | loss 2.11
| epoch 2 |  iter 5021 / 9295 | time 1028[s] | loss 2.11
| epoch 2 |  iter 5041 / 9295 | time 1029[s] | loss 2.10
| epoch 2 |  iter 5061 / 9295 | time 1030[s] | loss 2.10
| epoch 2 |  iter 5081 / 9295 | time 1031[s] | loss 2.06
| epoch 2 |  iter 5101 / 9295 | time 1033[s] | loss 2.09
| epoch 2 |  iter 5121 / 9295 | time 1034[s] | loss 2.10
| epoch 2 |  iter 5141 / 9295 | time 1035[s] | loss 2.08
| epoch 2 |  iter 5161 / 9295 | time 1037[s] | loss 2.09
| epoch 2 |  iter 5181 / 9295 | time 1038[s] | loss 2.08
| epoch 2 |  iter 5201 / 9295 | time 1040[s] | loss 2.04
| epoch 2 |  iter 5221 / 9295 | time 1041[s] | loss 2.09
| epoch 2 |  iter 5241 / 9295 | time 1042[s] | loss 2.01
| epoch 2 |  iter 5261 / 9295 | time 1044[s] | loss 2.03
| epoch 2 |  iter 5281 / 9295 | time 1045[s] | loss 2.08
| epoch 2 |  iter 5301 / 9295 | time 1046[s] | loss 2.06
| epoch 2 |  iter 5321 / 9295 | time 1048[s] | loss 2.10
| epoch 2 |  iter 5341 / 9295 | time 1049[s] | loss 2.08
| epoch 2 |  iter 5361 / 9295 | time 1050[s] | loss 2.04
| epoch 2 |  iter 5381 / 9295 | time 1052[s] | loss 2.07
| epoch 2 |  iter 5401 / 9295 | time 1053[s] | loss 2.07
| epoch 2 |  iter 5421 / 9295 | time 1055[s] | loss 2.08
| epoch 2 |  iter 5441 / 9295 | time 1056[s] | loss 2.06
| epoch 2 |  iter 5461 / 9295 | time 1057[s] | loss 2.02
| epoch 2 |  iter 5481 / 9295 | time 1059[s] | loss 2.10
| epoch 2 |  iter 5501 / 9295 | time 1060[s] | loss 2.07
| epoch 2 |  iter 5521 / 9295 | time 1062[s] | loss 2.06
| epoch 2 |  iter 5541 / 9295 | time 1063[s] | loss 2.07
| epoch 2 |  iter 5561 / 9295 | time 1064[s] | loss 2.06
| epoch 2 |  iter 5581 / 9295 | time 1066[s] | loss 2.12
| epoch 2 |  iter 5601 / 9295 | time 1067[s] | loss 2.09
| epoch 2 |  iter 5621 / 9295 | time 1068[s] | loss 2.09
| epoch 2 |  iter 5641 / 9295 | time 1070[s] | loss 2.05
| epoch 2 |  iter 5661 / 9295 | time 1071[s] | loss 2.08
| epoch 2 |  iter 5681 / 9295 | time 1072[s] | loss 2.05
| epoch 2 |  iter 5701 / 9295 | time 1074[s] | loss 2.10
| epoch 2 |  iter 5721 / 9295 | time 1075[s] | loss 2.05
| epoch 2 |  iter 5741 / 9295 | time 1077[s] | loss 2.06
| epoch 2 |  iter 5761 / 9295 | time 1078[s] | loss 2.09
| epoch 2 |  iter 5781 / 9295 | time 1080[s] | loss 2.06
| epoch 2 |  iter 5801 / 9295 | time 1081[s] | loss 2.06
| epoch 2 |  iter 5821 / 9295 | time 1082[s] | loss 2.05
| epoch 2 |  iter 5841 / 9295 | time 1084[s] | loss 2.06
| epoch 2 |  iter 5861 / 9295 | time 1085[s] | loss 2.05
| epoch 2 |  iter 5881 / 9295 | time 1086[s] | loss 2.10
| epoch 2 |  iter 5901 / 9295 | time 1088[s] | loss 2.06
| epoch 2 |  iter 5921 / 9295 | time 1089[s] | loss 2.05
| epoch 2 |  iter 5941 / 9295 | time 1090[s] | loss 2.12
| epoch 2 |  iter 5961 / 9295 | time 1092[s] | loss 2.05
| epoch 2 |  iter 5981 / 9295 | time 1093[s] | loss 2.08
| epoch 2 |  iter 6001 / 9295 | time 1094[s] | loss 2.06
| epoch 2 |  iter 6021 / 9295 | time 1096[s] | loss 2.05
| epoch 2 |  iter 6041 / 9295 | time 1097[s] | loss 2.02
| epoch 2 |  iter 6061 / 9295 | time 1098[s] | loss 2.07
| epoch 2 |  iter 6081 / 9295 | time 1100[s] | loss 2.08
| epoch 2 |  iter 6101 / 9295 | time 1101[s] | loss 2.06
| epoch 2 |  iter 6121 / 9295 | time 1103[s] | loss 2.08
| epoch 2 |  iter 6141 / 9295 | time 1104[s] | loss 2.09
| epoch 2 |  iter 6161 / 9295 | time 1105[s] | loss 2.03
| epoch 2 |  iter 6181 / 9295 | time 1106[s] | loss 2.04
| epoch 2 |  iter 6201 / 9295 | time 1108[s] | loss 2.10
| epoch 2 |  iter 6221 / 9295 | time 1109[s] | loss 2.11
| epoch 2 |  iter 6241 / 9295 | time 1110[s] | loss 2.09
| epoch 2 |  iter 6261 / 9295 | time 1112[s] | loss 2.06
| epoch 2 |  iter 6281 / 9295 | time 1113[s] | loss 2.09
| epoch 2 |  iter 6301 / 9295 | time 1115[s] | loss 2.07
| epoch 2 |  iter 6321 / 9295 | time 1116[s] | loss 2.04
| epoch 2 |  iter 6341 / 9295 | time 1117[s] | loss 2.05
| epoch 2 |  iter 6361 / 9295 | time 1119[s] | loss 2.07
| epoch 2 |  iter 6381 / 9295 | time 1120[s] | loss 2.06
| epoch 2 |  iter 6401 / 9295 | time 1121[s] | loss 2.06
| epoch 2 |  iter 6421 / 9295 | time 1123[s] | loss 2.04
| epoch 2 |  iter 6441 / 9295 | time 1125[s] | loss 2.05
| epoch 2 |  iter 6461 / 9295 | time 1126[s] | loss 2.07
| epoch 2 |  iter 6481 / 9295 | time 1127[s] | loss 2.07
| epoch 2 |  iter 6501 / 9295 | time 1129[s] | loss 2.05
| epoch 2 |  iter 6521 / 9295 | time 1130[s] | loss 2.05
| epoch 2 |  iter 6541 / 9295 | time 1132[s] | loss 2.03
| epoch 2 |  iter 6561 / 9295 | time 1133[s] | loss 2.03
| epoch 2 |  iter 6581 / 9295 | time 1134[s] | loss 2.10
| epoch 2 |  iter 6601 / 9295 | time 1136[s] | loss 2.04
| epoch 2 |  iter 6621 / 9295 | time 1137[s] | loss 2.05
| epoch 2 |  iter 6641 / 9295 | time 1139[s] | loss 2.06
| epoch 2 |  iter 6661 / 9295 | time 1140[s] | loss 2.03
| epoch 2 |  iter 6681 / 9295 | time 1142[s] | loss 2.04
| epoch 2 |  iter 6701 / 9295 | time 1143[s] | loss 2.03
| epoch 2 |  iter 6721 / 9295 | time 1145[s] | loss 2.06
| epoch 2 |  iter 6741 / 9295 | time 1146[s] | loss 2.04
| epoch 2 |  iter 6761 / 9295 | time 1148[s] | loss 2.03
| epoch 2 |  iter 6781 / 9295 | time 1149[s] | loss 2.06
| epoch 2 |  iter 6801 / 9295 | time 1151[s] | loss 2.12
| epoch 2 |  iter 6821 / 9295 | time 1152[s] | loss 2.07
| epoch 2 |  iter 6841 / 9295 | time 1154[s] | loss 2.05
| epoch 2 |  iter 6861 / 9295 | time 1155[s] | loss 2.04
| epoch 2 |  iter 6881 / 9295 | time 1156[s] | loss 2.05
| epoch 2 |  iter 6901 / 9295 | time 1158[s] | loss 2.01
| epoch 2 |  iter 6921 / 9295 | time 1159[s] | loss 2.04
| epoch 2 |  iter 6941 / 9295 | time 1161[s] | loss 2.05
| epoch 2 |  iter 6961 / 9295 | time 1162[s] | loss 2.05
| epoch 2 |  iter 6981 / 9295 | time 1164[s] | loss 2.06
| epoch 2 |  iter 7001 / 9295 | time 1165[s] | loss 2.01
| epoch 2 |  iter 7021 / 9295 | time 1166[s] | loss 2.08
| epoch 2 |  iter 7041 / 9295 | time 1168[s] | loss 2.05
| epoch 2 |  iter 7061 / 9295 | time 1169[s] | loss 2.04
| epoch 2 |  iter 7081 / 9295 | time 1171[s] | loss 2.05
| epoch 2 |  iter 7101 / 9295 | time 1172[s] | loss 2.04
| epoch 2 |  iter 7121 / 9295 | time 1174[s] | loss 2.05
| epoch 2 |  iter 7141 / 9295 | time 1175[s] | loss 2.09
| epoch 2 |  iter 7161 / 9295 | time 1176[s] | loss 2.08
| epoch 2 |  iter 7181 / 9295 | time 1178[s] | loss 2.04
| epoch 2 |  iter 7201 / 9295 | time 1179[s] | loss 2.07
| epoch 2 |  iter 7221 / 9295 | time 1181[s] | loss 2.08
| epoch 2 |  iter 7241 / 9295 | time 1182[s] | loss 2.04
| epoch 2 |  iter 7261 / 9295 | time 1184[s] | loss 2.06
| epoch 2 |  iter 7281 / 9295 | time 1185[s] | loss 2.04
| epoch 2 |  iter 7301 / 9295 | time 1187[s] | loss 2.01
| epoch 2 |  iter 7321 / 9295 | time 1188[s] | loss 2.07
| epoch 2 |  iter 7341 / 9295 | time 1189[s] | loss 2.06
| epoch 2 |  iter 7361 / 9295 | time 1191[s] | loss 2.05
| epoch 2 |  iter 7381 / 9295 | time 1192[s] | loss 2.08
| epoch 2 |  iter 7401 / 9295 | time 1194[s] | loss 2.01
| epoch 2 |  iter 7421 / 9295 | time 1195[s] | loss 2.03
| epoch 2 |  iter 7441 / 9295 | time 1197[s] | loss 2.03
| epoch 2 |  iter 7461 / 9295 | time 1198[s] | loss 2.03
| epoch 2 |  iter 7481 / 9295 | time 1200[s] | loss 2.06
| epoch 2 |  iter 7501 / 9295 | time 1201[s] | loss 2.08
| epoch 2 |  iter 7521 / 9295 | time 1202[s] | loss 2.07
| epoch 2 |  iter 7541 / 9295 | time 1204[s] | loss 1.99
| epoch 2 |  iter 7561 / 9295 | time 1205[s] | loss 2.02
| epoch 2 |  iter 7581 / 9295 | time 1207[s] | loss 2.03
| epoch 2 |  iter 7601 / 9295 | time 1208[s] | loss 2.05
| epoch 2 |  iter 7621 / 9295 | time 1210[s] | loss 2.05
| epoch 2 |  iter 7641 / 9295 | time 1211[s] | loss 2.04
| epoch 2 |  iter 7661 / 9295 | time 1213[s] | loss 2.01
| epoch 2 |  iter 7681 / 9295 | time 1214[s] | loss 2.02
| epoch 2 |  iter 7701 / 9295 | time 1216[s] | loss 2.07
| epoch 2 |  iter 7721 / 9295 | time 1217[s] | loss 2.05
| epoch 2 |  iter 7741 / 9295 | time 1219[s] | loss 2.03
| epoch 2 |  iter 7761 / 9295 | time 1220[s] | loss 2.03
| epoch 2 |  iter 7781 / 9295 | time 1222[s] | loss 2.01
| epoch 2 |  iter 7801 / 9295 | time 1223[s] | loss 2.03
| epoch 2 |  iter 7821 / 9295 | time 1225[s] | loss 2.03
| epoch 2 |  iter 7841 / 9295 | time 1226[s] | loss 2.01
| epoch 2 |  iter 7861 / 9295 | time 1227[s] | loss 2.05
| epoch 2 |  iter 7881 / 9295 | time 1229[s] | loss 2.01
| epoch 2 |  iter 7901 / 9295 | time 1230[s] | loss 2.04
| epoch 2 |  iter 7921 / 9295 | time 1232[s] | loss 2.04
| epoch 2 |  iter 7941 / 9295 | time 1233[s] | loss 2.05
| epoch 2 |  iter 7961 / 9295 | time 1235[s] | loss 2.04
| epoch 2 |  iter 7981 / 9295 | time 1236[s] | loss 2.01
| epoch 2 |  iter 8001 / 9295 | time 1238[s] | loss 2.03
| epoch 2 |  iter 8021 / 9295 | time 1239[s] | loss 2.06
| epoch 2 |  iter 8041 / 9295 | time 1241[s] | loss 2.05
| epoch 2 |  iter 8061 / 9295 | time 1242[s] | loss 1.99
| epoch 2 |  iter 8081 / 9295 | time 1244[s] | loss 2.03
| epoch 2 |  iter 8101 / 9295 | time 1245[s] | loss 2.02
| epoch 2 |  iter 8121 / 9295 | time 1247[s] | loss 2.01
| epoch 2 |  iter 8141 / 9295 | time 1248[s] | loss 2.02
| epoch 2 |  iter 8161 / 9295 | time 1250[s] | loss 2.06
| epoch 2 |  iter 8181 / 9295 | time 1251[s] | loss 2.03
| epoch 2 |  iter 8201 / 9295 | time 1252[s] | loss 2.03
| epoch 2 |  iter 8221 / 9295 | time 1254[s] | loss 2.05
| epoch 2 |  iter 8241 / 9295 | time 1255[s] | loss 2.06
| epoch 2 |  iter 8261 / 9295 | time 1257[s] | loss 2.04
| epoch 2 |  iter 8281 / 9295 | time 1258[s] | loss 2.04
| epoch 2 |  iter 8301 / 9295 | time 1260[s] | loss 2.09
| epoch 2 |  iter 8321 / 9295 | time 1261[s] | loss 2.03
| epoch 2 |  iter 8341 / 9295 | time 1263[s] | loss 2.02
| epoch 2 |  iter 8361 / 9295 | time 1264[s] | loss 2.03
| epoch 2 |  iter 8381 / 9295 | time 1265[s] | loss 2.02
| epoch 2 |  iter 8401 / 9295 | time 1267[s] | loss 2.05
| epoch 2 |  iter 8421 / 9295 | time 1268[s] | loss 2.05
| epoch 2 |  iter 8441 / 9295 | time 1270[s] | loss 2.03
| epoch 2 |  iter 8461 / 9295 | time 1271[s] | loss 2.03
| epoch 2 |  iter 8481 / 9295 | time 1273[s] | loss 2.06
| epoch 2 |  iter 8501 / 9295 | time 1274[s] | loss 2.00
| epoch 2 |  iter 8521 / 9295 | time 1275[s] | loss 2.03
| epoch 2 |  iter 8541 / 9295 | time 1277[s] | loss 2.00
| epoch 2 |  iter 8561 / 9295 | time 1278[s] | loss 2.03
| epoch 2 |  iter 8581 / 9295 | time 1280[s] | loss 2.02
| epoch 2 |  iter 8601 / 9295 | time 1281[s] | loss 2.01
| epoch 2 |  iter 8621 / 9295 | time 1283[s] | loss 2.05
| epoch 2 |  iter 8641 / 9295 | time 1284[s] | loss 2.04
| epoch 2 |  iter 8661 / 9295 | time 1285[s] | loss 2.06
| epoch 2 |  iter 8681 / 9295 | time 1287[s] | loss 2.04
| epoch 2 |  iter 8701 / 9295 | time 1288[s] | loss 2.02
| epoch 2 |  iter 8721 / 9295 | time 1290[s] | loss 2.02
| epoch 2 |  iter 8741 / 9295 | time 1291[s] | loss 2.03
| epoch 2 |  iter 8761 / 9295 | time 1293[s] | loss 2.04
| epoch 2 |  iter 8781 / 9295 | time 1294[s] | loss 2.02
| epoch 2 |  iter 8801 / 9295 | time 1296[s] | loss 1.98
| epoch 2 |  iter 8821 / 9295 | time 1297[s] | loss 2.00
| epoch 2 |  iter 8841 / 9295 | time 1299[s] | loss 2.06
| epoch 2 |  iter 8861 / 9295 | time 1300[s] | loss 2.06
| epoch 2 |  iter 8881 / 9295 | time 1302[s] | loss 2.06
| epoch 2 |  iter 8901 / 9295 | time 1303[s] | loss 2.05
| epoch 2 |  iter 8921 / 9295 | time 1305[s] | loss 2.02
| epoch 2 |  iter 8941 / 9295 | time 1306[s] | loss 2.01
| epoch 2 |  iter 8961 / 9295 | time 1308[s] | loss 2.02
| epoch 2 |  iter 8981 / 9295 | time 1309[s] | loss 2.05
| epoch 2 |  iter 9001 / 9295 | time 1311[s] | loss 2.00
| epoch 2 |  iter 9021 / 9295 | time 1312[s] | loss 2.00
| epoch 2 |  iter 9041 / 9295 | time 1314[s] | loss 2.01
| epoch 2 |  iter 9061 / 9295 | time 1315[s] | loss 2.03
| epoch 2 |  iter 9081 / 9295 | time 1317[s] | loss 2.00
| epoch 2 |  iter 9101 / 9295 | time 1318[s] | loss 2.02
| epoch 2 |  iter 9121 / 9295 | time 1320[s] | loss 2.01
| epoch 2 |  iter 9141 / 9295 | time 1321[s] | loss 2.02
| epoch 2 |  iter 9161 / 9295 | time 1323[s] | loss 2.04
| epoch 2 |  iter 9181 / 9295 | time 1324[s] | loss 2.07
| epoch 2 |  iter 9201 / 9295 | time 1326[s] | loss 1.99
| epoch 2 |  iter 9221 / 9295 | time 1327[s] | loss 2.01
| epoch 2 |  iter 9241 / 9295 | time 1329[s] | loss 2.00
| epoch 2 |  iter 9261 / 9295 | time 1330[s] | loss 2.07
| epoch 2 |  iter 9281 / 9295 | time 1332[s] | loss 2.01
| epoch 3 |  iter 1 / 9295 | time 1333[s] | loss 2.02
| epoch 3 |  iter 21 / 9295 | time 1334[s] | loss 1.96
| epoch 3 |  iter 41 / 9295 | time 1336[s] | loss 1.97
| epoch 3 |  iter 61 / 9295 | time 1337[s] | loss 1.96
| epoch 3 |  iter 81 / 9295 | time 1339[s] | loss 1.99
| epoch 3 |  iter 101 / 9295 | time 1340[s] | loss 1.91
| epoch 3 |  iter 121 / 9295 | time 1342[s] | loss 1.93
| epoch 3 |  iter 141 / 9295 | time 1343[s] | loss 1.96
| epoch 3 |  iter 161 / 9295 | time 1345[s] | loss 1.97
| epoch 3 |  iter 181 / 9295 | time 1346[s] | loss 1.89
| epoch 3 |  iter 201 / 9295 | time 1348[s] | loss 1.96
| epoch 3 |  iter 221 / 9295 | time 1349[s] | loss 1.96
| epoch 3 |  iter 241 / 9295 | time 1351[s] | loss 1.97
| epoch 3 |  iter 261 / 9295 | time 1352[s] | loss 1.94
| epoch 3 |  iter 281 / 9295 | time 1354[s] | loss 1.94
| epoch 3 |  iter 301 / 9295 | time 1355[s] | loss 1.96
| epoch 3 |  iter 321 / 9295 | time 1357[s] | loss 1.98
| epoch 3 |  iter 341 / 9295 | time 1358[s] | loss 1.95
| epoch 3 |  iter 361 / 9295 | time 1360[s] | loss 1.95
| epoch 3 |  iter 381 / 9295 | time 1361[s] | loss 1.96
| epoch 3 |  iter 401 / 9295 | time 1363[s] | loss 1.95
| epoch 3 |  iter 421 / 9295 | time 1364[s] | loss 1.95
| epoch 3 |  iter 441 / 9295 | time 1366[s] | loss 1.95
| epoch 3 |  iter 461 / 9295 | time 1367[s] | loss 1.97
| epoch 3 |  iter 481 / 9295 | time 1369[s] | loss 1.96
| epoch 3 |  iter 501 / 9295 | time 1370[s] | loss 1.94
| epoch 3 |  iter 521 / 9295 | time 1372[s] | loss 1.94
| epoch 3 |  iter 541 / 9295 | time 1373[s] | loss 1.97
| epoch 3 |  iter 561 / 9295 | time 1375[s] | loss 1.93
| epoch 3 |  iter 581 / 9295 | time 1376[s] | loss 1.96
| epoch 3 |  iter 601 / 9295 | time 1377[s] | loss 1.93
| epoch 3 |  iter 621 / 9295 | time 1379[s] | loss 1.94
| epoch 3 |  iter 641 / 9295 | time 1380[s] | loss 1.94
| epoch 3 |  iter 661 / 9295 | time 1382[s] | loss 1.98
| epoch 3 |  iter 681 / 9295 | time 1383[s] | loss 1.97
| epoch 3 |  iter 701 / 9295 | time 1385[s] | loss 1.96
| epoch 3 |  iter 721 / 9295 | time 1386[s] | loss 1.95
| epoch 3 |  iter 741 / 9295 | time 1388[s] | loss 1.97
| epoch 3 |  iter 761 / 9295 | time 1389[s] | loss 1.93
| epoch 3 |  iter 781 / 9295 | time 1391[s] | loss 1.94
| epoch 3 |  iter 801 / 9295 | time 1392[s] | loss 1.94
| epoch 3 |  iter 821 / 9295 | time 1394[s] | loss 1.99
| epoch 3 |  iter 841 / 9295 | time 1395[s] | loss 1.96
| epoch 3 |  iter 861 / 9295 | time 1396[s] | loss 1.97
| epoch 3 |  iter 881 / 9295 | time 1398[s] | loss 1.95
| epoch 3 |  iter 901 / 9295 | time 1399[s] | loss 1.96
| epoch 3 |  iter 921 / 9295 | time 1401[s] | loss 1.92
| epoch 3 |  iter 941 / 9295 | time 1402[s] | loss 1.95
| epoch 3 |  iter 961 / 9295 | time 1404[s] | loss 1.94
| epoch 3 |  iter 981 / 9295 | time 1405[s] | loss 1.96
| epoch 3 |  iter 1001 / 9295 | time 1407[s] | loss 1.96
| epoch 3 |  iter 1021 / 9295 | time 1408[s] | loss 1.97
| epoch 3 |  iter 1041 / 9295 | time 1410[s] | loss 1.97
| epoch 3 |  iter 1061 / 9295 | time 1411[s] | loss 1.91
| epoch 3 |  iter 1081 / 9295 | time 1413[s] | loss 1.96
| epoch 3 |  iter 1101 / 9295 | time 1414[s] | loss 1.92
| epoch 3 |  iter 1121 / 9295 | time 1416[s] | loss 1.93
| epoch 3 |  iter 1141 / 9295 | time 1417[s] | loss 1.96
| epoch 3 |  iter 1161 / 9295 | time 1418[s] | loss 1.95
| epoch 3 |  iter 1181 / 9295 | time 1420[s] | loss 1.90
| epoch 3 |  iter 1201 / 9295 | time 1421[s] | loss 1.96
| epoch 3 |  iter 1221 / 9295 | time 1423[s] | loss 1.97
| epoch 3 |  iter 1241 / 9295 | time 1425[s] | loss 1.96
| epoch 3 |  iter 1261 / 9295 | time 1426[s] | loss 1.94
| epoch 3 |  iter 1281 / 9295 | time 1427[s] | loss 1.95
| epoch 3 |  iter 1301 / 9295 | time 1429[s] | loss 1.92
| epoch 3 |  iter 1321 / 9295 | time 1430[s] | loss 1.92
| epoch 3 |  iter 1341 / 9295 | time 1432[s] | loss 1.93
| epoch 3 |  iter 1361 / 9295 | time 1434[s] | loss 1.96
| epoch 3 |  iter 1381 / 9295 | time 1435[s] | loss 1.99
| epoch 3 |  iter 1401 / 9295 | time 1437[s] | loss 1.93
| epoch 3 |  iter 1421 / 9295 | time 1438[s] | loss 1.94
| epoch 3 |  iter 1441 / 9295 | time 1440[s] | loss 1.92
| epoch 3 |  iter 1461 / 9295 | time 1441[s] | loss 1.97
| epoch 3 |  iter 1481 / 9295 | time 1442[s] | loss 1.98
| epoch 3 |  iter 1501 / 9295 | time 1444[s] | loss 1.95
| epoch 3 |  iter 1521 / 9295 | time 1445[s] | loss 1.94
| epoch 3 |  iter 1541 / 9295 | time 1446[s] | loss 1.96
| epoch 3 |  iter 1561 / 9295 | time 1448[s] | loss 1.95
| epoch 3 |  iter 1581 / 9295 | time 1449[s] | loss 1.95
| epoch 3 |  iter 1601 / 9295 | time 1450[s] | loss 1.95
| epoch 3 |  iter 1621 / 9295 | time 1452[s] | loss 1.97
| epoch 3 |  iter 1641 / 9295 | time 1453[s] | loss 1.97
| epoch 3 |  iter 1661 / 9295 | time 1454[s] | loss 1.95
| epoch 3 |  iter 1681 / 9295 | time 1456[s] | loss 1.97
| epoch 3 |  iter 1701 / 9295 | time 1457[s] | loss 1.94
| epoch 3 |  iter 1721 / 9295 | time 1458[s] | loss 1.94
| epoch 3 |  iter 1741 / 9295 | time 1460[s] | loss 1.93
| epoch 3 |  iter 1761 / 9295 | time 1461[s] | loss 1.94
| epoch 3 |  iter 1781 / 9295 | time 1462[s] | loss 1.97
| epoch 3 |  iter 1801 / 9295 | time 1464[s] | loss 1.98
| epoch 3 |  iter 1821 / 9295 | time 1465[s] | loss 1.96
| epoch 3 |  iter 1841 / 9295 | time 1467[s] | loss 1.93
| epoch 3 |  iter 1861 / 9295 | time 1468[s] | loss 1.98
| epoch 3 |  iter 1881 / 9295 | time 1469[s] | loss 1.97
| epoch 3 |  iter 1901 / 9295 | time 1471[s] | loss 1.94
| epoch 3 |  iter 1921 / 9295 | time 1472[s] | loss 1.94
| epoch 3 |  iter 1941 / 9295 | time 1473[s] | loss 1.96
| epoch 3 |  iter 1961 / 9295 | time 1475[s] | loss 1.93
| epoch 3 |  iter 1981 / 9295 | time 1476[s] | loss 1.93
| epoch 3 |  iter 2001 / 9295 | time 1477[s] | loss 1.97
| epoch 3 |  iter 2021 / 9295 | time 1479[s] | loss 1.94
| epoch 3 |  iter 2041 / 9295 | time 1480[s] | loss 1.95
| epoch 3 |  iter 2061 / 9295 | time 1481[s] | loss 1.98
| epoch 3 |  iter 2081 / 9295 | time 1483[s] | loss 1.97
| epoch 3 |  iter 2101 / 9295 | time 1484[s] | loss 1.95
| epoch 3 |  iter 2121 / 9295 | time 1485[s] | loss 1.96
| epoch 3 |  iter 2141 / 9295 | time 1486[s] | loss 1.94
| epoch 3 |  iter 2161 / 9295 | time 1488[s] | loss 1.94
| epoch 3 |  iter 2181 / 9295 | time 1489[s] | loss 1.97
| epoch 3 |  iter 2201 / 9295 | time 1490[s] | loss 1.96
| epoch 3 |  iter 2221 / 9295 | time 1492[s] | loss 1.97
| epoch 3 |  iter 2241 / 9295 | time 1493[s] | loss 1.95
| epoch 3 |  iter 2261 / 9295 | time 1495[s] | loss 1.94
| epoch 3 |  iter 2281 / 9295 | time 1496[s] | loss 1.94
| epoch 3 |  iter 2301 / 9295 | time 1497[s] | loss 1.95
| epoch 3 |  iter 2321 / 9295 | time 1499[s] | loss 1.90
| epoch 3 |  iter 2341 / 9295 | time 1500[s] | loss 1.95
| epoch 3 |  iter 2361 / 9295 | time 1502[s] | loss 1.96
| epoch 3 |  iter 2381 / 9295 | time 1503[s] | loss 1.93
| epoch 3 |  iter 2401 / 9295 | time 1505[s] | loss 1.89
| epoch 3 |  iter 2421 / 9295 | time 1506[s] | loss 1.93
| epoch 3 |  iter 2441 / 9295 | time 1508[s] | loss 1.93
| epoch 3 |  iter 2461 / 9295 | time 1509[s] | loss 1.90
| epoch 3 |  iter 2481 / 9295 | time 1510[s] | loss 1.91
| epoch 3 |  iter 2501 / 9295 | time 1512[s] | loss 1.98
| epoch 3 |  iter 2521 / 9295 | time 1513[s] | loss 1.94
| epoch 3 |  iter 2541 / 9295 | time 1514[s] | loss 1.96
| epoch 3 |  iter 2561 / 9295 | time 1516[s] | loss 1.90
| epoch 3 |  iter 2581 / 9295 | time 1517[s] | loss 1.93
| epoch 3 |  iter 2601 / 9295 | time 1519[s] | loss 1.94
| epoch 3 |  iter 2621 / 9295 | time 1520[s] | loss 1.93
| epoch 3 |  iter 2641 / 9295 | time 1522[s] | loss 1.91
| epoch 3 |  iter 2661 / 9295 | time 1523[s] | loss 1.96
| epoch 3 |  iter 2681 / 9295 | time 1525[s] | loss 1.94
| epoch 3 |  iter 2701 / 9295 | time 1526[s] | loss 1.99
| epoch 3 |  iter 2721 / 9295 | time 1527[s] | loss 1.95
| epoch 3 |  iter 2741 / 9295 | time 1529[s] | loss 1.95
| epoch 3 |  iter 2761 / 9295 | time 1530[s] | loss 1.95
| epoch 3 |  iter 2781 / 9295 | time 1532[s] | loss 1.94
| epoch 3 |  iter 2801 / 9295 | time 1533[s] | loss 1.98
| epoch 3 |  iter 2821 / 9295 | time 1534[s] | loss 1.96
| epoch 3 |  iter 2841 / 9295 | time 1536[s] | loss 1.93
| epoch 3 |  iter 2861 / 9295 | time 1537[s] | loss 1.97
| epoch 3 |  iter 2881 / 9295 | time 1539[s] | loss 1.94
| epoch 3 |  iter 2901 / 9295 | time 1541[s] | loss 1.96
| epoch 3 |  iter 2921 / 9295 | time 1542[s] | loss 1.95
| epoch 3 |  iter 2941 / 9295 | time 1544[s] | loss 1.92
| epoch 3 |  iter 2961 / 9295 | time 1545[s] | loss 1.90
| epoch 3 |  iter 2981 / 9295 | time 1547[s] | loss 1.90
| epoch 3 |  iter 3001 / 9295 | time 1548[s] | loss 1.94
| epoch 3 |  iter 3021 / 9295 | time 1550[s] | loss 1.92
| epoch 3 |  iter 3041 / 9295 | time 1551[s] | loss 1.98
| epoch 3 |  iter 3061 / 9295 | time 1553[s] | loss 1.90
| epoch 3 |  iter 3081 / 9295 | time 1554[s] | loss 1.94
| epoch 3 |  iter 3101 / 9295 | time 1556[s] | loss 1.96
| epoch 3 |  iter 3121 / 9295 | time 1557[s] | loss 1.96
| epoch 3 |  iter 3141 / 9295 | time 1558[s] | loss 1.96
| epoch 3 |  iter 3161 / 9295 | time 1560[s] | loss 1.95
| epoch 3 |  iter 3181 / 9295 | time 1561[s] | loss 1.92
| epoch 3 |  iter 3201 / 9295 | time 1563[s] | loss 1.94
| epoch 3 |  iter 3221 / 9295 | time 1564[s] | loss 1.92
| epoch 3 |  iter 3241 / 9295 | time 1566[s] | loss 1.94
| epoch 3 |  iter 3261 / 9295 | time 1567[s] | loss 1.94
| epoch 3 |  iter 3281 / 9295 | time 1569[s] | loss 1.91
| epoch 3 |  iter 3301 / 9295 | time 1570[s] | loss 1.92
| epoch 3 |  iter 3321 / 9295 | time 1572[s] | loss 1.93
| epoch 3 |  iter 3341 / 9295 | time 1573[s] | loss 1.90
| epoch 3 |  iter 3361 / 9295 | time 1575[s] | loss 1.93
| epoch 3 |  iter 3381 / 9295 | time 1576[s] | loss 1.95
| epoch 3 |  iter 3401 / 9295 | time 1578[s] | loss 1.95
| epoch 3 |  iter 3421 / 9295 | time 1579[s] | loss 1.93
| epoch 3 |  iter 3441 / 9295 | time 1580[s] | loss 1.95
| epoch 3 |  iter 3461 / 9295 | time 1582[s] | loss 1.97
| epoch 3 |  iter 3481 / 9295 | time 1583[s] | loss 1.96
| epoch 3 |  iter 3501 / 9295 | time 1585[s] | loss 1.94
| epoch 3 |  iter 3521 / 9295 | time 1586[s] | loss 2.00
| epoch 3 |  iter 3541 / 9295 | time 1588[s] | loss 1.93
| epoch 3 |  iter 3561 / 9295 | time 1589[s] | loss 1.93
| epoch 3 |  iter 3581 / 9295 | time 1590[s] | loss 1.98
| epoch 3 |  iter 3601 / 9295 | time 1592[s] | loss 1.92
| epoch 3 |  iter 3621 / 9295 | time 1593[s] | loss 1.95
| epoch 3 |  iter 3641 / 9295 | time 1594[s] | loss 1.92
| epoch 3 |  iter 3661 / 9295 | time 1596[s] | loss 1.91
| epoch 3 |  iter 3681 / 9295 | time 1597[s] | loss 1.92
| epoch 3 |  iter 3701 / 9295 | time 1599[s] | loss 1.92
| epoch 3 |  iter 3721 / 9295 | time 1600[s] | loss 1.96
| epoch 3 |  iter 3741 / 9295 | time 1602[s] | loss 1.92
| epoch 3 |  iter 3761 / 9295 | time 1603[s] | loss 1.94
| epoch 3 |  iter 3781 / 9295 | time 1604[s] | loss 1.96
| epoch 3 |  iter 3801 / 9295 | time 1606[s] | loss 1.94
| epoch 3 |  iter 3821 / 9295 | time 1607[s] | loss 1.92
| epoch 3 |  iter 3841 / 9295 | time 1608[s] | loss 1.93
| epoch 3 |  iter 3861 / 9295 | time 1610[s] | loss 1.90
| epoch 3 |  iter 3881 / 9295 | time 1611[s] | loss 1.92
| epoch 3 |  iter 3901 / 9295 | time 1612[s] | loss 1.96
| epoch 3 |  iter 3921 / 9295 | time 1614[s] | loss 1.95
| epoch 3 |  iter 3941 / 9295 | time 1615[s] | loss 1.96
| epoch 3 |  iter 3961 / 9295 | time 1617[s] | loss 1.91
| epoch 3 |  iter 3981 / 9295 | time 1618[s] | loss 1.93
| epoch 3 |  iter 4001 / 9295 | time 1619[s] | loss 1.92
| epoch 3 |  iter 4021 / 9295 | time 1621[s] | loss 1.95
| epoch 3 |  iter 4041 / 9295 | time 1622[s] | loss 1.94
| epoch 3 |  iter 4061 / 9295 | time 1623[s] | loss 1.91
| epoch 3 |  iter 4081 / 9295 | time 1625[s] | loss 1.93
| epoch 3 |  iter 4101 / 9295 | time 1626[s] | loss 1.93
| epoch 3 |  iter 4121 / 9295 | time 1628[s] | loss 1.91
| epoch 3 |  iter 4141 / 9295 | time 1629[s] | loss 1.92
| epoch 3 |  iter 4161 / 9295 | time 1631[s] | loss 1.91
| epoch 3 |  iter 4181 / 9295 | time 1632[s] | loss 1.92
| epoch 3 |  iter 4201 / 9295 | time 1633[s] | loss 1.89
| epoch 3 |  iter 4221 / 9295 | time 1635[s] | loss 1.92
| epoch 3 |  iter 4241 / 9295 | time 1636[s] | loss 1.93
| epoch 3 |  iter 4261 / 9295 | time 1638[s] | loss 1.95
| epoch 3 |  iter 4281 / 9295 | time 1639[s] | loss 1.92
| epoch 3 |  iter 4301 / 9295 | time 1641[s] | loss 1.92
| epoch 3 |  iter 4321 / 9295 | time 1642[s] | loss 1.96
| epoch 3 |  iter 4341 / 9295 | time 1644[s] | loss 1.93
| epoch 3 |  iter 4361 / 9295 | time 1645[s] | loss 1.94
| epoch 3 |  iter 4381 / 9295 | time 1647[s] | loss 1.95
| epoch 3 |  iter 4401 / 9295 | time 1648[s] | loss 1.92
| epoch 3 |  iter 4421 / 9295 | time 1649[s] | loss 1.90
| epoch 3 |  iter 4441 / 9295 | time 1651[s] | loss 1.89
| epoch 3 |  iter 4461 / 9295 | time 1652[s] | loss 1.96
| epoch 3 |  iter 4481 / 9295 | time 1654[s] | loss 1.93
| epoch 3 |  iter 4501 / 9295 | time 1655[s] | loss 1.92
| epoch 3 |  iter 4521 / 9295 | time 1657[s] | loss 1.94
| epoch 3 |  iter 4541 / 9295 | time 1658[s] | loss 1.93
| epoch 3 |  iter 4561 / 9295 | time 1659[s] | loss 1.93
| epoch 3 |  iter 4581 / 9295 | time 1661[s] | loss 1.95
| epoch 3 |  iter 4601 / 9295 | time 1662[s] | loss 1.92
| epoch 3 |  iter 4621 / 9295 | time 1664[s] | loss 1.91
| epoch 3 |  iter 4641 / 9295 | time 1665[s] | loss 1.91
| epoch 3 |  iter 4661 / 9295 | time 1667[s] | loss 1.95
| epoch 3 |  iter 4681 / 9295 | time 1668[s] | loss 1.89
| epoch 3 |  iter 4701 / 9295 | time 1669[s] | loss 1.96
| epoch 3 |  iter 4721 / 9295 | time 1671[s] | loss 1.91
| epoch 3 |  iter 4741 / 9295 | time 1672[s] | loss 1.93
| epoch 3 |  iter 4761 / 9295 | time 1674[s] | loss 1.94
| epoch 3 |  iter 4781 / 9295 | time 1675[s] | loss 1.98
| epoch 3 |  iter 4801 / 9295 | time 1677[s] | loss 1.93
| epoch 3 |  iter 4821 / 9295 | time 1678[s] | loss 1.93
| epoch 3 |  iter 4841 / 9295 | time 1680[s] | loss 1.92
| epoch 3 |  iter 4861 / 9295 | time 1681[s] | loss 1.94
| epoch 3 |  iter 4881 / 9295 | time 1683[s] | loss 1.92
| epoch 3 |  iter 4901 / 9295 | time 1684[s] | loss 1.94
| epoch 3 |  iter 4921 / 9295 | time 1686[s] | loss 1.93
| epoch 3 |  iter 4941 / 9295 | time 1687[s] | loss 1.93
| epoch 3 |  iter 4961 / 9295 | time 1689[s] | loss 1.91
| epoch 3 |  iter 4981 / 9295 | time 1690[s] | loss 1.91
| epoch 3 |  iter 5001 / 9295 | time 1692[s] | loss 1.92
| epoch 3 |  iter 5021 / 9295 | time 1693[s] | loss 1.93
| epoch 3 |  iter 5041 / 9295 | time 1695[s] | loss 1.87
| epoch 3 |  iter 5061 / 9295 | time 1696[s] | loss 1.91
| epoch 3 |  iter 5081 / 9295 | time 1698[s] | loss 1.92
| epoch 3 |  iter 5101 / 9295 | time 1699[s] | loss 1.94
| epoch 3 |  iter 5121 / 9295 | time 1701[s] | loss 1.95
| epoch 3 |  iter 5141 / 9295 | time 1702[s] | loss 1.92
| epoch 3 |  iter 5161 / 9295 | time 1704[s] | loss 1.90
| epoch 3 |  iter 5181 / 9295 | time 1705[s] | loss 1.92
| epoch 3 |  iter 5201 / 9295 | time 1707[s] | loss 1.93
| epoch 3 |  iter 5221 / 9295 | time 1708[s] | loss 1.87
| epoch 3 |  iter 5241 / 9295 | time 1710[s] | loss 1.89
| epoch 3 |  iter 5261 / 9295 | time 1711[s] | loss 1.89
| epoch 3 |  iter 5281 / 9295 | time 1713[s] | loss 1.95
| epoch 3 |  iter 5301 / 9295 | time 1714[s] | loss 1.95
| epoch 3 |  iter 5321 / 9295 | time 1716[s] | loss 1.89
| epoch 3 |  iter 5341 / 9295 | time 1717[s] | loss 1.92
| epoch 3 |  iter 5361 / 9295 | time 1719[s] | loss 1.90
| epoch 3 |  iter 5381 / 9295 | time 1720[s] | loss 1.92
| epoch 3 |  iter 5401 / 9295 | time 1722[s] | loss 1.95
| epoch 3 |  iter 5421 / 9295 | time 1723[s] | loss 1.93
| epoch 3 |  iter 5441 / 9295 | time 1725[s] | loss 1.89
| epoch 3 |  iter 5461 / 9295 | time 1726[s] | loss 1.87
| epoch 3 |  iter 5481 / 9295 | time 1728[s] | loss 1.95
| epoch 3 |  iter 5501 / 9295 | time 1729[s] | loss 1.95
| epoch 3 |  iter 5521 / 9295 | time 1731[s] | loss 1.94
| epoch 3 |  iter 5541 / 9295 | time 1732[s] | loss 1.90
| epoch 3 |  iter 5561 / 9295 | time 1733[s] | loss 1.93
| epoch 3 |  iter 5581 / 9295 | time 1735[s] | loss 1.93
| epoch 3 |  iter 5601 / 9295 | time 1736[s] | loss 1.93
| epoch 3 |  iter 5621 / 9295 | time 1738[s] | loss 1.92
| epoch 3 |  iter 5641 / 9295 | time 1739[s] | loss 1.92
| epoch 3 |  iter 5661 / 9295 | time 1741[s] | loss 1.88
| epoch 3 |  iter 5681 / 9295 | time 1742[s] | loss 1.91
| epoch 3 |  iter 5701 / 9295 | time 1743[s] | loss 1.95
| epoch 3 |  iter 5721 / 9295 | time 1745[s] | loss 1.93
| epoch 3 |  iter 5741 / 9295 | time 1746[s] | loss 1.90
| epoch 3 |  iter 5761 / 9295 | time 1748[s] | loss 1.93
| epoch 3 |  iter 5781 / 9295 | time 1749[s] | loss 1.92
| epoch 3 |  iter 5801 / 9295 | time 1750[s] | loss 1.92
| epoch 3 |  iter 5821 / 9295 | time 1752[s] | loss 1.88
| epoch 3 |  iter 5841 / 9295 | time 1753[s] | loss 1.91
| epoch 3 |  iter 5861 / 9295 | time 1755[s] | loss 1.95
| epoch 3 |  iter 5881 / 9295 | time 1756[s] | loss 1.92
| epoch 3 |  iter 5901 / 9295 | time 1758[s] | loss 1.95
| epoch 3 |  iter 5921 / 9295 | time 1759[s] | loss 1.93
| epoch 3 |  iter 5941 / 9295 | time 1760[s] | loss 1.97
| epoch 3 |  iter 5961 / 9295 | time 1762[s] | loss 1.96
| epoch 3 |  iter 5981 / 9295 | time 1763[s] | loss 1.93
| epoch 3 |  iter 6001 / 9295 | time 1765[s] | loss 1.89
| epoch 3 |  iter 6021 / 9295 | time 1766[s] | loss 1.89
| epoch 3 |  iter 6041 / 9295 | time 1768[s] | loss 1.94
| epoch 3 |  iter 6061 / 9295 | time 1769[s] | loss 1.91
| epoch 3 |  iter 6081 / 9295 | time 1771[s] | loss 1.92
| epoch 3 |  iter 6101 / 9295 | time 1772[s] | loss 1.90
| epoch 3 |  iter 6121 / 9295 | time 1774[s] | loss 1.92
| epoch 3 |  iter 6141 / 9295 | time 1775[s] | loss 1.92
| epoch 3 |  iter 6161 / 9295 | time 1777[s] | loss 1.93
| epoch 3 |  iter 6181 / 9295 | time 1779[s] | loss 1.92
| epoch 3 |  iter 6201 / 9295 | time 1780[s] | loss 1.95
| epoch 3 |  iter 6221 / 9295 | time 1782[s] | loss 1.90
| epoch 3 |  iter 6241 / 9295 | time 1784[s] | loss 1.95
| epoch 3 |  iter 6261 / 9295 | time 1785[s] | loss 1.94
| epoch 3 |  iter 6281 / 9295 | time 1787[s] | loss 1.92
| epoch 3 |  iter 6301 / 9295 | time 1788[s] | loss 1.93
| epoch 3 |  iter 6321 / 9295 | time 1790[s] | loss 1.92
| epoch 3 |  iter 6341 / 9295 | time 1791[s] | loss 1.91
| epoch 3 |  iter 6361 / 9295 | time 1793[s] | loss 1.95
| epoch 3 |  iter 6381 / 9295 | time 1794[s] | loss 1.91
| epoch 3 |  iter 6401 / 9295 | time 1796[s] | loss 1.91
| epoch 3 |  iter 6421 / 9295 | time 1797[s] | loss 1.95
| epoch 3 |  iter 6441 / 9295 | time 1799[s] | loss 1.96
| epoch 3 |  iter 6461 / 9295 | time 1800[s] | loss 1.93
| epoch 3 |  iter 6481 / 9295 | time 1802[s] | loss 1.93
| epoch 3 |  iter 6501 / 9295 | time 1803[s] | loss 1.89
| epoch 3 |  iter 6521 / 9295 | time 1804[s] | loss 1.89
| epoch 3 |  iter 6541 / 9295 | time 1806[s] | loss 1.93
| epoch 3 |  iter 6561 / 9295 | time 1807[s] | loss 1.91
| epoch 3 |  iter 6581 / 9295 | time 1809[s] | loss 1.88
| epoch 3 |  iter 6601 / 9295 | time 1810[s] | loss 1.93
| epoch 3 |  iter 6621 / 9295 | time 1812[s] | loss 1.89
| epoch 3 |  iter 6641 / 9295 | time 1814[s] | loss 1.90
| epoch 3 |  iter 6661 / 9295 | time 1815[s] | loss 1.90
| epoch 3 |  iter 6681 / 9295 | time 1817[s] | loss 1.86
| epoch 3 |  iter 6701 / 9295 | time 1819[s] | loss 1.89
| epoch 3 |  iter 6721 / 9295 | time 1820[s] | loss 1.88
| epoch 3 |  iter 6741 / 9295 | time 1822[s] | loss 1.93
| epoch 3 |  iter 6761 / 9295 | time 1823[s] | loss 1.89
| epoch 3 |  iter 6781 / 9295 | time 1825[s] | loss 1.89
| epoch 3 |  iter 6801 / 9295 | time 1827[s] | loss 1.87
| epoch 3 |  iter 6821 / 9295 | time 1828[s] | loss 1.89
| epoch 3 |  iter 6841 / 9295 | time 1830[s] | loss 1.88
| epoch 3 |  iter 6861 / 9295 | time 1832[s] | loss 1.88
| epoch 3 |  iter 6881 / 9295 | time 1833[s] | loss 1.91
| epoch 3 |  iter 6901 / 9295 | time 1835[s] | loss 1.93
| epoch 3 |  iter 6921 / 9295 | time 1836[s] | loss 1.90
| epoch 3 |  iter 6941 / 9295 | time 1838[s] | loss 1.93
| epoch 3 |  iter 6961 / 9295 | time 1839[s] | loss 1.93
| epoch 3 |  iter 6981 / 9295 | time 1841[s] | loss 1.94
| epoch 3 |  iter 7001 / 9295 | time 1843[s] | loss 1.92
| epoch 3 |  iter 7021 / 9295 | time 1845[s] | loss 1.88
| epoch 3 |  iter 7041 / 9295 | time 1847[s] | loss 1.92
| epoch 3 |  iter 7061 / 9295 | time 1849[s] | loss 1.90
| epoch 3 |  iter 7081 / 9295 | time 1851[s] | loss 1.90
| epoch 3 |  iter 7101 / 9295 | time 1853[s] | loss 1.92
| epoch 3 |  iter 7121 / 9295 | time 1855[s] | loss 1.90
| epoch 3 |  iter 7141 / 9295 | time 1858[s] | loss 1.93
| epoch 3 |  iter 7161 / 9295 | time 1860[s] | loss 1.91
| epoch 3 |  iter 7181 / 9295 | time 1862[s] | loss 1.90
| epoch 3 |  iter 7201 / 9295 | time 1863[s] | loss 1.92
| epoch 3 |  iter 7221 / 9295 | time 1865[s] | loss 1.91
| epoch 3 |  iter 7241 / 9295 | time 1866[s] | loss 1.94
| epoch 3 |  iter 7261 / 9295 | time 1868[s] | loss 1.86
| epoch 3 |  iter 7281 / 9295 | time 1869[s] | loss 1.92
| epoch 3 |  iter 7301 / 9295 | time 1870[s] | loss 1.90
| epoch 3 |  iter 7321 / 9295 | time 1872[s] | loss 1.94
| epoch 3 |  iter 7341 / 9295 | time 1873[s] | loss 1.93
| epoch 3 |  iter 7361 / 9295 | time 1874[s] | loss 1.93
| epoch 3 |  iter 7381 / 9295 | time 1876[s] | loss 1.89
| epoch 3 |  iter 7401 / 9295 | time 1877[s] | loss 1.89
| epoch 3 |  iter 7421 / 9295 | time 1878[s] | loss 1.88
| epoch 3 |  iter 7441 / 9295 | time 1880[s] | loss 1.90
| epoch 3 |  iter 7461 / 9295 | time 1881[s] | loss 1.90
| epoch 3 |  iter 7481 / 9295 | time 1882[s] | loss 1.92
| epoch 3 |  iter 7501 / 9295 | time 1884[s] | loss 1.88
| epoch 3 |  iter 7521 / 9295 | time 1885[s] | loss 1.91
| epoch 3 |  iter 7541 / 9295 | time 1886[s] | loss 1.89
| epoch 3 |  iter 7561 / 9295 | time 1888[s] | loss 1.89
| epoch 3 |  iter 7581 / 9295 | time 1889[s] | loss 1.93
| epoch 3 |  iter 7601 / 9295 | time 1891[s] | loss 1.89
| epoch 3 |  iter 7621 / 9295 | time 1892[s] | loss 1.87
| epoch 3 |  iter 7641 / 9295 | time 1894[s] | loss 1.88
| epoch 3 |  iter 7661 / 9295 | time 1895[s] | loss 1.87
| epoch 3 |  iter 7681 / 9295 | time 1897[s] | loss 1.91
| epoch 3 |  iter 7701 / 9295 | time 1899[s] | loss 1.93
| epoch 3 |  iter 7721 / 9295 | time 1900[s] | loss 1.90
| epoch 3 |  iter 7741 / 9295 | time 1902[s] | loss 1.88
| epoch 3 |  iter 7761 / 9295 | time 1903[s] | loss 1.90
| epoch 3 |  iter 7781 / 9295 | time 1905[s] | loss 1.91
| epoch 3 |  iter 7801 / 9295 | time 1906[s] | loss 1.91
| epoch 3 |  iter 7821 / 9295 | time 1908[s] | loss 1.91
| epoch 3 |  iter 7841 / 9295 | time 1909[s] | loss 1.87
| epoch 3 |  iter 7861 / 9295 | time 1911[s] | loss 1.89
| epoch 3 |  iter 7881 / 9295 | time 1912[s] | loss 1.86
| epoch 3 |  iter 7901 / 9295 | time 1914[s] | loss 1.89
| epoch 3 |  iter 7921 / 9295 | time 1915[s] | loss 1.91
| epoch 3 |  iter 7941 / 9295 | time 1917[s] | loss 1.88
| epoch 3 |  iter 7961 / 9295 | time 1919[s] | loss 1.89
| epoch 3 |  iter 7981 / 9295 | time 1920[s] | loss 1.87
| epoch 3 |  iter 8001 / 9295 | time 1922[s] | loss 1.91
| epoch 3 |  iter 8021 / 9295 | time 1923[s] | loss 1.91
| epoch 3 |  iter 8041 / 9295 | time 1925[s] | loss 1.89
| epoch 3 |  iter 8061 / 9295 | time 1927[s] | loss 1.88
| epoch 3 |  iter 8081 / 9295 | time 1929[s] | loss 1.92
| epoch 3 |  iter 8101 / 9295 | time 1930[s] | loss 1.87
| epoch 3 |  iter 8121 / 9295 | time 1932[s] | loss 1.92
| epoch 3 |  iter 8141 / 9295 | time 1934[s] | loss 1.90
| epoch 3 |  iter 8161 / 9295 | time 1936[s] | loss 1.89
| epoch 3 |  iter 8181 / 9295 | time 1938[s] | loss 1.89
| epoch 3 |  iter 8201 / 9295 | time 1939[s] | loss 1.93
| epoch 3 |  iter 8221 / 9295 | time 1941[s] | loss 1.93
| epoch 3 |  iter 8241 / 9295 | time 1943[s] | loss 1.91
| epoch 3 |  iter 8261 / 9295 | time 1945[s] | loss 1.91
| epoch 3 |  iter 8281 / 9295 | time 1946[s] | loss 1.90
| epoch 3 |  iter 8301 / 9295 | time 1948[s] | loss 1.88
| epoch 3 |  iter 8321 / 9295 | time 1949[s] | loss 1.89
| epoch 3 |  iter 8341 / 9295 | time 1951[s] | loss 1.91
| epoch 3 |  iter 8361 / 9295 | time 1953[s] | loss 1.86
| epoch 3 |  iter 8381 / 9295 | time 1954[s] | loss 1.86
| epoch 3 |  iter 8401 / 9295 | time 1956[s] | loss 1.91
| epoch 3 |  iter 8421 / 9295 | time 1957[s] | loss 1.90
| epoch 3 |  iter 8441 / 9295 | time 1958[s] | loss 1.88
| epoch 3 |  iter 8461 / 9295 | time 1960[s] | loss 1.90
| epoch 3 |  iter 8481 / 9295 | time 1961[s] | loss 1.91
| epoch 3 |  iter 8501 / 9295 | time 1962[s] | loss 1.94
| epoch 3 |  iter 8521 / 9295 | time 1964[s] | loss 1.88
| epoch 3 |  iter 8541 / 9295 | time 1965[s] | loss 1.89
| epoch 3 |  iter 8561 / 9295 | time 1966[s] | loss 1.93
| epoch 3 |  iter 8581 / 9295 | time 1968[s] | loss 1.92
| epoch 3 |  iter 8601 / 9295 | time 1969[s] | loss 1.89
| epoch 3 |  iter 8621 / 9295 | time 1971[s] | loss 1.91
| epoch 3 |  iter 8641 / 9295 | time 1972[s] | loss 1.89
| epoch 3 |  iter 8661 / 9295 | time 1973[s] | loss 1.88
| epoch 3 |  iter 8681 / 9295 | time 1975[s] | loss 1.90
| epoch 3 |  iter 8701 / 9295 | time 1976[s] | loss 1.92
| epoch 3 |  iter 8721 / 9295 | time 1978[s] | loss 1.88
| epoch 3 |  iter 8741 / 9295 | time 1979[s] | loss 1.90
| epoch 3 |  iter 8761 / 9295 | time 1980[s] | loss 1.90
| epoch 3 |  iter 8781 / 9295 | time 1982[s] | loss 1.89
| epoch 3 |  iter 8801 / 9295 | time 1983[s] | loss 1.91
| epoch 3 |  iter 8821 / 9295 | time 1984[s] | loss 1.92
| epoch 3 |  iter 8841 / 9295 | time 1986[s] | loss 1.89
| epoch 3 |  iter 8861 / 9295 | time 1987[s] | loss 1.91
| epoch 3 |  iter 8881 / 9295 | time 1988[s] | loss 1.95
| epoch 3 |  iter 8901 / 9295 | time 1990[s] | loss 1.87
| epoch 3 |  iter 8921 / 9295 | time 1991[s] | loss 1.86
| epoch 3 |  iter 8941 / 9295 | time 1992[s] | loss 1.92
| epoch 3 |  iter 8961 / 9295 | time 1993[s] | loss 1.90
| epoch 3 |  iter 8981 / 9295 | time 1995[s] | loss 1.86
| epoch 3 |  iter 9001 / 9295 | time 1996[s] | loss 1.91
| epoch 3 |  iter 9021 / 9295 | time 1997[s] | loss 1.88
| epoch 3 |  iter 9041 / 9295 | time 1999[s] | loss 1.92
| epoch 3 |  iter 9061 / 9295 | time 2000[s] | loss 1.91
| epoch 3 |  iter 9081 / 9295 | time 2001[s] | loss 1.87
| epoch 3 |  iter 9101 / 9295 | time 2003[s] | loss 1.88
| epoch 3 |  iter 9121 / 9295 | time 2004[s] | loss 1.86
| epoch 3 |  iter 9141 / 9295 | time 2006[s] | loss 1.90
| epoch 3 |  iter 9161 / 9295 | time 2007[s] | loss 1.92
| epoch 3 |  iter 9181 / 9295 | time 2009[s] | loss 1.90
| epoch 3 |  iter 9201 / 9295 | time 2011[s] | loss 1.88
| epoch 3 |  iter 9221 / 9295 | time 2013[s] | loss 1.89
| epoch 3 |  iter 9241 / 9295 | time 2014[s] | loss 1.88
| epoch 3 |  iter 9261 / 9295 | time 2016[s] | loss 1.89
| epoch 3 |  iter 9281 / 9295 | time 2018[s] | loss 1.88
| epoch 4 |  iter 1 / 9295 | time 2020[s] | loss 1.92
| epoch 4 |  iter 21 / 9295 | time 2022[s] | loss 1.75
| epoch 4 |  iter 41 / 9295 | time 2024[s] | loss 1.82
| epoch 4 |  iter 61 / 9295 | time 2026[s] | loss 1.83
| epoch 4 |  iter 81 / 9295 | time 2028[s] | loss 1.84
| epoch 4 |  iter 101 / 9295 | time 2030[s] | loss 1.81
| epoch 4 |  iter 121 / 9295 | time 2032[s] | loss 1.80
| epoch 4 |  iter 141 / 9295 | time 2034[s] | loss 1.84
| epoch 4 |  iter 161 / 9295 | time 2035[s] | loss 1.87
| epoch 4 |  iter 181 / 9295 | time 2037[s] | loss 1.80
| epoch 4 |  iter 201 / 9295 | time 2039[s] | loss 1.83
| epoch 4 |  iter 221 / 9295 | time 2040[s] | loss 1.85
| epoch 4 |  iter 241 / 9295 | time 2042[s] | loss 1.81
| epoch 4 |  iter 261 / 9295 | time 2043[s] | loss 1.80
| epoch 4 |  iter 281 / 9295 | time 2045[s] | loss 1.83
| epoch 4 |  iter 301 / 9295 | time 2047[s] | loss 1.85
| epoch 4 |  iter 321 / 9295 | time 2048[s] | loss 1.85
| epoch 4 |  iter 341 / 9295 | time 2050[s] | loss 1.85
| epoch 4 |  iter 361 / 9295 | time 2052[s] | loss 1.81
| epoch 4 |  iter 381 / 9295 | time 2053[s] | loss 1.83
| epoch 4 |  iter 401 / 9295 | time 2055[s] | loss 1.83
| epoch 4 |  iter 421 / 9295 | time 2056[s] | loss 1.83
| epoch 4 |  iter 441 / 9295 | time 2058[s] | loss 1.81
| epoch 4 |  iter 461 / 9295 | time 2060[s] | loss 1.83
| epoch 4 |  iter 481 / 9295 | time 2061[s] | loss 1.81
| epoch 4 |  iter 501 / 9295 | time 2062[s] | loss 1.83
| epoch 4 |  iter 521 / 9295 | time 2064[s] | loss 1.77
| epoch 4 |  iter 541 / 9295 | time 2065[s] | loss 1.83
| epoch 4 |  iter 561 / 9295 | time 2067[s] | loss 1.84
| epoch 4 |  iter 581 / 9295 | time 2068[s] | loss 1.75
| epoch 4 |  iter 601 / 9295 | time 2070[s] | loss 1.81
| epoch 4 |  iter 621 / 9295 | time 2071[s] | loss 1.82
| epoch 4 |  iter 641 / 9295 | time 2072[s] | loss 1.85
| epoch 4 |  iter 661 / 9295 | time 2074[s] | loss 1.85
| epoch 4 |  iter 681 / 9295 | time 2075[s] | loss 1.82
| epoch 4 |  iter 701 / 9295 | time 2077[s] | loss 1.84
| epoch 4 |  iter 721 / 9295 | time 2078[s] | loss 1.80
| epoch 4 |  iter 741 / 9295 | time 2080[s] | loss 1.81
| epoch 4 |  iter 761 / 9295 | time 2081[s] | loss 1.81
| epoch 4 |  iter 781 / 9295 | time 2083[s] | loss 1.79
| epoch 4 |  iter 801 / 9295 | time 2084[s] | loss 1.87
| epoch 4 |  iter 821 / 9295 | time 2086[s] | loss 1.79
| epoch 4 |  iter 841 / 9295 | time 2087[s] | loss 1.83
| epoch 4 |  iter 861 / 9295 | time 2088[s] | loss 1.80
| epoch 4 |  iter 881 / 9295 | time 2090[s] | loss 1.84
| epoch 4 |  iter 901 / 9295 | time 2091[s] | loss 1.82
| epoch 4 |  iter 921 / 9295 | time 2093[s] | loss 1.82
| epoch 4 |  iter 941 / 9295 | time 2094[s] | loss 1.79
| epoch 4 |  iter 961 / 9295 | time 2095[s] | loss 1.81
| epoch 4 |  iter 981 / 9295 | time 2097[s] | loss 1.81
| epoch 4 |  iter 1001 / 9295 | time 2098[s] | loss 1.84
| epoch 4 |  iter 1021 / 9295 | time 2100[s] | loss 1.82
| epoch 4 |  iter 1041 / 9295 | time 2101[s] | loss 1.86
| epoch 4 |  iter 1061 / 9295 | time 2103[s] | loss 1.84
| epoch 4 |  iter 1081 / 9295 | time 2104[s] | loss 1.81
| epoch 4 |  iter 1101 / 9295 | time 2105[s] | loss 1.84
| epoch 4 |  iter 1121 / 9295 | time 2107[s] | loss 1.83
| epoch 4 |  iter 1141 / 9295 | time 2108[s] | loss 1.80
| epoch 4 |  iter 1161 / 9295 | time 2110[s] | loss 1.80
| epoch 4 |  iter 1181 / 9295 | time 2111[s] | loss 1.79
| epoch 4 |  iter 1201 / 9295 | time 2113[s] | loss 1.82
| epoch 4 |  iter 1221 / 9295 | time 2114[s] | loss 1.85
| epoch 4 |  iter 1241 / 9295 | time 2116[s] | loss 1.81
| epoch 4 |  iter 1261 / 9295 | time 2117[s] | loss 1.86
| epoch 4 |  iter 1281 / 9295 | time 2119[s] | loss 1.77
| epoch 4 |  iter 1301 / 9295 | time 2120[s] | loss 1.79
| epoch 4 |  iter 1321 / 9295 | time 2122[s] | loss 1.82
| epoch 4 |  iter 1341 / 9295 | time 2123[s] | loss 1.83
| epoch 4 |  iter 1361 / 9295 | time 2125[s] | loss 1.84
| epoch 4 |  iter 1381 / 9295 | time 2126[s] | loss 1.85
| epoch 4 |  iter 1401 / 9295 | time 2127[s] | loss 1.82
| epoch 4 |  iter 1421 / 9295 | time 2128[s] | loss 1.83
| epoch 4 |  iter 1441 / 9295 | time 2130[s] | loss 1.85
| epoch 4 |  iter 1461 / 9295 | time 2131[s] | loss 1.80
| epoch 4 |  iter 1481 / 9295 | time 2132[s] | loss 1.79
| epoch 4 |  iter 1501 / 9295 | time 2134[s] | loss 1.82
| epoch 4 |  iter 1521 / 9295 | time 2136[s] | loss 1.83
| epoch 4 |  iter 1541 / 9295 | time 2137[s] | loss 1.83
| epoch 4 |  iter 1561 / 9295 | time 2139[s] | loss 1.85
| epoch 4 |  iter 1581 / 9295 | time 2140[s] | loss 1.82
| epoch 4 |  iter 1601 / 9295 | time 2142[s] | loss 1.82
| epoch 4 |  iter 1621 / 9295 | time 2143[s] | loss 1.83
| epoch 4 |  iter 1641 / 9295 | time 2144[s] | loss 1.85
| epoch 4 |  iter 1661 / 9295 | time 2146[s] | loss 1.86
| epoch 4 |  iter 1681 / 9295 | time 2147[s] | loss 1.83
| epoch 4 |  iter 1701 / 9295 | time 2149[s] | loss 1.88
| epoch 4 |  iter 1721 / 9295 | time 2150[s] | loss 1.82
| epoch 4 |  iter 1741 / 9295 | time 2152[s] | loss 1.85
| epoch 4 |  iter 1761 / 9295 | time 2153[s] | loss 1.85
| epoch 4 |  iter 1781 / 9295 | time 2154[s] | loss 1.78
| epoch 4 |  iter 1801 / 9295 | time 2156[s] | loss 1.82
| epoch 4 |  iter 1821 / 9295 | time 2157[s] | loss 1.80
| epoch 4 |  iter 1841 / 9295 | time 2159[s] | loss 1.81
| epoch 4 |  iter 1861 / 9295 | time 2160[s] | loss 1.85
| epoch 4 |  iter 1881 / 9295 | time 2162[s] | loss 1.83
| epoch 4 |  iter 1901 / 9295 | time 2163[s] | loss 1.79
| epoch 4 |  iter 1921 / 9295 | time 2164[s] | loss 1.78
| epoch 4 |  iter 1941 / 9295 | time 2166[s] | loss 1.83
| epoch 4 |  iter 1961 / 9295 | time 2167[s] | loss 1.84
| epoch 4 |  iter 1981 / 9295 | time 2169[s] | loss 1.82
| epoch 4 |  iter 2001 / 9295 | time 2171[s] | loss 1.80
| epoch 4 |  iter 2021 / 9295 | time 2173[s] | loss 1.80
| epoch 4 |  iter 2041 / 9295 | time 2176[s] | loss 1.81
| epoch 4 |  iter 2061 / 9295 | time 2179[s] | loss 1.81
| epoch 4 |  iter 2081 / 9295 | time 2180[s] | loss 1.81
| epoch 4 |  iter 2101 / 9295 | time 2182[s] | loss 1.81
| epoch 4 |  iter 2121 / 9295 | time 2184[s] | loss 1.80
| epoch 4 |  iter 2141 / 9295 | time 2185[s] | loss 1.78
| epoch 4 |  iter 2161 / 9295 | time 2187[s] | loss 1.84
| epoch 4 |  iter 2181 / 9295 | time 2189[s] | loss 1.81
| epoch 4 |  iter 2201 / 9295 | time 2190[s] | loss 1.87
| epoch 4 |  iter 2221 / 9295 | time 2192[s] | loss 1.84
| epoch 4 |  iter 2241 / 9295 | time 2193[s] | loss 1.78
| epoch 4 |  iter 2261 / 9295 | time 2195[s] | loss 1.79
| epoch 4 |  iter 2281 / 9295 | time 2197[s] | loss 1.81
| epoch 4 |  iter 2301 / 9295 | time 2198[s] | loss 1.82
| epoch 4 |  iter 2321 / 9295 | time 2201[s] | loss 1.83
| epoch 4 |  iter 2341 / 9295 | time 2203[s] | loss 1.85
| epoch 4 |  iter 2361 / 9295 | time 2204[s] | loss 1.80
| epoch 4 |  iter 2381 / 9295 | time 2206[s] | loss 1.85
| epoch 4 |  iter 2401 / 9295 | time 2208[s] | loss 1.82
| epoch 4 |  iter 2421 / 9295 | time 2209[s] | loss 1.86
| epoch 4 |  iter 2441 / 9295 | time 2211[s] | loss 1.82
| epoch 4 |  iter 2461 / 9295 | time 2213[s] | loss 1.82
| epoch 4 |  iter 2481 / 9295 | time 2214[s] | loss 1.84
| epoch 4 |  iter 2501 / 9295 | time 2216[s] | loss 1.80
| epoch 4 |  iter 2521 / 9295 | time 2217[s] | loss 1.79
| epoch 4 |  iter 2541 / 9295 | time 2219[s] | loss 1.81
| epoch 4 |  iter 2561 / 9295 | time 2220[s] | loss 1.83
| epoch 4 |  iter 2581 / 9295 | time 2222[s] | loss 1.87
| epoch 4 |  iter 2601 / 9295 | time 2224[s] | loss 1.81
| epoch 4 |  iter 2621 / 9295 | time 2225[s] | loss 1.81
| epoch 4 |  iter 2641 / 9295 | time 2227[s] | loss 1.81
| epoch 4 |  iter 2661 / 9295 | time 2228[s] | loss 1.82
| epoch 4 |  iter 2681 / 9295 | time 2230[s] | loss 1.79
| epoch 4 |  iter 2701 / 9295 | time 2231[s] | loss 1.77
| epoch 4 |  iter 2721 / 9295 | time 2233[s] | loss 1.83
| epoch 4 |  iter 2741 / 9295 | time 2235[s] | loss 1.82
| epoch 4 |  iter 2761 / 9295 | time 2236[s] | loss 1.86
| epoch 4 |  iter 2781 / 9295 | time 2238[s] | loss 1.80
| epoch 4 |  iter 2801 / 9295 | time 2240[s] | loss 1.84
| epoch 4 |  iter 2821 / 9295 | time 2241[s] | loss 1.78
| epoch 4 |  iter 2841 / 9295 | time 2243[s] | loss 1.83
| epoch 4 |  iter 2861 / 9295 | time 2244[s] | loss 1.81
| epoch 4 |  iter 2881 / 9295 | time 2246[s] | loss 1.83
| epoch 4 |  iter 2901 / 9295 | time 2248[s] | loss 1.83
| epoch 4 |  iter 2921 / 9295 | time 2249[s] | loss 1.80
| epoch 4 |  iter 2941 / 9295 | time 2251[s] | loss 1.83
| epoch 4 |  iter 2961 / 9295 | time 2252[s] | loss 1.86
| epoch 4 |  iter 2981 / 9295 | time 2254[s] | loss 1.84
| epoch 4 |  iter 3001 / 9295 | time 2256[s] | loss 1.79
| epoch 4 |  iter 3021 / 9295 | time 2258[s] | loss 1.84
| epoch 4 |  iter 3041 / 9295 | time 2260[s] | loss 1.81
| epoch 4 |  iter 3061 / 9295 | time 2261[s] | loss 1.78
| epoch 4 |  iter 3081 / 9295 | time 2263[s] | loss 1.84
| epoch 4 |  iter 3101 / 9295 | time 2264[s] | loss 1.82
| epoch 4 |  iter 3121 / 9295 | time 2266[s] | loss 1.80
| epoch 4 |  iter 3141 / 9295 | time 2268[s] | loss 1.85
| epoch 4 |  iter 3161 / 9295 | time 2269[s] | loss 1.81
| epoch 4 |  iter 3181 / 9295 | time 2271[s] | loss 1.85
| epoch 4 |  iter 3201 / 9295 | time 2272[s] | loss 1.83
| epoch 4 |  iter 3221 / 9295 | time 2274[s] | loss 1.79
| epoch 4 |  iter 3241 / 9295 | time 2275[s] | loss 1.82
| epoch 4 |  iter 3261 / 9295 | time 2277[s] | loss 1.82
| epoch 4 |  iter 3281 / 9295 | time 2279[s] | loss 1.86
| epoch 4 |  iter 3301 / 9295 | time 2280[s] | loss 1.80
| epoch 4 |  iter 3321 / 9295 | time 2282[s] | loss 1.80
| epoch 4 |  iter 3341 / 9295 | time 2283[s] | loss 1.83
| epoch 4 |  iter 3361 / 9295 | time 2285[s] | loss 1.80
| epoch 4 |  iter 3381 / 9295 | time 2287[s] | loss 1.83
| epoch 4 |  iter 3401 / 9295 | time 2288[s] | loss 1.83
| epoch 4 |  iter 3421 / 9295 | time 2290[s] | loss 1.85
| epoch 4 |  iter 3441 / 9295 | time 2291[s] | loss 1.84
| epoch 4 |  iter 3461 / 9295 | time 2293[s] | loss 1.83
| epoch 4 |  iter 3481 / 9295 | time 2294[s] | loss 1.82
| epoch 4 |  iter 3501 / 9295 | time 2296[s] | loss 1.81
| epoch 4 |  iter 3521 / 9295 | time 2298[s] | loss 1.82
| epoch 4 |  iter 3541 / 9295 | time 2299[s] | loss 1.81
| epoch 4 |  iter 3561 / 9295 | time 2301[s] | loss 1.82
| epoch 4 |  iter 3581 / 9295 | time 2302[s] | loss 1.82
| epoch 4 |  iter 3601 / 9295 | time 2304[s] | loss 1.80
| epoch 4 |  iter 3621 / 9295 | time 2305[s] | loss 1.81
| epoch 4 |  iter 3641 / 9295 | time 2307[s] | loss 1.79
| epoch 4 |  iter 3661 / 9295 | time 2309[s] | loss 1.83
| epoch 4 |  iter 3681 / 9295 | time 2311[s] | loss 1.82
| epoch 4 |  iter 3701 / 9295 | time 2312[s] | loss 1.87
| epoch 4 |  iter 3721 / 9295 | time 2314[s] | loss 1.82
| epoch 4 |  iter 3741 / 9295 | time 2315[s] | loss 1.80
| epoch 4 |  iter 3761 / 9295 | time 2317[s] | loss 1.82
| epoch 4 |  iter 3781 / 9295 | time 2318[s] | loss 1.84
| epoch 4 |  iter 3801 / 9295 | time 2320[s] | loss 1.79
| epoch 4 |  iter 3821 / 9295 | time 2321[s] | loss 1.79
| epoch 4 |  iter 3841 / 9295 | time 2323[s] | loss 1.86
| epoch 4 |  iter 3861 / 9295 | time 2324[s] | loss 1.85
| epoch 4 |  iter 3881 / 9295 | time 2326[s] | loss 1.79
| epoch 4 |  iter 3901 / 9295 | time 2327[s] | loss 1.81
| epoch 4 |  iter 3921 / 9295 | time 2329[s] | loss 1.82
| epoch 4 |  iter 3941 / 9295 | time 2331[s] | loss 1.82
| epoch 4 |  iter 3961 / 9295 | time 2332[s] | loss 1.85
| epoch 4 |  iter 3981 / 9295 | time 2334[s] | loss 1.85
| epoch 4 |  iter 4001 / 9295 | time 2335[s] | loss 1.83
| epoch 4 |  iter 4021 / 9295 | time 2337[s] | loss 1.82
| epoch 4 |  iter 4041 / 9295 | time 2338[s] | loss 1.83
| epoch 4 |  iter 4061 / 9295 | time 2340[s] | loss 1.79
| epoch 4 |  iter 4081 / 9295 | time 2341[s] | loss 1.80
| epoch 4 |  iter 4101 / 9295 | time 2343[s] | loss 1.81
| epoch 4 |  iter 4121 / 9295 | time 2344[s] | loss 1.79
| epoch 4 |  iter 4141 / 9295 | time 2346[s] | loss 1.81
| epoch 4 |  iter 4161 / 9295 | time 2347[s] | loss 1.84
| epoch 4 |  iter 4181 / 9295 | time 2349[s] | loss 1.82
| epoch 4 |  iter 4201 / 9295 | time 2350[s] | loss 1.82
| epoch 4 |  iter 4221 / 9295 | time 2352[s] | loss 1.82
| epoch 4 |  iter 4241 / 9295 | time 2354[s] | loss 1.82
| epoch 4 |  iter 4261 / 9295 | time 2355[s] | loss 1.83
| epoch 4 |  iter 4281 / 9295 | time 2357[s] | loss 1.81
| epoch 4 |  iter 4301 / 9295 | time 2358[s] | loss 1.81
| epoch 4 |  iter 4321 / 9295 | time 2360[s] | loss 1.86
| epoch 4 |  iter 4341 / 9295 | time 2362[s] | loss 1.79
| epoch 4 |  iter 4361 / 9295 | time 2363[s] | loss 1.83
| epoch 4 |  iter 4381 / 9295 | time 2365[s] | loss 1.81
| epoch 4 |  iter 4401 / 9295 | time 2366[s] | loss 1.80
| epoch 4 |  iter 4421 / 9295 | time 2368[s] | loss 1.83
| epoch 4 |  iter 4441 / 9295 | time 2369[s] | loss 1.83
| epoch 4 |  iter 4461 / 9295 | time 2371[s] | loss 1.80
| epoch 4 |  iter 4481 / 9295 | time 2373[s] | loss 1.82
| epoch 4 |  iter 4501 / 9295 | time 2374[s] | loss 1.87
| epoch 4 |  iter 4521 / 9295 | time 2376[s] | loss 1.78
| epoch 4 |  iter 4541 / 9295 | time 2377[s] | loss 1.82
| epoch 4 |  iter 4561 / 9295 | time 2379[s] | loss 1.80
| epoch 4 |  iter 4581 / 9295 | time 2381[s] | loss 1.85
| epoch 4 |  iter 4601 / 9295 | time 2382[s] | loss 1.80
| epoch 4 |  iter 4621 / 9295 | time 2384[s] | loss 1.83
| epoch 4 |  iter 4641 / 9295 | time 2385[s] | loss 1.79
| epoch 4 |  iter 4661 / 9295 | time 2387[s] | loss 1.80
| epoch 4 |  iter 4681 / 9295 | time 2389[s] | loss 1.79
| epoch 4 |  iter 4701 / 9295 | time 2390[s] | loss 1.82
| epoch 4 |  iter 4721 / 9295 | time 2392[s] | loss 1.85
| epoch 4 |  iter 4741 / 9295 | time 2393[s] | loss 1.80
| epoch 4 |  iter 4761 / 9295 | time 2395[s] | loss 1.84
| epoch 4 |  iter 4781 / 9295 | time 2397[s] | loss 1.81
| epoch 4 |  iter 4801 / 9295 | time 2398[s] | loss 1.82
| epoch 4 |  iter 4821 / 9295 | time 2400[s] | loss 1.80
| epoch 4 |  iter 4841 / 9295 | time 2401[s] | loss 1.82
| epoch 4 |  iter 4861 / 9295 | time 2403[s] | loss 1.81
| epoch 4 |  iter 4881 / 9295 | time 2405[s] | loss 1.77
| epoch 4 |  iter 4901 / 9295 | time 2406[s] | loss 1.82
| epoch 4 |  iter 4921 / 9295 | time 2408[s] | loss 1.80
| epoch 4 |  iter 4941 / 9295 | time 2409[s] | loss 1.83
| epoch 4 |  iter 4961 / 9295 | time 2411[s] | loss 1.81
| epoch 4 |  iter 4981 / 9295 | time 2413[s] | loss 1.79
| epoch 4 |  iter 5001 / 9295 | time 2414[s] | loss 1.86
| epoch 4 |  iter 5021 / 9295 | time 2416[s] | loss 1.85
| epoch 4 |  iter 5041 / 9295 | time 2418[s] | loss 1.83
| epoch 4 |  iter 5061 / 9295 | time 2419[s] | loss 1.83
| epoch 4 |  iter 5081 / 9295 | time 2421[s] | loss 1.81
| epoch 4 |  iter 5101 / 9295 | time 2422[s] | loss 1.83
| epoch 4 |  iter 5121 / 9295 | time 2424[s] | loss 1.83
| epoch 4 |  iter 5141 / 9295 | time 2426[s] | loss 1.85
| epoch 4 |  iter 5161 / 9295 | time 2427[s] | loss 1.82
| epoch 4 |  iter 5181 / 9295 | time 2429[s] | loss 1.81
| epoch 4 |  iter 5201 / 9295 | time 2431[s] | loss 1.81
| epoch 4 |  iter 5221 / 9295 | time 2432[s] | loss 1.82
| epoch 4 |  iter 5241 / 9295 | time 2434[s] | loss 1.78
| epoch 4 |  iter 5261 / 9295 | time 2436[s] | loss 1.80
| epoch 4 |  iter 5281 / 9295 | time 2437[s] | loss 1.81
| epoch 4 |  iter 5301 / 9295 | time 2439[s] | loss 1.81
| epoch 4 |  iter 5321 / 9295 | time 2441[s] | loss 1.80
| epoch 4 |  iter 5341 / 9295 | time 2442[s] | loss 1.84
| epoch 4 |  iter 5361 / 9295 | time 2444[s] | loss 1.82
| epoch 4 |  iter 5381 / 9295 | time 2445[s] | loss 1.81
| epoch 4 |  iter 5401 / 9295 | time 2447[s] | loss 1.77
| epoch 4 |  iter 5421 / 9295 | time 2449[s] | loss 1.79
| epoch 4 |  iter 5441 / 9295 | time 2450[s] | loss 1.80
| epoch 4 |  iter 5461 / 9295 | time 2452[s] | loss 1.83
| epoch 4 |  iter 5481 / 9295 | time 2454[s] | loss 1.80
| epoch 4 |  iter 5501 / 9295 | time 2455[s] | loss 1.82
| epoch 4 |  iter 5521 / 9295 | time 2457[s] | loss 1.84
| epoch 4 |  iter 5541 / 9295 | time 2458[s] | loss 1.81
| epoch 4 |  iter 5561 / 9295 | time 2460[s] | loss 1.79
| epoch 4 |  iter 5581 / 9295 | time 2462[s] | loss 1.80
| epoch 4 |  iter 5601 / 9295 | time 2463[s] | loss 1.82
| epoch 4 |  iter 5621 / 9295 | time 2465[s] | loss 1.77
| epoch 4 |  iter 5641 / 9295 | time 2466[s] | loss 1.79
| epoch 4 |  iter 5661 / 9295 | time 2468[s] | loss 1.80
| epoch 4 |  iter 5681 / 9295 | time 2470[s] | loss 1.81
| epoch 4 |  iter 5701 / 9295 | time 2471[s] | loss 1.78
| epoch 4 |  iter 5721 / 9295 | time 2473[s] | loss 1.81
| epoch 4 |  iter 5741 / 9295 | time 2474[s] | loss 1.81
| epoch 4 |  iter 5761 / 9295 | time 2476[s] | loss 1.79
| epoch 4 |  iter 5781 / 9295 | time 2477[s] | loss 1.82
| epoch 4 |  iter 5801 / 9295 | time 2479[s] | loss 1.82
| epoch 4 |  iter 5821 / 9295 | time 2481[s] | loss 1.82
| epoch 4 |  iter 5841 / 9295 | time 2482[s] | loss 1.78
| epoch 4 |  iter 5861 / 9295 | time 2483[s] | loss 1.85
| epoch 4 |  iter 5881 / 9295 | time 2485[s] | loss 1.80
| epoch 4 |  iter 5901 / 9295 | time 2486[s] | loss 1.80
| epoch 4 |  iter 5921 / 9295 | time 2488[s] | loss 1.80
| epoch 4 |  iter 5941 / 9295 | time 2490[s] | loss 1.82
| epoch 4 |  iter 5961 / 9295 | time 2491[s] | loss 1.82
| epoch 4 |  iter 5981 / 9295 | time 2492[s] | loss 1.79
| epoch 4 |  iter 6001 / 9295 | time 2494[s] | loss 1.82
| epoch 4 |  iter 6021 / 9295 | time 2495[s] | loss 1.81
| epoch 4 |  iter 6041 / 9295 | time 2497[s] | loss 1.80
| epoch 4 |  iter 6061 / 9295 | time 2498[s] | loss 1.81
| epoch 4 |  iter 6081 / 9295 | time 2500[s] | loss 1.84
| epoch 4 |  iter 6101 / 9295 | time 2501[s] | loss 1.85
| epoch 4 |  iter 6121 / 9295 | time 2502[s] | loss 1.80
| epoch 4 |  iter 6141 / 9295 | time 2504[s] | loss 1.81
| epoch 4 |  iter 6161 / 9295 | time 2505[s] | loss 1.80
| epoch 4 |  iter 6181 / 9295 | time 2507[s] | loss 1.79
| epoch 4 |  iter 6201 / 9295 | time 2508[s] | loss 1.81
| epoch 4 |  iter 6221 / 9295 | time 2509[s] | loss 1.81
| epoch 4 |  iter 6241 / 9295 | time 2511[s] | loss 1.85
| epoch 4 |  iter 6261 / 9295 | time 2512[s] | loss 1.82
| epoch 4 |  iter 6281 / 9295 | time 2514[s] | loss 1.83
| epoch 4 |  iter 6301 / 9295 | time 2515[s] | loss 1.85
| epoch 4 |  iter 6321 / 9295 | time 2517[s] | loss 1.79
| epoch 4 |  iter 6341 / 9295 | time 2518[s] | loss 1.81
| epoch 4 |  iter 6361 / 9295 | time 2519[s] | loss 1.78
| epoch 4 |  iter 6381 / 9295 | time 2521[s] | loss 1.85
| epoch 4 |  iter 6401 / 9295 | time 2522[s] | loss 1.84
| epoch 4 |  iter 6421 / 9295 | time 2524[s] | loss 1.83
| epoch 4 |  iter 6441 / 9295 | time 2525[s] | loss 1.78
| epoch 4 |  iter 6461 / 9295 | time 2526[s] | loss 1.84
| epoch 4 |  iter 6481 / 9295 | time 2528[s] | loss 1.81
| epoch 4 |  iter 6501 / 9295 | time 2529[s] | loss 1.81
| epoch 4 |  iter 6521 / 9295 | time 2531[s] | loss 1.81
| epoch 4 |  iter 6541 / 9295 | time 2532[s] | loss 1.83
| epoch 4 |  iter 6561 / 9295 | time 2533[s] | loss 1.81
| epoch 4 |  iter 6581 / 9295 | time 2535[s] | loss 1.78
| epoch 4 |  iter 6601 / 9295 | time 2536[s] | loss 1.82
| epoch 4 |  iter 6621 / 9295 | time 2538[s] | loss 1.81
| epoch 4 |  iter 6641 / 9295 | time 2539[s] | loss 1.82
| epoch 4 |  iter 6661 / 9295 | time 2541[s] | loss 1.82
| epoch 4 |  iter 6681 / 9295 | time 2542[s] | loss 1.81
| epoch 4 |  iter 6701 / 9295 | time 2543[s] | loss 1.83
| epoch 4 |  iter 6721 / 9295 | time 2545[s] | loss 1.82
| epoch 4 |  iter 6741 / 9295 | time 2546[s] | loss 1.83
| epoch 4 |  iter 6761 / 9295 | time 2547[s] | loss 1.77
| epoch 4 |  iter 6781 / 9295 | time 2549[s] | loss 1.81
| epoch 4 |  iter 6801 / 9295 | time 2550[s] | loss 1.78
| epoch 4 |  iter 6821 / 9295 | time 2552[s] | loss 1.81
| epoch 4 |  iter 6841 / 9295 | time 2553[s] | loss 1.81
| epoch 4 |  iter 6861 / 9295 | time 2555[s] | loss 1.81
| epoch 4 |  iter 6881 / 9295 | time 2556[s] | loss 1.78
| epoch 4 |  iter 6901 / 9295 | time 2557[s] | loss 1.83
| epoch 4 |  iter 6921 / 9295 | time 2559[s] | loss 1.81
| epoch 4 |  iter 6941 / 9295 | time 2560[s] | loss 1.83
| epoch 4 |  iter 6961 / 9295 | time 2562[s] | loss 1.80
| epoch 4 |  iter 6981 / 9295 | time 2563[s] | loss 1.75
| epoch 4 |  iter 7001 / 9295 | time 2564[s] | loss 1.81
| epoch 4 |  iter 7021 / 9295 | time 2566[s] | loss 1.83
| epoch 4 |  iter 7041 / 9295 | time 2567[s] | loss 1.82
| epoch 4 |  iter 7061 / 9295 | time 2569[s] | loss 1.80
| epoch 4 |  iter 7081 / 9295 | time 2570[s] | loss 1.82
| epoch 4 |  iter 7101 / 9295 | time 2571[s] | loss 1.84
| epoch 4 |  iter 7121 / 9295 | time 2573[s] | loss 1.80
| epoch 4 |  iter 7141 / 9295 | time 2574[s] | loss 1.80
| epoch 4 |  iter 7161 / 9295 | time 2576[s] | loss 1.81
| epoch 4 |  iter 7181 / 9295 | time 2577[s] | loss 1.79
| epoch 4 |  iter 7201 / 9295 | time 2578[s] | loss 1.82
| epoch 4 |  iter 7221 / 9295 | time 2580[s] | loss 1.83
| epoch 4 |  iter 7241 / 9295 | time 2581[s] | loss 1.81
| epoch 4 |  iter 7261 / 9295 | time 2583[s] | loss 1.78
| epoch 4 |  iter 7281 / 9295 | time 2584[s] | loss 1.79
| epoch 4 |  iter 7301 / 9295 | time 2585[s] | loss 1.79
| epoch 4 |  iter 7321 / 9295 | time 2587[s] | loss 1.78
| epoch 4 |  iter 7341 / 9295 | time 2588[s] | loss 1.80
| epoch 4 |  iter 7361 / 9295 | time 2590[s] | loss 1.80
| epoch 4 |  iter 7381 / 9295 | time 2591[s] | loss 1.83
| epoch 4 |  iter 7401 / 9295 | time 2593[s] | loss 1.75
| epoch 4 |  iter 7421 / 9295 | time 2594[s] | loss 1.79
| epoch 4 |  iter 7441 / 9295 | time 2595[s] | loss 1.80
| epoch 4 |  iter 7461 / 9295 | time 2597[s] | loss 1.81
| epoch 4 |  iter 7481 / 9295 | time 2598[s] | loss 1.79
| epoch 4 |  iter 7501 / 9295 | time 2600[s] | loss 1.78
| epoch 4 |  iter 7521 / 9295 | time 2601[s] | loss 1.82
| epoch 4 |  iter 7541 / 9295 | time 2602[s] | loss 1.81
| epoch 4 |  iter 7561 / 9295 | time 2604[s] | loss 1.83
| epoch 4 |  iter 7581 / 9295 | time 2605[s] | loss 1.81
| epoch 4 |  iter 7601 / 9295 | time 2607[s] | loss 1.79
| epoch 4 |  iter 7621 / 9295 | time 2608[s] | loss 1.85
| epoch 4 |  iter 7641 / 9295 | time 2609[s] | loss 1.81
| epoch 4 |  iter 7661 / 9295 | time 2611[s] | loss 1.80
| epoch 4 |  iter 7681 / 9295 | time 2612[s] | loss 1.82
| epoch 4 |  iter 7701 / 9295 | time 2614[s] | loss 1.80
| epoch 4 |  iter 7721 / 9295 | time 2615[s] | loss 1.81
| epoch 4 |  iter 7741 / 9295 | time 2616[s] | loss 1.76
| epoch 4 |  iter 7761 / 9295 | time 2618[s] | loss 1.81
| epoch 4 |  iter 7781 / 9295 | time 2619[s] | loss 1.78
| epoch 4 |  iter 7801 / 9295 | time 2621[s] | loss 1.76
| epoch 4 |  iter 7821 / 9295 | time 2622[s] | loss 1.78
| epoch 4 |  iter 7841 / 9295 | time 2623[s] | loss 1.85
| epoch 4 |  iter 7861 / 9295 | time 2625[s] | loss 1.76
| epoch 4 |  iter 7881 / 9295 | time 2626[s] | loss 1.76
| epoch 4 |  iter 7901 / 9295 | time 2628[s] | loss 1.86
| epoch 4 |  iter 7921 / 9295 | time 2629[s] | loss 1.82
| epoch 4 |  iter 7941 / 9295 | time 2630[s] | loss 1.79
| epoch 4 |  iter 7961 / 9295 | time 2632[s] | loss 1.78
| epoch 4 |  iter 7981 / 9295 | time 2633[s] | loss 1.76
| epoch 4 |  iter 8001 / 9295 | time 2635[s] | loss 1.80
| epoch 4 |  iter 8021 / 9295 | time 2636[s] | loss 1.82
| epoch 4 |  iter 8041 / 9295 | time 2637[s] | loss 1.81
| epoch 4 |  iter 8061 / 9295 | time 2639[s] | loss 1.81
| epoch 4 |  iter 8081 / 9295 | time 2640[s] | loss 1.79
| epoch 4 |  iter 8101 / 9295 | time 2642[s] | loss 1.81
| epoch 4 |  iter 8121 / 9295 | time 2643[s] | loss 1.78
| epoch 4 |  iter 8141 / 9295 | time 2645[s] | loss 1.84
| epoch 4 |  iter 8161 / 9295 | time 2646[s] | loss 1.82
| epoch 4 |  iter 8181 / 9295 | time 2647[s] | loss 1.79
| epoch 4 |  iter 8201 / 9295 | time 2649[s] | loss 1.79
| epoch 4 |  iter 8221 / 9295 | time 2650[s] | loss 1.79
| epoch 4 |  iter 8241 / 9295 | time 2652[s] | loss 1.76
| epoch 4 |  iter 8261 / 9295 | time 2653[s] | loss 1.85
| epoch 4 |  iter 8281 / 9295 | time 2654[s] | loss 1.79
| epoch 4 |  iter 8301 / 9295 | time 2655[s] | loss 1.83
| epoch 4 |  iter 8321 / 9295 | time 2657[s] | loss 1.80
| epoch 4 |  iter 8341 / 9295 | time 2658[s] | loss 1.79
| epoch 4 |  iter 8361 / 9295 | time 2659[s] | loss 1.81
| epoch 4 |  iter 8381 / 9295 | time 2660[s] | loss 1.78
| epoch 4 |  iter 8401 / 9295 | time 2662[s] | loss 1.77
| epoch 4 |  iter 8421 / 9295 | time 2663[s] | loss 1.80
| epoch 4 |  iter 8441 / 9295 | time 2664[s] | loss 1.80
| epoch 4 |  iter 8461 / 9295 | time 2665[s] | loss 1.77
| epoch 4 |  iter 8481 / 9295 | time 2667[s] | loss 1.82
| epoch 4 |  iter 8501 / 9295 | time 2668[s] | loss 1.75
| epoch 4 |  iter 8521 / 9295 | time 2669[s] | loss 1.80
| epoch 4 |  iter 8541 / 9295 | time 2670[s] | loss 1.74
| epoch 4 |  iter 8561 / 9295 | time 2672[s] | loss 1.80
| epoch 4 |  iter 8581 / 9295 | time 2673[s] | loss 1.80
| epoch 4 |  iter 8601 / 9295 | time 2674[s] | loss 1.82
| epoch 4 |  iter 8621 / 9295 | time 2676[s] | loss 1.82
| epoch 4 |  iter 8641 / 9295 | time 2677[s] | loss 1.79
| epoch 4 |  iter 8661 / 9295 | time 2678[s] | loss 1.74
| epoch 4 |  iter 8681 / 9295 | time 2679[s] | loss 1.77
| epoch 4 |  iter 8701 / 9295 | time 2681[s] | loss 1.78
| epoch 4 |  iter 8721 / 9295 | time 2682[s] | loss 1.81
| epoch 4 |  iter 8741 / 9295 | time 2683[s] | loss 1.78
| epoch 4 |  iter 8761 / 9295 | time 2684[s] | loss 1.77
| epoch 4 |  iter 8781 / 9295 | time 2686[s] | loss 1.79
| epoch 4 |  iter 8801 / 9295 | time 2687[s] | loss 1.81
| epoch 4 |  iter 8821 / 9295 | time 2688[s] | loss 1.74
| epoch 4 |  iter 8841 / 9295 | time 2689[s] | loss 1.77
| epoch 4 |  iter 8861 / 9295 | time 2691[s] | loss 1.82
| epoch 4 |  iter 8881 / 9295 | time 2692[s] | loss 1.81
| epoch 4 |  iter 8901 / 9295 | time 2693[s] | loss 1.80
| epoch 4 |  iter 8921 / 9295 | time 2694[s] | loss 1.82
| epoch 4 |  iter 8941 / 9295 | time 2696[s] | loss 1.84
| epoch 4 |  iter 8961 / 9295 | time 2697[s] | loss 1.82
| epoch 4 |  iter 8981 / 9295 | time 2698[s] | loss 1.78
| epoch 4 |  iter 9001 / 9295 | time 2699[s] | loss 1.76
| epoch 4 |  iter 9021 / 9295 | time 2701[s] | loss 1.79
| epoch 4 |  iter 9041 / 9295 | time 2702[s] | loss 1.79
| epoch 4 |  iter 9061 / 9295 | time 2703[s] | loss 1.77
| epoch 4 |  iter 9081 / 9295 | time 2704[s] | loss 1.76
| epoch 4 |  iter 9101 / 9295 | time 2706[s] | loss 1.79
| epoch 4 |  iter 9121 / 9295 | time 2707[s] | loss 1.84
| epoch 4 |  iter 9141 / 9295 | time 2708[s] | loss 1.79
| epoch 4 |  iter 9161 / 9295 | time 2709[s] | loss 1.78
| epoch 4 |  iter 9181 / 9295 | time 2711[s] | loss 1.79
| epoch 4 |  iter 9201 / 9295 | time 2712[s] | loss 1.79
| epoch 4 |  iter 9221 / 9295 | time 2713[s] | loss 1.80
| epoch 4 |  iter 9241 / 9295 | time 2714[s] | loss 1.80
| epoch 4 |  iter 9261 / 9295 | time 2716[s] | loss 1.79
| epoch 4 |  iter 9281 / 9295 | time 2717[s] | loss 1.78
| epoch 5 |  iter 1 / 9295 | time 2718[s] | loss 1.81
| epoch 5 |  iter 21 / 9295 | time 2719[s] | loss 1.73
| epoch 5 |  iter 41 / 9295 | time 2720[s] | loss 1.72
| epoch 5 |  iter 61 / 9295 | time 2722[s] | loss 1.70
| epoch 5 |  iter 81 / 9295 | time 2723[s] | loss 1.70
| epoch 5 |  iter 101 / 9295 | time 2724[s] | loss 1.72
| epoch 5 |  iter 121 / 9295 | time 2726[s] | loss 1.74
| epoch 5 |  iter 141 / 9295 | time 2727[s] | loss 1.73
| epoch 5 |  iter 161 / 9295 | time 2728[s] | loss 1.74
| epoch 5 |  iter 181 / 9295 | time 2729[s] | loss 1.71
| epoch 5 |  iter 201 / 9295 | time 2731[s] | loss 1.73
| epoch 5 |  iter 221 / 9295 | time 2732[s] | loss 1.73
| epoch 5 |  iter 241 / 9295 | time 2733[s] | loss 1.70
| epoch 5 |  iter 261 / 9295 | time 2734[s] | loss 1.74
| epoch 5 |  iter 281 / 9295 | time 2736[s] | loss 1.69
| epoch 5 |  iter 301 / 9295 | time 2737[s] | loss 1.74
| epoch 5 |  iter 321 / 9295 | time 2738[s] | loss 1.72
| epoch 5 |  iter 341 / 9295 | time 2739[s] | loss 1.70
| epoch 5 |  iter 361 / 9295 | time 2741[s] | loss 1.73
| epoch 5 |  iter 381 / 9295 | time 2742[s] | loss 1.73
| epoch 5 |  iter 401 / 9295 | time 2743[s] | loss 1.74
| epoch 5 |  iter 421 / 9295 | time 2745[s] | loss 1.72
| epoch 5 |  iter 441 / 9295 | time 2746[s] | loss 1.78
| epoch 5 |  iter 461 / 9295 | time 2747[s] | loss 1.70
| epoch 5 |  iter 481 / 9295 | time 2748[s] | loss 1.74
| epoch 5 |  iter 501 / 9295 | time 2750[s] | loss 1.71
| epoch 5 |  iter 521 / 9295 | time 2751[s] | loss 1.73
| epoch 5 |  iter 541 / 9295 | time 2752[s] | loss 1.70
| epoch 5 |  iter 561 / 9295 | time 2753[s] | loss 1.71
| epoch 5 |  iter 581 / 9295 | time 2755[s] | loss 1.73
| epoch 5 |  iter 601 / 9295 | time 2756[s] | loss 1.73
| epoch 5 |  iter 621 / 9295 | time 2757[s] | loss 1.74
| epoch 5 |  iter 641 / 9295 | time 2758[s] | loss 1.74
| epoch 5 |  iter 661 / 9295 | time 2760[s] | loss 1.72
| epoch 5 |  iter 681 / 9295 | time 2761[s] | loss 1.72
| epoch 5 |  iter 701 / 9295 | time 2762[s] | loss 1.75
| epoch 5 |  iter 721 / 9295 | time 2763[s] | loss 1.74
| epoch 5 |  iter 741 / 9295 | time 2765[s] | loss 1.70
| epoch 5 |  iter 761 / 9295 | time 2766[s] | loss 1.73
| epoch 5 |  iter 781 / 9295 | time 2767[s] | loss 1.74
| epoch 5 |  iter 801 / 9295 | time 2768[s] | loss 1.75
| epoch 5 |  iter 821 / 9295 | time 2770[s] | loss 1.73
| epoch 5 |  iter 841 / 9295 | time 2771[s] | loss 1.72
| epoch 5 |  iter 861 / 9295 | time 2772[s] | loss 1.74
| epoch 5 |  iter 881 / 9295 | time 2773[s] | loss 1.74
| epoch 5 |  iter 901 / 9295 | time 2775[s] | loss 1.72
| epoch 5 |  iter 921 / 9295 | time 2776[s] | loss 1.72
| epoch 5 |  iter 941 / 9295 | time 2777[s] | loss 1.74
| epoch 5 |  iter 961 / 9295 | time 2778[s] | loss 1.69
| epoch 5 |  iter 981 / 9295 | time 2780[s] | loss 1.74
| epoch 5 |  iter 1001 / 9295 | time 2781[s] | loss 1.73
| epoch 5 |  iter 1021 / 9295 | time 2782[s] | loss 1.69
| epoch 5 |  iter 1041 / 9295 | time 2784[s] | loss 1.75
| epoch 5 |  iter 1061 / 9295 | time 2785[s] | loss 1.74
| epoch 5 |  iter 1081 / 9295 | time 2786[s] | loss 1.71
| epoch 5 |  iter 1101 / 9295 | time 2787[s] | loss 1.69
| epoch 5 |  iter 1121 / 9295 | time 2789[s] | loss 1.70
| epoch 5 |  iter 1141 / 9295 | time 2790[s] | loss 1.75
| epoch 5 |  iter 1161 / 9295 | time 2791[s] | loss 1.73
| epoch 5 |  iter 1181 / 9295 | time 2792[s] | loss 1.75
| epoch 5 |  iter 1201 / 9295 | time 2794[s] | loss 1.77
| epoch 5 |  iter 1221 / 9295 | time 2795[s] | loss 1.71
| epoch 5 |  iter 1241 / 9295 | time 2796[s] | loss 1.74
| epoch 5 |  iter 1261 / 9295 | time 2797[s] | loss 1.72
| epoch 5 |  iter 1281 / 9295 | time 2799[s] | loss 1.71
| epoch 5 |  iter 1301 / 9295 | time 2800[s] | loss 1.73
| epoch 5 |  iter 1321 / 9295 | time 2801[s] | loss 1.75
| epoch 5 |  iter 1341 / 9295 | time 2802[s] | loss 1.73
| epoch 5 |  iter 1361 / 9295 | time 2804[s] | loss 1.70
| epoch 5 |  iter 1381 / 9295 | time 2805[s] | loss 1.72
| epoch 5 |  iter 1401 / 9295 | time 2806[s] | loss 1.75
| epoch 5 |  iter 1421 / 9295 | time 2807[s] | loss 1.75
| epoch 5 |  iter 1441 / 9295 | time 2809[s] | loss 1.71
| epoch 5 |  iter 1461 / 9295 | time 2810[s] | loss 1.73
| epoch 5 |  iter 1481 / 9295 | time 2811[s] | loss 1.70
| epoch 5 |  iter 1501 / 9295 | time 2812[s] | loss 1.71
| epoch 5 |  iter 1521 / 9295 | time 2814[s] | loss 1.71
| epoch 5 |  iter 1541 / 9295 | time 2815[s] | loss 1.73
| epoch 5 |  iter 1561 / 9295 | time 2816[s] | loss 1.69
| epoch 5 |  iter 1581 / 9295 | time 2817[s] | loss 1.72
| epoch 5 |  iter 1601 / 9295 | time 2819[s] | loss 1.74
| epoch 5 |  iter 1621 / 9295 | time 2820[s] | loss 1.72
| epoch 5 |  iter 1641 / 9295 | time 2821[s] | loss 1.72
| epoch 5 |  iter 1661 / 9295 | time 2822[s] | loss 1.73
| epoch 5 |  iter 1681 / 9295 | time 2824[s] | loss 1.73
| epoch 5 |  iter 1701 / 9295 | time 2825[s] | loss 1.73
| epoch 5 |  iter 1721 / 9295 | time 2826[s] | loss 1.72
| epoch 5 |  iter 1741 / 9295 | time 2827[s] | loss 1.74
| epoch 5 |  iter 1761 / 9295 | time 2829[s] | loss 1.73
| epoch 5 |  iter 1781 / 9295 | time 2830[s] | loss 1.70
| epoch 5 |  iter 1801 / 9295 | time 2831[s] | loss 1.70
| epoch 5 |  iter 1821 / 9295 | time 2832[s] | loss 1.72
| epoch 5 |  iter 1841 / 9295 | time 2834[s] | loss 1.73
| epoch 5 |  iter 1861 / 9295 | time 2835[s] | loss 1.76
| epoch 5 |  iter 1881 / 9295 | time 2836[s] | loss 1.74
| epoch 5 |  iter 1901 / 9295 | time 2837[s] | loss 1.71
| epoch 5 |  iter 1921 / 9295 | time 2839[s] | loss 1.74
| epoch 5 |  iter 1941 / 9295 | time 2840[s] | loss 1.73
| epoch 5 |  iter 1961 / 9295 | time 2841[s] | loss 1.74
| epoch 5 |  iter 1981 / 9295 | time 2842[s] | loss 1.76
| epoch 5 |  iter 2001 / 9295 | time 2844[s] | loss 1.77
| epoch 5 |  iter 2021 / 9295 | time 2845[s] | loss 1.76
| epoch 5 |  iter 2041 / 9295 | time 2846[s] | loss 1.66
| epoch 5 |  iter 2061 / 9295 | time 2847[s] | loss 1.78
| epoch 5 |  iter 2081 / 9295 | time 2849[s] | loss 1.77
| epoch 5 |  iter 2101 / 9295 | time 2850[s] | loss 1.74
| epoch 5 |  iter 2121 / 9295 | time 2851[s] | loss 1.71
| epoch 5 |  iter 2141 / 9295 | time 2852[s] | loss 1.72
| epoch 5 |  iter 2161 / 9295 | time 2854[s] | loss 1.74
| epoch 5 |  iter 2181 / 9295 | time 2855[s] | loss 1.72
| epoch 5 |  iter 2201 / 9295 | time 2856[s] | loss 1.74
| epoch 5 |  iter 2221 / 9295 | time 2858[s] | loss 1.70
| epoch 5 |  iter 2241 / 9295 | time 2859[s] | loss 1.70
| epoch 5 |  iter 2261 / 9295 | time 2860[s] | loss 1.73
| epoch 5 |  iter 2281 / 9295 | time 2861[s] | loss 1.74
| epoch 5 |  iter 2301 / 9295 | time 2863[s] | loss 1.72
| epoch 5 |  iter 2321 / 9295 | time 2864[s] | loss 1.76
| epoch 5 |  iter 2341 / 9295 | time 2865[s] | loss 1.73
| epoch 5 |  iter 2361 / 9295 | time 2866[s] | loss 1.73
| epoch 5 |  iter 2381 / 9295 | time 2868[s] | loss 1.71
| epoch 5 |  iter 2401 / 9295 | time 2869[s] | loss 1.73
| epoch 5 |  iter 2421 / 9295 | time 2870[s] | loss 1.75
| epoch 5 |  iter 2441 / 9295 | time 2871[s] | loss 1.69
| epoch 5 |  iter 2461 / 9295 | time 2873[s] | loss 1.76
| epoch 5 |  iter 2481 / 9295 | time 2874[s] | loss 1.73
| epoch 5 |  iter 2501 / 9295 | time 2875[s] | loss 1.72
| epoch 5 |  iter 2521 / 9295 | time 2876[s] | loss 1.74
| epoch 5 |  iter 2541 / 9295 | time 2878[s] | loss 1.72
| epoch 5 |  iter 2561 / 9295 | time 2879[s] | loss 1.72
| epoch 5 |  iter 2581 / 9295 | time 2880[s] | loss 1.73
| epoch 5 |  iter 2601 / 9295 | time 2881[s] | loss 1.71
| epoch 5 |  iter 2621 / 9295 | time 2883[s] | loss 1.73
| epoch 5 |  iter 2641 / 9295 | time 2884[s] | loss 1.73
| epoch 5 |  iter 2661 / 9295 | time 2885[s] | loss 1.71
| epoch 5 |  iter 2681 / 9295 | time 2886[s] | loss 1.72
| epoch 5 |  iter 2701 / 9295 | time 2888[s] | loss 1.75
| epoch 5 |  iter 2721 / 9295 | time 2889[s] | loss 1.70
| epoch 5 |  iter 2741 / 9295 | time 2890[s] | loss 1.71
| epoch 5 |  iter 2761 / 9295 | time 2891[s] | loss 1.73
| epoch 5 |  iter 2781 / 9295 | time 2893[s] | loss 1.74
| epoch 5 |  iter 2801 / 9295 | time 2894[s] | loss 1.70
| epoch 5 |  iter 2821 / 9295 | time 2895[s] | loss 1.73
| epoch 5 |  iter 2841 / 9295 | time 2897[s] | loss 1.73
| epoch 5 |  iter 2861 / 9295 | time 2898[s] | loss 1.73
| epoch 5 |  iter 2881 / 9295 | time 2899[s] | loss 1.71
| epoch 5 |  iter 2901 / 9295 | time 2900[s] | loss 1.78
| epoch 5 |  iter 2921 / 9295 | time 2902[s] | loss 1.71
| epoch 5 |  iter 2941 / 9295 | time 2903[s] | loss 1.73
| epoch 5 |  iter 2961 / 9295 | time 2904[s] | loss 1.71
| epoch 5 |  iter 2981 / 9295 | time 2905[s] | loss 1.75
| epoch 5 |  iter 3001 / 9295 | time 2907[s] | loss 1.73
| epoch 5 |  iter 3021 / 9295 | time 2908[s] | loss 1.74
| epoch 5 |  iter 3041 / 9295 | time 2909[s] | loss 1.72
| epoch 5 |  iter 3061 / 9295 | time 2910[s] | loss 1.74
| epoch 5 |  iter 3081 / 9295 | time 2912[s] | loss 1.72
| epoch 5 |  iter 3101 / 9295 | time 2913[s] | loss 1.73
| epoch 5 |  iter 3121 / 9295 | time 2914[s] | loss 1.65
| epoch 5 |  iter 3141 / 9295 | time 2915[s] | loss 1.76
| epoch 5 |  iter 3161 / 9295 | time 2917[s] | loss 1.73
| epoch 5 |  iter 3181 / 9295 | time 2918[s] | loss 1.71
| epoch 5 |  iter 3201 / 9295 | time 2919[s] | loss 1.67
| epoch 5 |  iter 3221 / 9295 | time 2921[s] | loss 1.79
| epoch 5 |  iter 3241 / 9295 | time 2922[s] | loss 1.71
| epoch 5 |  iter 3261 / 9295 | time 2923[s] | loss 1.69
| epoch 5 |  iter 3281 / 9295 | time 2924[s] | loss 1.74
| epoch 5 |  iter 3301 / 9295 | time 2926[s] | loss 1.70
| epoch 5 |  iter 3321 / 9295 | time 2927[s] | loss 1.75
| epoch 5 |  iter 3341 / 9295 | time 2928[s] | loss 1.73
| epoch 5 |  iter 3361 / 9295 | time 2929[s] | loss 1.72
| epoch 5 |  iter 3381 / 9295 | time 2931[s] | loss 1.73
| epoch 5 |  iter 3401 / 9295 | time 2932[s] | loss 1.71
| epoch 5 |  iter 3421 / 9295 | time 2933[s] | loss 1.72
| epoch 5 |  iter 3441 / 9295 | time 2934[s] | loss 1.71
| epoch 5 |  iter 3461 / 9295 | time 2936[s] | loss 1.71
| epoch 5 |  iter 3481 / 9295 | time 2937[s] | loss 1.73
| epoch 5 |  iter 3501 / 9295 | time 2938[s] | loss 1.73
| epoch 5 |  iter 3521 / 9295 | time 2940[s] | loss 1.73
| epoch 5 |  iter 3541 / 9295 | time 2941[s] | loss 1.75
| epoch 5 |  iter 3561 / 9295 | time 2942[s] | loss 1.74
| epoch 5 |  iter 3581 / 9295 | time 2943[s] | loss 1.69
| epoch 5 |  iter 3601 / 9295 | time 2945[s] | loss 1.71
| epoch 5 |  iter 3621 / 9295 | time 2946[s] | loss 1.70
| epoch 5 |  iter 3641 / 9295 | time 2947[s] | loss 1.71
| epoch 5 |  iter 3661 / 9295 | time 2949[s] | loss 1.73
| epoch 5 |  iter 3681 / 9295 | time 2950[s] | loss 1.72
| epoch 5 |  iter 3701 / 9295 | time 2952[s] | loss 1.69
| epoch 5 |  iter 3721 / 9295 | time 2953[s] | loss 1.74
| epoch 5 |  iter 3741 / 9295 | time 2955[s] | loss 1.73
| epoch 5 |  iter 3761 / 9295 | time 2956[s] | loss 1.74
| epoch 5 |  iter 3781 / 9295 | time 2958[s] | loss 1.74
| epoch 5 |  iter 3801 / 9295 | time 2959[s] | loss 1.74
| epoch 5 |  iter 3821 / 9295 | time 2961[s] | loss 1.77
| epoch 5 |  iter 3841 / 9295 | time 2962[s] | loss 1.71
| epoch 5 |  iter 3861 / 9295 | time 2964[s] | loss 1.76
| epoch 5 |  iter 3881 / 9295 | time 2965[s] | loss 1.73
| epoch 5 |  iter 3901 / 9295 | time 2967[s] | loss 1.75
| epoch 5 |  iter 3921 / 9295 | time 2968[s] | loss 1.70
| epoch 5 |  iter 3941 / 9295 | time 2970[s] | loss 1.77
| epoch 5 |  iter 3961 / 9295 | time 2971[s] | loss 1.71
| epoch 5 |  iter 3981 / 9295 | time 2973[s] | loss 1.73
| epoch 5 |  iter 4001 / 9295 | time 2974[s] | loss 1.71
| epoch 5 |  iter 4021 / 9295 | time 2976[s] | loss 1.76
| epoch 5 |  iter 4041 / 9295 | time 2977[s] | loss 1.73
| epoch 5 |  iter 4061 / 9295 | time 2978[s] | loss 1.75
| epoch 5 |  iter 4081 / 9295 | time 2980[s] | loss 1.72
| epoch 5 |  iter 4101 / 9295 | time 2981[s] | loss 1.70
| epoch 5 |  iter 4121 / 9295 | time 2983[s] | loss 1.69
| epoch 5 |  iter 4141 / 9295 | time 2984[s] | loss 1.75
| epoch 5 |  iter 4161 / 9295 | time 2986[s] | loss 1.74
| epoch 5 |  iter 4181 / 9295 | time 2987[s] | loss 1.71
| epoch 5 |  iter 4201 / 9295 | time 2989[s] | loss 1.74
| epoch 5 |  iter 4221 / 9295 | time 2990[s] | loss 1.72
| epoch 5 |  iter 4241 / 9295 | time 2992[s] | loss 1.73
| epoch 5 |  iter 4261 / 9295 | time 2993[s] | loss 1.74
| epoch 5 |  iter 4281 / 9295 | time 2995[s] | loss 1.77
| epoch 5 |  iter 4301 / 9295 | time 2996[s] | loss 1.71
| epoch 5 |  iter 4321 / 9295 | time 2998[s] | loss 1.74
| epoch 5 |  iter 4341 / 9295 | time 2999[s] | loss 1.71
| epoch 5 |  iter 4361 / 9295 | time 3001[s] | loss 1.70
| epoch 5 |  iter 4381 / 9295 | time 3003[s] | loss 1.75
| epoch 5 |  iter 4401 / 9295 | time 3004[s] | loss 1.73
| epoch 5 |  iter 4421 / 9295 | time 3005[s] | loss 1.69
| epoch 5 |  iter 4441 / 9295 | time 3007[s] | loss 1.71
| epoch 5 |  iter 4461 / 9295 | time 3008[s] | loss 1.72
| epoch 5 |  iter 4481 / 9295 | time 3010[s] | loss 1.71
| epoch 5 |  iter 4501 / 9295 | time 3011[s] | loss 1.75
| epoch 5 |  iter 4521 / 9295 | time 3013[s] | loss 1.73
| epoch 5 |  iter 4541 / 9295 | time 3014[s] | loss 1.74
| epoch 5 |  iter 4561 / 9295 | time 3016[s] | loss 1.74
| epoch 5 |  iter 4581 / 9295 | time 3017[s] | loss 1.71
| epoch 5 |  iter 4601 / 9295 | time 3018[s] | loss 1.75
| epoch 5 |  iter 4621 / 9295 | time 3020[s] | loss 1.74
| epoch 5 |  iter 4641 / 9295 | time 3021[s] | loss 1.76
| epoch 5 |  iter 4661 / 9295 | time 3023[s] | loss 1.70
| epoch 5 |  iter 4681 / 9295 | time 3024[s] | loss 1.74
| epoch 5 |  iter 4701 / 9295 | time 3026[s] | loss 1.74
| epoch 5 |  iter 4721 / 9295 | time 3027[s] | loss 1.74
| epoch 5 |  iter 4741 / 9295 | time 3028[s] | loss 1.73
| epoch 5 |  iter 4761 / 9295 | time 3030[s] | loss 1.71
| epoch 5 |  iter 4781 / 9295 | time 3031[s] | loss 1.74
| epoch 5 |  iter 4801 / 9295 | time 3033[s] | loss 1.70
| epoch 5 |  iter 4821 / 9295 | time 3034[s] | loss 1.74
| epoch 5 |  iter 4841 / 9295 | time 3036[s] | loss 1.73
| epoch 5 |  iter 4861 / 9295 | time 3037[s] | loss 1.71
| epoch 5 |  iter 4881 / 9295 | time 3038[s] | loss 1.73
| epoch 5 |  iter 4901 / 9295 | time 3040[s] | loss 1.71
| epoch 5 |  iter 4921 / 9295 | time 3041[s] | loss 1.70
| epoch 5 |  iter 4941 / 9295 | time 3043[s] | loss 1.75
| epoch 5 |  iter 4961 / 9295 | time 3044[s] | loss 1.73
| epoch 5 |  iter 4981 / 9295 | time 3046[s] | loss 1.71
| epoch 5 |  iter 5001 / 9295 | time 3047[s] | loss 1.76
| epoch 5 |  iter 5021 / 9295 | time 3049[s] | loss 1.73
| epoch 5 |  iter 5041 / 9295 | time 3050[s] | loss 1.70
| epoch 5 |  iter 5061 / 9295 | time 3051[s] | loss 1.69
| epoch 5 |  iter 5081 / 9295 | time 3053[s] | loss 1.71
| epoch 5 |  iter 5101 / 9295 | time 3054[s] | loss 1.72
| epoch 5 |  iter 5121 / 9295 | time 3056[s] | loss 1.69
| epoch 5 |  iter 5141 / 9295 | time 3057[s] | loss 1.67
| epoch 5 |  iter 5161 / 9295 | time 3059[s] | loss 1.72
| epoch 5 |  iter 5181 / 9295 | time 3060[s] | loss 1.71
| epoch 5 |  iter 5201 / 9295 | time 3061[s] | loss 1.71
| epoch 5 |  iter 5221 / 9295 | time 3063[s] | loss 1.74
| epoch 5 |  iter 5241 / 9295 | time 3064[s] | loss 1.73
| epoch 5 |  iter 5261 / 9295 | time 3066[s] | loss 1.69
| epoch 5 |  iter 5281 / 9295 | time 3067[s] | loss 1.73
| epoch 5 |  iter 5301 / 9295 | time 3069[s] | loss 1.71
| epoch 5 |  iter 5321 / 9295 | time 3070[s] | loss 1.71
| epoch 5 |  iter 5341 / 9295 | time 3072[s] | loss 1.72
| epoch 5 |  iter 5361 / 9295 | time 3073[s] | loss 1.72
| epoch 5 |  iter 5381 / 9295 | time 3074[s] | loss 1.76
| epoch 5 |  iter 5401 / 9295 | time 3076[s] | loss 1.69
| epoch 5 |  iter 5421 / 9295 | time 3077[s] | loss 1.74
| epoch 5 |  iter 5441 / 9295 | time 3079[s] | loss 1.75
| epoch 5 |  iter 5461 / 9295 | time 3080[s] | loss 1.73
| epoch 5 |  iter 5481 / 9295 | time 3082[s] | loss 1.69
| epoch 5 |  iter 5501 / 9295 | time 3083[s] | loss 1.74
| epoch 5 |  iter 5521 / 9295 | time 3084[s] | loss 1.74
| epoch 5 |  iter 5541 / 9295 | time 3086[s] | loss 1.76
| epoch 5 |  iter 5561 / 9295 | time 3087[s] | loss 1.72
| epoch 5 |  iter 5581 / 9295 | time 3089[s] | loss 1.74
| epoch 5 |  iter 5601 / 9295 | time 3090[s] | loss 1.73
| epoch 5 |  iter 5621 / 9295 | time 3092[s] | loss 1.72
| epoch 5 |  iter 5641 / 9295 | time 3093[s] | loss 1.72
| epoch 5 |  iter 5661 / 9295 | time 3095[s] | loss 1.74
| epoch 5 |  iter 5681 / 9295 | time 3096[s] | loss 1.74
| epoch 5 |  iter 5701 / 9295 | time 3097[s] | loss 1.74
| epoch 5 |  iter 5721 / 9295 | time 3099[s] | loss 1.73
| epoch 5 |  iter 5741 / 9295 | time 3100[s] | loss 1.73
| epoch 5 |  iter 5761 / 9295 | time 3102[s] | loss 1.71
| epoch 5 |  iter 5781 / 9295 | time 3103[s] | loss 1.74
| epoch 5 |  iter 5801 / 9295 | time 3105[s] | loss 1.72
| epoch 5 |  iter 5821 / 9295 | time 3106[s] | loss 1.75
| epoch 5 |  iter 5841 / 9295 | time 3107[s] | loss 1.75
| epoch 5 |  iter 5861 / 9295 | time 3109[s] | loss 1.73
| epoch 5 |  iter 5881 / 9295 | time 3110[s] | loss 1.70
| epoch 5 |  iter 5901 / 9295 | time 3112[s] | loss 1.72
| epoch 5 |  iter 5921 / 9295 | time 3113[s] | loss 1.76
| epoch 5 |  iter 5941 / 9295 | time 3114[s] | loss 1.72
| epoch 5 |  iter 5961 / 9295 | time 3116[s] | loss 1.73
| epoch 5 |  iter 5981 / 9295 | time 3117[s] | loss 1.73
| epoch 5 |  iter 6001 / 9295 | time 3119[s] | loss 1.70
| epoch 5 |  iter 6021 / 9295 | time 3120[s] | loss 1.69
| epoch 5 |  iter 6041 / 9295 | time 3121[s] | loss 1.72
| epoch 5 |  iter 6061 / 9295 | time 3123[s] | loss 1.69
| epoch 5 |  iter 6081 / 9295 | time 3124[s] | loss 1.77
| epoch 5 |  iter 6101 / 9295 | time 3126[s] | loss 1.71
| epoch 5 |  iter 6121 / 9295 | time 3127[s] | loss 1.72
| epoch 5 |  iter 6141 / 9295 | time 3129[s] | loss 1.75
| epoch 5 |  iter 6161 / 9295 | time 3130[s] | loss 1.76
| epoch 5 |  iter 6181 / 9295 | time 3131[s] | loss 1.73
| epoch 5 |  iter 6201 / 9295 | time 3133[s] | loss 1.71
| epoch 5 |  iter 6221 / 9295 | time 3134[s] | loss 1.68
| epoch 5 |  iter 6241 / 9295 | time 3136[s] | loss 1.73
| epoch 5 |  iter 6261 / 9295 | time 3137[s] | loss 1.73
| epoch 5 |  iter 6281 / 9295 | time 3139[s] | loss 1.70
| epoch 5 |  iter 6301 / 9295 | time 3140[s] | loss 1.70
| epoch 5 |  iter 6321 / 9295 | time 3141[s] | loss 1.72
| epoch 5 |  iter 6341 / 9295 | time 3143[s] | loss 1.72
| epoch 5 |  iter 6361 / 9295 | time 3144[s] | loss 1.71
| epoch 5 |  iter 6381 / 9295 | time 3146[s] | loss 1.71
| epoch 5 |  iter 6401 / 9295 | time 3147[s] | loss 1.74
| epoch 5 |  iter 6421 / 9295 | time 3149[s] | loss 1.73
| epoch 5 |  iter 6441 / 9295 | time 3150[s] | loss 1.72
| epoch 5 |  iter 6461 / 9295 | time 3152[s] | loss 1.73
| epoch 5 |  iter 6481 / 9295 | time 3153[s] | loss 1.75
| epoch 5 |  iter 6501 / 9295 | time 3154[s] | loss 1.70
| epoch 5 |  iter 6521 / 9295 | time 3156[s] | loss 1.73
| epoch 5 |  iter 6541 / 9295 | time 3157[s] | loss 1.72
| epoch 5 |  iter 6561 / 9295 | time 3159[s] | loss 1.70
| epoch 5 |  iter 6581 / 9295 | time 3160[s] | loss 1.74
| epoch 5 |  iter 6601 / 9295 | time 3162[s] | loss 1.74
| epoch 5 |  iter 6621 / 9295 | time 3163[s] | loss 1.72
| epoch 5 |  iter 6641 / 9295 | time 3165[s] | loss 1.71
| epoch 5 |  iter 6661 / 9295 | time 3166[s] | loss 1.69
| epoch 5 |  iter 6681 / 9295 | time 3167[s] | loss 1.72
| epoch 5 |  iter 6701 / 9295 | time 3169[s] | loss 1.71
| epoch 5 |  iter 6721 / 9295 | time 3170[s] | loss 1.76
| epoch 5 |  iter 6741 / 9295 | time 3172[s] | loss 1.73
| epoch 5 |  iter 6761 / 9295 | time 3173[s] | loss 1.72
| epoch 5 |  iter 6781 / 9295 | time 3175[s] | loss 1.75
| epoch 5 |  iter 6801 / 9295 | time 3176[s] | loss 1.73
| epoch 5 |  iter 6821 / 9295 | time 3177[s] | loss 1.74
| epoch 5 |  iter 6841 / 9295 | time 3179[s] | loss 1.74
| epoch 5 |  iter 6861 / 9295 | time 3180[s] | loss 1.71
| epoch 5 |  iter 6881 / 9295 | time 3182[s] | loss 1.71
| epoch 5 |  iter 6901 / 9295 | time 3183[s] | loss 1.71
| epoch 5 |  iter 6921 / 9295 | time 3184[s] | loss 1.69
| epoch 5 |  iter 6941 / 9295 | time 3186[s] | loss 1.79
| epoch 5 |  iter 6961 / 9295 | time 3187[s] | loss 1.73
| epoch 5 |  iter 6981 / 9295 | time 3189[s] | loss 1.73
| epoch 5 |  iter 7001 / 9295 | time 3190[s] | loss 1.75
| epoch 5 |  iter 7021 / 9295 | time 3192[s] | loss 1.72
| epoch 5 |  iter 7041 / 9295 | time 3193[s] | loss 1.76
| epoch 5 |  iter 7061 / 9295 | time 3195[s] | loss 1.72
| epoch 5 |  iter 7081 / 9295 | time 3196[s] | loss 1.70
| epoch 5 |  iter 7101 / 9295 | time 3197[s] | loss 1.74
| epoch 5 |  iter 7121 / 9295 | time 3199[s] | loss 1.76
| epoch 5 |  iter 7141 / 9295 | time 3200[s] | loss 1.75
| epoch 5 |  iter 7161 / 9295 | time 3202[s] | loss 1.75
| epoch 5 |  iter 7181 / 9295 | time 3203[s] | loss 1.73
| epoch 5 |  iter 7201 / 9295 | time 3205[s] | loss 1.72
| epoch 5 |  iter 7221 / 9295 | time 3206[s] | loss 1.76
| epoch 5 |  iter 7241 / 9295 | time 3208[s] | loss 1.75
| epoch 5 |  iter 7261 / 9295 | time 3209[s] | loss 1.69
| epoch 5 |  iter 7281 / 9295 | time 3210[s] | loss 1.72
| epoch 5 |  iter 7301 / 9295 | time 3212[s] | loss 1.72
| epoch 5 |  iter 7321 / 9295 | time 3213[s] | loss 1.70
| epoch 5 |  iter 7341 / 9295 | time 3215[s] | loss 1.70
| epoch 5 |  iter 7361 / 9295 | time 3216[s] | loss 1.75
| epoch 5 |  iter 7381 / 9295 | time 3218[s] | loss 1.72
| epoch 5 |  iter 7401 / 9295 | time 3219[s] | loss 1.71
| epoch 5 |  iter 7421 / 9295 | time 3220[s] | loss 1.71
| epoch 5 |  iter 7441 / 9295 | time 3222[s] | loss 1.73
| epoch 5 |  iter 7461 / 9295 | time 3223[s] | loss 1.71
| epoch 5 |  iter 7481 / 9295 | time 3225[s] | loss 1.72
| epoch 5 |  iter 7501 / 9295 | time 3226[s] | loss 1.71
| epoch 5 |  iter 7521 / 9295 | time 3228[s] | loss 1.72
| epoch 5 |  iter 7541 / 9295 | time 3230[s] | loss 1.69
| epoch 5 |  iter 7561 / 9295 | time 3232[s] | loss 1.75
| epoch 5 |  iter 7581 / 9295 | time 3233[s] | loss 1.73
| epoch 5 |  iter 7601 / 9295 | time 3235[s] | loss 1.72
| epoch 5 |  iter 7621 / 9295 | time 3237[s] | loss 1.69
| epoch 5 |  iter 7641 / 9295 | time 3239[s] | loss 1.75
| epoch 5 |  iter 7661 / 9295 | time 3241[s] | loss 1.73
| epoch 5 |  iter 7681 / 9295 | time 3243[s] | loss 1.70
| epoch 5 |  iter 7701 / 9295 | time 3244[s] | loss 1.75
| epoch 5 |  iter 7721 / 9295 | time 3246[s] | loss 1.77
| epoch 5 |  iter 7741 / 9295 | time 3247[s] | loss 1.72
| epoch 5 |  iter 7761 / 9295 | time 3249[s] | loss 1.70
| epoch 5 |  iter 7781 / 9295 | time 3250[s] | loss 1.72
| epoch 5 |  iter 7801 / 9295 | time 3252[s] | loss 1.73
| epoch 5 |  iter 7821 / 9295 | time 3253[s] | loss 1.77
| epoch 5 |  iter 7841 / 9295 | time 3255[s] | loss 1.71
| epoch 5 |  iter 7861 / 9295 | time 3257[s] | loss 1.73
| epoch 5 |  iter 7881 / 9295 | time 3258[s] | loss 1.73
| epoch 5 |  iter 7901 / 9295 | time 3260[s] | loss 1.75
| epoch 5 |  iter 7921 / 9295 | time 3261[s] | loss 1.77
| epoch 5 |  iter 7941 / 9295 | time 3263[s] | loss 1.68
| epoch 5 |  iter 7961 / 9295 | time 3265[s] | loss 1.72
| epoch 5 |  iter 7981 / 9295 | time 3266[s] | loss 1.74
| epoch 5 |  iter 8001 / 9295 | time 3268[s] | loss 1.67
| epoch 5 |  iter 8021 / 9295 | time 3270[s] | loss 1.77
| epoch 5 |  iter 8041 / 9295 | time 3271[s] | loss 1.73
| epoch 5 |  iter 8061 / 9295 | time 3272[s] | loss 1.71
| epoch 5 |  iter 8081 / 9295 | time 3274[s] | loss 1.69
| epoch 5 |  iter 8101 / 9295 | time 3275[s] | loss 1.72
| epoch 5 |  iter 8121 / 9295 | time 3276[s] | loss 1.77
| epoch 5 |  iter 8141 / 9295 | time 3278[s] | loss 1.71
| epoch 5 |  iter 8161 / 9295 | time 3279[s] | loss 1.71
| epoch 5 |  iter 8181 / 9295 | time 3280[s] | loss 1.67
| epoch 5 |  iter 8201 / 9295 | time 3282[s] | loss 1.74
| epoch 5 |  iter 8221 / 9295 | time 3283[s] | loss 1.72
| epoch 5 |  iter 8241 / 9295 | time 3284[s] | loss 1.74
| epoch 5 |  iter 8261 / 9295 | time 3285[s] | loss 1.71
| epoch 5 |  iter 8281 / 9295 | time 3287[s] | loss 1.72
| epoch 5 |  iter 8301 / 9295 | time 3288[s] | loss 1.76
| epoch 5 |  iter 8321 / 9295 | time 3290[s] | loss 1.73
| epoch 5 |  iter 8341 / 9295 | time 3291[s] | loss 1.72
| epoch 5 |  iter 8361 / 9295 | time 3292[s] | loss 1.72
| epoch 5 |  iter 8381 / 9295 | time 3294[s] | loss 1.71
| epoch 5 |  iter 8401 / 9295 | time 3295[s] | loss 1.70
| epoch 5 |  iter 8421 / 9295 | time 3297[s] | loss 1.71
| epoch 5 |  iter 8441 / 9295 | time 3298[s] | loss 1.70
| epoch 5 |  iter 8461 / 9295 | time 3300[s] | loss 1.73
| epoch 5 |  iter 8481 / 9295 | time 3301[s] | loss 1.72
| epoch 5 |  iter 8501 / 9295 | time 3302[s] | loss 1.70
| epoch 5 |  iter 8521 / 9295 | time 3304[s] | loss 1.75
| epoch 5 |  iter 8541 / 9295 | time 3305[s] | loss 1.70
| epoch 5 |  iter 8561 / 9295 | time 3307[s] | loss 1.72
| epoch 5 |  iter 8581 / 9295 | time 3308[s] | loss 1.72
| epoch 5 |  iter 8601 / 9295 | time 3309[s] | loss 1.72
| epoch 5 |  iter 8621 / 9295 | time 3311[s] | loss 1.71
| epoch 5 |  iter 8641 / 9295 | time 3312[s] | loss 1.75
| epoch 5 |  iter 8661 / 9295 | time 3314[s] | loss 1.74
| epoch 5 |  iter 8681 / 9295 | time 3315[s] | loss 1.75
| epoch 5 |  iter 8701 / 9295 | time 3317[s] | loss 1.71
| epoch 5 |  iter 8721 / 9295 | time 3318[s] | loss 1.73
| epoch 5 |  iter 8741 / 9295 | time 3319[s] | loss 1.68
| epoch 5 |  iter 8761 / 9295 | time 3321[s] | loss 1.68
| epoch 5 |  iter 8781 / 9295 | time 3323[s] | loss 1.76
| epoch 5 |  iter 8801 / 9295 | time 3324[s] | loss 1.73
| epoch 5 |  iter 8821 / 9295 | time 3326[s] | loss 1.70
| epoch 5 |  iter 8841 / 9295 | time 3327[s] | loss 1.74
| epoch 5 |  iter 8861 / 9295 | time 3328[s] | loss 1.75
| epoch 5 |  iter 8881 / 9295 | time 3330[s] | loss 1.68
| epoch 5 |  iter 8901 / 9295 | time 3331[s] | loss 1.71
| epoch 5 |  iter 8921 / 9295 | time 3333[s] | loss 1.71
| epoch 5 |  iter 8941 / 9295 | time 3334[s] | loss 1.73
| epoch 5 |  iter 8961 / 9295 | time 3336[s] | loss 1.73
| epoch 5 |  iter 8981 / 9295 | time 3337[s] | loss 1.72
| epoch 5 |  iter 9001 / 9295 | time 3338[s] | loss 1.69
| epoch 5 |  iter 9021 / 9295 | time 3340[s] | loss 1.76
| epoch 5 |  iter 9041 / 9295 | time 3341[s] | loss 1.71
| epoch 5 |  iter 9061 / 9295 | time 3343[s] | loss 1.67
| epoch 5 |  iter 9081 / 9295 | time 3344[s] | loss 1.74
| epoch 5 |  iter 9101 / 9295 | time 3346[s] | loss 1.70
| epoch 5 |  iter 9121 / 9295 | time 3347[s] | loss 1.69
| epoch 5 |  iter 9141 / 9295 | time 3348[s] | loss 1.72
| epoch 5 |  iter 9161 / 9295 | time 3350[s] | loss 1.73
| epoch 5 |  iter 9181 / 9295 | time 3351[s] | loss 1.69
| epoch 5 |  iter 9201 / 9295 | time 3353[s] | loss 1.73
| epoch 5 |  iter 9221 / 9295 | time 3355[s] | loss 1.74
| epoch 5 |  iter 9241 / 9295 | time 3356[s] | loss 1.72
| epoch 5 |  iter 9261 / 9295 | time 3358[s] | loss 1.74
| epoch 5 |  iter 9281 / 9295 | time 3359[s] | loss 1.72
| epoch 6 |  iter 1 / 9295 | time 3360[s] | loss 1.73
| epoch 6 |  iter 21 / 9295 | time 3362[s] | loss 1.64
| epoch 6 |  iter 41 / 9295 | time 3363[s] | loss 1.64
| epoch 6 |  iter 61 / 9295 | time 3365[s] | loss 1.68
| epoch 6 |  iter 81 / 9295 | time 3366[s] | loss 1.65
| epoch 6 |  iter 101 / 9295 | time 3368[s] | loss 1.63
| epoch 6 |  iter 121 / 9295 | time 3369[s] | loss 1.63
| epoch 6 |  iter 141 / 9295 | time 3370[s] | loss 1.68
| epoch 6 |  iter 161 / 9295 | time 3372[s] | loss 1.66
| epoch 6 |  iter 181 / 9295 | time 3373[s] | loss 1.63
| epoch 6 |  iter 201 / 9295 | time 3375[s] | loss 1.62
| epoch 6 |  iter 221 / 9295 | time 3376[s] | loss 1.66
| epoch 6 |  iter 241 / 9295 | time 3378[s] | loss 1.62
| epoch 6 |  iter 261 / 9295 | time 3379[s] | loss 1.64
| epoch 6 |  iter 281 / 9295 | time 3381[s] | loss 1.63
| epoch 6 |  iter 301 / 9295 | time 3382[s] | loss 1.67
| epoch 6 |  iter 321 / 9295 | time 3384[s] | loss 1.69
| epoch 6 |  iter 341 / 9295 | time 3386[s] | loss 1.66
| epoch 6 |  iter 361 / 9295 | time 3387[s] | loss 1.61
| epoch 6 |  iter 381 / 9295 | time 3389[s] | loss 1.62
| epoch 6 |  iter 401 / 9295 | time 3391[s] | loss 1.68
| epoch 6 |  iter 421 / 9295 | time 3393[s] | loss 1.65
| epoch 6 |  iter 441 / 9295 | time 3395[s] | loss 1.62
| epoch 6 |  iter 461 / 9295 | time 3396[s] | loss 1.66
| epoch 6 |  iter 481 / 9295 | time 3398[s] | loss 1.66
| epoch 6 |  iter 501 / 9295 | time 3400[s] | loss 1.66
| epoch 6 |  iter 521 / 9295 | time 3402[s] | loss 1.65
| epoch 6 |  iter 541 / 9295 | time 3404[s] | loss 1.64
| epoch 6 |  iter 561 / 9295 | time 3405[s] | loss 1.62
| epoch 6 |  iter 581 / 9295 | time 3407[s] | loss 1.59
| epoch 6 |  iter 601 / 9295 | time 3409[s] | loss 1.66
| epoch 6 |  iter 621 / 9295 | time 3411[s] | loss 1.62
| epoch 6 |  iter 641 / 9295 | time 3413[s] | loss 1.66
| epoch 6 |  iter 661 / 9295 | time 3414[s] | loss 1.68
| epoch 6 |  iter 681 / 9295 | time 3416[s] | loss 1.65
| epoch 6 |  iter 701 / 9295 | time 3418[s] | loss 1.64
| epoch 6 |  iter 721 / 9295 | time 3420[s] | loss 1.66
| epoch 6 |  iter 741 / 9295 | time 3421[s] | loss 1.61
| epoch 6 |  iter 761 / 9295 | time 3423[s] | loss 1.63
| epoch 6 |  iter 781 / 9295 | time 3425[s] | loss 1.66
| epoch 6 |  iter 801 / 9295 | time 3427[s] | loss 1.71
| epoch 6 |  iter 821 / 9295 | time 3429[s] | loss 1.65
| epoch 6 |  iter 841 / 9295 | time 3430[s] | loss 1.63
| epoch 6 |  iter 861 / 9295 | time 3432[s] | loss 1.67
| epoch 6 |  iter 881 / 9295 | time 3434[s] | loss 1.67
| epoch 6 |  iter 901 / 9295 | time 3436[s] | loss 1.63
| epoch 6 |  iter 921 / 9295 | time 3438[s] | loss 1.64
| epoch 6 |  iter 941 / 9295 | time 3439[s] | loss 1.60
| epoch 6 |  iter 961 / 9295 | time 3441[s] | loss 1.64
| epoch 6 |  iter 981 / 9295 | time 3442[s] | loss 1.66
| epoch 6 |  iter 1001 / 9295 | time 3443[s] | loss 1.68
| epoch 6 |  iter 1021 / 9295 | time 3445[s] | loss 1.63
| epoch 6 |  iter 1041 / 9295 | time 3446[s] | loss 1.64
| epoch 6 |  iter 1061 / 9295 | time 3448[s] | loss 1.62
| epoch 6 |  iter 1081 / 9295 | time 3449[s] | loss 1.62
| epoch 6 |  iter 1101 / 9295 | time 3451[s] | loss 1.65
| epoch 6 |  iter 1121 / 9295 | time 3452[s] | loss 1.67
| epoch 6 |  iter 1141 / 9295 | time 3454[s] | loss 1.65
| epoch 6 |  iter 1161 / 9295 | time 3455[s] | loss 1.64
| epoch 6 |  iter 1181 / 9295 | time 3457[s] | loss 1.67
| epoch 6 |  iter 1201 / 9295 | time 3458[s] | loss 1.64
| epoch 6 |  iter 1221 / 9295 | time 3460[s] | loss 1.62
| epoch 6 |  iter 1241 / 9295 | time 3461[s] | loss 1.60
| epoch 6 |  iter 1261 / 9295 | time 3462[s] | loss 1.62
| epoch 6 |  iter 1281 / 9295 | time 3464[s] | loss 1.68
| epoch 6 |  iter 1301 / 9295 | time 3465[s] | loss 1.64
| epoch 6 |  iter 1321 / 9295 | time 3467[s] | loss 1.63
| epoch 6 |  iter 1341 / 9295 | time 3468[s] | loss 1.61
| epoch 6 |  iter 1361 / 9295 | time 3470[s] | loss 1.65
| epoch 6 |  iter 1381 / 9295 | time 3471[s] | loss 1.60
| epoch 6 |  iter 1401 / 9295 | time 3473[s] | loss 1.66
| epoch 6 |  iter 1421 / 9295 | time 3474[s] | loss 1.65
| epoch 6 |  iter 1441 / 9295 | time 3476[s] | loss 1.68
| epoch 6 |  iter 1461 / 9295 | time 3477[s] | loss 1.63
| epoch 6 |  iter 1481 / 9295 | time 3478[s] | loss 1.65
| epoch 6 |  iter 1501 / 9295 | time 3480[s] | loss 1.66
| epoch 6 |  iter 1521 / 9295 | time 3481[s] | loss 1.65
| epoch 6 |  iter 1541 / 9295 | time 3483[s] | loss 1.64
| epoch 6 |  iter 1561 / 9295 | time 3484[s] | loss 1.68
| epoch 6 |  iter 1581 / 9295 | time 3485[s] | loss 1.66
| epoch 6 |  iter 1601 / 9295 | time 3487[s] | loss 1.62
| epoch 6 |  iter 1621 / 9295 | time 3488[s] | loss 1.67
| epoch 6 |  iter 1641 / 9295 | time 3490[s] | loss 1.64
| epoch 6 |  iter 1661 / 9295 | time 3491[s] | loss 1.65
| epoch 6 |  iter 1681 / 9295 | time 3493[s] | loss 1.62
| epoch 6 |  iter 1701 / 9295 | time 3494[s] | loss 1.67
| epoch 6 |  iter 1721 / 9295 | time 3496[s] | loss 1.63
| epoch 6 |  iter 1741 / 9295 | time 3497[s] | loss 1.65
| epoch 6 |  iter 1761 / 9295 | time 3498[s] | loss 1.66
| epoch 6 |  iter 1781 / 9295 | time 3500[s] | loss 1.65
| epoch 6 |  iter 1801 / 9295 | time 3501[s] | loss 1.64
| epoch 6 |  iter 1821 / 9295 | time 3503[s] | loss 1.69
| epoch 6 |  iter 1841 / 9295 | time 3504[s] | loss 1.63
| epoch 6 |  iter 1861 / 9295 | time 3506[s] | loss 1.64
| epoch 6 |  iter 1881 / 9295 | time 3507[s] | loss 1.61
| epoch 6 |  iter 1901 / 9295 | time 3509[s] | loss 1.65
| epoch 6 |  iter 1921 / 9295 | time 3511[s] | loss 1.64
| epoch 6 |  iter 1941 / 9295 | time 3512[s] | loss 1.66
| epoch 6 |  iter 1961 / 9295 | time 3514[s] | loss 1.66
| epoch 6 |  iter 1981 / 9295 | time 3515[s] | loss 1.64
| epoch 6 |  iter 2001 / 9295 | time 3516[s] | loss 1.66
| epoch 6 |  iter 2021 / 9295 | time 3518[s] | loss 1.61
| epoch 6 |  iter 2041 / 9295 | time 3519[s] | loss 1.60
| epoch 6 |  iter 2061 / 9295 | time 3520[s] | loss 1.67
| epoch 6 |  iter 2081 / 9295 | time 3522[s] | loss 1.66
| epoch 6 |  iter 2101 / 9295 | time 3523[s] | loss 1.64
| epoch 6 |  iter 2121 / 9295 | time 3524[s] | loss 1.67
| epoch 6 |  iter 2141 / 9295 | time 3526[s] | loss 1.66
| epoch 6 |  iter 2161 / 9295 | time 3528[s] | loss 1.62
| epoch 6 |  iter 2181 / 9295 | time 3529[s] | loss 1.65
| epoch 6 |  iter 2201 / 9295 | time 3530[s] | loss 1.65
| epoch 6 |  iter 2221 / 9295 | time 3532[s] | loss 1.66
| epoch 6 |  iter 2241 / 9295 | time 3533[s] | loss 1.65
| epoch 6 |  iter 2261 / 9295 | time 3535[s] | loss 1.63
| epoch 6 |  iter 2281 / 9295 | time 3536[s] | loss 1.66
| epoch 6 |  iter 2301 / 9295 | time 3537[s] | loss 1.64
| epoch 6 |  iter 2321 / 9295 | time 3538[s] | loss 1.64
| epoch 6 |  iter 2341 / 9295 | time 3540[s] | loss 1.64
| epoch 6 |  iter 2361 / 9295 | time 3541[s] | loss 1.65
| epoch 6 |  iter 2381 / 9295 | time 3542[s] | loss 1.65
| epoch 6 |  iter 2401 / 9295 | time 3544[s] | loss 1.63
| epoch 6 |  iter 2421 / 9295 | time 3545[s] | loss 1.67
| epoch 6 |  iter 2441 / 9295 | time 3546[s] | loss 1.70
| epoch 6 |  iter 2461 / 9295 | time 3548[s] | loss 1.63
| epoch 6 |  iter 2481 / 9295 | time 3549[s] | loss 1.68
| epoch 6 |  iter 2501 / 9295 | time 3550[s] | loss 1.65
| epoch 6 |  iter 2521 / 9295 | time 3551[s] | loss 1.66
| epoch 6 |  iter 2541 / 9295 | time 3553[s] | loss 1.64
| epoch 6 |  iter 2561 / 9295 | time 3554[s] | loss 1.62
| epoch 6 |  iter 2581 / 9295 | time 3555[s] | loss 1.70
| epoch 6 |  iter 2601 / 9295 | time 3557[s] | loss 1.63
| epoch 6 |  iter 2621 / 9295 | time 3558[s] | loss 1.69
| epoch 6 |  iter 2641 / 9295 | time 3559[s] | loss 1.65
| epoch 6 |  iter 2661 / 9295 | time 3561[s] | loss 1.66
| epoch 6 |  iter 2681 / 9295 | time 3562[s] | loss 1.67
| epoch 6 |  iter 2701 / 9295 | time 3564[s] | loss 1.62
| epoch 6 |  iter 2721 / 9295 | time 3565[s] | loss 1.67
| epoch 6 |  iter 2741 / 9295 | time 3567[s] | loss 1.68
| epoch 6 |  iter 2761 / 9295 | time 3569[s] | loss 1.65
| epoch 6 |  iter 2781 / 9295 | time 3570[s] | loss 1.65
| epoch 6 |  iter 2801 / 9295 | time 3571[s] | loss 1.69
| epoch 6 |  iter 2821 / 9295 | time 3573[s] | loss 1.63
| epoch 6 |  iter 2841 / 9295 | time 3574[s] | loss 1.66
| epoch 6 |  iter 2861 / 9295 | time 3576[s] | loss 1.64
| epoch 6 |  iter 2881 / 9295 | time 3577[s] | loss 1.67
| epoch 6 |  iter 2901 / 9295 | time 3579[s] | loss 1.64
| epoch 6 |  iter 2921 / 9295 | time 3580[s] | loss 1.64
| epoch 6 |  iter 2941 / 9295 | time 3582[s] | loss 1.61
| epoch 6 |  iter 2961 / 9295 | time 3583[s] | loss 1.69
| epoch 6 |  iter 2981 / 9295 | time 3585[s] | loss 1.66
| epoch 6 |  iter 3001 / 9295 | time 3586[s] | loss 1.65
| epoch 6 |  iter 3021 / 9295 | time 3588[s] | loss 1.64
| epoch 6 |  iter 3041 / 9295 | time 3589[s] | loss 1.67
| epoch 6 |  iter 3061 / 9295 | time 3591[s] | loss 1.68
| epoch 6 |  iter 3081 / 9295 | time 3592[s] | loss 1.66
| epoch 6 |  iter 3101 / 9295 | time 3594[s] | loss 1.64
| epoch 6 |  iter 3121 / 9295 | time 3595[s] | loss 1.63
| epoch 6 |  iter 3141 / 9295 | time 3597[s] | loss 1.68
| epoch 6 |  iter 3161 / 9295 | time 3598[s] | loss 1.67
| epoch 6 |  iter 3181 / 9295 | time 3600[s] | loss 1.67
| epoch 6 |  iter 3201 / 9295 | time 3601[s] | loss 1.65
| epoch 6 |  iter 3221 / 9295 | time 3602[s] | loss 1.68
| epoch 6 |  iter 3241 / 9295 | time 3604[s] | loss 1.63
| epoch 6 |  iter 3261 / 9295 | time 3605[s] | loss 1.64
| epoch 6 |  iter 3281 / 9295 | time 3607[s] | loss 1.64
| epoch 6 |  iter 3301 / 9295 | time 3608[s] | loss 1.70
| epoch 6 |  iter 3321 / 9295 | time 3609[s] | loss 1.65
| epoch 6 |  iter 3341 / 9295 | time 3611[s] | loss 1.65
| epoch 6 |  iter 3361 / 9295 | time 3612[s] | loss 1.66
| epoch 6 |  iter 3381 / 9295 | time 3614[s] | loss 1.69
| epoch 6 |  iter 3401 / 9295 | time 3615[s] | loss 1.64
| epoch 6 |  iter 3421 / 9295 | time 3616[s] | loss 1.65
| epoch 6 |  iter 3441 / 9295 | time 3618[s] | loss 1.63
| epoch 6 |  iter 3461 / 9295 | time 3619[s] | loss 1.70
| epoch 6 |  iter 3481 / 9295 | time 3621[s] | loss 1.68
| epoch 6 |  iter 3501 / 9295 | time 3622[s] | loss 1.66
| epoch 6 |  iter 3521 / 9295 | time 3624[s] | loss 1.67
| epoch 6 |  iter 3541 / 9295 | time 3625[s] | loss 1.64
| epoch 6 |  iter 3561 / 9295 | time 3627[s] | loss 1.65
| epoch 6 |  iter 3581 / 9295 | time 3628[s] | loss 1.67
| epoch 6 |  iter 3601 / 9295 | time 3630[s] | loss 1.61
| epoch 6 |  iter 3621 / 9295 | time 3631[s] | loss 1.66
| epoch 6 |  iter 3641 / 9295 | time 3633[s] | loss 1.64
| epoch 6 |  iter 3661 / 9295 | time 3634[s] | loss 1.66
| epoch 6 |  iter 3681 / 9295 | time 3636[s] | loss 1.64
| epoch 6 |  iter 3701 / 9295 | time 3637[s] | loss 1.66
| epoch 6 |  iter 3721 / 9295 | time 3638[s] | loss 1.65
| epoch 6 |  iter 3741 / 9295 | time 3640[s] | loss 1.65
| epoch 6 |  iter 3761 / 9295 | time 3641[s] | loss 1.68
| epoch 6 |  iter 3781 / 9295 | time 3643[s] | loss 1.67
| epoch 6 |  iter 3801 / 9295 | time 3644[s] | loss 1.69
| epoch 6 |  iter 3821 / 9295 | time 3646[s] | loss 1.65
| epoch 6 |  iter 3841 / 9295 | time 3647[s] | loss 1.63
| epoch 6 |  iter 3861 / 9295 | time 3649[s] | loss 1.67
| epoch 6 |  iter 3881 / 9295 | time 3650[s] | loss 1.66
| epoch 6 |  iter 3901 / 9295 | time 3652[s] | loss 1.65
| epoch 6 |  iter 3921 / 9295 | time 3653[s] | loss 1.65
| epoch 6 |  iter 3941 / 9295 | time 3655[s] | loss 1.66
| epoch 6 |  iter 3961 / 9295 | time 3656[s] | loss 1.67
| epoch 6 |  iter 3981 / 9295 | time 3658[s] | loss 1.68
| epoch 6 |  iter 4001 / 9295 | time 3659[s] | loss 1.63
| epoch 6 |  iter 4021 / 9295 | time 3660[s] | loss 1.65
| epoch 6 |  iter 4041 / 9295 | time 3662[s] | loss 1.65
| epoch 6 |  iter 4061 / 9295 | time 3663[s] | loss 1.63
| epoch 6 |  iter 4081 / 9295 | time 3665[s] | loss 1.67
| epoch 6 |  iter 4101 / 9295 | time 3666[s] | loss 1.67
| epoch 6 |  iter 4121 / 9295 | time 3668[s] | loss 1.66
| epoch 6 |  iter 4141 / 9295 | time 3669[s] | loss 1.63
| epoch 6 |  iter 4161 / 9295 | time 3670[s] | loss 1.64
| epoch 6 |  iter 4181 / 9295 | time 3672[s] | loss 1.68
| epoch 6 |  iter 4201 / 9295 | time 3673[s] | loss 1.65
| epoch 6 |  iter 4221 / 9295 | time 3675[s] | loss 1.64
| epoch 6 |  iter 4241 / 9295 | time 3676[s] | loss 1.63
| epoch 6 |  iter 4261 / 9295 | time 3678[s] | loss 1.70
| epoch 6 |  iter 4281 / 9295 | time 3679[s] | loss 1.64
| epoch 6 |  iter 4301 / 9295 | time 3681[s] | loss 1.66
| epoch 6 |  iter 4321 / 9295 | time 3682[s] | loss 1.65
| epoch 6 |  iter 4341 / 9295 | time 3683[s] | loss 1.66
| epoch 6 |  iter 4361 / 9295 | time 3685[s] | loss 1.64
| epoch 6 |  iter 4381 / 9295 | time 3686[s] | loss 1.67
| epoch 6 |  iter 4401 / 9295 | time 3688[s] | loss 1.68
| epoch 6 |  iter 4421 / 9295 | time 3689[s] | loss 1.67
| epoch 6 |  iter 4441 / 9295 | time 3691[s] | loss 1.63
| epoch 6 |  iter 4461 / 9295 | time 3692[s] | loss 1.67
| epoch 6 |  iter 4481 / 9295 | time 3694[s] | loss 1.68
| epoch 6 |  iter 4501 / 9295 | time 3695[s] | loss 1.67
| epoch 6 |  iter 4521 / 9295 | time 3697[s] | loss 1.69
| epoch 6 |  iter 4541 / 9295 | time 3698[s] | loss 1.60
| epoch 6 |  iter 4561 / 9295 | time 3699[s] | loss 1.67
| epoch 6 |  iter 4581 / 9295 | time 3701[s] | loss 1.68
| epoch 6 |  iter 4601 / 9295 | time 3702[s] | loss 1.64
| epoch 6 |  iter 4621 / 9295 | time 3704[s] | loss 1.63
| epoch 6 |  iter 4641 / 9295 | time 3705[s] | loss 1.63
| epoch 6 |  iter 4661 / 9295 | time 3707[s] | loss 1.68
| epoch 6 |  iter 4681 / 9295 | time 3708[s] | loss 1.68
| epoch 6 |  iter 4701 / 9295 | time 3710[s] | loss 1.68
| epoch 6 |  iter 4721 / 9295 | time 3711[s] | loss 1.61
| epoch 6 |  iter 4741 / 9295 | time 3713[s] | loss 1.69
| epoch 6 |  iter 4761 / 9295 | time 3714[s] | loss 1.66
| epoch 6 |  iter 4781 / 9295 | time 3715[s] | loss 1.66
| epoch 6 |  iter 4801 / 9295 | time 3717[s] | loss 1.64
| epoch 6 |  iter 4821 / 9295 | time 3718[s] | loss 1.67
| epoch 6 |  iter 4841 / 9295 | time 3720[s] | loss 1.65
| epoch 6 |  iter 4861 / 9295 | time 3721[s] | loss 1.66
| epoch 6 |  iter 4881 / 9295 | time 3723[s] | loss 1.65
| epoch 6 |  iter 4901 / 9295 | time 3724[s] | loss 1.69
| epoch 6 |  iter 4921 / 9295 | time 3726[s] | loss 1.64
| epoch 6 |  iter 4941 / 9295 | time 3727[s] | loss 1.65
| epoch 6 |  iter 4961 / 9295 | time 3729[s] | loss 1.62
| epoch 6 |  iter 4981 / 9295 | time 3730[s] | loss 1.65
| epoch 6 |  iter 5001 / 9295 | time 3731[s] | loss 1.59
| epoch 6 |  iter 5021 / 9295 | time 3733[s] | loss 1.63
| epoch 6 |  iter 5041 / 9295 | time 3734[s] | loss 1.66
| epoch 6 |  iter 5061 / 9295 | time 3736[s] | loss 1.64
| epoch 6 |  iter 5081 / 9295 | time 3737[s] | loss 1.65
| epoch 6 |  iter 5101 / 9295 | time 3739[s] | loss 1.64
| epoch 6 |  iter 5121 / 9295 | time 3741[s] | loss 1.65
| epoch 6 |  iter 5141 / 9295 | time 3742[s] | loss 1.70
| epoch 6 |  iter 5161 / 9295 | time 3743[s] | loss 1.64
| epoch 6 |  iter 5181 / 9295 | time 3745[s] | loss 1.67
| epoch 6 |  iter 5201 / 9295 | time 3746[s] | loss 1.64
| epoch 6 |  iter 5221 / 9295 | time 3748[s] | loss 1.66
| epoch 6 |  iter 5241 / 9295 | time 3749[s] | loss 1.68
| epoch 6 |  iter 5261 / 9295 | time 3751[s] | loss 1.67
| epoch 6 |  iter 5281 / 9295 | time 3752[s] | loss 1.66
| epoch 6 |  iter 5301 / 9295 | time 3754[s] | loss 1.63
| epoch 6 |  iter 5321 / 9295 | time 3755[s] | loss 1.65
| epoch 6 |  iter 5341 / 9295 | time 3757[s] | loss 1.68
| epoch 6 |  iter 5361 / 9295 | time 3758[s] | loss 1.62
| epoch 6 |  iter 5381 / 9295 | time 3760[s] | loss 1.59
| epoch 6 |  iter 5401 / 9295 | time 3761[s] | loss 1.66
| epoch 6 |  iter 5421 / 9295 | time 3763[s] | loss 1.66
| epoch 6 |  iter 5441 / 9295 | time 3764[s] | loss 1.69
| epoch 6 |  iter 5461 / 9295 | time 3766[s] | loss 1.64
| epoch 6 |  iter 5481 / 9295 | time 3767[s] | loss 1.63
| epoch 6 |  iter 5501 / 9295 | time 3769[s] | loss 1.66
| epoch 6 |  iter 5521 / 9295 | time 3771[s] | loss 1.70
| epoch 6 |  iter 5541 / 9295 | time 3772[s] | loss 1.65
| epoch 6 |  iter 5561 / 9295 | time 3774[s] | loss 1.65
| epoch 6 |  iter 5581 / 9295 | time 3775[s] | loss 1.62
| epoch 6 |  iter 5601 / 9295 | time 3777[s] | loss 1.68
| epoch 6 |  iter 5621 / 9295 | time 3778[s] | loss 1.64
| epoch 6 |  iter 5641 / 9295 | time 3780[s] | loss 1.64
| epoch 6 |  iter 5661 / 9295 | time 3781[s] | loss 1.64
| epoch 6 |  iter 5681 / 9295 | time 3783[s] | loss 1.66
| epoch 6 |  iter 5701 / 9295 | time 3784[s] | loss 1.63
| epoch 6 |  iter 5721 / 9295 | time 3785[s] | loss 1.67
| epoch 6 |  iter 5741 / 9295 | time 3787[s] | loss 1.64
| epoch 6 |  iter 5761 / 9295 | time 3788[s] | loss 1.65
| epoch 6 |  iter 5781 / 9295 | time 3790[s] | loss 1.63
| epoch 6 |  iter 5801 / 9295 | time 3791[s] | loss 1.67
| epoch 6 |  iter 5821 / 9295 | time 3793[s] | loss 1.65
| epoch 6 |  iter 5841 / 9295 | time 3794[s] | loss 1.65
| epoch 6 |  iter 5861 / 9295 | time 3796[s] | loss 1.65
| epoch 6 |  iter 5881 / 9295 | time 3797[s] | loss 1.62
| epoch 6 |  iter 5901 / 9295 | time 3799[s] | loss 1.66
| epoch 6 |  iter 5921 / 9295 | time 3800[s] | loss 1.64
| epoch 6 |  iter 5941 / 9295 | time 3801[s] | loss 1.65
| epoch 6 |  iter 5961 / 9295 | time 3803[s] | loss 1.67
| epoch 6 |  iter 5981 / 9295 | time 3804[s] | loss 1.65
| epoch 6 |  iter 6001 / 9295 | time 3806[s] | loss 1.69
| epoch 6 |  iter 6021 / 9295 | time 3807[s] | loss 1.67
| epoch 6 |  iter 6041 / 9295 | time 3809[s] | loss 1.65
| epoch 6 |  iter 6061 / 9295 | time 3810[s] | loss 1.62
| epoch 6 |  iter 6081 / 9295 | time 3812[s] | loss 1.69
| epoch 6 |  iter 6101 / 9295 | time 3813[s] | loss 1.62
| epoch 6 |  iter 6121 / 9295 | time 3815[s] | loss 1.65
| epoch 6 |  iter 6141 / 9295 | time 3816[s] | loss 1.63
| epoch 6 |  iter 6161 / 9295 | time 3818[s] | loss 1.65
| epoch 6 |  iter 6181 / 9295 | time 3819[s] | loss 1.66
| epoch 6 |  iter 6201 / 9295 | time 3821[s] | loss 1.68
| epoch 6 |  iter 6221 / 9295 | time 3822[s] | loss 1.68
| epoch 6 |  iter 6241 / 9295 | time 3824[s] | loss 1.64
| epoch 6 |  iter 6261 / 9295 | time 3825[s] | loss 1.70
| epoch 6 |  iter 6281 / 9295 | time 3827[s] | loss 1.64
| epoch 6 |  iter 6301 / 9295 | time 3828[s] | loss 1.64
| epoch 6 |  iter 6321 / 9295 | time 3830[s] | loss 1.65
| epoch 6 |  iter 6341 / 9295 | time 3831[s] | loss 1.63
| epoch 6 |  iter 6361 / 9295 | time 3833[s] | loss 1.61
| epoch 6 |  iter 6381 / 9295 | time 3834[s] | loss 1.65
| epoch 6 |  iter 6401 / 9295 | time 3836[s] | loss 1.62
| epoch 6 |  iter 6421 / 9295 | time 3837[s] | loss 1.69
| epoch 6 |  iter 6441 / 9295 | time 3838[s] | loss 1.62
| epoch 6 |  iter 6461 / 9295 | time 3840[s] | loss 1.65
| epoch 6 |  iter 6481 / 9295 | time 3841[s] | loss 1.64
| epoch 6 |  iter 6501 / 9295 | time 3843[s] | loss 1.66
| epoch 6 |  iter 6521 / 9295 | time 3844[s] | loss 1.70
| epoch 6 |  iter 6541 / 9295 | time 3846[s] | loss 1.65
| epoch 6 |  iter 6561 / 9295 | time 3847[s] | loss 1.69
| epoch 6 |  iter 6581 / 9295 | time 3849[s] | loss 1.64
| epoch 6 |  iter 6601 / 9295 | time 3850[s] | loss 1.64
| epoch 6 |  iter 6621 / 9295 | time 3852[s] | loss 1.65
| epoch 6 |  iter 6641 / 9295 | time 3853[s] | loss 1.62
| epoch 6 |  iter 6661 / 9295 | time 3855[s] | loss 1.63
| epoch 6 |  iter 6681 / 9295 | time 3856[s] | loss 1.68
| epoch 6 |  iter 6701 / 9295 | time 3858[s] | loss 1.66
| epoch 6 |  iter 6721 / 9295 | time 3859[s] | loss 1.63
| epoch 6 |  iter 6741 / 9295 | time 3861[s] | loss 1.66
| epoch 6 |  iter 6761 / 9295 | time 3862[s] | loss 1.66
| epoch 6 |  iter 6781 / 9295 | time 3864[s] | loss 1.66
| epoch 6 |  iter 6801 / 9295 | time 3865[s] | loss 1.65
| epoch 6 |  iter 6821 / 9295 | time 3867[s] | loss 1.65
| epoch 6 |  iter 6841 / 9295 | time 3868[s] | loss 1.63
| epoch 6 |  iter 6861 / 9295 | time 3869[s] | loss 1.63
| epoch 6 |  iter 6881 / 9295 | time 3871[s] | loss 1.66
| epoch 6 |  iter 6901 / 9295 | time 3872[s] | loss 1.62
| epoch 6 |  iter 6921 / 9295 | time 3874[s] | loss 1.61
| epoch 6 |  iter 6941 / 9295 | time 3875[s] | loss 1.70
| epoch 6 |  iter 6961 / 9295 | time 3877[s] | loss 1.60
| epoch 6 |  iter 6981 / 9295 | time 3878[s] | loss 1.65
| epoch 6 |  iter 7001 / 9295 | time 3880[s] | loss 1.62
| epoch 6 |  iter 7021 / 9295 | time 3882[s] | loss 1.65
| epoch 6 |  iter 7041 / 9295 | time 3883[s] | loss 1.60
| epoch 6 |  iter 7061 / 9295 | time 3885[s] | loss 1.67
| epoch 6 |  iter 7081 / 9295 | time 3886[s] | loss 1.68
| epoch 6 |  iter 7101 / 9295 | time 3888[s] | loss 1.65
| epoch 6 |  iter 7121 / 9295 | time 3889[s] | loss 1.66
| epoch 6 |  iter 7141 / 9295 | time 3891[s] | loss 1.68
| epoch 6 |  iter 7161 / 9295 | time 3892[s] | loss 1.61
| epoch 6 |  iter 7181 / 9295 | time 3894[s] | loss 1.73
| epoch 6 |  iter 7201 / 9295 | time 3895[s] | loss 1.67
| epoch 6 |  iter 7221 / 9295 | time 3897[s] | loss 1.65
| epoch 6 |  iter 7241 / 9295 | time 3898[s] | loss 1.67
| epoch 6 |  iter 7261 / 9295 | time 3899[s] | loss 1.65
| epoch 6 |  iter 7281 / 9295 | time 3901[s] | loss 1.62
| epoch 6 |  iter 7301 / 9295 | time 3902[s] | loss 1.70
| epoch 6 |  iter 7321 / 9295 | time 3904[s] | loss 1.67
| epoch 6 |  iter 7341 / 9295 | time 3905[s] | loss 1.67
| epoch 6 |  iter 7361 / 9295 | time 3907[s] | loss 1.62
| epoch 6 |  iter 7381 / 9295 | time 3909[s] | loss 1.68
| epoch 6 |  iter 7401 / 9295 | time 3910[s] | loss 1.63
| epoch 6 |  iter 7421 / 9295 | time 3912[s] | loss 1.63
| epoch 6 |  iter 7441 / 9295 | time 3913[s] | loss 1.61
| epoch 6 |  iter 7461 / 9295 | time 3915[s] | loss 1.67
| epoch 6 |  iter 7481 / 9295 | time 3916[s] | loss 1.64
| epoch 6 |  iter 7501 / 9295 | time 3917[s] | loss 1.67
| epoch 6 |  iter 7521 / 9295 | time 3919[s] | loss 1.68
| epoch 6 |  iter 7541 / 9295 | time 3920[s] | loss 1.65
| epoch 6 |  iter 7561 / 9295 | time 3922[s] | loss 1.64
| epoch 6 |  iter 7581 / 9295 | time 3923[s] | loss 1.67
| epoch 6 |  iter 7601 / 9295 | time 3925[s] | loss 1.68
| epoch 6 |  iter 7621 / 9295 | time 3926[s] | loss 1.65
| epoch 6 |  iter 7641 / 9295 | time 3928[s] | loss 1.66
| epoch 6 |  iter 7661 / 9295 | time 3929[s] | loss 1.65
| epoch 6 |  iter 7681 / 9295 | time 3931[s] | loss 1.61
| epoch 6 |  iter 7701 / 9295 | time 3932[s] | loss 1.65
| epoch 6 |  iter 7721 / 9295 | time 3934[s] | loss 1.66
| epoch 6 |  iter 7741 / 9295 | time 3935[s] | loss 1.62
| epoch 6 |  iter 7761 / 9295 | time 3937[s] | loss 1.70
| epoch 6 |  iter 7781 / 9295 | time 3938[s] | loss 1.68
| epoch 6 |  iter 7801 / 9295 | time 3940[s] | loss 1.66
| epoch 6 |  iter 7821 / 9295 | time 3941[s] | loss 1.66
| epoch 6 |  iter 7841 / 9295 | time 3943[s] | loss 1.69
| epoch 6 |  iter 7861 / 9295 | time 3944[s] | loss 1.66
| epoch 6 |  iter 7881 / 9295 | time 3946[s] | loss 1.65
| epoch 6 |  iter 7901 / 9295 | time 3947[s] | loss 1.67
| epoch 6 |  iter 7921 / 9295 | time 3949[s] | loss 1.68
| epoch 6 |  iter 7941 / 9295 | time 3950[s] | loss 1.63
| epoch 6 |  iter 7961 / 9295 | time 3952[s] | loss 1.65
| epoch 6 |  iter 7981 / 9295 | time 3953[s] | loss 1.65
| epoch 6 |  iter 8001 / 9295 | time 3955[s] | loss 1.65
| epoch 6 |  iter 8021 / 9295 | time 3957[s] | loss 1.64
| epoch 6 |  iter 8041 / 9295 | time 3958[s] | loss 1.65
| epoch 6 |  iter 8061 / 9295 | time 3960[s] | loss 1.64
| epoch 6 |  iter 8081 / 9295 | time 3961[s] | loss 1.68
| epoch 6 |  iter 8101 / 9295 | time 3963[s] | loss 1.65
| epoch 6 |  iter 8121 / 9295 | time 3965[s] | loss 1.67
| epoch 6 |  iter 8141 / 9295 | time 3966[s] | loss 1.63
| epoch 6 |  iter 8161 / 9295 | time 3968[s] | loss 1.65
| epoch 6 |  iter 8181 / 9295 | time 3970[s] | loss 1.63
| epoch 6 |  iter 8201 / 9295 | time 3971[s] | loss 1.66
| epoch 6 |  iter 8221 / 9295 | time 3973[s] | loss 1.64
| epoch 6 |  iter 8241 / 9295 | time 3974[s] | loss 1.65
| epoch 6 |  iter 8261 / 9295 | time 3976[s] | loss 1.69
| epoch 6 |  iter 8281 / 9295 | time 3977[s] | loss 1.61
| epoch 6 |  iter 8301 / 9295 | time 3979[s] | loss 1.70
| epoch 6 |  iter 8321 / 9295 | time 3981[s] | loss 1.62
| epoch 6 |  iter 8341 / 9295 | time 3983[s] | loss 1.68
| epoch 6 |  iter 8361 / 9295 | time 3984[s] | loss 1.68
| epoch 6 |  iter 8381 / 9295 | time 3986[s] | loss 1.65
| epoch 6 |  iter 8401 / 9295 | time 3988[s] | loss 1.65
| epoch 6 |  iter 8421 / 9295 | time 3989[s] | loss 1.66
| epoch 6 |  iter 8441 / 9295 | time 3991[s] | loss 1.63
| epoch 6 |  iter 8461 / 9295 | time 3993[s] | loss 1.63
| epoch 6 |  iter 8481 / 9295 | time 3995[s] | loss 1.65
| epoch 6 |  iter 8501 / 9295 | time 3996[s] | loss 1.65
| epoch 6 |  iter 8521 / 9295 | time 3998[s] | loss 1.71
| epoch 6 |  iter 8541 / 9295 | time 4000[s] | loss 1.65
| epoch 6 |  iter 8561 / 9295 | time 4001[s] | loss 1.61
| epoch 6 |  iter 8581 / 9295 | time 4003[s] | loss 1.66
| epoch 6 |  iter 8601 / 9295 | time 4005[s] | loss 1.64
| epoch 6 |  iter 8621 / 9295 | time 4006[s] | loss 1.63
| epoch 6 |  iter 8641 / 9295 | time 4008[s] | loss 1.66
| epoch 6 |  iter 8661 / 9295 | time 4010[s] | loss 1.63
| epoch 6 |  iter 8681 / 9295 | time 4012[s] | loss 1.64
| epoch 6 |  iter 8701 / 9295 | time 4013[s] | loss 1.67
| epoch 6 |  iter 8721 / 9295 | time 4015[s] | loss 1.64
| epoch 6 |  iter 8741 / 9295 | time 4017[s] | loss 1.67
| epoch 6 |  iter 8761 / 9295 | time 4018[s] | loss 1.65
| epoch 6 |  iter 8781 / 9295 | time 4020[s] | loss 1.66
| epoch 6 |  iter 8801 / 9295 | time 4022[s] | loss 1.70
| epoch 6 |  iter 8821 / 9295 | time 4023[s] | loss 1.63
| epoch 6 |  iter 8841 / 9295 | time 4025[s] | loss 1.63
| epoch 6 |  iter 8861 / 9295 | time 4027[s] | loss 1.65
| epoch 6 |  iter 8881 / 9295 | time 4028[s] | loss 1.65
| epoch 6 |  iter 8901 / 9295 | time 4030[s] | loss 1.65
| epoch 6 |  iter 8921 / 9295 | time 4032[s] | loss 1.70
| epoch 6 |  iter 8941 / 9295 | time 4034[s] | loss 1.63
| epoch 6 |  iter 8961 / 9295 | time 4035[s] | loss 1.63
| epoch 6 |  iter 8981 / 9295 | time 4037[s] | loss 1.60
| epoch 6 |  iter 9001 / 9295 | time 4039[s] | loss 1.65
| epoch 6 |  iter 9021 / 9295 | time 4040[s] | loss 1.65
| epoch 6 |  iter 9041 / 9295 | time 4042[s] | loss 1.63
| epoch 6 |  iter 9061 / 9295 | time 4044[s] | loss 1.63
| epoch 6 |  iter 9081 / 9295 | time 4046[s] | loss 1.66
| epoch 6 |  iter 9101 / 9295 | time 4047[s] | loss 1.69
| epoch 6 |  iter 9121 / 9295 | time 4049[s] | loss 1.66
| epoch 6 |  iter 9141 / 9295 | time 4051[s] | loss 1.64
| epoch 6 |  iter 9161 / 9295 | time 4052[s] | loss 1.66
| epoch 6 |  iter 9181 / 9295 | time 4054[s] | loss 1.68
| epoch 6 |  iter 9201 / 9295 | time 4056[s] | loss 1.60
| epoch 6 |  iter 9221 / 9295 | time 4057[s] | loss 1.64
| epoch 6 |  iter 9241 / 9295 | time 4059[s] | loss 1.66
| epoch 6 |  iter 9261 / 9295 | time 4060[s] | loss 1.64
| epoch 6 |  iter 9281 / 9295 | time 4061[s] | loss 1.62
| epoch 7 |  iter 1 / 9295 | time 4063[s] | loss 1.63
| epoch 7 |  iter 21 / 9295 | time 4064[s] | loss 1.56
| epoch 7 |  iter 41 / 9295 | time 4066[s] | loss 1.57
| epoch 7 |  iter 61 / 9295 | time 4067[s] | loss 1.57
| epoch 7 |  iter 81 / 9295 | time 4069[s] | loss 1.58
| epoch 7 |  iter 101 / 9295 | time 4070[s] | loss 1.55
| epoch 7 |  iter 121 / 9295 | time 4072[s] | loss 1.58
| epoch 7 |  iter 141 / 9295 | time 4073[s] | loss 1.55
| epoch 7 |  iter 161 / 9295 | time 4074[s] | loss 1.56
| epoch 7 |  iter 181 / 9295 | time 4076[s] | loss 1.56
| epoch 7 |  iter 201 / 9295 | time 4077[s] | loss 1.56
| epoch 7 |  iter 221 / 9295 | time 4079[s] | loss 1.56
| epoch 7 |  iter 241 / 9295 | time 4080[s] | loss 1.58
| epoch 7 |  iter 261 / 9295 | time 4081[s] | loss 1.60
| epoch 7 |  iter 281 / 9295 | time 4083[s] | loss 1.55
| epoch 7 |  iter 301 / 9295 | time 4084[s] | loss 1.55
| epoch 7 |  iter 321 / 9295 | time 4086[s] | loss 1.57
| epoch 7 |  iter 341 / 9295 | time 4087[s] | loss 1.55
| epoch 7 |  iter 361 / 9295 | time 4088[s] | loss 1.60
| epoch 7 |  iter 381 / 9295 | time 4090[s] | loss 1.53
| epoch 7 |  iter 401 / 9295 | time 4091[s] | loss 1.59
| epoch 7 |  iter 421 / 9295 | time 4093[s] | loss 1.60
| epoch 7 |  iter 441 / 9295 | time 4094[s] | loss 1.57
| epoch 7 |  iter 461 / 9295 | time 4095[s] | loss 1.58
| epoch 7 |  iter 481 / 9295 | time 4097[s] | loss 1.56
| epoch 7 |  iter 501 / 9295 | time 4098[s] | loss 1.57
| epoch 7 |  iter 521 / 9295 | time 4099[s] | loss 1.58
| epoch 7 |  iter 541 / 9295 | time 4101[s] | loss 1.51
| epoch 7 |  iter 561 / 9295 | time 4102[s] | loss 1.58
| epoch 7 |  iter 581 / 9295 | time 4104[s] | loss 1.58
| epoch 7 |  iter 601 / 9295 | time 4105[s] | loss 1.60
| epoch 7 |  iter 621 / 9295 | time 4106[s] | loss 1.59
| epoch 7 |  iter 641 / 9295 | time 4108[s] | loss 1.58
| epoch 7 |  iter 661 / 9295 | time 4109[s] | loss 1.59
| epoch 7 |  iter 681 / 9295 | time 4111[s] | loss 1.52
| epoch 7 |  iter 701 / 9295 | time 4112[s] | loss 1.58
| epoch 7 |  iter 721 / 9295 | time 4113[s] | loss 1.58
| epoch 7 |  iter 741 / 9295 | time 4115[s] | loss 1.58
| epoch 7 |  iter 761 / 9295 | time 4116[s] | loss 1.55
| epoch 7 |  iter 781 / 9295 | time 4118[s] | loss 1.57
| epoch 7 |  iter 801 / 9295 | time 4119[s] | loss 1.56
| epoch 7 |  iter 821 / 9295 | time 4120[s] | loss 1.63
| epoch 7 |  iter 841 / 9295 | time 4122[s] | loss 1.63
| epoch 7 |  iter 861 / 9295 | time 4123[s] | loss 1.61
| epoch 7 |  iter 881 / 9295 | time 4125[s] | loss 1.58
| epoch 7 |  iter 901 / 9295 | time 4126[s] | loss 1.60
| epoch 7 |  iter 921 / 9295 | time 4128[s] | loss 1.56
| epoch 7 |  iter 941 / 9295 | time 4130[s] | loss 1.59
| epoch 7 |  iter 961 / 9295 | time 4131[s] | loss 1.58
| epoch 7 |  iter 981 / 9295 | time 4133[s] | loss 1.60
| epoch 7 |  iter 1001 / 9295 | time 4134[s] | loss 1.60
| epoch 7 |  iter 1021 / 9295 | time 4136[s] | loss 1.62
| epoch 7 |  iter 1041 / 9295 | time 4137[s] | loss 1.56
| epoch 7 |  iter 1061 / 9295 | time 4139[s] | loss 1.57
| epoch 7 |  iter 1081 / 9295 | time 4140[s] | loss 1.59
| epoch 7 |  iter 1101 / 9295 | time 4142[s] | loss 1.57
| epoch 7 |  iter 1121 / 9295 | time 4143[s] | loss 1.59
| epoch 7 |  iter 1141 / 9295 | time 4145[s] | loss 1.62
| epoch 7 |  iter 1161 / 9295 | time 4146[s] | loss 1.59
| epoch 7 |  iter 1181 / 9295 | time 4148[s] | loss 1.53
| epoch 7 |  iter 1201 / 9295 | time 4149[s] | loss 1.57
| epoch 7 |  iter 1221 / 9295 | time 4151[s] | loss 1.57
| epoch 7 |  iter 1241 / 9295 | time 4152[s] | loss 1.58
| epoch 7 |  iter 1261 / 9295 | time 4154[s] | loss 1.56
| epoch 7 |  iter 1281 / 9295 | time 4155[s] | loss 1.58
| epoch 7 |  iter 1301 / 9295 | time 4157[s] | loss 1.58
| epoch 7 |  iter 1321 / 9295 | time 4158[s] | loss 1.54
| epoch 7 |  iter 1341 / 9295 | time 4160[s] | loss 1.59
| epoch 7 |  iter 1361 / 9295 | time 4161[s] | loss 1.56
| epoch 7 |  iter 1381 / 9295 | time 4162[s] | loss 1.54
| epoch 7 |  iter 1401 / 9295 | time 4164[s] | loss 1.58
| epoch 7 |  iter 1421 / 9295 | time 4165[s] | loss 1.58
| epoch 7 |  iter 1441 / 9295 | time 4167[s] | loss 1.59
| epoch 7 |  iter 1461 / 9295 | time 4168[s] | loss 1.57
| epoch 7 |  iter 1481 / 9295 | time 4170[s] | loss 1.61
| epoch 7 |  iter 1501 / 9295 | time 4171[s] | loss 1.59
| epoch 7 |  iter 1521 / 9295 | time 4173[s] | loss 1.54
| epoch 7 |  iter 1541 / 9295 | time 4174[s] | loss 1.56
| epoch 7 |  iter 1561 / 9295 | time 4176[s] | loss 1.57
| epoch 7 |  iter 1581 / 9295 | time 4177[s] | loss 1.62
| epoch 7 |  iter 1601 / 9295 | time 4179[s] | loss 1.56
| epoch 7 |  iter 1621 / 9295 | time 4180[s] | loss 1.62
| epoch 7 |  iter 1641 / 9295 | time 4182[s] | loss 1.56
| epoch 7 |  iter 1661 / 9295 | time 4183[s] | loss 1.57
| epoch 7 |  iter 1681 / 9295 | time 4185[s] | loss 1.55
| epoch 7 |  iter 1701 / 9295 | time 4186[s] | loss 1.61
| epoch 7 |  iter 1721 / 9295 | time 4188[s] | loss 1.56
| epoch 7 |  iter 1741 / 9295 | time 4189[s] | loss 1.57
| epoch 7 |  iter 1761 / 9295 | time 4191[s] | loss 1.61
| epoch 7 |  iter 1781 / 9295 | time 4192[s] | loss 1.63
| epoch 7 |  iter 1801 / 9295 | time 4194[s] | loss 1.59
| epoch 7 |  iter 1821 / 9295 | time 4196[s] | loss 1.62
| epoch 7 |  iter 1841 / 9295 | time 4197[s] | loss 1.59
| epoch 7 |  iter 1861 / 9295 | time 4199[s] | loss 1.61
| epoch 7 |  iter 1881 / 9295 | time 4200[s] | loss 1.55
| epoch 7 |  iter 1901 / 9295 | time 4201[s] | loss 1.59
| epoch 7 |  iter 1921 / 9295 | time 4203[s] | loss 1.58
| epoch 7 |  iter 1941 / 9295 | time 4204[s] | loss 1.56
| epoch 7 |  iter 1961 / 9295 | time 4206[s] | loss 1.61
| epoch 7 |  iter 1981 / 9295 | time 4207[s] | loss 1.60
| epoch 7 |  iter 2001 / 9295 | time 4209[s] | loss 1.56
| epoch 7 |  iter 2021 / 9295 | time 4210[s] | loss 1.58
| epoch 7 |  iter 2041 / 9295 | time 4211[s] | loss 1.57
| epoch 7 |  iter 2061 / 9295 | time 4213[s] | loss 1.61
| epoch 7 |  iter 2081 / 9295 | time 4214[s] | loss 1.61
| epoch 7 |  iter 2101 / 9295 | time 4216[s] | loss 1.57
| epoch 7 |  iter 2121 / 9295 | time 4217[s] | loss 1.60
| epoch 7 |  iter 2141 / 9295 | time 4218[s] | loss 1.55
| epoch 7 |  iter 2161 / 9295 | time 4220[s] | loss 1.56
| epoch 7 |  iter 2181 / 9295 | time 4221[s] | loss 1.59
| epoch 7 |  iter 2201 / 9295 | time 4223[s] | loss 1.59
| epoch 7 |  iter 2221 / 9295 | time 4224[s] | loss 1.57
| epoch 7 |  iter 2241 / 9295 | time 4226[s] | loss 1.56
| epoch 7 |  iter 2261 / 9295 | time 4227[s] | loss 1.57
| epoch 7 |  iter 2281 / 9295 | time 4229[s] | loss 1.59
| epoch 7 |  iter 2301 / 9295 | time 4230[s] | loss 1.59
| epoch 7 |  iter 2321 / 9295 | time 4232[s] | loss 1.61
| epoch 7 |  iter 2341 / 9295 | time 4233[s] | loss 1.59
| epoch 7 |  iter 2361 / 9295 | time 4234[s] | loss 1.60
| epoch 7 |  iter 2381 / 9295 | time 4236[s] | loss 1.56
| epoch 7 |  iter 2401 / 9295 | time 4237[s] | loss 1.62
| epoch 7 |  iter 2421 / 9295 | time 4238[s] | loss 1.59
| epoch 7 |  iter 2441 / 9295 | time 4240[s] | loss 1.56
| epoch 7 |  iter 2461 / 9295 | time 4241[s] | loss 1.60
| epoch 7 |  iter 2481 / 9295 | time 4243[s] | loss 1.62
| epoch 7 |  iter 2501 / 9295 | time 4244[s] | loss 1.59
| epoch 7 |  iter 2521 / 9295 | time 4245[s] | loss 1.56
| epoch 7 |  iter 2541 / 9295 | time 4247[s] | loss 1.59
| epoch 7 |  iter 2561 / 9295 | time 4248[s] | loss 1.55
| epoch 7 |  iter 2581 / 9295 | time 4249[s] | loss 1.53
| epoch 7 |  iter 2601 / 9295 | time 4251[s] | loss 1.58
| epoch 7 |  iter 2621 / 9295 | time 4252[s] | loss 1.62
| epoch 7 |  iter 2641 / 9295 | time 4254[s] | loss 1.60
| epoch 7 |  iter 2661 / 9295 | time 4255[s] | loss 1.59
| epoch 7 |  iter 2681 / 9295 | time 4256[s] | loss 1.63
| epoch 7 |  iter 2701 / 9295 | time 4258[s] | loss 1.55
| epoch 7 |  iter 2721 / 9295 | time 4259[s] | loss 1.58
| epoch 7 |  iter 2741 / 9295 | time 4261[s] | loss 1.59
| epoch 7 |  iter 2761 / 9295 | time 4262[s] | loss 1.56
| epoch 7 |  iter 2781 / 9295 | time 4264[s] | loss 1.60
| epoch 7 |  iter 2801 / 9295 | time 4265[s] | loss 1.55
| epoch 7 |  iter 2821 / 9295 | time 4267[s] | loss 1.60
| epoch 7 |  iter 2841 / 9295 | time 4268[s] | loss 1.60
| epoch 7 |  iter 2861 / 9295 | time 4269[s] | loss 1.60
| epoch 7 |  iter 2881 / 9295 | time 4271[s] | loss 1.55
| epoch 7 |  iter 2901 / 9295 | time 4272[s] | loss 1.61
| epoch 7 |  iter 2921 / 9295 | time 4274[s] | loss 1.60
| epoch 7 |  iter 2941 / 9295 | time 4275[s] | loss 1.59
| epoch 7 |  iter 2961 / 9295 | time 4277[s] | loss 1.62
| epoch 7 |  iter 2981 / 9295 | time 4278[s] | loss 1.60
| epoch 7 |  iter 3001 / 9295 | time 4279[s] | loss 1.57
| epoch 7 |  iter 3021 / 9295 | time 4281[s] | loss 1.59
| epoch 7 |  iter 3041 / 9295 | time 4282[s] | loss 1.58
| epoch 7 |  iter 3061 / 9295 | time 4284[s] | loss 1.58
| epoch 7 |  iter 3081 / 9295 | time 4286[s] | loss 1.59
| epoch 7 |  iter 3101 / 9295 | time 4287[s] | loss 1.60
| epoch 7 |  iter 3121 / 9295 | time 4289[s] | loss 1.60
| epoch 7 |  iter 3141 / 9295 | time 4290[s] | loss 1.57
| epoch 7 |  iter 3161 / 9295 | time 4292[s] | loss 1.62
| epoch 7 |  iter 3181 / 9295 | time 4293[s] | loss 1.58
| epoch 7 |  iter 3201 / 9295 | time 4295[s] | loss 1.56
| epoch 7 |  iter 3221 / 9295 | time 4296[s] | loss 1.56
| epoch 7 |  iter 3241 / 9295 | time 4298[s] | loss 1.57
| epoch 7 |  iter 3261 / 9295 | time 4299[s] | loss 1.58
| epoch 7 |  iter 3281 / 9295 | time 4301[s] | loss 1.57
| epoch 7 |  iter 3301 / 9295 | time 4302[s] | loss 1.58
| epoch 7 |  iter 3321 / 9295 | time 4304[s] | loss 1.55
| epoch 7 |  iter 3341 / 9295 | time 4306[s] | loss 1.58
| epoch 7 |  iter 3361 / 9295 | time 4307[s] | loss 1.56
| epoch 7 |  iter 3381 / 9295 | time 4308[s] | loss 1.62
| epoch 7 |  iter 3401 / 9295 | time 4310[s] | loss 1.60
| epoch 7 |  iter 3421 / 9295 | time 4311[s] | loss 1.61
| epoch 7 |  iter 3441 / 9295 | time 4313[s] | loss 1.58
| epoch 7 |  iter 3461 / 9295 | time 4315[s] | loss 1.59
| epoch 7 |  iter 3481 / 9295 | time 4316[s] | loss 1.61
| epoch 7 |  iter 3501 / 9295 | time 4318[s] | loss 1.58
| epoch 7 |  iter 3521 / 9295 | time 4320[s] | loss 1.60
| epoch 7 |  iter 3541 / 9295 | time 4321[s] | loss 1.61
| epoch 7 |  iter 3561 / 9295 | time 4322[s] | loss 1.63
| epoch 7 |  iter 3581 / 9295 | time 4324[s] | loss 1.54
| epoch 7 |  iter 3601 / 9295 | time 4325[s] | loss 1.61
| epoch 7 |  iter 3621 / 9295 | time 4326[s] | loss 1.60
| epoch 7 |  iter 3641 / 9295 | time 4328[s] | loss 1.63
| epoch 7 |  iter 3661 / 9295 | time 4329[s] | loss 1.59
| epoch 7 |  iter 3681 / 9295 | time 4330[s] | loss 1.54
| epoch 7 |  iter 3701 / 9295 | time 4332[s] | loss 1.58
| epoch 7 |  iter 3721 / 9295 | time 4333[s] | loss 1.64
| epoch 7 |  iter 3741 / 9295 | time 4335[s] | loss 1.55
| epoch 7 |  iter 3761 / 9295 | time 4336[s] | loss 1.59
| epoch 7 |  iter 3781 / 9295 | time 4337[s] | loss 1.57
| epoch 7 |  iter 3801 / 9295 | time 4339[s] | loss 1.62
| epoch 7 |  iter 3821 / 9295 | time 4340[s] | loss 1.60
| epoch 7 |  iter 3841 / 9295 | time 4341[s] | loss 1.59
| epoch 7 |  iter 3861 / 9295 | time 4343[s] | loss 1.59
| epoch 7 |  iter 3881 / 9295 | time 4344[s] | loss 1.57
| epoch 7 |  iter 3901 / 9295 | time 4345[s] | loss 1.58
| epoch 7 |  iter 3921 / 9295 | time 4347[s] | loss 1.56
| epoch 7 |  iter 3941 / 9295 | time 4348[s] | loss 1.60
| epoch 7 |  iter 3961 / 9295 | time 4349[s] | loss 1.58
| epoch 7 |  iter 3981 / 9295 | time 4350[s] | loss 1.62
| epoch 7 |  iter 4001 / 9295 | time 4352[s] | loss 1.58
| epoch 7 |  iter 4021 / 9295 | time 4353[s] | loss 1.57
| epoch 7 |  iter 4041 / 9295 | time 4354[s] | loss 1.62
| epoch 7 |  iter 4061 / 9295 | time 4356[s] | loss 1.58
| epoch 7 |  iter 4081 / 9295 | time 4357[s] | loss 1.58
| epoch 7 |  iter 4101 / 9295 | time 4358[s] | loss 1.61
| epoch 7 |  iter 4121 / 9295 | time 4360[s] | loss 1.56
| epoch 7 |  iter 4141 / 9295 | time 4361[s] | loss 1.62
| epoch 7 |  iter 4161 / 9295 | time 4362[s] | loss 1.62
| epoch 7 |  iter 4181 / 9295 | time 4363[s] | loss 1.55
| epoch 7 |  iter 4201 / 9295 | time 4365[s] | loss 1.56
| epoch 7 |  iter 4221 / 9295 | time 4366[s] | loss 1.59
| epoch 7 |  iter 4241 / 9295 | time 4367[s] | loss 1.64
| epoch 7 |  iter 4261 / 9295 | time 4369[s] | loss 1.58
| epoch 7 |  iter 4281 / 9295 | time 4370[s] | loss 1.63
| epoch 7 |  iter 4301 / 9295 | time 4371[s] | loss 1.59
| epoch 7 |  iter 4321 / 9295 | time 4373[s] | loss 1.59
| epoch 7 |  iter 4341 / 9295 | time 4374[s] | loss 1.61
| epoch 7 |  iter 4361 / 9295 | time 4375[s] | loss 1.57
| epoch 7 |  iter 4381 / 9295 | time 4377[s] | loss 1.60
| epoch 7 |  iter 4401 / 9295 | time 4378[s] | loss 1.62
| epoch 7 |  iter 4421 / 9295 | time 4379[s] | loss 1.63
| epoch 7 |  iter 4441 / 9295 | time 4381[s] | loss 1.61
| epoch 7 |  iter 4461 / 9295 | time 4382[s] | loss 1.58
| epoch 7 |  iter 4481 / 9295 | time 4383[s] | loss 1.58
| epoch 7 |  iter 4501 / 9295 | time 4384[s] | loss 1.62
| epoch 7 |  iter 4521 / 9295 | time 4386[s] | loss 1.60
| epoch 7 |  iter 4541 / 9295 | time 4387[s] | loss 1.62
| epoch 7 |  iter 4561 / 9295 | time 4388[s] | loss 1.56
| epoch 7 |  iter 4581 / 9295 | time 4390[s] | loss 1.58
| epoch 7 |  iter 4601 / 9295 | time 4391[s] | loss 1.60
| epoch 7 |  iter 4621 / 9295 | time 4392[s] | loss 1.61
| epoch 7 |  iter 4641 / 9295 | time 4394[s] | loss 1.57
| epoch 7 |  iter 4661 / 9295 | time 4395[s] | loss 1.61
| epoch 7 |  iter 4681 / 9295 | time 4396[s] | loss 1.61
| epoch 7 |  iter 4701 / 9295 | time 4398[s] | loss 1.58
| epoch 7 |  iter 4721 / 9295 | time 4399[s] | loss 1.61
| epoch 7 |  iter 4741 / 9295 | time 4400[s] | loss 1.56
| epoch 7 |  iter 4761 / 9295 | time 4402[s] | loss 1.57
| epoch 7 |  iter 4781 / 9295 | time 4403[s] | loss 1.55
| epoch 7 |  iter 4801 / 9295 | time 4404[s] | loss 1.60
| epoch 7 |  iter 4821 / 9295 | time 4406[s] | loss 1.59
| epoch 7 |  iter 4841 / 9295 | time 4407[s] | loss 1.53
| epoch 7 |  iter 4861 / 9295 | time 4408[s] | loss 1.61
| epoch 7 |  iter 4881 / 9295 | time 4409[s] | loss 1.57
| epoch 7 |  iter 4901 / 9295 | time 4411[s] | loss 1.59
| epoch 7 |  iter 4921 / 9295 | time 4412[s] | loss 1.59
| epoch 7 |  iter 4941 / 9295 | time 4413[s] | loss 1.60
| epoch 7 |  iter 4961 / 9295 | time 4415[s] | loss 1.61
| epoch 7 |  iter 4981 / 9295 | time 4416[s] | loss 1.59
| epoch 7 |  iter 5001 / 9295 | time 4417[s] | loss 1.58
| epoch 7 |  iter 5021 / 9295 | time 4418[s] | loss 1.56
| epoch 7 |  iter 5041 / 9295 | time 4420[s] | loss 1.57
| epoch 7 |  iter 5061 / 9295 | time 4421[s] | loss 1.58
| epoch 7 |  iter 5081 / 9295 | time 4422[s] | loss 1.56
| epoch 7 |  iter 5101 / 9295 | time 4424[s] | loss 1.62
| epoch 7 |  iter 5121 / 9295 | time 4425[s] | loss 1.61
| epoch 7 |  iter 5141 / 9295 | time 4426[s] | loss 1.57
| epoch 7 |  iter 5161 / 9295 | time 4428[s] | loss 1.61
| epoch 7 |  iter 5181 / 9295 | time 4429[s] | loss 1.63
| epoch 7 |  iter 5201 / 9295 | time 4430[s] | loss 1.62
| epoch 7 |  iter 5221 / 9295 | time 4431[s] | loss 1.58
| epoch 7 |  iter 5241 / 9295 | time 4433[s] | loss 1.58
| epoch 7 |  iter 5261 / 9295 | time 4434[s] | loss 1.57
| epoch 7 |  iter 5281 / 9295 | time 4436[s] | loss 1.60
| epoch 7 |  iter 5301 / 9295 | time 4437[s] | loss 1.54
| epoch 7 |  iter 5321 / 9295 | time 4438[s] | loss 1.63
| epoch 7 |  iter 5341 / 9295 | time 4440[s] | loss 1.62
| epoch 7 |  iter 5361 / 9295 | time 4441[s] | loss 1.58
| epoch 7 |  iter 5381 / 9295 | time 4442[s] | loss 1.59
| epoch 7 |  iter 5401 / 9295 | time 4443[s] | loss 1.60
| epoch 7 |  iter 5421 / 9295 | time 4445[s] | loss 1.60
| epoch 7 |  iter 5441 / 9295 | time 4446[s] | loss 1.57
| epoch 7 |  iter 5461 / 9295 | time 4447[s] | loss 1.62
| epoch 7 |  iter 5481 / 9295 | time 4449[s] | loss 1.63
| epoch 7 |  iter 5501 / 9295 | time 4450[s] | loss 1.59
| epoch 7 |  iter 5521 / 9295 | time 4451[s] | loss 1.61
| epoch 7 |  iter 5541 / 9295 | time 4453[s] | loss 1.57
| epoch 7 |  iter 5561 / 9295 | time 4454[s] | loss 1.59
| epoch 7 |  iter 5581 / 9295 | time 4455[s] | loss 1.55
| epoch 7 |  iter 5601 / 9295 | time 4456[s] | loss 1.54
| epoch 7 |  iter 5621 / 9295 | time 4458[s] | loss 1.63
| epoch 7 |  iter 5641 / 9295 | time 4459[s] | loss 1.57
| epoch 7 |  iter 5661 / 9295 | time 4460[s] | loss 1.60
| epoch 7 |  iter 5681 / 9295 | time 4462[s] | loss 1.64
| epoch 7 |  iter 5701 / 9295 | time 4463[s] | loss 1.58
| epoch 7 |  iter 5721 / 9295 | time 4464[s] | loss 1.56
| epoch 7 |  iter 5741 / 9295 | time 4465[s] | loss 1.64
| epoch 7 |  iter 5761 / 9295 | time 4467[s] | loss 1.61
| epoch 7 |  iter 5781 / 9295 | time 4468[s] | loss 1.57
| epoch 7 |  iter 5801 / 9295 | time 4469[s] | loss 1.55
| epoch 7 |  iter 5821 / 9295 | time 4471[s] | loss 1.59
| epoch 7 |  iter 5841 / 9295 | time 4472[s] | loss 1.59
| epoch 7 |  iter 5861 / 9295 | time 4473[s] | loss 1.58
| epoch 7 |  iter 5881 / 9295 | time 4475[s] | loss 1.59
| epoch 7 |  iter 5901 / 9295 | time 4476[s] | loss 1.60
| epoch 7 |  iter 5921 / 9295 | time 4477[s] | loss 1.61
| epoch 7 |  iter 5941 / 9295 | time 4479[s] | loss 1.57
| epoch 7 |  iter 5961 / 9295 | time 4480[s] | loss 1.61
| epoch 7 |  iter 5981 / 9295 | time 4481[s] | loss 1.60
| epoch 7 |  iter 6001 / 9295 | time 4482[s] | loss 1.61
| epoch 7 |  iter 6021 / 9295 | time 4484[s] | loss 1.59
| epoch 7 |  iter 6041 / 9295 | time 4485[s] | loss 1.61
| epoch 7 |  iter 6061 / 9295 | time 4486[s] | loss 1.61
| epoch 7 |  iter 6081 / 9295 | time 4488[s] | loss 1.59
| epoch 7 |  iter 6101 / 9295 | time 4489[s] | loss 1.58
| epoch 7 |  iter 6121 / 9295 | time 4490[s] | loss 1.64
| epoch 7 |  iter 6141 / 9295 | time 4492[s] | loss 1.63
| epoch 7 |  iter 6161 / 9295 | time 4493[s] | loss 1.65
| epoch 7 |  iter 6181 / 9295 | time 4494[s] | loss 1.59
| epoch 7 |  iter 6201 / 9295 | time 4496[s] | loss 1.58
| epoch 7 |  iter 6221 / 9295 | time 4497[s] | loss 1.57
| epoch 7 |  iter 6241 / 9295 | time 4498[s] | loss 1.58
| epoch 7 |  iter 6261 / 9295 | time 4499[s] | loss 1.62
| epoch 7 |  iter 6281 / 9295 | time 4501[s] | loss 1.60
| epoch 7 |  iter 6301 / 9295 | time 4502[s] | loss 1.57
| epoch 7 |  iter 6321 / 9295 | time 4503[s] | loss 1.60
| epoch 7 |  iter 6341 / 9295 | time 4505[s] | loss 1.59
| epoch 7 |  iter 6361 / 9295 | time 4506[s] | loss 1.58
| epoch 7 |  iter 6381 / 9295 | time 4507[s] | loss 1.60
| epoch 7 |  iter 6401 / 9295 | time 4509[s] | loss 1.60
| epoch 7 |  iter 6421 / 9295 | time 4510[s] | loss 1.62
| epoch 7 |  iter 6441 / 9295 | time 4511[s] | loss 1.57
| epoch 7 |  iter 6461 / 9295 | time 4513[s] | loss 1.57
| epoch 7 |  iter 6481 / 9295 | time 4514[s] | loss 1.62
| epoch 7 |  iter 6501 / 9295 | time 4515[s] | loss 1.60
| epoch 7 |  iter 6521 / 9295 | time 4517[s] | loss 1.60
| epoch 7 |  iter 6541 / 9295 | time 4518[s] | loss 1.61
| epoch 7 |  iter 6561 / 9295 | time 4519[s] | loss 1.61
| epoch 7 |  iter 6581 / 9295 | time 4520[s] | loss 1.58
| epoch 7 |  iter 6601 / 9295 | time 4522[s] | loss 1.60
| epoch 7 |  iter 6621 / 9295 | time 4523[s] | loss 1.58
| epoch 7 |  iter 6641 / 9295 | time 4524[s] | loss 1.59
| epoch 7 |  iter 6661 / 9295 | time 4526[s] | loss 1.60
| epoch 7 |  iter 6681 / 9295 | time 4527[s] | loss 1.62
| epoch 7 |  iter 6701 / 9295 | time 4528[s] | loss 1.62
| epoch 7 |  iter 6721 / 9295 | time 4530[s] | loss 1.56
| epoch 7 |  iter 6741 / 9295 | time 4531[s] | loss 1.59
| epoch 7 |  iter 6761 / 9295 | time 4532[s] | loss 1.62
| epoch 7 |  iter 6781 / 9295 | time 4534[s] | loss 1.64
| epoch 7 |  iter 6801 / 9295 | time 4535[s] | loss 1.56
| epoch 7 |  iter 6821 / 9295 | time 4537[s] | loss 1.55
| epoch 7 |  iter 6841 / 9295 | time 4538[s] | loss 1.55
| epoch 7 |  iter 6861 / 9295 | time 4539[s] | loss 1.58
| epoch 7 |  iter 6881 / 9295 | time 4540[s] | loss 1.55
| epoch 7 |  iter 6901 / 9295 | time 4542[s] | loss 1.61
| epoch 7 |  iter 6921 / 9295 | time 4543[s] | loss 1.61
| epoch 7 |  iter 6941 / 9295 | time 4544[s] | loss 1.57
| epoch 7 |  iter 6961 / 9295 | time 4546[s] | loss 1.63
| epoch 7 |  iter 6981 / 9295 | time 4547[s] | loss 1.59
| epoch 7 |  iter 7001 / 9295 | time 4548[s] | loss 1.58
| epoch 7 |  iter 7021 / 9295 | time 4550[s] | loss 1.61
| epoch 7 |  iter 7041 / 9295 | time 4551[s] | loss 1.58
| epoch 7 |  iter 7061 / 9295 | time 4552[s] | loss 1.59
| epoch 7 |  iter 7081 / 9295 | time 4554[s] | loss 1.58
| epoch 7 |  iter 7101 / 9295 | time 4555[s] | loss 1.59
| epoch 7 |  iter 7121 / 9295 | time 4556[s] | loss 1.58
| epoch 7 |  iter 7141 / 9295 | time 4557[s] | loss 1.59
| epoch 7 |  iter 7161 / 9295 | time 4559[s] | loss 1.57
| epoch 7 |  iter 7181 / 9295 | time 4560[s] | loss 1.60
| epoch 7 |  iter 7201 / 9295 | time 4561[s] | loss 1.59
| epoch 7 |  iter 7221 / 9295 | time 4563[s] | loss 1.59
| epoch 7 |  iter 7241 / 9295 | time 4564[s] | loss 1.59
| epoch 7 |  iter 7261 / 9295 | time 4565[s] | loss 1.58
| epoch 7 |  iter 7281 / 9295 | time 4567[s] | loss 1.57
| epoch 7 |  iter 7301 / 9295 | time 4569[s] | loss 1.61
| epoch 7 |  iter 7321 / 9295 | time 4570[s] | loss 1.60
| epoch 7 |  iter 7341 / 9295 | time 4572[s] | loss 1.63
| epoch 7 |  iter 7361 / 9295 | time 4573[s] | loss 1.63
| epoch 7 |  iter 7381 / 9295 | time 4575[s] | loss 1.61
| epoch 7 |  iter 7401 / 9295 | time 4576[s] | loss 1.68
| epoch 7 |  iter 7421 / 9295 | time 4577[s] | loss 1.60
| epoch 7 |  iter 7441 / 9295 | time 4579[s] | loss 1.60
| epoch 7 |  iter 7461 / 9295 | time 4580[s] | loss 1.58
| epoch 7 |  iter 7481 / 9295 | time 4581[s] | loss 1.57
| epoch 7 |  iter 7501 / 9295 | time 4582[s] | loss 1.57
| epoch 7 |  iter 7521 / 9295 | time 4584[s] | loss 1.62
| epoch 7 |  iter 7541 / 9295 | time 4586[s] | loss 1.64
| epoch 7 |  iter 7561 / 9295 | time 4587[s] | loss 1.58
| epoch 7 |  iter 7581 / 9295 | time 4588[s] | loss 1.59
| epoch 7 |  iter 7601 / 9295 | time 4590[s] | loss 1.55
| epoch 7 |  iter 7621 / 9295 | time 4591[s] | loss 1.58
| epoch 7 |  iter 7641 / 9295 | time 4592[s] | loss 1.61
| epoch 7 |  iter 7661 / 9295 | time 4594[s] | loss 1.60
| epoch 7 |  iter 7681 / 9295 | time 4595[s] | loss 1.61
| epoch 7 |  iter 7701 / 9295 | time 4596[s] | loss 1.64
| epoch 7 |  iter 7721 / 9295 | time 4598[s] | loss 1.58
| epoch 7 |  iter 7741 / 9295 | time 4599[s] | loss 1.60
| epoch 7 |  iter 7761 / 9295 | time 4600[s] | loss 1.63
| epoch 7 |  iter 7781 / 9295 | time 4601[s] | loss 1.60
| epoch 7 |  iter 7801 / 9295 | time 4603[s] | loss 1.58
| epoch 7 |  iter 7821 / 9295 | time 4604[s] | loss 1.59
| epoch 7 |  iter 7841 / 9295 | time 4605[s] | loss 1.59
| epoch 7 |  iter 7861 / 9295 | time 4607[s] | loss 1.61
| epoch 7 |  iter 7881 / 9295 | time 4608[s] | loss 1.58
| epoch 7 |  iter 7901 / 9295 | time 4609[s] | loss 1.60
| epoch 7 |  iter 7921 / 9295 | time 4610[s] | loss 1.58
| epoch 7 |  iter 7941 / 9295 | time 4612[s] | loss 1.58
| epoch 7 |  iter 7961 / 9295 | time 4613[s] | loss 1.61
| epoch 7 |  iter 7981 / 9295 | time 4614[s] | loss 1.57
| epoch 7 |  iter 8001 / 9295 | time 4616[s] | loss 1.54
| epoch 7 |  iter 8021 / 9295 | time 4617[s] | loss 1.60
| epoch 7 |  iter 8041 / 9295 | time 4618[s] | loss 1.55
| epoch 7 |  iter 8061 / 9295 | time 4619[s] | loss 1.61
| epoch 7 |  iter 8081 / 9295 | time 4621[s] | loss 1.59
| epoch 7 |  iter 8101 / 9295 | time 4622[s] | loss 1.62
| epoch 7 |  iter 8121 / 9295 | time 4623[s] | loss 1.59
| epoch 7 |  iter 8141 / 9295 | time 4625[s] | loss 1.63
| epoch 7 |  iter 8161 / 9295 | time 4626[s] | loss 1.60
| epoch 7 |  iter 8181 / 9295 | time 4627[s] | loss 1.58
| epoch 7 |  iter 8201 / 9295 | time 4629[s] | loss 1.60
| epoch 7 |  iter 8221 / 9295 | time 4630[s] | loss 1.56
| epoch 7 |  iter 8241 / 9295 | time 4631[s] | loss 1.55
| epoch 7 |  iter 8261 / 9295 | time 4632[s] | loss 1.57
| epoch 7 |  iter 8281 / 9295 | time 4634[s] | loss 1.57
| epoch 7 |  iter 8301 / 9295 | time 4635[s] | loss 1.62
| epoch 7 |  iter 8321 / 9295 | time 4636[s] | loss 1.58
| epoch 7 |  iter 8341 / 9295 | time 4638[s] | loss 1.59
| epoch 7 |  iter 8361 / 9295 | time 4639[s] | loss 1.57
| epoch 7 |  iter 8381 / 9295 | time 4640[s] | loss 1.63
| epoch 7 |  iter 8401 / 9295 | time 4641[s] | loss 1.56
| epoch 7 |  iter 8421 / 9295 | time 4643[s] | loss 1.62
| epoch 7 |  iter 8441 / 9295 | time 4644[s] | loss 1.61
| epoch 7 |  iter 8461 / 9295 | time 4645[s] | loss 1.58
| epoch 7 |  iter 8481 / 9295 | time 4647[s] | loss 1.60
| epoch 7 |  iter 8501 / 9295 | time 4648[s] | loss 1.64
| epoch 7 |  iter 8521 / 9295 | time 4649[s] | loss 1.59
| epoch 7 |  iter 8541 / 9295 | time 4650[s] | loss 1.59
| epoch 7 |  iter 8561 / 9295 | time 4652[s] | loss 1.61
| epoch 7 |  iter 8581 / 9295 | time 4653[s] | loss 1.59
| epoch 7 |  iter 8601 / 9295 | time 4654[s] | loss 1.59
| epoch 7 |  iter 8621 / 9295 | time 4655[s] | loss 1.65
| epoch 7 |  iter 8641 / 9295 | time 4657[s] | loss 1.59
| epoch 7 |  iter 8661 / 9295 | time 4658[s] | loss 1.61
| epoch 7 |  iter 8681 / 9295 | time 4659[s] | loss 1.58
| epoch 7 |  iter 8701 / 9295 | time 4661[s] | loss 1.61
| epoch 7 |  iter 8721 / 9295 | time 4662[s] | loss 1.58
| epoch 7 |  iter 8741 / 9295 | time 4663[s] | loss 1.61
| epoch 7 |  iter 8761 / 9295 | time 4665[s] | loss 1.59
| epoch 7 |  iter 8781 / 9295 | time 4666[s] | loss 1.58
| epoch 7 |  iter 8801 / 9295 | time 4667[s] | loss 1.58
| epoch 7 |  iter 8821 / 9295 | time 4668[s] | loss 1.61
| epoch 7 |  iter 8841 / 9295 | time 4670[s] | loss 1.56
| epoch 7 |  iter 8861 / 9295 | time 4671[s] | loss 1.63
| epoch 7 |  iter 8881 / 9295 | time 4672[s] | loss 1.59
| epoch 7 |  iter 8901 / 9295 | time 4674[s] | loss 1.62
| epoch 7 |  iter 8921 / 9295 | time 4675[s] | loss 1.56
| epoch 7 |  iter 8941 / 9295 | time 4676[s] | loss 1.60
| epoch 7 |  iter 8961 / 9295 | time 4677[s] | loss 1.58
| epoch 7 |  iter 8981 / 9295 | time 4679[s] | loss 1.62
| epoch 7 |  iter 9001 / 9295 | time 4680[s] | loss 1.59
| epoch 7 |  iter 9021 / 9295 | time 4681[s] | loss 1.60
| epoch 7 |  iter 9041 / 9295 | time 4682[s] | loss 1.59
| epoch 7 |  iter 9061 / 9295 | time 4684[s] | loss 1.56
| epoch 7 |  iter 9081 / 9295 | time 4685[s] | loss 1.63
| epoch 7 |  iter 9101 / 9295 | time 4686[s] | loss 1.58
| epoch 7 |  iter 9121 / 9295 | time 4688[s] | loss 1.60
| epoch 7 |  iter 9141 / 9295 | time 4689[s] | loss 1.60
| epoch 7 |  iter 9161 / 9295 | time 4690[s] | loss 1.60
| epoch 7 |  iter 9181 / 9295 | time 4692[s] | loss 1.56
| epoch 7 |  iter 9201 / 9295 | time 4693[s] | loss 1.58
| epoch 7 |  iter 9221 / 9295 | time 4694[s] | loss 1.61
| epoch 7 |  iter 9241 / 9295 | time 4695[s] | loss 1.57
| epoch 7 |  iter 9261 / 9295 | time 4697[s] | loss 1.60
| epoch 7 |  iter 9281 / 9295 | time 4698[s] | loss 1.59
| epoch 8 |  iter 1 / 9295 | time 4699[s] | loss 1.56
| epoch 8 |  iter 21 / 9295 | time 4700[s] | loss 1.52
| epoch 8 |  iter 41 / 9295 | time 4702[s] | loss 1.52
| epoch 8 |  iter 61 / 9295 | time 4703[s] | loss 1.53
| epoch 8 |  iter 81 / 9295 | time 4704[s] | loss 1.53
| epoch 8 |  iter 101 / 9295 | time 4705[s] | loss 1.48
| epoch 8 |  iter 121 / 9295 | time 4707[s] | loss 1.54
| epoch 8 |  iter 141 / 9295 | time 4708[s] | loss 1.49
| epoch 8 |  iter 161 / 9295 | time 4709[s] | loss 1.53
| epoch 8 |  iter 181 / 9295 | time 4711[s] | loss 1.50
| epoch 8 |  iter 201 / 9295 | time 4712[s] | loss 1.54
| epoch 8 |  iter 221 / 9295 | time 4713[s] | loss 1.49
| epoch 8 |  iter 241 / 9295 | time 4715[s] | loss 1.50
| epoch 8 |  iter 261 / 9295 | time 4716[s] | loss 1.49
| epoch 8 |  iter 281 / 9295 | time 4717[s] | loss 1.53
| epoch 8 |  iter 301 / 9295 | time 4718[s] | loss 1.49
| epoch 8 |  iter 321 / 9295 | time 4720[s] | loss 1.56
| epoch 8 |  iter 341 / 9295 | time 4721[s] | loss 1.52
| epoch 8 |  iter 361 / 9295 | time 4723[s] | loss 1.52
| epoch 8 |  iter 381 / 9295 | time 4724[s] | loss 1.49
| epoch 8 |  iter 401 / 9295 | time 4725[s] | loss 1.51
| epoch 8 |  iter 421 / 9295 | time 4726[s] | loss 1.48
| epoch 8 |  iter 441 / 9295 | time 4728[s] | loss 1.50
| epoch 8 |  iter 461 / 9295 | time 4729[s] | loss 1.54
| epoch 8 |  iter 481 / 9295 | time 4730[s] | loss 1.52
| epoch 8 |  iter 501 / 9295 | time 4732[s] | loss 1.54
| epoch 8 |  iter 521 / 9295 | time 4733[s] | loss 1.55
| epoch 8 |  iter 541 / 9295 | time 4734[s] | loss 1.48
| epoch 8 |  iter 561 / 9295 | time 4735[s] | loss 1.53
| epoch 8 |  iter 581 / 9295 | time 4737[s] | loss 1.56
| epoch 8 |  iter 601 / 9295 | time 4738[s] | loss 1.50
| epoch 8 |  iter 621 / 9295 | time 4739[s] | loss 1.53
| epoch 8 |  iter 641 / 9295 | time 4741[s] | loss 1.51
| epoch 8 |  iter 661 / 9295 | time 4742[s] | loss 1.53
| epoch 8 |  iter 681 / 9295 | time 4743[s] | loss 1.56
| epoch 8 |  iter 701 / 9295 | time 4744[s] | loss 1.58
| epoch 8 |  iter 721 / 9295 | time 4746[s] | loss 1.50
| epoch 8 |  iter 741 / 9295 | time 4747[s] | loss 1.52
| epoch 8 |  iter 761 / 9295 | time 4748[s] | loss 1.52
| epoch 8 |  iter 781 / 9295 | time 4749[s] | loss 1.54
| epoch 8 |  iter 801 / 9295 | time 4751[s] | loss 1.52
| epoch 8 |  iter 821 / 9295 | time 4752[s] | loss 1.54
| epoch 8 |  iter 841 / 9295 | time 4753[s] | loss 1.57
| epoch 8 |  iter 861 / 9295 | time 4755[s] | loss 1.53
| epoch 8 |  iter 881 / 9295 | time 4756[s] | loss 1.50
| epoch 8 |  iter 901 / 9295 | time 4757[s] | loss 1.51
| epoch 8 |  iter 921 / 9295 | time 4759[s] | loss 1.51
| epoch 8 |  iter 941 / 9295 | time 4760[s] | loss 1.50
| epoch 8 |  iter 961 / 9295 | time 4761[s] | loss 1.50
| epoch 8 |  iter 981 / 9295 | time 4762[s] | loss 1.52
| epoch 8 |  iter 1001 / 9295 | time 4764[s] | loss 1.54
| epoch 8 |  iter 1021 / 9295 | time 4765[s] | loss 1.50
| epoch 8 |  iter 1041 / 9295 | time 4766[s] | loss 1.52
| epoch 8 |  iter 1061 / 9295 | time 4768[s] | loss 1.57
| epoch 8 |  iter 1081 / 9295 | time 4769[s] | loss 1.50
| epoch 8 |  iter 1101 / 9295 | time 4770[s] | loss 1.51
| epoch 8 |  iter 1121 / 9295 | time 4771[s] | loss 1.54
| epoch 8 |  iter 1141 / 9295 | time 4773[s] | loss 1.52
| epoch 8 |  iter 1161 / 9295 | time 4774[s] | loss 1.51
| epoch 8 |  iter 1181 / 9295 | time 4775[s] | loss 1.50
| epoch 8 |  iter 1201 / 9295 | time 4777[s] | loss 1.50
| epoch 8 |  iter 1221 / 9295 | time 4778[s] | loss 1.52
| epoch 8 |  iter 1241 / 9295 | time 4779[s] | loss 1.50
| epoch 8 |  iter 1261 / 9295 | time 4780[s] | loss 1.52
| epoch 8 |  iter 1281 / 9295 | time 4782[s] | loss 1.51
| epoch 8 |  iter 1301 / 9295 | time 4783[s] | loss 1.49
| epoch 8 |  iter 1321 / 9295 | time 4784[s] | loss 1.52
| epoch 8 |  iter 1341 / 9295 | time 4786[s] | loss 1.52
| epoch 8 |  iter 1361 / 9295 | time 4787[s] | loss 1.50
| epoch 8 |  iter 1381 / 9295 | time 4788[s] | loss 1.53
| epoch 8 |  iter 1401 / 9295 | time 4790[s] | loss 1.48
| epoch 8 |  iter 1421 / 9295 | time 4791[s] | loss 1.52
| epoch 8 |  iter 1441 / 9295 | time 4792[s] | loss 1.49
| epoch 8 |  iter 1461 / 9295 | time 4793[s] | loss 1.51
| epoch 8 |  iter 1481 / 9295 | time 4795[s] | loss 1.51
| epoch 8 |  iter 1501 / 9295 | time 4796[s] | loss 1.54
| epoch 8 |  iter 1521 / 9295 | time 4797[s] | loss 1.52
| epoch 8 |  iter 1541 / 9295 | time 4799[s] | loss 1.56
| epoch 8 |  iter 1561 / 9295 | time 4800[s] | loss 1.55
| epoch 8 |  iter 1581 / 9295 | time 4801[s] | loss 1.52
| epoch 8 |  iter 1601 / 9295 | time 4802[s] | loss 1.55
| epoch 8 |  iter 1621 / 9295 | time 4804[s] | loss 1.53
| epoch 8 |  iter 1641 / 9295 | time 4805[s] | loss 1.54
| epoch 8 |  iter 1661 / 9295 | time 4806[s] | loss 1.53
| epoch 8 |  iter 1681 / 9295 | time 4807[s] | loss 1.52
| epoch 8 |  iter 1701 / 9295 | time 4809[s] | loss 1.51
| epoch 8 |  iter 1721 / 9295 | time 4810[s] | loss 1.51
| epoch 8 |  iter 1741 / 9295 | time 4811[s] | loss 1.51
| epoch 8 |  iter 1761 / 9295 | time 4813[s] | loss 1.51
| epoch 8 |  iter 1781 / 9295 | time 4814[s] | loss 1.54
| epoch 8 |  iter 1801 / 9295 | time 4815[s] | loss 1.52
| epoch 8 |  iter 1821 / 9295 | time 4817[s] | loss 1.51
| epoch 8 |  iter 1841 / 9295 | time 4818[s] | loss 1.55
| epoch 8 |  iter 1861 / 9295 | time 4819[s] | loss 1.56
| epoch 8 |  iter 1881 / 9295 | time 4820[s] | loss 1.53
| epoch 8 |  iter 1901 / 9295 | time 4822[s] | loss 1.50
| epoch 8 |  iter 1921 / 9295 | time 4823[s] | loss 1.54
| epoch 8 |  iter 1941 / 9295 | time 4824[s] | loss 1.49
| epoch 8 |  iter 1961 / 9295 | time 4826[s] | loss 1.57
| epoch 8 |  iter 1981 / 9295 | time 4827[s] | loss 1.49
| epoch 8 |  iter 2001 / 9295 | time 4828[s] | loss 1.49
| epoch 8 |  iter 2021 / 9295 | time 4830[s] | loss 1.51
| epoch 8 |  iter 2041 / 9295 | time 4831[s] | loss 1.50
| epoch 8 |  iter 2061 / 9295 | time 4833[s] | loss 1.57
| epoch 8 |  iter 2081 / 9295 | time 4834[s] | loss 1.49
| epoch 8 |  iter 2101 / 9295 | time 4835[s] | loss 1.53
| epoch 8 |  iter 2121 / 9295 | time 4836[s] | loss 1.54
| epoch 8 |  iter 2141 / 9295 | time 4838[s] | loss 1.55
| epoch 8 |  iter 2161 / 9295 | time 4839[s] | loss 1.53
| epoch 8 |  iter 2181 / 9295 | time 4840[s] | loss 1.54
| epoch 8 |  iter 2201 / 9295 | time 4842[s] | loss 1.55
| epoch 8 |  iter 2221 / 9295 | time 4843[s] | loss 1.49
| epoch 8 |  iter 2241 / 9295 | time 4844[s] | loss 1.56
| epoch 8 |  iter 2261 / 9295 | time 4845[s] | loss 1.52
| epoch 8 |  iter 2281 / 9295 | time 4847[s] | loss 1.51
| epoch 8 |  iter 2301 / 9295 | time 4848[s] | loss 1.54
| epoch 8 |  iter 2321 / 9295 | time 4849[s] | loss 1.49
| epoch 8 |  iter 2341 / 9295 | time 4851[s] | loss 1.54
| epoch 8 |  iter 2361 / 9295 | time 4852[s] | loss 1.51
| epoch 8 |  iter 2381 / 9295 | time 4853[s] | loss 1.53
| epoch 8 |  iter 2401 / 9295 | time 4854[s] | loss 1.50
| epoch 8 |  iter 2421 / 9295 | time 4856[s] | loss 1.51
| epoch 8 |  iter 2441 / 9295 | time 4857[s] | loss 1.52
| epoch 8 |  iter 2461 / 9295 | time 4858[s] | loss 1.55
| epoch 8 |  iter 2481 / 9295 | time 4860[s] | loss 1.61
| epoch 8 |  iter 2501 / 9295 | time 4861[s] | loss 1.55
| epoch 8 |  iter 2521 / 9295 | time 4862[s] | loss 1.54
| epoch 8 |  iter 2541 / 9295 | time 4863[s] | loss 1.49
| epoch 8 |  iter 2561 / 9295 | time 4865[s] | loss 1.52
| epoch 8 |  iter 2581 / 9295 | time 4866[s] | loss 1.54
| epoch 8 |  iter 2601 / 9295 | time 4867[s] | loss 1.54
| epoch 8 |  iter 2621 / 9295 | time 4869[s] | loss 1.53
| epoch 8 |  iter 2641 / 9295 | time 4870[s] | loss 1.55
| epoch 8 |  iter 2661 / 9295 | time 4871[s] | loss 1.53
| epoch 8 |  iter 2681 / 9295 | time 4872[s] | loss 1.53
| epoch 8 |  iter 2701 / 9295 | time 4874[s] | loss 1.53
| epoch 8 |  iter 2721 / 9295 | time 4875[s] | loss 1.52
| epoch 8 |  iter 2741 / 9295 | time 4877[s] | loss 1.55
| epoch 8 |  iter 2761 / 9295 | time 4878[s] | loss 1.52
| epoch 8 |  iter 2781 / 9295 | time 4880[s] | loss 1.53
| epoch 8 |  iter 2801 / 9295 | time 4881[s] | loss 1.52
| epoch 8 |  iter 2821 / 9295 | time 4882[s] | loss 1.53
| epoch 8 |  iter 2841 / 9295 | time 4884[s] | loss 1.57
| epoch 8 |  iter 2861 / 9295 | time 4885[s] | loss 1.58
| epoch 8 |  iter 2881 / 9295 | time 4887[s] | loss 1.54
| epoch 8 |  iter 2901 / 9295 | time 4888[s] | loss 1.51
| epoch 8 |  iter 2921 / 9295 | time 4890[s] | loss 1.51
| epoch 8 |  iter 2941 / 9295 | time 4891[s] | loss 1.55
| epoch 8 |  iter 2961 / 9295 | time 4893[s] | loss 1.52
| epoch 8 |  iter 2981 / 9295 | time 4894[s] | loss 1.53
| epoch 8 |  iter 3001 / 9295 | time 4896[s] | loss 1.55
| epoch 8 |  iter 3021 / 9295 | time 4897[s] | loss 1.57
| epoch 8 |  iter 3041 / 9295 | time 4898[s] | loss 1.52
| epoch 8 |  iter 3061 / 9295 | time 4900[s] | loss 1.53
| epoch 8 |  iter 3081 / 9295 | time 4901[s] | loss 1.54
| epoch 8 |  iter 3101 / 9295 | time 4902[s] | loss 1.54
| epoch 8 |  iter 3121 / 9295 | time 4903[s] | loss 1.52
| epoch 8 |  iter 3141 / 9295 | time 4905[s] | loss 1.49
| epoch 8 |  iter 3161 / 9295 | time 4906[s] | loss 1.54
| epoch 8 |  iter 3181 / 9295 | time 4907[s] | loss 1.51
| epoch 8 |  iter 3201 / 9295 | time 4909[s] | loss 1.51
| epoch 8 |  iter 3221 / 9295 | time 4910[s] | loss 1.53
| epoch 8 |  iter 3241 / 9295 | time 4911[s] | loss 1.54
| epoch 8 |  iter 3261 / 9295 | time 4913[s] | loss 1.56
| epoch 8 |  iter 3281 / 9295 | time 4914[s] | loss 1.54
| epoch 8 |  iter 3301 / 9295 | time 4915[s] | loss 1.57
| epoch 8 |  iter 3321 / 9295 | time 4917[s] | loss 1.55
| epoch 8 |  iter 3341 / 9295 | time 4918[s] | loss 1.52
| epoch 8 |  iter 3361 / 9295 | time 4919[s] | loss 1.49
| epoch 8 |  iter 3381 / 9295 | time 4921[s] | loss 1.54
| epoch 8 |  iter 3401 / 9295 | time 4922[s] | loss 1.51
| epoch 8 |  iter 3421 / 9295 | time 4924[s] | loss 1.56
| epoch 8 |  iter 3441 / 9295 | time 4925[s] | loss 1.55
| epoch 8 |  iter 3461 / 9295 | time 4926[s] | loss 1.56
| epoch 8 |  iter 3481 / 9295 | time 4928[s] | loss 1.54
| epoch 8 |  iter 3501 / 9295 | time 4929[s] | loss 1.54
| epoch 8 |  iter 3521 / 9295 | time 4931[s] | loss 1.53
| epoch 8 |  iter 3541 / 9295 | time 4932[s] | loss 1.54
| epoch 8 |  iter 3561 / 9295 | time 4933[s] | loss 1.53
| epoch 8 |  iter 3581 / 9295 | time 4935[s] | loss 1.57
| epoch 8 |  iter 3601 / 9295 | time 4936[s] | loss 1.51
| epoch 8 |  iter 3621 / 9295 | time 4938[s] | loss 1.55
| epoch 8 |  iter 3641 / 9295 | time 4939[s] | loss 1.52
| epoch 8 |  iter 3661 / 9295 | time 4940[s] | loss 1.54
| epoch 8 |  iter 3681 / 9295 | time 4942[s] | loss 1.57
| epoch 8 |  iter 3701 / 9295 | time 4943[s] | loss 1.50
| epoch 8 |  iter 3721 / 9295 | time 4944[s] | loss 1.56
| epoch 8 |  iter 3741 / 9295 | time 4946[s] | loss 1.56
| epoch 8 |  iter 3761 / 9295 | time 4947[s] | loss 1.55
| epoch 8 |  iter 3781 / 9295 | time 4949[s] | loss 1.52
| epoch 8 |  iter 3801 / 9295 | time 4950[s] | loss 1.52
| epoch 8 |  iter 3821 / 9295 | time 4951[s] | loss 1.52
| epoch 8 |  iter 3841 / 9295 | time 4953[s] | loss 1.52
| epoch 8 |  iter 3861 / 9295 | time 4954[s] | loss 1.52
| epoch 8 |  iter 3881 / 9295 | time 4955[s] | loss 1.49
| epoch 8 |  iter 3901 / 9295 | time 4957[s] | loss 1.54
| epoch 8 |  iter 3921 / 9295 | time 4958[s] | loss 1.53
| epoch 8 |  iter 3941 / 9295 | time 4959[s] | loss 1.55
| epoch 8 |  iter 3961 / 9295 | time 4961[s] | loss 1.54
| epoch 8 |  iter 3981 / 9295 | time 4962[s] | loss 1.53
| epoch 8 |  iter 4001 / 9295 | time 4963[s] | loss 1.56
| epoch 8 |  iter 4021 / 9295 | time 4965[s] | loss 1.52
| epoch 8 |  iter 4041 / 9295 | time 4967[s] | loss 1.53
| epoch 8 |  iter 4061 / 9295 | time 4968[s] | loss 1.54
| epoch 8 |  iter 4081 / 9295 | time 4970[s] | loss 1.50
| epoch 8 |  iter 4101 / 9295 | time 4971[s] | loss 1.51
| epoch 8 |  iter 4121 / 9295 | time 4972[s] | loss 1.52
| epoch 8 |  iter 4141 / 9295 | time 4974[s] | loss 1.59
| epoch 8 |  iter 4161 / 9295 | time 4975[s] | loss 1.49
| epoch 8 |  iter 4181 / 9295 | time 4977[s] | loss 1.54
| epoch 8 |  iter 4201 / 9295 | time 4978[s] | loss 1.52
| epoch 8 |  iter 4221 / 9295 | time 4980[s] | loss 1.55
| epoch 8 |  iter 4241 / 9295 | time 4981[s] | loss 1.52
| epoch 8 |  iter 4261 / 9295 | time 4983[s] | loss 1.56
| epoch 8 |  iter 4281 / 9295 | time 4985[s] | loss 1.54
| epoch 8 |  iter 4301 / 9295 | time 4986[s] | loss 1.54
| epoch 8 |  iter 4321 / 9295 | time 4987[s] | loss 1.56
| epoch 8 |  iter 4341 / 9295 | time 4989[s] | loss 1.55
| epoch 8 |  iter 4361 / 9295 | time 4990[s] | loss 1.53
| epoch 8 |  iter 4381 / 9295 | time 4992[s] | loss 1.53
| epoch 8 |  iter 4401 / 9295 | time 4993[s] | loss 1.55
| epoch 8 |  iter 4421 / 9295 | time 4995[s] | loss 1.53
| epoch 8 |  iter 4441 / 9295 | time 4996[s] | loss 1.56
| epoch 8 |  iter 4461 / 9295 | time 4997[s] | loss 1.53
| epoch 8 |  iter 4481 / 9295 | time 4999[s] | loss 1.52
| epoch 8 |  iter 4501 / 9295 | time 5000[s] | loss 1.54
| epoch 8 |  iter 4521 / 9295 | time 5002[s] | loss 1.50
| epoch 8 |  iter 4541 / 9295 | time 5003[s] | loss 1.56
| epoch 8 |  iter 4561 / 9295 | time 5004[s] | loss 1.55
| epoch 8 |  iter 4581 / 9295 | time 5006[s] | loss 1.52
| epoch 8 |  iter 4601 / 9295 | time 5007[s] | loss 1.55
| epoch 8 |  iter 4621 / 9295 | time 5009[s] | loss 1.51
| epoch 8 |  iter 4641 / 9295 | time 5010[s] | loss 1.54
| epoch 8 |  iter 4661 / 9295 | time 5012[s] | loss 1.53
| epoch 8 |  iter 4681 / 9295 | time 5013[s] | loss 1.53
| epoch 8 |  iter 4701 / 9295 | time 5014[s] | loss 1.54
| epoch 8 |  iter 4721 / 9295 | time 5016[s] | loss 1.53
| epoch 8 |  iter 4741 / 9295 | time 5017[s] | loss 1.54
| epoch 8 |  iter 4761 / 9295 | time 5018[s] | loss 1.52
| epoch 8 |  iter 4781 / 9295 | time 5020[s] | loss 1.56
| epoch 8 |  iter 4801 / 9295 | time 5021[s] | loss 1.52
| epoch 8 |  iter 4821 / 9295 | time 5022[s] | loss 1.54
| epoch 8 |  iter 4841 / 9295 | time 5024[s] | loss 1.53
| epoch 8 |  iter 4861 / 9295 | time 5025[s] | loss 1.54
| epoch 8 |  iter 4881 / 9295 | time 5026[s] | loss 1.55
| epoch 8 |  iter 4901 / 9295 | time 5028[s] | loss 1.51
| epoch 8 |  iter 4921 / 9295 | time 5029[s] | loss 1.51
| epoch 8 |  iter 4941 / 9295 | time 5030[s] | loss 1.55
| epoch 8 |  iter 4961 / 9295 | time 5032[s] | loss 1.55
| epoch 8 |  iter 4981 / 9295 | time 5033[s] | loss 1.58
| epoch 8 |  iter 5001 / 9295 | time 5034[s] | loss 1.56
| epoch 8 |  iter 5021 / 9295 | time 5036[s] | loss 1.56
| epoch 8 |  iter 5041 / 9295 | time 5037[s] | loss 1.56
| epoch 8 |  iter 5061 / 9295 | time 5039[s] | loss 1.52
| epoch 8 |  iter 5081 / 9295 | time 5040[s] | loss 1.55
| epoch 8 |  iter 5101 / 9295 | time 5042[s] | loss 1.51
| epoch 8 |  iter 5121 / 9295 | time 5043[s] | loss 1.55
| epoch 8 |  iter 5141 / 9295 | time 5044[s] | loss 1.52
| epoch 8 |  iter 5161 / 9295 | time 5045[s] | loss 1.54
| epoch 8 |  iter 5181 / 9295 | time 5047[s] | loss 1.54
| epoch 8 |  iter 5201 / 9295 | time 5048[s] | loss 1.53
| epoch 8 |  iter 5221 / 9295 | time 5050[s] | loss 1.52
| epoch 8 |  iter 5241 / 9295 | time 5051[s] | loss 1.57
| epoch 8 |  iter 5261 / 9295 | time 5053[s] | loss 1.54
| epoch 8 |  iter 5281 / 9295 | time 5054[s] | loss 1.52
| epoch 8 |  iter 5301 / 9295 | time 5056[s] | loss 1.55
| epoch 8 |  iter 5321 / 9295 | time 5057[s] | loss 1.55
| epoch 8 |  iter 5341 / 9295 | time 5059[s] | loss 1.52
| epoch 8 |  iter 5361 / 9295 | time 5060[s] | loss 1.56
| epoch 8 |  iter 5381 / 9295 | time 5062[s] | loss 1.56
| epoch 8 |  iter 5401 / 9295 | time 5064[s] | loss 1.52
| epoch 8 |  iter 5421 / 9295 | time 5065[s] | loss 1.55
| epoch 8 |  iter 5441 / 9295 | time 5067[s] | loss 1.52
| epoch 8 |  iter 5461 / 9295 | time 5068[s] | loss 1.54
| epoch 8 |  iter 5481 / 9295 | time 5070[s] | loss 1.52
| epoch 8 |  iter 5501 / 9295 | time 5071[s] | loss 1.52
| epoch 8 |  iter 5521 / 9295 | time 5073[s] | loss 1.48
| epoch 8 |  iter 5541 / 9295 | time 5074[s] | loss 1.53
| epoch 8 |  iter 5561 / 9295 | time 5076[s] | loss 1.53
| epoch 8 |  iter 5581 / 9295 | time 5077[s] | loss 1.60
| epoch 8 |  iter 5601 / 9295 | time 5079[s] | loss 1.52
| epoch 8 |  iter 5621 / 9295 | time 5080[s] | loss 1.53
| epoch 8 |  iter 5641 / 9295 | time 5082[s] | loss 1.52
| epoch 8 |  iter 5661 / 9295 | time 5084[s] | loss 1.52
| epoch 8 |  iter 5681 / 9295 | time 5085[s] | loss 1.55
| epoch 8 |  iter 5701 / 9295 | time 5087[s] | loss 1.53
| epoch 8 |  iter 5721 / 9295 | time 5088[s] | loss 1.53
| epoch 8 |  iter 5741 / 9295 | time 5090[s] | loss 1.52
| epoch 8 |  iter 5761 / 9295 | time 5091[s] | loss 1.56
| epoch 8 |  iter 5781 / 9295 | time 5093[s] | loss 1.55
| epoch 8 |  iter 5801 / 9295 | time 5095[s] | loss 1.54
| epoch 8 |  iter 5821 / 9295 | time 5096[s] | loss 1.54
| epoch 8 |  iter 5841 / 9295 | time 5097[s] | loss 1.58
| epoch 8 |  iter 5861 / 9295 | time 5099[s] | loss 1.53
| epoch 8 |  iter 5881 / 9295 | time 5100[s] | loss 1.52
| epoch 8 |  iter 5901 / 9295 | time 5101[s] | loss 1.55
| epoch 8 |  iter 5921 / 9295 | time 5103[s] | loss 1.55
| epoch 8 |  iter 5941 / 9295 | time 5104[s] | loss 1.55
| epoch 8 |  iter 5961 / 9295 | time 5105[s] | loss 1.54
| epoch 8 |  iter 5981 / 9295 | time 5107[s] | loss 1.56
| epoch 8 |  iter 6001 / 9295 | time 5108[s] | loss 1.54
| epoch 8 |  iter 6021 / 9295 | time 5110[s] | loss 1.56
| epoch 8 |  iter 6041 / 9295 | time 5111[s] | loss 1.54
| epoch 8 |  iter 6061 / 9295 | time 5113[s] | loss 1.57
| epoch 8 |  iter 6081 / 9295 | time 5114[s] | loss 1.56
| epoch 8 |  iter 6101 / 9295 | time 5116[s] | loss 1.50
| epoch 8 |  iter 6121 / 9295 | time 5117[s] | loss 1.51
| epoch 8 |  iter 6141 / 9295 | time 5119[s] | loss 1.52
| epoch 8 |  iter 6161 / 9295 | time 5120[s] | loss 1.54
| epoch 8 |  iter 6181 / 9295 | time 5122[s] | loss 1.55
| epoch 8 |  iter 6201 / 9295 | time 5123[s] | loss 1.55
| epoch 8 |  iter 6221 / 9295 | time 5124[s] | loss 1.55
| epoch 8 |  iter 6241 / 9295 | time 5126[s] | loss 1.54
| epoch 8 |  iter 6261 / 9295 | time 5127[s] | loss 1.55
| epoch 8 |  iter 6281 / 9295 | time 5129[s] | loss 1.52
| epoch 8 |  iter 6301 / 9295 | time 5130[s] | loss 1.54
| epoch 8 |  iter 6321 / 9295 | time 5132[s] | loss 1.54
| epoch 8 |  iter 6341 / 9295 | time 5133[s] | loss 1.52
| epoch 8 |  iter 6361 / 9295 | time 5135[s] | loss 1.55
| epoch 8 |  iter 6381 / 9295 | time 5136[s] | loss 1.55
| epoch 8 |  iter 6401 / 9295 | time 5137[s] | loss 1.52
| epoch 8 |  iter 6421 / 9295 | time 5139[s] | loss 1.54
| epoch 8 |  iter 6441 / 9295 | time 5140[s] | loss 1.50
| epoch 8 |  iter 6461 / 9295 | time 5142[s] | loss 1.51
| epoch 8 |  iter 6481 / 9295 | time 5143[s] | loss 1.54
| epoch 8 |  iter 6501 / 9295 | time 5145[s] | loss 1.52
| epoch 8 |  iter 6521 / 9295 | time 5146[s] | loss 1.56
| epoch 8 |  iter 6541 / 9295 | time 5147[s] | loss 1.52
| epoch 8 |  iter 6561 / 9295 | time 5149[s] | loss 1.52
| epoch 8 |  iter 6581 / 9295 | time 5151[s] | loss 1.56
| epoch 8 |  iter 6601 / 9295 | time 5152[s] | loss 1.56
| epoch 8 |  iter 6621 / 9295 | time 5154[s] | loss 1.52
| epoch 8 |  iter 6641 / 9295 | time 5155[s] | loss 1.52
| epoch 8 |  iter 6661 / 9295 | time 5157[s] | loss 1.54
| epoch 8 |  iter 6681 / 9295 | time 5158[s] | loss 1.52
| epoch 8 |  iter 6701 / 9295 | time 5160[s] | loss 1.53
| epoch 8 |  iter 6721 / 9295 | time 5161[s] | loss 1.54
| epoch 8 |  iter 6741 / 9295 | time 5163[s] | loss 1.50
| epoch 8 |  iter 6761 / 9295 | time 5164[s] | loss 1.53
| epoch 8 |  iter 6781 / 9295 | time 5166[s] | loss 1.58
| epoch 8 |  iter 6801 / 9295 | time 5167[s] | loss 1.53
| epoch 8 |  iter 6821 / 9295 | time 5169[s] | loss 1.57
| epoch 8 |  iter 6841 / 9295 | time 5170[s] | loss 1.54
| epoch 8 |  iter 6861 / 9295 | time 5172[s] | loss 1.51
| epoch 8 |  iter 6881 / 9295 | time 5173[s] | loss 1.53
| epoch 8 |  iter 6901 / 9295 | time 5175[s] | loss 1.57
| epoch 8 |  iter 6921 / 9295 | time 5176[s] | loss 1.55
| epoch 8 |  iter 6941 / 9295 | time 5179[s] | loss 1.54
| epoch 8 |  iter 6961 / 9295 | time 5180[s] | loss 1.54
| epoch 8 |  iter 6981 / 9295 | time 5182[s] | loss 1.55
| epoch 8 |  iter 7001 / 9295 | time 5184[s] | loss 1.50
| epoch 8 |  iter 7021 / 9295 | time 5185[s] | loss 1.51
| epoch 8 |  iter 7041 / 9295 | time 5187[s] | loss 1.55
| epoch 8 |  iter 7061 / 9295 | time 5189[s] | loss 1.55
| epoch 8 |  iter 7081 / 9295 | time 5190[s] | loss 1.54
| epoch 8 |  iter 7101 / 9295 | time 5192[s] | loss 1.53
| epoch 8 |  iter 7121 / 9295 | time 5194[s] | loss 1.54
| epoch 8 |  iter 7141 / 9295 | time 5195[s] | loss 1.53
| epoch 8 |  iter 7161 / 9295 | time 5197[s] | loss 1.53
| epoch 8 |  iter 7181 / 9295 | time 5198[s] | loss 1.51
| epoch 8 |  iter 7201 / 9295 | time 5200[s] | loss 1.55
| epoch 8 |  iter 7221 / 9295 | time 5201[s] | loss 1.53
| epoch 8 |  iter 7241 / 9295 | time 5203[s] | loss 1.56
| epoch 8 |  iter 7261 / 9295 | time 5205[s] | loss 1.59
| epoch 8 |  iter 7281 / 9295 | time 5207[s] | loss 1.49
| epoch 8 |  iter 7301 / 9295 | time 5208[s] | loss 1.55
| epoch 8 |  iter 7321 / 9295 | time 5210[s] | loss 1.57
| epoch 8 |  iter 7341 / 9295 | time 5211[s] | loss 1.54
| epoch 8 |  iter 7361 / 9295 | time 5213[s] | loss 1.51
| epoch 8 |  iter 7381 / 9295 | time 5214[s] | loss 1.54
| epoch 8 |  iter 7401 / 9295 | time 5216[s] | loss 1.57
| epoch 8 |  iter 7421 / 9295 | time 5217[s] | loss 1.55
| epoch 8 |  iter 7441 / 9295 | time 5218[s] | loss 1.62
| epoch 8 |  iter 7461 / 9295 | time 5220[s] | loss 1.55
| epoch 8 |  iter 7481 / 9295 | time 5221[s] | loss 1.57
| epoch 8 |  iter 7501 / 9295 | time 5222[s] | loss 1.59
| epoch 8 |  iter 7521 / 9295 | time 5224[s] | loss 1.53
| epoch 8 |  iter 7541 / 9295 | time 5225[s] | loss 1.52
| epoch 8 |  iter 7561 / 9295 | time 5226[s] | loss 1.52
| epoch 8 |  iter 7581 / 9295 | time 5228[s] | loss 1.54
| epoch 8 |  iter 7601 / 9295 | time 5229[s] | loss 1.52
| epoch 8 |  iter 7621 / 9295 | time 5230[s] | loss 1.57
| epoch 8 |  iter 7641 / 9295 | time 5232[s] | loss 1.50
| epoch 8 |  iter 7661 / 9295 | time 5234[s] | loss 1.57
| epoch 8 |  iter 7681 / 9295 | time 5235[s] | loss 1.59
| epoch 8 |  iter 7701 / 9295 | time 5236[s] | loss 1.52
| epoch 8 |  iter 7721 / 9295 | time 5238[s] | loss 1.54
| epoch 8 |  iter 7741 / 9295 | time 5240[s] | loss 1.55
| epoch 8 |  iter 7761 / 9295 | time 5241[s] | loss 1.48
| epoch 8 |  iter 7781 / 9295 | time 5243[s] | loss 1.55
| epoch 8 |  iter 7801 / 9295 | time 5244[s] | loss 1.51
| epoch 8 |  iter 7821 / 9295 | time 5246[s] | loss 1.56
| epoch 8 |  iter 7841 / 9295 | time 5247[s] | loss 1.53
| epoch 8 |  iter 7861 / 9295 | time 5249[s] | loss 1.57
| epoch 8 |  iter 7881 / 9295 | time 5250[s] | loss 1.50
| epoch 8 |  iter 7901 / 9295 | time 5252[s] | loss 1.53
| epoch 8 |  iter 7921 / 9295 | time 5253[s] | loss 1.56
| epoch 8 |  iter 7941 / 9295 | time 5254[s] | loss 1.55
| epoch 8 |  iter 7961 / 9295 | time 5256[s] | loss 1.55
| epoch 8 |  iter 7981 / 9295 | time 5257[s] | loss 1.57
| epoch 8 |  iter 8001 / 9295 | time 5258[s] | loss 1.50
| epoch 8 |  iter 8021 / 9295 | time 5260[s] | loss 1.54
| epoch 8 |  iter 8041 / 9295 | time 5261[s] | loss 1.55
| epoch 8 |  iter 8061 / 9295 | time 5262[s] | loss 1.57
| epoch 8 |  iter 8081 / 9295 | time 5264[s] | loss 1.55
| epoch 8 |  iter 8101 / 9295 | time 5265[s] | loss 1.56
| epoch 8 |  iter 8121 / 9295 | time 5266[s] | loss 1.57
| epoch 8 |  iter 8141 / 9295 | time 5268[s] | loss 1.56
| epoch 8 |  iter 8161 / 9295 | time 5269[s] | loss 1.50
| epoch 8 |  iter 8181 / 9295 | time 5271[s] | loss 1.55
| epoch 8 |  iter 8201 / 9295 | time 5272[s] | loss 1.53
| epoch 8 |  iter 8221 / 9295 | time 5273[s] | loss 1.52
| epoch 8 |  iter 8241 / 9295 | time 5275[s] | loss 1.56
| epoch 8 |  iter 8261 / 9295 | time 5276[s] | loss 1.53
| epoch 8 |  iter 8281 / 9295 | time 5277[s] | loss 1.51
| epoch 8 |  iter 8301 / 9295 | time 5279[s] | loss 1.49
| epoch 8 |  iter 8321 / 9295 | time 5280[s] | loss 1.56
| epoch 8 |  iter 8341 / 9295 | time 5282[s] | loss 1.52
| epoch 8 |  iter 8361 / 9295 | time 5283[s] | loss 1.58
| epoch 8 |  iter 8381 / 9295 | time 5285[s] | loss 1.54
| epoch 8 |  iter 8401 / 9295 | time 5287[s] | loss 1.53
| epoch 8 |  iter 8421 / 9295 | time 5288[s] | loss 1.52
| epoch 8 |  iter 8441 / 9295 | time 5289[s] | loss 1.54
| epoch 8 |  iter 8461 / 9295 | time 5291[s] | loss 1.53
| epoch 8 |  iter 8481 / 9295 | time 5292[s] | loss 1.55
| epoch 8 |  iter 8501 / 9295 | time 5293[s] | loss 1.54
| epoch 8 |  iter 8521 / 9295 | time 5295[s] | loss 1.55
| epoch 8 |  iter 8541 / 9295 | time 5296[s] | loss 1.52
| epoch 8 |  iter 8561 / 9295 | time 5298[s] | loss 1.54
| epoch 8 |  iter 8581 / 9295 | time 5299[s] | loss 1.51
| epoch 8 |  iter 8601 / 9295 | time 5301[s] | loss 1.59
| epoch 8 |  iter 8621 / 9295 | time 5302[s] | loss 1.58
| epoch 8 |  iter 8641 / 9295 | time 5303[s] | loss 1.58
| epoch 8 |  iter 8661 / 9295 | time 5305[s] | loss 1.54
| epoch 8 |  iter 8681 / 9295 | time 5306[s] | loss 1.51
| epoch 8 |  iter 8701 / 9295 | time 5308[s] | loss 1.53
| epoch 8 |  iter 8721 / 9295 | time 5309[s] | loss 1.54
| epoch 8 |  iter 8741 / 9295 | time 5310[s] | loss 1.57
| epoch 8 |  iter 8761 / 9295 | time 5312[s] | loss 1.53
| epoch 8 |  iter 8781 / 9295 | time 5313[s] | loss 1.52
| epoch 8 |  iter 8801 / 9295 | time 5314[s] | loss 1.54
| epoch 8 |  iter 8821 / 9295 | time 5316[s] | loss 1.53
| epoch 8 |  iter 8841 / 9295 | time 5317[s] | loss 1.55
| epoch 8 |  iter 8861 / 9295 | time 5319[s] | loss 1.52
| epoch 8 |  iter 8881 / 9295 | time 5320[s] | loss 1.53
| epoch 8 |  iter 8901 / 9295 | time 5321[s] | loss 1.54
| epoch 8 |  iter 8921 / 9295 | time 5323[s] | loss 1.59
| epoch 8 |  iter 8941 / 9295 | time 5324[s] | loss 1.55
| epoch 8 |  iter 8961 / 9295 | time 5325[s] | loss 1.57
| epoch 8 |  iter 8981 / 9295 | time 5327[s] | loss 1.55
| epoch 8 |  iter 9001 / 9295 | time 5328[s] | loss 1.52
| epoch 8 |  iter 9021 / 9295 | time 5329[s] | loss 1.56
| epoch 8 |  iter 9041 / 9295 | time 5331[s] | loss 1.53
| epoch 8 |  iter 9061 / 9295 | time 5332[s] | loss 1.54
| epoch 8 |  iter 9081 / 9295 | time 5334[s] | loss 1.56
| epoch 8 |  iter 9101 / 9295 | time 5335[s] | loss 1.54
| epoch 8 |  iter 9121 / 9295 | time 5336[s] | loss 1.56
| epoch 8 |  iter 9141 / 9295 | time 5338[s] | loss 1.56
| epoch 8 |  iter 9161 / 9295 | time 5339[s] | loss 1.51
| epoch 8 |  iter 9181 / 9295 | time 5340[s] | loss 1.53
| epoch 8 |  iter 9201 / 9295 | time 5342[s] | loss 1.56
| epoch 8 |  iter 9221 / 9295 | time 5343[s] | loss 1.58
| epoch 8 |  iter 9241 / 9295 | time 5344[s] | loss 1.54
| epoch 8 |  iter 9261 / 9295 | time 5346[s] | loss 1.54
| epoch 8 |  iter 9281 / 9295 | time 5347[s] | loss 1.56
| epoch 9 |  iter 1 / 9295 | time 5348[s] | loss 1.51
| epoch 9 |  iter 21 / 9295 | time 5349[s] | loss 1.47
| epoch 9 |  iter 41 / 9295 | time 5351[s] | loss 1.46
| epoch 9 |  iter 61 / 9295 | time 5352[s] | loss 1.50
| epoch 9 |  iter 81 / 9295 | time 5353[s] | loss 1.45
| epoch 9 |  iter 101 / 9295 | time 5355[s] | loss 1.44
| epoch 9 |  iter 121 / 9295 | time 5356[s] | loss 1.43
| epoch 9 |  iter 141 / 9295 | time 5358[s] | loss 1.47
| epoch 9 |  iter 161 / 9295 | time 5359[s] | loss 1.46
| epoch 9 |  iter 181 / 9295 | time 5361[s] | loss 1.46
| epoch 9 |  iter 201 / 9295 | time 5362[s] | loss 1.46
| epoch 9 |  iter 221 / 9295 | time 5364[s] | loss 1.49
| epoch 9 |  iter 241 / 9295 | time 5365[s] | loss 1.45
| epoch 9 |  iter 261 / 9295 | time 5367[s] | loss 1.50
| epoch 9 |  iter 281 / 9295 | time 5368[s] | loss 1.42
| epoch 9 |  iter 301 / 9295 | time 5370[s] | loss 1.47
| epoch 9 |  iter 321 / 9295 | time 5371[s] | loss 1.47
| epoch 9 |  iter 341 / 9295 | time 5373[s] | loss 1.49
| epoch 9 |  iter 361 / 9295 | time 5374[s] | loss 1.44
| epoch 9 |  iter 381 / 9295 | time 5376[s] | loss 1.45
| epoch 9 |  iter 401 / 9295 | time 5377[s] | loss 1.42
| epoch 9 |  iter 421 / 9295 | time 5379[s] | loss 1.42
| epoch 9 |  iter 441 / 9295 | time 5380[s] | loss 1.47
| epoch 9 |  iter 461 / 9295 | time 5382[s] | loss 1.40
| epoch 9 |  iter 481 / 9295 | time 5383[s] | loss 1.46
| epoch 9 |  iter 501 / 9295 | time 5385[s] | loss 1.45
| epoch 9 |  iter 521 / 9295 | time 5386[s] | loss 1.48
| epoch 9 |  iter 541 / 9295 | time 5387[s] | loss 1.45
| epoch 9 |  iter 561 / 9295 | time 5389[s] | loss 1.45
| epoch 9 |  iter 581 / 9295 | time 5390[s] | loss 1.47
| epoch 9 |  iter 601 / 9295 | time 5391[s] | loss 1.51
| epoch 9 |  iter 621 / 9295 | time 5393[s] | loss 1.46
| epoch 9 |  iter 641 / 9295 | time 5394[s] | loss 1.50
| epoch 9 |  iter 661 / 9295 | time 5395[s] | loss 1.46
| epoch 9 |  iter 681 / 9295 | time 5397[s] | loss 1.51
| epoch 9 |  iter 701 / 9295 | time 5398[s] | loss 1.46
| epoch 9 |  iter 721 / 9295 | time 5399[s] | loss 1.46
| epoch 9 |  iter 741 / 9295 | time 5400[s] | loss 1.43
| epoch 9 |  iter 761 / 9295 | time 5402[s] | loss 1.47
| epoch 9 |  iter 781 / 9295 | time 5403[s] | loss 1.46
| epoch 9 |  iter 801 / 9295 | time 5404[s] | loss 1.51
| epoch 9 |  iter 821 / 9295 | time 5406[s] | loss 1.48
| epoch 9 |  iter 841 / 9295 | time 5407[s] | loss 1.52
| epoch 9 |  iter 861 / 9295 | time 5408[s] | loss 1.47
| epoch 9 |  iter 881 / 9295 | time 5410[s] | loss 1.46
| epoch 9 |  iter 901 / 9295 | time 5411[s] | loss 1.46
| epoch 9 |  iter 921 / 9295 | time 5412[s] | loss 1.43
| epoch 9 |  iter 941 / 9295 | time 5413[s] | loss 1.47
| epoch 9 |  iter 961 / 9295 | time 5415[s] | loss 1.43
| epoch 9 |  iter 981 / 9295 | time 5416[s] | loss 1.48
| epoch 9 |  iter 1001 / 9295 | time 5417[s] | loss 1.49
| epoch 9 |  iter 1021 / 9295 | time 5419[s] | loss 1.48
| epoch 9 |  iter 1041 / 9295 | time 5420[s] | loss 1.46
| epoch 9 |  iter 1061 / 9295 | time 5421[s] | loss 1.46
| epoch 9 |  iter 1081 / 9295 | time 5422[s] | loss 1.46
| epoch 9 |  iter 1101 / 9295 | time 5424[s] | loss 1.48
| epoch 9 |  iter 1121 / 9295 | time 5425[s] | loss 1.46
| epoch 9 |  iter 1141 / 9295 | time 5426[s] | loss 1.49
| epoch 9 |  iter 1161 / 9295 | time 5428[s] | loss 1.48
| epoch 9 |  iter 1181 / 9295 | time 5429[s] | loss 1.46
| epoch 9 |  iter 1201 / 9295 | time 5430[s] | loss 1.47
| epoch 9 |  iter 1221 / 9295 | time 5431[s] | loss 1.46
| epoch 9 |  iter 1241 / 9295 | time 5433[s] | loss 1.46
| epoch 9 |  iter 1261 / 9295 | time 5434[s] | loss 1.50
| epoch 9 |  iter 1281 / 9295 | time 5435[s] | loss 1.45
| epoch 9 |  iter 1301 / 9295 | time 5437[s] | loss 1.46
| epoch 9 |  iter 1321 / 9295 | time 5438[s] | loss 1.43
| epoch 9 |  iter 1341 / 9295 | time 5439[s] | loss 1.50
| epoch 9 |  iter 1361 / 9295 | time 5441[s] | loss 1.47
| epoch 9 |  iter 1381 / 9295 | time 5442[s] | loss 1.48
| epoch 9 |  iter 1401 / 9295 | time 5443[s] | loss 1.45
| epoch 9 |  iter 1421 / 9295 | time 5444[s] | loss 1.46
| epoch 9 |  iter 1441 / 9295 | time 5446[s] | loss 1.47
| epoch 9 |  iter 1461 / 9295 | time 5447[s] | loss 1.46
| epoch 9 |  iter 1481 / 9295 | time 5448[s] | loss 1.45
| epoch 9 |  iter 1501 / 9295 | time 5450[s] | loss 1.45
| epoch 9 |  iter 1521 / 9295 | time 5451[s] | loss 1.48
| epoch 9 |  iter 1541 / 9295 | time 5452[s] | loss 1.47
| epoch 9 |  iter 1561 / 9295 | time 5453[s] | loss 1.46
| epoch 9 |  iter 1581 / 9295 | time 5455[s] | loss 1.52
| epoch 9 |  iter 1601 / 9295 | time 5456[s] | loss 1.48
| epoch 9 |  iter 1621 / 9295 | time 5457[s] | loss 1.48
| epoch 9 |  iter 1641 / 9295 | time 5459[s] | loss 1.43
| epoch 9 |  iter 1661 / 9295 | time 5460[s] | loss 1.48
| epoch 9 |  iter 1681 / 9295 | time 5461[s] | loss 1.47
| epoch 9 |  iter 1701 / 9295 | time 5462[s] | loss 1.48
| epoch 9 |  iter 1721 / 9295 | time 5464[s] | loss 1.46
| epoch 9 |  iter 1741 / 9295 | time 5465[s] | loss 1.47
| epoch 9 |  iter 1761 / 9295 | time 5466[s] | loss 1.48
| epoch 9 |  iter 1781 / 9295 | time 5467[s] | loss 1.44
| epoch 9 |  iter 1801 / 9295 | time 5469[s] | loss 1.52
| epoch 9 |  iter 1821 / 9295 | time 5470[s] | loss 1.48
| epoch 9 |  iter 1841 / 9295 | time 5471[s] | loss 1.46
| epoch 9 |  iter 1861 / 9295 | time 5473[s] | loss 1.46
| epoch 9 |  iter 1881 / 9295 | time 5474[s] | loss 1.48
| epoch 9 |  iter 1901 / 9295 | time 5475[s] | loss 1.49
| epoch 9 |  iter 1921 / 9295 | time 5476[s] | loss 1.46
| epoch 9 |  iter 1941 / 9295 | time 5478[s] | loss 1.47
| epoch 9 |  iter 1961 / 9295 | time 5479[s] | loss 1.48
| epoch 9 |  iter 1981 / 9295 | time 5480[s] | loss 1.47
| epoch 9 |  iter 2001 / 9295 | time 5482[s] | loss 1.43
| epoch 9 |  iter 2021 / 9295 | time 5483[s] | loss 1.46
| epoch 9 |  iter 2041 / 9295 | time 5484[s] | loss 1.49
| epoch 9 |  iter 2061 / 9295 | time 5485[s] | loss 1.46
| epoch 9 |  iter 2081 / 9295 | time 5487[s] | loss 1.47
| epoch 9 |  iter 2101 / 9295 | time 5488[s] | loss 1.43
| epoch 9 |  iter 2121 / 9295 | time 5489[s] | loss 1.50
| epoch 9 |  iter 2141 / 9295 | time 5490[s] | loss 1.45
| epoch 9 |  iter 2161 / 9295 | time 5492[s] | loss 1.52
| epoch 9 |  iter 2181 / 9295 | time 5493[s] | loss 1.47
| epoch 9 |  iter 2201 / 9295 | time 5494[s] | loss 1.50
| epoch 9 |  iter 2221 / 9295 | time 5496[s] | loss 1.47
| epoch 9 |  iter 2241 / 9295 | time 5497[s] | loss 1.43
| epoch 9 |  iter 2261 / 9295 | time 5498[s] | loss 1.45
| epoch 9 |  iter 2281 / 9295 | time 5499[s] | loss 1.49
| epoch 9 |  iter 2301 / 9295 | time 5501[s] | loss 1.48
| epoch 9 |  iter 2321 / 9295 | time 5502[s] | loss 1.48
| epoch 9 |  iter 2341 / 9295 | time 5503[s] | loss 1.51
| epoch 9 |  iter 2361 / 9295 | time 5504[s] | loss 1.44
| epoch 9 |  iter 2381 / 9295 | time 5506[s] | loss 1.49
| epoch 9 |  iter 2401 / 9295 | time 5507[s] | loss 1.50
| epoch 9 |  iter 2421 / 9295 | time 5508[s] | loss 1.49
| epoch 9 |  iter 2441 / 9295 | time 5509[s] | loss 1.55
| epoch 9 |  iter 2461 / 9295 | time 5511[s] | loss 1.52
| epoch 9 |  iter 2481 / 9295 | time 5512[s] | loss 1.48
| epoch 9 |  iter 2501 / 9295 | time 5513[s] | loss 1.50
| epoch 9 |  iter 2521 / 9295 | time 5515[s] | loss 1.45
| epoch 9 |  iter 2541 / 9295 | time 5516[s] | loss 1.45
| epoch 9 |  iter 2561 / 9295 | time 5517[s] | loss 1.44
| epoch 9 |  iter 2581 / 9295 | time 5518[s] | loss 1.50
| epoch 9 |  iter 2601 / 9295 | time 5520[s] | loss 1.43
| epoch 9 |  iter 2621 / 9295 | time 5521[s] | loss 1.51
| epoch 9 |  iter 2641 / 9295 | time 5522[s] | loss 1.48
| epoch 9 |  iter 2661 / 9295 | time 5523[s] | loss 1.48
| epoch 9 |  iter 2681 / 9295 | time 5525[s] | loss 1.44
| epoch 9 |  iter 2701 / 9295 | time 5526[s] | loss 1.50
| epoch 9 |  iter 2721 / 9295 | time 5527[s] | loss 1.48
| epoch 9 |  iter 2741 / 9295 | time 5528[s] | loss 1.52
| epoch 9 |  iter 2761 / 9295 | time 5530[s] | loss 1.50
| epoch 9 |  iter 2781 / 9295 | time 5531[s] | loss 1.49
| epoch 9 |  iter 2801 / 9295 | time 5532[s] | loss 1.49
| epoch 9 |  iter 2821 / 9295 | time 5534[s] | loss 1.50
| epoch 9 |  iter 2841 / 9295 | time 5535[s] | loss 1.47
| epoch 9 |  iter 2861 / 9295 | time 5536[s] | loss 1.47
| epoch 9 |  iter 2881 / 9295 | time 5537[s] | loss 1.52
| epoch 9 |  iter 2901 / 9295 | time 5539[s] | loss 1.51
| epoch 9 |  iter 2921 / 9295 | time 5540[s] | loss 1.50
| epoch 9 |  iter 2941 / 9295 | time 5541[s] | loss 1.50
| epoch 9 |  iter 2961 / 9295 | time 5542[s] | loss 1.44
| epoch 9 |  iter 2981 / 9295 | time 5544[s] | loss 1.48
| epoch 9 |  iter 3001 / 9295 | time 5545[s] | loss 1.51
| epoch 9 |  iter 3021 / 9295 | time 5546[s] | loss 1.46
| epoch 9 |  iter 3041 / 9295 | time 5547[s] | loss 1.50
| epoch 9 |  iter 3061 / 9295 | time 5549[s] | loss 1.44
| epoch 9 |  iter 3081 / 9295 | time 5550[s] | loss 1.45
| epoch 9 |  iter 3101 / 9295 | time 5551[s] | loss 1.51
| epoch 9 |  iter 3121 / 9295 | time 5553[s] | loss 1.46
| epoch 9 |  iter 3141 / 9295 | time 5554[s] | loss 1.47
| epoch 9 |  iter 3161 / 9295 | time 5555[s] | loss 1.53
| epoch 9 |  iter 3181 / 9295 | time 5556[s] | loss 1.48
| epoch 9 |  iter 3201 / 9295 | time 5558[s] | loss 1.46
| epoch 9 |  iter 3221 / 9295 | time 5559[s] | loss 1.51
| epoch 9 |  iter 3241 / 9295 | time 5560[s] | loss 1.48
| epoch 9 |  iter 3261 / 9295 | time 5561[s] | loss 1.48
| epoch 9 |  iter 3281 / 9295 | time 5563[s] | loss 1.47
| epoch 9 |  iter 3301 / 9295 | time 5564[s] | loss 1.47
| epoch 9 |  iter 3321 / 9295 | time 5565[s] | loss 1.48
| epoch 9 |  iter 3341 / 9295 | time 5567[s] | loss 1.49
| epoch 9 |  iter 3361 / 9295 | time 5568[s] | loss 1.47
| epoch 9 |  iter 3381 / 9295 | time 5569[s] | loss 1.52
| epoch 9 |  iter 3401 / 9295 | time 5570[s] | loss 1.44
| epoch 9 |  iter 3421 / 9295 | time 5572[s] | loss 1.47
| epoch 9 |  iter 3441 / 9295 | time 5573[s] | loss 1.50
| epoch 9 |  iter 3461 / 9295 | time 5574[s] | loss 1.48
| epoch 9 |  iter 3481 / 9295 | time 5575[s] | loss 1.49
| epoch 9 |  iter 3501 / 9295 | time 5577[s] | loss 1.50
| epoch 9 |  iter 3521 / 9295 | time 5578[s] | loss 1.50
| epoch 9 |  iter 3541 / 9295 | time 5579[s] | loss 1.46
| epoch 9 |  iter 3561 / 9295 | time 5581[s] | loss 1.50
| epoch 9 |  iter 3581 / 9295 | time 5582[s] | loss 1.51
| epoch 9 |  iter 3601 / 9295 | time 5583[s] | loss 1.50
| epoch 9 |  iter 3621 / 9295 | time 5584[s] | loss 1.48
| epoch 9 |  iter 3641 / 9295 | time 5586[s] | loss 1.47
| epoch 9 |  iter 3661 / 9295 | time 5587[s] | loss 1.47
| epoch 9 |  iter 3681 / 9295 | time 5588[s] | loss 1.50
| epoch 9 |  iter 3701 / 9295 | time 5589[s] | loss 1.48
| epoch 9 |  iter 3721 / 9295 | time 5591[s] | loss 1.49
| epoch 9 |  iter 3741 / 9295 | time 5592[s] | loss 1.45
| epoch 9 |  iter 3761 / 9295 | time 5593[s] | loss 1.50
| epoch 9 |  iter 3781 / 9295 | time 5595[s] | loss 1.48
| epoch 9 |  iter 3801 / 9295 | time 5596[s] | loss 1.47
| epoch 9 |  iter 3821 / 9295 | time 5597[s] | loss 1.43
| epoch 9 |  iter 3841 / 9295 | time 5598[s] | loss 1.52
| epoch 9 |  iter 3861 / 9295 | time 5600[s] | loss 1.48
| epoch 9 |  iter 3881 / 9295 | time 5601[s] | loss 1.53
| epoch 9 |  iter 3901 / 9295 | time 5602[s] | loss 1.45
| epoch 9 |  iter 3921 / 9295 | time 5603[s] | loss 1.47
| epoch 9 |  iter 3941 / 9295 | time 5605[s] | loss 1.49
| epoch 9 |  iter 3961 / 9295 | time 5606[s] | loss 1.49
| epoch 9 |  iter 3981 / 9295 | time 5607[s] | loss 1.54
| epoch 9 |  iter 4001 / 9295 | time 5608[s] | loss 1.49
| epoch 9 |  iter 4021 / 9295 | time 5610[s] | loss 1.47
| epoch 9 |  iter 4041 / 9295 | time 5611[s] | loss 1.51
| epoch 9 |  iter 4061 / 9295 | time 5612[s] | loss 1.50
| epoch 9 |  iter 4081 / 9295 | time 5614[s] | loss 1.46
| epoch 9 |  iter 4101 / 9295 | time 5615[s] | loss 1.51
| epoch 9 |  iter 4121 / 9295 | time 5616[s] | loss 1.45
| epoch 9 |  iter 4141 / 9295 | time 5617[s] | loss 1.49
| epoch 9 |  iter 4161 / 9295 | time 5619[s] | loss 1.49
| epoch 9 |  iter 4181 / 9295 | time 5620[s] | loss 1.48
| epoch 9 |  iter 4201 / 9295 | time 5621[s] | loss 1.47
| epoch 9 |  iter 4221 / 9295 | time 5622[s] | loss 1.47
| epoch 9 |  iter 4241 / 9295 | time 5624[s] | loss 1.48
| epoch 9 |  iter 4261 / 9295 | time 5625[s] | loss 1.53
| epoch 9 |  iter 4281 / 9295 | time 5626[s] | loss 1.48
| epoch 9 |  iter 4301 / 9295 | time 5628[s] | loss 1.46
| epoch 9 |  iter 4321 / 9295 | time 5629[s] | loss 1.52
| epoch 9 |  iter 4341 / 9295 | time 5630[s] | loss 1.48
| epoch 9 |  iter 4361 / 9295 | time 5632[s] | loss 1.50
| epoch 9 |  iter 4381 / 9295 | time 5633[s] | loss 1.48
| epoch 9 |  iter 4401 / 9295 | time 5635[s] | loss 1.46
| epoch 9 |  iter 4421 / 9295 | time 5636[s] | loss 1.49
| epoch 9 |  iter 4441 / 9295 | time 5637[s] | loss 1.53
| epoch 9 |  iter 4461 / 9295 | time 5639[s] | loss 1.45
| epoch 9 |  iter 4481 / 9295 | time 5640[s] | loss 1.44
| epoch 9 |  iter 4501 / 9295 | time 5641[s] | loss 1.43
| epoch 9 |  iter 4521 / 9295 | time 5643[s] | loss 1.49
| epoch 9 |  iter 4541 / 9295 | time 5644[s] | loss 1.49
| epoch 9 |  iter 4561 / 9295 | time 5645[s] | loss 1.48
| epoch 9 |  iter 4581 / 9295 | time 5647[s] | loss 1.50
| epoch 9 |  iter 4601 / 9295 | time 5648[s] | loss 1.45
| epoch 9 |  iter 4621 / 9295 | time 5649[s] | loss 1.50
| epoch 9 |  iter 4641 / 9295 | time 5651[s] | loss 1.53
| epoch 9 |  iter 4661 / 9295 | time 5652[s] | loss 1.50
| epoch 9 |  iter 4681 / 9295 | time 5654[s] | loss 1.48
| epoch 9 |  iter 4701 / 9295 | time 5655[s] | loss 1.51
| epoch 9 |  iter 4721 / 9295 | time 5656[s] | loss 1.49
| epoch 9 |  iter 4741 / 9295 | time 5658[s] | loss 1.48
| epoch 9 |  iter 4761 / 9295 | time 5659[s] | loss 1.46
| epoch 9 |  iter 4781 / 9295 | time 5660[s] | loss 1.46
| epoch 9 |  iter 4801 / 9295 | time 5661[s] | loss 1.46
| epoch 9 |  iter 4821 / 9295 | time 5663[s] | loss 1.46
| epoch 9 |  iter 4841 / 9295 | time 5664[s] | loss 1.49
| epoch 9 |  iter 4861 / 9295 | time 5665[s] | loss 1.51
| epoch 9 |  iter 4881 / 9295 | time 5667[s] | loss 1.46
| epoch 9 |  iter 4901 / 9295 | time 5668[s] | loss 1.50
| epoch 9 |  iter 4921 / 9295 | time 5669[s] | loss 1.51
| epoch 9 |  iter 4941 / 9295 | time 5671[s] | loss 1.48
| epoch 9 |  iter 4961 / 9295 | time 5672[s] | loss 1.50
| epoch 9 |  iter 4981 / 9295 | time 5673[s] | loss 1.47
| epoch 9 |  iter 5001 / 9295 | time 5674[s] | loss 1.46
| epoch 9 |  iter 5021 / 9295 | time 5676[s] | loss 1.52
| epoch 9 |  iter 5041 / 9295 | time 5677[s] | loss 1.48
| epoch 9 |  iter 5061 / 9295 | time 5678[s] | loss 1.49
| epoch 9 |  iter 5081 / 9295 | time 5680[s] | loss 1.45
| epoch 9 |  iter 5101 / 9295 | time 5681[s] | loss 1.51
| epoch 9 |  iter 5121 / 9295 | time 5682[s] | loss 1.51
| epoch 9 |  iter 5141 / 9295 | time 5684[s] | loss 1.50
| epoch 9 |  iter 5161 / 9295 | time 5685[s] | loss 1.47
| epoch 9 |  iter 5181 / 9295 | time 5686[s] | loss 1.47
| epoch 9 |  iter 5201 / 9295 | time 5688[s] | loss 1.49
| epoch 9 |  iter 5221 / 9295 | time 5689[s] | loss 1.46
| epoch 9 |  iter 5241 / 9295 | time 5690[s] | loss 1.52
| epoch 9 |  iter 5261 / 9295 | time 5692[s] | loss 1.47
| epoch 9 |  iter 5281 / 9295 | time 5693[s] | loss 1.50
| epoch 9 |  iter 5301 / 9295 | time 5694[s] | loss 1.50
| epoch 9 |  iter 5321 / 9295 | time 5696[s] | loss 1.46
| epoch 9 |  iter 5341 / 9295 | time 5697[s] | loss 1.45
| epoch 9 |  iter 5361 / 9295 | time 5698[s] | loss 1.47
| epoch 9 |  iter 5381 / 9295 | time 5700[s] | loss 1.51
| epoch 9 |  iter 5401 / 9295 | time 5701[s] | loss 1.52
| epoch 9 |  iter 5421 / 9295 | time 5702[s] | loss 1.55
| epoch 9 |  iter 5441 / 9295 | time 5703[s] | loss 1.49
| epoch 9 |  iter 5461 / 9295 | time 5705[s] | loss 1.52
| epoch 9 |  iter 5481 / 9295 | time 5706[s] | loss 1.50
| epoch 9 |  iter 5501 / 9295 | time 5707[s] | loss 1.50
| epoch 9 |  iter 5521 / 9295 | time 5709[s] | loss 1.52
| epoch 9 |  iter 5541 / 9295 | time 5710[s] | loss 1.50
| epoch 9 |  iter 5561 / 9295 | time 5711[s] | loss 1.48
| epoch 9 |  iter 5581 / 9295 | time 5712[s] | loss 1.48
| epoch 9 |  iter 5601 / 9295 | time 5714[s] | loss 1.51
| epoch 9 |  iter 5621 / 9295 | time 5715[s] | loss 1.47
| epoch 9 |  iter 5641 / 9295 | time 5716[s] | loss 1.49
| epoch 9 |  iter 5661 / 9295 | time 5718[s] | loss 1.48
| epoch 9 |  iter 5681 / 9295 | time 5719[s] | loss 1.52
| epoch 9 |  iter 5701 / 9295 | time 5720[s] | loss 1.47
| epoch 9 |  iter 5721 / 9295 | time 5721[s] | loss 1.48
| epoch 9 |  iter 5741 / 9295 | time 5723[s] | loss 1.50
| epoch 9 |  iter 5761 / 9295 | time 5724[s] | loss 1.48
| epoch 9 |  iter 5781 / 9295 | time 5725[s] | loss 1.48
| epoch 9 |  iter 5801 / 9295 | time 5727[s] | loss 1.53
| epoch 9 |  iter 5821 / 9295 | time 5728[s] | loss 1.48
| epoch 9 |  iter 5841 / 9295 | time 5729[s] | loss 1.46
| epoch 9 |  iter 5861 / 9295 | time 5730[s] | loss 1.50
| epoch 9 |  iter 5881 / 9295 | time 5732[s] | loss 1.52
| epoch 9 |  iter 5901 / 9295 | time 5733[s] | loss 1.51
| epoch 9 |  iter 5921 / 9295 | time 5734[s] | loss 1.46
| epoch 9 |  iter 5941 / 9295 | time 5735[s] | loss 1.48
| epoch 9 |  iter 5961 / 9295 | time 5737[s] | loss 1.53
| epoch 9 |  iter 5981 / 9295 | time 5738[s] | loss 1.47
| epoch 9 |  iter 6001 / 9295 | time 5739[s] | loss 1.51
| epoch 9 |  iter 6021 / 9295 | time 5741[s] | loss 1.51
| epoch 9 |  iter 6041 / 9295 | time 5742[s] | loss 1.49
| epoch 9 |  iter 6061 / 9295 | time 5743[s] | loss 1.47
| epoch 9 |  iter 6081 / 9295 | time 5744[s] | loss 1.49
| epoch 9 |  iter 6101 / 9295 | time 5746[s] | loss 1.49
| epoch 9 |  iter 6121 / 9295 | time 5747[s] | loss 1.51
| epoch 9 |  iter 6141 / 9295 | time 5748[s] | loss 1.50
| epoch 9 |  iter 6161 / 9295 | time 5750[s] | loss 1.49
| epoch 9 |  iter 6181 / 9295 | time 5751[s] | loss 1.50
| epoch 9 |  iter 6201 / 9295 | time 5752[s] | loss 1.48
| epoch 9 |  iter 6221 / 9295 | time 5753[s] | loss 1.49
| epoch 9 |  iter 6241 / 9295 | time 5755[s] | loss 1.52
| epoch 9 |  iter 6261 / 9295 | time 5756[s] | loss 1.48
| epoch 9 |  iter 6281 / 9295 | time 5757[s] | loss 1.44
| epoch 9 |  iter 6301 / 9295 | time 5758[s] | loss 1.47
| epoch 9 |  iter 6321 / 9295 | time 5760[s] | loss 1.46
| epoch 9 |  iter 6341 / 9295 | time 5761[s] | loss 1.46
| epoch 9 |  iter 6361 / 9295 | time 5762[s] | loss 1.49
| epoch 9 |  iter 6381 / 9295 | time 5764[s] | loss 1.48
| epoch 9 |  iter 6401 / 9295 | time 5765[s] | loss 1.54
| epoch 9 |  iter 6421 / 9295 | time 5766[s] | loss 1.48
| epoch 9 |  iter 6441 / 9295 | time 5767[s] | loss 1.47
| epoch 9 |  iter 6461 / 9295 | time 5769[s] | loss 1.47
| epoch 9 |  iter 6481 / 9295 | time 5770[s] | loss 1.53
| epoch 9 |  iter 6501 / 9295 | time 5771[s] | loss 1.48
| epoch 9 |  iter 6521 / 9295 | time 5773[s] | loss 1.50
| epoch 9 |  iter 6541 / 9295 | time 5774[s] | loss 1.46
| epoch 9 |  iter 6561 / 9295 | time 5775[s] | loss 1.52
| epoch 9 |  iter 6581 / 9295 | time 5776[s] | loss 1.45
| epoch 9 |  iter 6601 / 9295 | time 5778[s] | loss 1.48
| epoch 9 |  iter 6621 / 9295 | time 5779[s] | loss 1.47
| epoch 9 |  iter 6641 / 9295 | time 5780[s] | loss 1.49
| epoch 9 |  iter 6661 / 9295 | time 5782[s] | loss 1.50
| epoch 9 |  iter 6681 / 9295 | time 5783[s] | loss 1.51
| epoch 9 |  iter 6701 / 9295 | time 5784[s] | loss 1.48
| epoch 9 |  iter 6721 / 9295 | time 5785[s] | loss 1.49
| epoch 9 |  iter 6741 / 9295 | time 5787[s] | loss 1.44
| epoch 9 |  iter 6761 / 9295 | time 5788[s] | loss 1.50
| epoch 9 |  iter 6781 / 9295 | time 5789[s] | loss 1.48
| epoch 9 |  iter 6801 / 9295 | time 5790[s] | loss 1.47
| epoch 9 |  iter 6821 / 9295 | time 5792[s] | loss 1.49
| epoch 9 |  iter 6841 / 9295 | time 5793[s] | loss 1.47
| epoch 9 |  iter 6861 / 9295 | time 5794[s] | loss 1.48
| epoch 9 |  iter 6881 / 9295 | time 5796[s] | loss 1.50
| epoch 9 |  iter 6901 / 9295 | time 5797[s] | loss 1.50
| epoch 9 |  iter 6921 / 9295 | time 5798[s] | loss 1.49
| epoch 9 |  iter 6941 / 9295 | time 5799[s] | loss 1.50
| epoch 9 |  iter 6961 / 9295 | time 5801[s] | loss 1.49
| epoch 9 |  iter 6981 / 9295 | time 5802[s] | loss 1.51
| epoch 9 |  iter 7001 / 9295 | time 5803[s] | loss 1.50
| epoch 9 |  iter 7021 / 9295 | time 5805[s] | loss 1.48
| epoch 9 |  iter 7041 / 9295 | time 5806[s] | loss 1.48
| epoch 9 |  iter 7061 / 9295 | time 5807[s] | loss 1.50
| epoch 9 |  iter 7081 / 9295 | time 5808[s] | loss 1.52
| epoch 9 |  iter 7101 / 9295 | time 5810[s] | loss 1.54
| epoch 9 |  iter 7121 / 9295 | time 5811[s] | loss 1.49
| epoch 9 |  iter 7141 / 9295 | time 5812[s] | loss 1.51
| epoch 9 |  iter 7161 / 9295 | time 5813[s] | loss 1.51
| epoch 9 |  iter 7181 / 9295 | time 5815[s] | loss 1.52
| epoch 9 |  iter 7201 / 9295 | time 5816[s] | loss 1.44
| epoch 9 |  iter 7221 / 9295 | time 5817[s] | loss 1.48
| epoch 9 |  iter 7241 / 9295 | time 5819[s] | loss 1.51
| epoch 9 |  iter 7261 / 9295 | time 5820[s] | loss 1.54
| epoch 9 |  iter 7281 / 9295 | time 5821[s] | loss 1.47
| epoch 9 |  iter 7301 / 9295 | time 5822[s] | loss 1.52
| epoch 9 |  iter 7321 / 9295 | time 5824[s] | loss 1.49
| epoch 9 |  iter 7341 / 9295 | time 5825[s] | loss 1.47
| epoch 9 |  iter 7361 / 9295 | time 5826[s] | loss 1.50
| epoch 9 |  iter 7381 / 9295 | time 5828[s] | loss 1.50
| epoch 9 |  iter 7401 / 9295 | time 5829[s] | loss 1.49
| epoch 9 |  iter 7421 / 9295 | time 5830[s] | loss 1.53
| epoch 9 |  iter 7441 / 9295 | time 5831[s] | loss 1.49
| epoch 9 |  iter 7461 / 9295 | time 5833[s] | loss 1.51
| epoch 9 |  iter 7481 / 9295 | time 5834[s] | loss 1.49
| epoch 9 |  iter 7501 / 9295 | time 5835[s] | loss 1.50
| epoch 9 |  iter 7521 / 9295 | time 5837[s] | loss 1.48
| epoch 9 |  iter 7541 / 9295 | time 5838[s] | loss 1.47
| epoch 9 |  iter 7561 / 9295 | time 5839[s] | loss 1.53
| epoch 9 |  iter 7581 / 9295 | time 5840[s] | loss 1.44
| epoch 9 |  iter 7601 / 9295 | time 5842[s] | loss 1.49
| epoch 9 |  iter 7621 / 9295 | time 5843[s] | loss 1.51
| epoch 9 |  iter 7641 / 9295 | time 5844[s] | loss 1.52
| epoch 9 |  iter 7661 / 9295 | time 5846[s] | loss 1.45
| epoch 9 |  iter 7681 / 9295 | time 5847[s] | loss 1.53
| epoch 9 |  iter 7701 / 9295 | time 5848[s] | loss 1.47
| epoch 9 |  iter 7721 / 9295 | time 5849[s] | loss 1.51
| epoch 9 |  iter 7741 / 9295 | time 5851[s] | loss 1.47
| epoch 9 |  iter 7761 / 9295 | time 5852[s] | loss 1.50
| epoch 9 |  iter 7781 / 9295 | time 5853[s] | loss 1.52
| epoch 9 |  iter 7801 / 9295 | time 5854[s] | loss 1.52
| epoch 9 |  iter 7821 / 9295 | time 5856[s] | loss 1.51
| epoch 9 |  iter 7841 / 9295 | time 5857[s] | loss 1.48
| epoch 9 |  iter 7861 / 9295 | time 5858[s] | loss 1.53
| epoch 9 |  iter 7881 / 9295 | time 5860[s] | loss 1.48
| epoch 9 |  iter 7901 / 9295 | time 5861[s] | loss 1.47
| epoch 9 |  iter 7921 / 9295 | time 5862[s] | loss 1.53
| epoch 9 |  iter 7941 / 9295 | time 5863[s] | loss 1.52
| epoch 9 |  iter 7961 / 9295 | time 5865[s] | loss 1.48
| epoch 9 |  iter 7981 / 9295 | time 5866[s] | loss 1.49
| epoch 9 |  iter 8001 / 9295 | time 5867[s] | loss 1.50
| epoch 9 |  iter 8021 / 9295 | time 5869[s] | loss 1.52
| epoch 9 |  iter 8041 / 9295 | time 5870[s] | loss 1.51
| epoch 9 |  iter 8061 / 9295 | time 5871[s] | loss 1.47
| epoch 9 |  iter 8081 / 9295 | time 5872[s] | loss 1.52
| epoch 9 |  iter 8101 / 9295 | time 5874[s] | loss 1.49
| epoch 9 |  iter 8121 / 9295 | time 5875[s] | loss 1.50
| epoch 9 |  iter 8141 / 9295 | time 5876[s] | loss 1.50
| epoch 9 |  iter 8161 / 9295 | time 5878[s] | loss 1.49
| epoch 9 |  iter 8181 / 9295 | time 5879[s] | loss 1.53
| epoch 9 |  iter 8201 / 9295 | time 5880[s] | loss 1.47
| epoch 9 |  iter 8221 / 9295 | time 5881[s] | loss 1.49
| epoch 9 |  iter 8241 / 9295 | time 5883[s] | loss 1.49
| epoch 9 |  iter 8261 / 9295 | time 5884[s] | loss 1.52
| epoch 9 |  iter 8281 / 9295 | time 5885[s] | loss 1.48
| epoch 9 |  iter 8301 / 9295 | time 5887[s] | loss 1.48
| epoch 9 |  iter 8321 / 9295 | time 5888[s] | loss 1.50
| epoch 9 |  iter 8341 / 9295 | time 5889[s] | loss 1.52
| epoch 9 |  iter 8361 / 9295 | time 5890[s] | loss 1.50
| epoch 9 |  iter 8381 / 9295 | time 5892[s] | loss 1.49
| epoch 9 |  iter 8401 / 9295 | time 5893[s] | loss 1.49
| epoch 9 |  iter 8421 / 9295 | time 5894[s] | loss 1.46
| epoch 9 |  iter 8441 / 9295 | time 5896[s] | loss 1.53
| epoch 9 |  iter 8461 / 9295 | time 5897[s] | loss 1.51
| epoch 9 |  iter 8481 / 9295 | time 5898[s] | loss 1.51
| epoch 9 |  iter 8501 / 9295 | time 5899[s] | loss 1.49
| epoch 9 |  iter 8521 / 9295 | time 5901[s] | loss 1.52
| epoch 9 |  iter 8541 / 9295 | time 5902[s] | loss 1.49
| epoch 9 |  iter 8561 / 9295 | time 5903[s] | loss 1.50
| epoch 9 |  iter 8581 / 9295 | time 5905[s] | loss 1.53
| epoch 9 |  iter 8601 / 9295 | time 5906[s] | loss 1.50
| epoch 9 |  iter 8621 / 9295 | time 5907[s] | loss 1.45
| epoch 9 |  iter 8641 / 9295 | time 5908[s] | loss 1.52
| epoch 9 |  iter 8661 / 9295 | time 5910[s] | loss 1.55
| epoch 9 |  iter 8681 / 9295 | time 5911[s] | loss 1.53
| epoch 9 |  iter 8701 / 9295 | time 5912[s] | loss 1.51
| epoch 9 |  iter 8721 / 9295 | time 5914[s] | loss 1.51
| epoch 9 |  iter 8741 / 9295 | time 5915[s] | loss 1.49
| epoch 9 |  iter 8761 / 9295 | time 5916[s] | loss 1.49
| epoch 9 |  iter 8781 / 9295 | time 5917[s] | loss 1.48
| epoch 9 |  iter 8801 / 9295 | time 5919[s] | loss 1.49
| epoch 9 |  iter 8821 / 9295 | time 5920[s] | loss 1.46
| epoch 9 |  iter 8841 / 9295 | time 5921[s] | loss 1.49
| epoch 9 |  iter 8861 / 9295 | time 5922[s] | loss 1.51
| epoch 9 |  iter 8881 / 9295 | time 5924[s] | loss 1.46
| epoch 9 |  iter 8901 / 9295 | time 5925[s] | loss 1.50
| epoch 9 |  iter 8921 / 9295 | time 5926[s] | loss 1.49
| epoch 9 |  iter 8941 / 9295 | time 5928[s] | loss 1.48
| epoch 9 |  iter 8961 / 9295 | time 5929[s] | loss 1.50
| epoch 9 |  iter 8981 / 9295 | time 5930[s] | loss 1.53
| epoch 9 |  iter 9001 / 9295 | time 5931[s] | loss 1.52
| epoch 9 |  iter 9021 / 9295 | time 5933[s] | loss 1.47
| epoch 9 |  iter 9041 / 9295 | time 5934[s] | loss 1.47
| epoch 9 |  iter 9061 / 9295 | time 5935[s] | loss 1.49
| epoch 9 |  iter 9081 / 9295 | time 5937[s] | loss 1.47
| epoch 9 |  iter 9101 / 9295 | time 5938[s] | loss 1.51
| epoch 9 |  iter 9121 / 9295 | time 5939[s] | loss 1.50
| epoch 9 |  iter 9141 / 9295 | time 5940[s] | loss 1.51
| epoch 9 |  iter 9161 / 9295 | time 5942[s] | loss 1.48
| epoch 9 |  iter 9181 / 9295 | time 5943[s] | loss 1.52
| epoch 9 |  iter 9201 / 9295 | time 5944[s] | loss 1.47
| epoch 9 |  iter 9221 / 9295 | time 5946[s] | loss 1.46
| epoch 9 |  iter 9241 / 9295 | time 5947[s] | loss 1.48
| epoch 9 |  iter 9261 / 9295 | time 5948[s] | loss 1.48
| epoch 9 |  iter 9281 / 9295 | time 5949[s] | loss 1.49
| epoch 10 |  iter 1 / 9295 | time 5951[s] | loss 1.51
| epoch 10 |  iter 21 / 9295 | time 5952[s] | loss 1.40
| epoch 10 |  iter 41 / 9295 | time 5953[s] | loss 1.46
| epoch 10 |  iter 61 / 9295 | time 5954[s] | loss 1.39
| epoch 10 |  iter 81 / 9295 | time 5956[s] | loss 1.44
| epoch 10 |  iter 101 / 9295 | time 5957[s] | loss 1.40
| epoch 10 |  iter 121 / 9295 | time 5958[s] | loss 1.40
| epoch 10 |  iter 141 / 9295 | time 5959[s] | loss 1.44
| epoch 10 |  iter 161 / 9295 | time 5961[s] | loss 1.41
| epoch 10 |  iter 181 / 9295 | time 5962[s] | loss 1.43
| epoch 10 |  iter 201 / 9295 | time 5963[s] | loss 1.47
| epoch 10 |  iter 221 / 9295 | time 5965[s] | loss 1.40
| epoch 10 |  iter 241 / 9295 | time 5966[s] | loss 1.44
| epoch 10 |  iter 261 / 9295 | time 5967[s] | loss 1.43
| epoch 10 |  iter 281 / 9295 | time 5968[s] | loss 1.41
| epoch 10 |  iter 301 / 9295 | time 5970[s] | loss 1.43
| epoch 10 |  iter 321 / 9295 | time 5971[s] | loss 1.41
| epoch 10 |  iter 341 / 9295 | time 5972[s] | loss 1.41
| epoch 10 |  iter 361 / 9295 | time 5974[s] | loss 1.38
| epoch 10 |  iter 381 / 9295 | time 5975[s] | loss 1.45
| epoch 10 |  iter 401 / 9295 | time 5976[s] | loss 1.44
| epoch 10 |  iter 421 / 9295 | time 5978[s] | loss 1.40
| epoch 10 |  iter 441 / 9295 | time 5979[s] | loss 1.37
| epoch 10 |  iter 461 / 9295 | time 5980[s] | loss 1.40
| epoch 10 |  iter 481 / 9295 | time 5981[s] | loss 1.41
| epoch 10 |  iter 501 / 9295 | time 5983[s] | loss 1.46
| epoch 10 |  iter 521 / 9295 | time 5984[s] | loss 1.42
| epoch 10 |  iter 541 / 9295 | time 5985[s] | loss 1.43
| epoch 10 |  iter 561 / 9295 | time 5987[s] | loss 1.44
| epoch 10 |  iter 581 / 9295 | time 5988[s] | loss 1.41
| epoch 10 |  iter 601 / 9295 | time 5989[s] | loss 1.41
| epoch 10 |  iter 621 / 9295 | time 5990[s] | loss 1.45
| epoch 10 |  iter 641 / 9295 | time 5992[s] | loss 1.41
| epoch 10 |  iter 661 / 9295 | time 5993[s] | loss 1.48
| epoch 10 |  iter 681 / 9295 | time 5994[s] | loss 1.43
| epoch 10 |  iter 701 / 9295 | time 5996[s] | loss 1.42
| epoch 10 |  iter 721 / 9295 | time 5997[s] | loss 1.44
| epoch 10 |  iter 741 / 9295 | time 5998[s] | loss 1.46
| epoch 10 |  iter 761 / 9295 | time 6000[s] | loss 1.41
| epoch 10 |  iter 781 / 9295 | time 6001[s] | loss 1.40
| epoch 10 |  iter 801 / 9295 | time 6002[s] | loss 1.43
| epoch 10 |  iter 821 / 9295 | time 6004[s] | loss 1.44
| epoch 10 |  iter 841 / 9295 | time 6005[s] | loss 1.45
| epoch 10 |  iter 861 / 9295 | time 6006[s] | loss 1.45
| epoch 10 |  iter 881 / 9295 | time 6008[s] | loss 1.43
| epoch 10 |  iter 901 / 9295 | time 6009[s] | loss 1.39
| epoch 10 |  iter 921 / 9295 | time 6011[s] | loss 1.41
| epoch 10 |  iter 941 / 9295 | time 6012[s] | loss 1.43
| epoch 10 |  iter 961 / 9295 | time 6013[s] | loss 1.39
| epoch 10 |  iter 981 / 9295 | time 6015[s] | loss 1.43
| epoch 10 |  iter 1001 / 9295 | time 6016[s] | loss 1.43
| epoch 10 |  iter 1021 / 9295 | time 6017[s] | loss 1.44
| epoch 10 |  iter 1041 / 9295 | time 6019[s] | loss 1.40
| epoch 10 |  iter 1061 / 9295 | time 6020[s] | loss 1.47
| epoch 10 |  iter 1081 / 9295 | time 6021[s] | loss 1.44
| epoch 10 |  iter 1101 / 9295 | time 6023[s] | loss 1.44
| epoch 10 |  iter 1121 / 9295 | time 6024[s] | loss 1.42
| epoch 10 |  iter 1141 / 9295 | time 6025[s] | loss 1.44
| epoch 10 |  iter 1161 / 9295 | time 6027[s] | loss 1.43
| epoch 10 |  iter 1181 / 9295 | time 6028[s] | loss 1.44
| epoch 10 |  iter 1201 / 9295 | time 6029[s] | loss 1.43
| epoch 10 |  iter 1221 / 9295 | time 6031[s] | loss 1.42
| epoch 10 |  iter 1241 / 9295 | time 6032[s] | loss 1.46
| epoch 10 |  iter 1261 / 9295 | time 6033[s] | loss 1.44
| epoch 10 |  iter 1281 / 9295 | time 6035[s] | loss 1.42
| epoch 10 |  iter 1301 / 9295 | time 6036[s] | loss 1.39
| epoch 10 |  iter 1321 / 9295 | time 6037[s] | loss 1.44
| epoch 10 |  iter 1341 / 9295 | time 6039[s] | loss 1.46
| epoch 10 |  iter 1361 / 9295 | time 6040[s] | loss 1.43
| epoch 10 |  iter 1381 / 9295 | time 6041[s] | loss 1.42
| epoch 10 |  iter 1401 / 9295 | time 6042[s] | loss 1.41
| epoch 10 |  iter 1421 / 9295 | time 6044[s] | loss 1.43
| epoch 10 |  iter 1441 / 9295 | time 6045[s] | loss 1.44
| epoch 10 |  iter 1461 / 9295 | time 6046[s] | loss 1.43
| epoch 10 |  iter 1481 / 9295 | time 6047[s] | loss 1.40
| epoch 10 |  iter 1501 / 9295 | time 6049[s] | loss 1.45
| epoch 10 |  iter 1521 / 9295 | time 6050[s] | loss 1.45
| epoch 10 |  iter 1541 / 9295 | time 6051[s] | loss 1.40
| epoch 10 |  iter 1561 / 9295 | time 6053[s] | loss 1.45
| epoch 10 |  iter 1581 / 9295 | time 6054[s] | loss 1.47
| epoch 10 |  iter 1601 / 9295 | time 6055[s] | loss 1.46
| epoch 10 |  iter 1621 / 9295 | time 6057[s] | loss 1.40
| epoch 10 |  iter 1641 / 9295 | time 6058[s] | loss 1.43
| epoch 10 |  iter 1661 / 9295 | time 6060[s] | loss 1.43
| epoch 10 |  iter 1681 / 9295 | time 6061[s] | loss 1.43
| epoch 10 |  iter 1701 / 9295 | time 6063[s] | loss 1.40
| epoch 10 |  iter 1721 / 9295 | time 6064[s] | loss 1.47
| epoch 10 |  iter 1741 / 9295 | time 6066[s] | loss 1.42
| epoch 10 |  iter 1761 / 9295 | time 6067[s] | loss 1.44
| epoch 10 |  iter 1781 / 9295 | time 6069[s] | loss 1.39
| epoch 10 |  iter 1801 / 9295 | time 6070[s] | loss 1.40
| epoch 10 |  iter 1821 / 9295 | time 6071[s] | loss 1.44
| epoch 10 |  iter 1841 / 9295 | time 6073[s] | loss 1.42
| epoch 10 |  iter 1861 / 9295 | time 6074[s] | loss 1.46
| epoch 10 |  iter 1881 / 9295 | time 6075[s] | loss 1.44
| epoch 10 |  iter 1901 / 9295 | time 6077[s] | loss 1.41
| epoch 10 |  iter 1921 / 9295 | time 6078[s] | loss 1.45
| epoch 10 |  iter 1941 / 9295 | time 6080[s] | loss 1.44
| epoch 10 |  iter 1961 / 9295 | time 6081[s] | loss 1.43
| epoch 10 |  iter 1981 / 9295 | time 6082[s] | loss 1.40
| epoch 10 |  iter 2001 / 9295 | time 6084[s] | loss 1.40
| epoch 10 |  iter 2021 / 9295 | time 6085[s] | loss 1.44
| epoch 10 |  iter 2041 / 9295 | time 6086[s] | loss 1.43
| epoch 10 |  iter 2061 / 9295 | time 6088[s] | loss 1.44
| epoch 10 |  iter 2081 / 9295 | time 6089[s] | loss 1.45
| epoch 10 |  iter 2101 / 9295 | time 6090[s] | loss 1.47
| epoch 10 |  iter 2121 / 9295 | time 6091[s] | loss 1.41
| epoch 10 |  iter 2141 / 9295 | time 6093[s] | loss 1.49
| epoch 10 |  iter 2161 / 9295 | time 6094[s] | loss 1.45
| epoch 10 |  iter 2181 / 9295 | time 6096[s] | loss 1.42
| epoch 10 |  iter 2201 / 9295 | time 6097[s] | loss 1.41
| epoch 10 |  iter 2221 / 9295 | time 6098[s] | loss 1.43
| epoch 10 |  iter 2241 / 9295 | time 6100[s] | loss 1.46
| epoch 10 |  iter 2261 / 9295 | time 6101[s] | loss 1.41
| epoch 10 |  iter 2281 / 9295 | time 6102[s] | loss 1.43
| epoch 10 |  iter 2301 / 9295 | time 6103[s] | loss 1.46
| epoch 10 |  iter 2321 / 9295 | time 6105[s] | loss 1.42
| epoch 10 |  iter 2341 / 9295 | time 6106[s] | loss 1.43
| epoch 10 |  iter 2361 / 9295 | time 6107[s] | loss 1.44
| epoch 10 |  iter 2381 / 9295 | time 6109[s] | loss 1.41
| epoch 10 |  iter 2401 / 9295 | time 6110[s] | loss 1.44
| epoch 10 |  iter 2421 / 9295 | time 6111[s] | loss 1.42
| epoch 10 |  iter 2441 / 9295 | time 6113[s] | loss 1.42
| epoch 10 |  iter 2461 / 9295 | time 6114[s] | loss 1.42
| epoch 10 |  iter 2481 / 9295 | time 6115[s] | loss 1.43
| epoch 10 |  iter 2501 / 9295 | time 6117[s] | loss 1.45
| epoch 10 |  iter 2521 / 9295 | time 6118[s] | loss 1.46
| epoch 10 |  iter 2541 / 9295 | time 6119[s] | loss 1.43
| epoch 10 |  iter 2561 / 9295 | time 6121[s] | loss 1.43
| epoch 10 |  iter 2581 / 9295 | time 6122[s] | loss 1.44
| epoch 10 |  iter 2601 / 9295 | time 6123[s] | loss 1.45
| epoch 10 |  iter 2621 / 9295 | time 6125[s] | loss 1.41
| epoch 10 |  iter 2641 / 9295 | time 6126[s] | loss 1.43
| epoch 10 |  iter 2661 / 9295 | time 6127[s] | loss 1.45
| epoch 10 |  iter 2681 / 9295 | time 6129[s] | loss 1.42
| epoch 10 |  iter 2701 / 9295 | time 6130[s] | loss 1.46
| epoch 10 |  iter 2721 / 9295 | time 6131[s] | loss 1.47
| epoch 10 |  iter 2741 / 9295 | time 6133[s] | loss 1.40
| epoch 10 |  iter 2761 / 9295 | time 6134[s] | loss 1.44
| epoch 10 |  iter 2781 / 9295 | time 6136[s] | loss 1.44
| epoch 10 |  iter 2801 / 9295 | time 6137[s] | loss 1.42
| epoch 10 |  iter 2821 / 9295 | time 6139[s] | loss 1.44
| epoch 10 |  iter 2841 / 9295 | time 6140[s] | loss 1.44
| epoch 10 |  iter 2861 / 9295 | time 6141[s] | loss 1.45
| epoch 10 |  iter 2881 / 9295 | time 6143[s] | loss 1.44
| epoch 10 |  iter 2901 / 9295 | time 6144[s] | loss 1.42
| epoch 10 |  iter 2921 / 9295 | time 6145[s] | loss 1.46
| epoch 10 |  iter 2941 / 9295 | time 6147[s] | loss 1.41
| epoch 10 |  iter 2961 / 9295 | time 6148[s] | loss 1.48
| epoch 10 |  iter 2981 / 9295 | time 6149[s] | loss 1.42
| epoch 10 |  iter 3001 / 9295 | time 6150[s] | loss 1.43
| epoch 10 |  iter 3021 / 9295 | time 6152[s] | loss 1.44
| epoch 10 |  iter 3041 / 9295 | time 6153[s] | loss 1.38
| epoch 10 |  iter 3061 / 9295 | time 6154[s] | loss 1.41
| epoch 10 |  iter 3081 / 9295 | time 6156[s] | loss 1.45
| epoch 10 |  iter 3101 / 9295 | time 6157[s] | loss 1.47
| epoch 10 |  iter 3121 / 9295 | time 6158[s] | loss 1.39
| epoch 10 |  iter 3141 / 9295 | time 6160[s] | loss 1.41
| epoch 10 |  iter 3161 / 9295 | time 6161[s] | loss 1.43
| epoch 10 |  iter 3181 / 9295 | time 6162[s] | loss 1.43
| epoch 10 |  iter 3201 / 9295 | time 6164[s] | loss 1.41
| epoch 10 |  iter 3221 / 9295 | time 6165[s] | loss 1.42
| epoch 10 |  iter 3241 / 9295 | time 6166[s] | loss 1.42
| epoch 10 |  iter 3261 / 9295 | time 6167[s] | loss 1.41
| epoch 10 |  iter 3281 / 9295 | time 6169[s] | loss 1.43
| epoch 10 |  iter 3301 / 9295 | time 6170[s] | loss 1.43
| epoch 10 |  iter 3321 / 9295 | time 6171[s] | loss 1.44
| epoch 10 |  iter 3341 / 9295 | time 6173[s] | loss 1.45
| epoch 10 |  iter 3361 / 9295 | time 6174[s] | loss 1.47
| epoch 10 |  iter 3381 / 9295 | time 6175[s] | loss 1.43
| epoch 10 |  iter 3401 / 9295 | time 6177[s] | loss 1.43
| epoch 10 |  iter 3421 / 9295 | time 6178[s] | loss 1.44
| epoch 10 |  iter 3441 / 9295 | time 6179[s] | loss 1.43
| epoch 10 |  iter 3461 / 9295 | time 6181[s] | loss 1.39
| epoch 10 |  iter 3481 / 9295 | time 6182[s] | loss 1.43
| epoch 10 |  iter 3501 / 9295 | time 6183[s] | loss 1.42
| epoch 10 |  iter 3521 / 9295 | time 6185[s] | loss 1.44
| epoch 10 |  iter 3541 / 9295 | time 6186[s] | loss 1.44
| epoch 10 |  iter 3561 / 9295 | time 6187[s] | loss 1.41
| epoch 10 |  iter 3581 / 9295 | time 6189[s] | loss 1.44
| epoch 10 |  iter 3601 / 9295 | time 6190[s] | loss 1.44
| epoch 10 |  iter 3621 / 9295 | time 6191[s] | loss 1.44
| epoch 10 |  iter 3641 / 9295 | time 6193[s] | loss 1.42
| epoch 10 |  iter 3661 / 9295 | time 6194[s] | loss 1.40
| epoch 10 |  iter 3681 / 9295 | time 6196[s] | loss 1.47
| epoch 10 |  iter 3701 / 9295 | time 6197[s] | loss 1.42
| epoch 10 |  iter 3721 / 9295 | time 6198[s] | loss 1.44
| epoch 10 |  iter 3741 / 9295 | time 6200[s] | loss 1.45
| epoch 10 |  iter 3761 / 9295 | time 6201[s] | loss 1.46
| epoch 10 |  iter 3781 / 9295 | time 6202[s] | loss 1.46
| epoch 10 |  iter 3801 / 9295 | time 6204[s] | loss 1.40
| epoch 10 |  iter 3821 / 9295 | time 6205[s] | loss 1.48
| epoch 10 |  iter 3841 / 9295 | time 6207[s] | loss 1.46
| epoch 10 |  iter 3861 / 9295 | time 6208[s] | loss 1.44
| epoch 10 |  iter 3881 / 9295 | time 6209[s] | loss 1.44
| epoch 10 |  iter 3901 / 9295 | time 6211[s] | loss 1.46
| epoch 10 |  iter 3921 / 9295 | time 6212[s] | loss 1.49
| epoch 10 |  iter 3941 / 9295 | time 6213[s] | loss 1.44
| epoch 10 |  iter 3961 / 9295 | time 6215[s] | loss 1.43
| epoch 10 |  iter 3981 / 9295 | time 6216[s] | loss 1.42
| epoch 10 |  iter 4001 / 9295 | time 6217[s] | loss 1.44
| epoch 10 |  iter 4021 / 9295 | time 6219[s] | loss 1.47
| epoch 10 |  iter 4041 / 9295 | time 6220[s] | loss 1.46
| epoch 10 |  iter 4061 / 9295 | time 6221[s] | loss 1.44
| epoch 10 |  iter 4081 / 9295 | time 6222[s] | loss 1.46
| epoch 10 |  iter 4101 / 9295 | time 6224[s] | loss 1.43
| epoch 10 |  iter 4121 / 9295 | time 6225[s] | loss 1.43
| epoch 10 |  iter 4141 / 9295 | time 6226[s] | loss 1.46
| epoch 10 |  iter 4161 / 9295 | time 6228[s] | loss 1.43
| epoch 10 |  iter 4181 / 9295 | time 6229[s] | loss 1.45
| epoch 10 |  iter 4201 / 9295 | time 6230[s] | loss 1.43
| epoch 10 |  iter 4221 / 9295 | time 6231[s] | loss 1.46
| epoch 10 |  iter 4241 / 9295 | time 6233[s] | loss 1.46
| epoch 10 |  iter 4261 / 9295 | time 6234[s] | loss 1.41
| epoch 10 |  iter 4281 / 9295 | time 6235[s] | loss 1.44
| epoch 10 |  iter 4301 / 9295 | time 6237[s] | loss 1.45
| epoch 10 |  iter 4321 / 9295 | time 6238[s] | loss 1.42
| epoch 10 |  iter 4341 / 9295 | time 6239[s] | loss 1.44
| epoch 10 |  iter 4361 / 9295 | time 6241[s] | loss 1.42
| epoch 10 |  iter 4381 / 9295 | time 6242[s] | loss 1.45
| epoch 10 |  iter 4401 / 9295 | time 6243[s] | loss 1.44
| epoch 10 |  iter 4421 / 9295 | time 6245[s] | loss 1.43
| epoch 10 |  iter 4441 / 9295 | time 6246[s] | loss 1.42
| epoch 10 |  iter 4461 / 9295 | time 6247[s] | loss 1.43
| epoch 10 |  iter 4481 / 9295 | time 6249[s] | loss 1.45
| epoch 10 |  iter 4501 / 9295 | time 6250[s] | loss 1.48
| epoch 10 |  iter 4521 / 9295 | time 6251[s] | loss 1.46
| epoch 10 |  iter 4541 / 9295 | time 6253[s] | loss 1.45
| epoch 10 |  iter 4561 / 9295 | time 6254[s] | loss 1.44
| epoch 10 |  iter 4581 / 9295 | time 6255[s] | loss 1.44
| epoch 10 |  iter 4601 / 9295 | time 6257[s] | loss 1.45
| epoch 10 |  iter 4621 / 9295 | time 6258[s] | loss 1.39
| epoch 10 |  iter 4641 / 9295 | time 6259[s] | loss 1.44
| epoch 10 |  iter 4661 / 9295 | time 6261[s] | loss 1.47
| epoch 10 |  iter 4681 / 9295 | time 6262[s] | loss 1.45
| epoch 10 |  iter 4701 / 9295 | time 6263[s] | loss 1.43
| epoch 10 |  iter 4721 / 9295 | time 6264[s] | loss 1.42
| epoch 10 |  iter 4741 / 9295 | time 6266[s] | loss 1.40
| epoch 10 |  iter 4761 / 9295 | time 6267[s] | loss 1.43
| epoch 10 |  iter 4781 / 9295 | time 6268[s] | loss 1.42
| epoch 10 |  iter 4801 / 9295 | time 6270[s] | loss 1.43
| epoch 10 |  iter 4821 / 9295 | time 6271[s] | loss 1.45
| epoch 10 |  iter 4841 / 9295 | time 6272[s] | loss 1.43
| epoch 10 |  iter 4861 / 9295 | time 6274[s] | loss 1.42
| epoch 10 |  iter 4881 / 9295 | time 6275[s] | loss 1.44
| epoch 10 |  iter 4901 / 9295 | time 6276[s] | loss 1.42
| epoch 10 |  iter 4921 / 9295 | time 6278[s] | loss 1.46
| epoch 10 |  iter 4941 / 9295 | time 6279[s] | loss 1.47
| epoch 10 |  iter 4961 / 9295 | time 6280[s] | loss 1.44
| epoch 10 |  iter 4981 / 9295 | time 6282[s] | loss 1.40
| epoch 10 |  iter 5001 / 9295 | time 6283[s] | loss 1.46
| epoch 10 |  iter 5021 / 9295 | time 6284[s] | loss 1.42
| epoch 10 |  iter 5041 / 9295 | time 6286[s] | loss 1.39
| epoch 10 |  iter 5061 / 9295 | time 6287[s] | loss 1.48
| epoch 10 |  iter 5081 / 9295 | time 6289[s] | loss 1.42
| epoch 10 |  iter 5101 / 9295 | time 6290[s] | loss 1.40
| epoch 10 |  iter 5121 / 9295 | time 6292[s] | loss 1.49
| epoch 10 |  iter 5141 / 9295 | time 6294[s] | loss 1.44
| epoch 10 |  iter 5161 / 9295 | time 6295[s] | loss 1.41
| epoch 10 |  iter 5181 / 9295 | time 6297[s] | loss 1.43
| epoch 10 |  iter 5201 / 9295 | time 6298[s] | loss 1.43
| epoch 10 |  iter 5221 / 9295 | time 6300[s] | loss 1.41
| epoch 10 |  iter 5241 / 9295 | time 6301[s] | loss 1.44
| epoch 10 |  iter 5261 / 9295 | time 6303[s] | loss 1.46
| epoch 10 |  iter 5281 / 9295 | time 6304[s] | loss 1.44
| epoch 10 |  iter 5301 / 9295 | time 6305[s] | loss 1.48
| epoch 10 |  iter 5321 / 9295 | time 6307[s] | loss 1.46
| epoch 10 |  iter 5341 / 9295 | time 6308[s] | loss 1.48
| epoch 10 |  iter 5361 / 9295 | time 6310[s] | loss 1.42
| epoch 10 |  iter 5381 / 9295 | time 6311[s] | loss 1.49
| epoch 10 |  iter 5401 / 9295 | time 6312[s] | loss 1.48
| epoch 10 |  iter 5421 / 9295 | time 6314[s] | loss 1.43
| epoch 10 |  iter 5441 / 9295 | time 6315[s] | loss 1.45
| epoch 10 |  iter 5461 / 9295 | time 6316[s] | loss 1.43
| epoch 10 |  iter 5481 / 9295 | time 6318[s] | loss 1.46
| epoch 10 |  iter 5501 / 9295 | time 6319[s] | loss 1.46
| epoch 10 |  iter 5521 / 9295 | time 6321[s] | loss 1.46
| epoch 10 |  iter 5541 / 9295 | time 6322[s] | loss 1.42
| epoch 10 |  iter 5561 / 9295 | time 6323[s] | loss 1.50
| epoch 10 |  iter 5581 / 9295 | time 6325[s] | loss 1.44
| epoch 10 |  iter 5601 / 9295 | time 6326[s] | loss 1.47
| epoch 10 |  iter 5621 / 9295 | time 6328[s] | loss 1.44
| epoch 10 |  iter 5641 / 9295 | time 6329[s] | loss 1.48
| epoch 10 |  iter 5661 / 9295 | time 6330[s] | loss 1.47
| epoch 10 |  iter 5681 / 9295 | time 6332[s] | loss 1.48
| epoch 10 |  iter 5701 / 9295 | time 6333[s] | loss 1.44
| epoch 10 |  iter 5721 / 9295 | time 6335[s] | loss 1.41
| epoch 10 |  iter 5741 / 9295 | time 6336[s] | loss 1.49
| epoch 10 |  iter 5761 / 9295 | time 6337[s] | loss 1.46
| epoch 10 |  iter 5781 / 9295 | time 6339[s] | loss 1.45
| epoch 10 |  iter 5801 / 9295 | time 6340[s] | loss 1.46
| epoch 10 |  iter 5821 / 9295 | time 6342[s] | loss 1.43
| epoch 10 |  iter 5841 / 9295 | time 6343[s] | loss 1.48
| epoch 10 |  iter 5861 / 9295 | time 6344[s] | loss 1.45
| epoch 10 |  iter 5881 / 9295 | time 6346[s] | loss 1.44
| epoch 10 |  iter 5901 / 9295 | time 6347[s] | loss 1.44
| epoch 10 |  iter 5921 / 9295 | time 6348[s] | loss 1.48
| epoch 10 |  iter 5941 / 9295 | time 6350[s] | loss 1.48
| epoch 10 |  iter 5961 / 9295 | time 6351[s] | loss 1.44
| epoch 10 |  iter 5981 / 9295 | time 6353[s] | loss 1.45
| epoch 10 |  iter 6001 / 9295 | time 6354[s] | loss 1.45
| epoch 10 |  iter 6021 / 9295 | time 6355[s] | loss 1.44
| epoch 10 |  iter 6041 / 9295 | time 6357[s] | loss 1.43
| epoch 10 |  iter 6061 / 9295 | time 6358[s] | loss 1.43
| epoch 10 |  iter 6081 / 9295 | time 6359[s] | loss 1.46
| epoch 10 |  iter 6101 / 9295 | time 6361[s] | loss 1.44
| epoch 10 |  iter 6121 / 9295 | time 6362[s] | loss 1.47
| epoch 10 |  iter 6141 / 9295 | time 6364[s] | loss 1.49
| epoch 10 |  iter 6161 / 9295 | time 6365[s] | loss 1.45
| epoch 10 |  iter 6181 / 9295 | time 6366[s] | loss 1.48
| epoch 10 |  iter 6201 / 9295 | time 6368[s] | loss 1.45
| epoch 10 |  iter 6221 / 9295 | time 6369[s] | loss 1.39
| epoch 10 |  iter 6241 / 9295 | time 6371[s] | loss 1.44
| epoch 10 |  iter 6261 / 9295 | time 6372[s] | loss 1.42
| epoch 10 |  iter 6281 / 9295 | time 6373[s] | loss 1.45
| epoch 10 |  iter 6301 / 9295 | time 6375[s] | loss 1.47
| epoch 10 |  iter 6321 / 9295 | time 6376[s] | loss 1.44
| epoch 10 |  iter 6341 / 9295 | time 6377[s] | loss 1.47
| epoch 10 |  iter 6361 / 9295 | time 6379[s] | loss 1.42
| epoch 10 |  iter 6381 / 9295 | time 6380[s] | loss 1.44
| epoch 10 |  iter 6401 / 9295 | time 6382[s] | loss 1.45
| epoch 10 |  iter 6421 / 9295 | time 6383[s] | loss 1.43
| epoch 10 |  iter 6441 / 9295 | time 6384[s] | loss 1.45
| epoch 10 |  iter 6461 / 9295 | time 6386[s] | loss 1.44
| epoch 10 |  iter 6481 / 9295 | time 6387[s] | loss 1.45
| epoch 10 |  iter 6501 / 9295 | time 6389[s] | loss 1.46
| epoch 10 |  iter 6521 / 9295 | time 6390[s] | loss 1.48
| epoch 10 |  iter 6541 / 9295 | time 6391[s] | loss 1.44
| epoch 10 |  iter 6561 / 9295 | time 6393[s] | loss 1.43
| epoch 10 |  iter 6581 / 9295 | time 6394[s] | loss 1.43
| epoch 10 |  iter 6601 / 9295 | time 6395[s] | loss 1.45
| epoch 10 |  iter 6621 / 9295 | time 6397[s] | loss 1.45
| epoch 10 |  iter 6641 / 9295 | time 6398[s] | loss 1.45
| epoch 10 |  iter 6661 / 9295 | time 6399[s] | loss 1.47
| epoch 10 |  iter 6681 / 9295 | time 6401[s] | loss 1.45
| epoch 10 |  iter 6701 / 9295 | time 6402[s] | loss 1.42
| epoch 10 |  iter 6721 / 9295 | time 6404[s] | loss 1.44
| epoch 10 |  iter 6741 / 9295 | time 6405[s] | loss 1.42
| epoch 10 |  iter 6761 / 9295 | time 6406[s] | loss 1.48
| epoch 10 |  iter 6781 / 9295 | time 6407[s] | loss 1.49
| epoch 10 |  iter 6801 / 9295 | time 6409[s] | loss 1.45
| epoch 10 |  iter 6821 / 9295 | time 6410[s] | loss 1.46
| epoch 10 |  iter 6841 / 9295 | time 6411[s] | loss 1.44
| epoch 10 |  iter 6861 / 9295 | time 6413[s] | loss 1.46
| epoch 10 |  iter 6881 / 9295 | time 6414[s] | loss 1.47
| epoch 10 |  iter 6901 / 9295 | time 6415[s] | loss 1.47
| epoch 10 |  iter 6921 / 9295 | time 6417[s] | loss 1.44
| epoch 10 |  iter 6941 / 9295 | time 6418[s] | loss 1.44
| epoch 10 |  iter 6961 / 9295 | time 6419[s] | loss 1.44
| epoch 10 |  iter 6981 / 9295 | time 6421[s] | loss 1.47
| epoch 10 |  iter 7001 / 9295 | time 6422[s] | loss 1.49
| epoch 10 |  iter 7021 / 9295 | time 6423[s] | loss 1.43
| epoch 10 |  iter 7041 / 9295 | time 6425[s] | loss 1.45
| epoch 10 |  iter 7061 / 9295 | time 6426[s] | loss 1.47
| epoch 10 |  iter 7081 / 9295 | time 6427[s] | loss 1.45
| epoch 10 |  iter 7101 / 9295 | time 6429[s] | loss 1.44
| epoch 10 |  iter 7121 / 9295 | time 6430[s] | loss 1.44
| epoch 10 |  iter 7141 / 9295 | time 6431[s] | loss 1.45
| epoch 10 |  iter 7161 / 9295 | time 6433[s] | loss 1.43
| epoch 10 |  iter 7181 / 9295 | time 6434[s] | loss 1.47
| epoch 10 |  iter 7201 / 9295 | time 6435[s] | loss 1.45
| epoch 10 |  iter 7221 / 9295 | time 6437[s] | loss 1.43
| epoch 10 |  iter 7241 / 9295 | time 6438[s] | loss 1.40
| epoch 10 |  iter 7261 / 9295 | time 6439[s] | loss 1.42
| epoch 10 |  iter 7281 / 9295 | time 6441[s] | loss 1.43
| epoch 10 |  iter 7301 / 9295 | time 6442[s] | loss 1.44
| epoch 10 |  iter 7321 / 9295 | time 6443[s] | loss 1.46
| epoch 10 |  iter 7341 / 9295 | time 6445[s] | loss 1.44
| epoch 10 |  iter 7361 / 9295 | time 6446[s] | loss 1.49
| epoch 10 |  iter 7381 / 9295 | time 6447[s] | loss 1.40
| epoch 10 |  iter 7401 / 9295 | time 6449[s] | loss 1.44
| epoch 10 |  iter 7421 / 9295 | time 6450[s] | loss 1.46
| epoch 10 |  iter 7441 / 9295 | time 6452[s] | loss 1.45
| epoch 10 |  iter 7461 / 9295 | time 6453[s] | loss 1.45
| epoch 10 |  iter 7481 / 9295 | time 6454[s] | loss 1.43
| epoch 10 |  iter 7501 / 9295 | time 6455[s] | loss 1.46
| epoch 10 |  iter 7521 / 9295 | time 6457[s] | loss 1.50
| epoch 10 |  iter 7541 / 9295 | time 6458[s] | loss 1.47
| epoch 10 |  iter 7561 / 9295 | time 6460[s] | loss 1.48
| epoch 10 |  iter 7581 / 9295 | time 6461[s] | loss 1.46
| epoch 10 |  iter 7601 / 9295 | time 6463[s] | loss 1.44
| epoch 10 |  iter 7621 / 9295 | time 6464[s] | loss 1.46
| epoch 10 |  iter 7641 / 9295 | time 6466[s] | loss 1.47
| epoch 10 |  iter 7661 / 9295 | time 6467[s] | loss 1.44
| epoch 10 |  iter 7681 / 9295 | time 6469[s] | loss 1.44
| epoch 10 |  iter 7701 / 9295 | time 6470[s] | loss 1.46
| epoch 10 |  iter 7721 / 9295 | time 6472[s] | loss 1.45
| epoch 10 |  iter 7741 / 9295 | time 6473[s] | loss 1.42
| epoch 10 |  iter 7761 / 9295 | time 6475[s] | loss 1.45
| epoch 10 |  iter 7781 / 9295 | time 6476[s] | loss 1.44
| epoch 10 |  iter 7801 / 9295 | time 6478[s] | loss 1.48
| epoch 10 |  iter 7821 / 9295 | time 6479[s] | loss 1.47
| epoch 10 |  iter 7841 / 9295 | time 6481[s] | loss 1.46
| epoch 10 |  iter 7861 / 9295 | time 6482[s] | loss 1.46
| epoch 10 |  iter 7881 / 9295 | time 6484[s] | loss 1.49
| epoch 10 |  iter 7901 / 9295 | time 6485[s] | loss 1.46
| epoch 10 |  iter 7921 / 9295 | time 6486[s] | loss 1.48
| epoch 10 |  iter 7941 / 9295 | time 6488[s] | loss 1.48
| epoch 10 |  iter 7961 / 9295 | time 6489[s] | loss 1.48
| epoch 10 |  iter 7981 / 9295 | time 6490[s] | loss 1.45
| epoch 10 |  iter 8001 / 9295 | time 6492[s] | loss 1.49
| epoch 10 |  iter 8021 / 9295 | time 6493[s] | loss 1.43
| epoch 10 |  iter 8041 / 9295 | time 6494[s] | loss 1.44
| epoch 10 |  iter 8061 / 9295 | time 6495[s] | loss 1.45
| epoch 10 |  iter 8081 / 9295 | time 6497[s] | loss 1.43
| epoch 10 |  iter 8101 / 9295 | time 6498[s] | loss 1.46
| epoch 10 |  iter 8121 / 9295 | time 6499[s] | loss 1.49
| epoch 10 |  iter 8141 / 9295 | time 6501[s] | loss 1.43
| epoch 10 |  iter 8161 / 9295 | time 6502[s] | loss 1.43
| epoch 10 |  iter 8181 / 9295 | time 6503[s] | loss 1.43
| epoch 10 |  iter 8201 / 9295 | time 6505[s] | loss 1.43
| epoch 10 |  iter 8221 / 9295 | time 6506[s] | loss 1.47
| epoch 10 |  iter 8241 / 9295 | time 6507[s] | loss 1.49
| epoch 10 |  iter 8261 / 9295 | time 6509[s] | loss 1.46
| epoch 10 |  iter 8281 / 9295 | time 6510[s] | loss 1.43
| epoch 10 |  iter 8301 / 9295 | time 6511[s] | loss 1.46
| epoch 10 |  iter 8321 / 9295 | time 6513[s] | loss 1.44
| epoch 10 |  iter 8341 / 9295 | time 6514[s] | loss 1.48
| epoch 10 |  iter 8361 / 9295 | time 6515[s] | loss 1.48
| epoch 10 |  iter 8381 / 9295 | time 6517[s] | loss 1.48
| epoch 10 |  iter 8401 / 9295 | time 6518[s] | loss 1.46
| epoch 10 |  iter 8421 / 9295 | time 6519[s] | loss 1.42
| epoch 10 |  iter 8441 / 9295 | time 6520[s] | loss 1.46
| epoch 10 |  iter 8461 / 9295 | time 6522[s] | loss 1.45
| epoch 10 |  iter 8481 / 9295 | time 6523[s] | loss 1.42
| epoch 10 |  iter 8501 / 9295 | time 6524[s] | loss 1.45
| epoch 10 |  iter 8521 / 9295 | time 6526[s] | loss 1.45
| epoch 10 |  iter 8541 / 9295 | time 6527[s] | loss 1.44
| epoch 10 |  iter 8561 / 9295 | time 6528[s] | loss 1.47
| epoch 10 |  iter 8581 / 9295 | time 6530[s] | loss 1.49
| epoch 10 |  iter 8601 / 9295 | time 6531[s] | loss 1.47
| epoch 10 |  iter 8621 / 9295 | time 6532[s] | loss 1.45
| epoch 10 |  iter 8641 / 9295 | time 6534[s] | loss 1.48
| epoch 10 |  iter 8661 / 9295 | time 6535[s] | loss 1.43
| epoch 10 |  iter 8681 / 9295 | time 6536[s] | loss 1.47
| epoch 10 |  iter 8701 / 9295 | time 6538[s] | loss 1.46
| epoch 10 |  iter 8721 / 9295 | time 6539[s] | loss 1.44
| epoch 10 |  iter 8741 / 9295 | time 6540[s] | loss 1.42
| epoch 10 |  iter 8761 / 9295 | time 6541[s] | loss 1.44
| epoch 10 |  iter 8781 / 9295 | time 6543[s] | loss 1.45
| epoch 10 |  iter 8801 / 9295 | time 6544[s] | loss 1.47
| epoch 10 |  iter 8821 / 9295 | time 6545[s] | loss 1.46
| epoch 10 |  iter 8841 / 9295 | time 6547[s] | loss 1.45
| epoch 10 |  iter 8861 / 9295 | time 6548[s] | loss 1.47
| epoch 10 |  iter 8881 / 9295 | time 6550[s] | loss 1.48
| epoch 10 |  iter 8901 / 9295 | time 6551[s] | loss 1.43
| epoch 10 |  iter 8921 / 9295 | time 6552[s] | loss 1.45
| epoch 10 |  iter 8941 / 9295 | time 6553[s] | loss 1.47
| epoch 10 |  iter 8961 / 9295 | time 6555[s] | loss 1.43
| epoch 10 |  iter 8981 / 9295 | time 6556[s] | loss 1.47
| epoch 10 |  iter 9001 / 9295 | time 6557[s] | loss 1.44
| epoch 10 |  iter 9021 / 9295 | time 6559[s] | loss 1.45
| epoch 10 |  iter 9041 / 9295 | time 6560[s] | loss 1.42
| epoch 10 |  iter 9061 / 9295 | time 6561[s] | loss 1.45
| epoch 10 |  iter 9081 / 9295 | time 6563[s] | loss 1.40
| epoch 10 |  iter 9101 / 9295 | time 6564[s] | loss 1.49
| epoch 10 |  iter 9121 / 9295 | time 6565[s] | loss 1.45
| epoch 10 |  iter 9141 / 9295 | time 6567[s] | loss 1.48
| epoch 10 |  iter 9161 / 9295 | time 6568[s] | loss 1.48
| epoch 10 |  iter 9181 / 9295 | time 6569[s] | loss 1.42
| epoch 10 |  iter 9201 / 9295 | time 6571[s] | loss 1.47
| epoch 10 |  iter 9221 / 9295 | time 6572[s] | loss 1.46
| epoch 10 |  iter 9241 / 9295 | time 6574[s] | loss 1.45
| epoch 10 |  iter 9261 / 9295 | time 6575[s] | loss 1.44
| epoch 10 |  iter 9281 / 9295 | time 6576[s] | loss 1.45
<Figure size 640x480 with 1 Axes>
In [2]:
# 学習開始
trainer.plot()

4.4 word2vecに関する残りのテーマ

4.4.1 word2vecを使ったアプリケーションの例

自然言語処理の分野においては、単語の分散表現が重要な理由は「転移学習(transfer learning)」にある。

自然言語処理に関しては、先に大きなコーパスで学習を行い、その学習済みの分散表現を個別のタスクで使用する。

単語の分散表現の利点は、単語を固定長のベクトルに変換できることにある。 自然言語をベクトルに変換できれば、一般的な機械学習の手法が適用可能となる。

4.5 まとめ

  • word2vecを使うと計算量が増加するので、近似計算を行う。
  • Embeddingレイヤを使って、単語のIDを抽出する。計算量を減らす役割がある。
    • Negative Samplingを使って、負例をいくつかサンプリングする。
    • 二値分類を使用して、計算量を減らす。
  • 単語の分散表現は、単語の意味が含まれており、単語ベクトル空間上で似た単語は近い位置に表示されるようになる。
  • word2vecの単語の分散表現は、類推問題をベクトルの加算と減算によって解ける性質を持つ。
  • word2vecは転移学習の点で特に重要である。様々な自然言語処理に適用可能である。