Notebook

Prepare Environment¶

In [2]:

!pip install -q tensorflow-gpu==2.0.0-beta1
# !pip install -q tensorflow-gpu==1.15

     |████████████████████████████████| 348.9MB 42kB/s 
     |████████████████████████████████| 3.1MB 60.1MB/s 
     |████████████████████████████████| 501kB 57.3MB/s

In [3]:

import tensorflow as tf
import tensorflow_hub as hub
import numpy as np
import os
from sklearn.metrics import classification_report
from gensim.models import Word2Vec

/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/dtypes.py:516: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/dtypes.py:517: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/dtypes.py:518: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/dtypes.py:519: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/dtypes.py:520: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/dtypes.py:525: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  np_resource = np.dtype([("resource", np.ubyte, 1)])
/usr/local/lib/python3.6/dist-packages/tensorboard/compat/tensorflow_stub/dtypes.py:541: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
/usr/local/lib/python3.6/dist-packages/tensorboard/compat/tensorflow_stub/dtypes.py:542: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/usr/local/lib/python3.6/dist-packages/tensorboard/compat/tensorflow_stub/dtypes.py:543: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
/usr/local/lib/python3.6/dist-packages/tensorboard/compat/tensorflow_stub/dtypes.py:544: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/usr/local/lib/python3.6/dist-packages/tensorboard/compat/tensorflow_stub/dtypes.py:545: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
/usr/local/lib/python3.6/dist-packages/tensorboard/compat/tensorflow_stub/dtypes.py:550: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  np_resource = np.dtype([("resource", np.ubyte, 1)])

In [4]:

tf.__version__

Out[4]:

'2.0.0-beta1'

In [0]:

# train and save word2vec model, this step can be removed after uploading the trained model to github
# !wget https://github.com/raqueeb/datasets/raw/master/bnwiki-texts.zip
# !unzip bnwiki-texts.zip

# preprocessed_text_file_path = 'bnwiki-texts-preprocessed.txt'

# lines_from_file = []
# with open(preprocessed_text_file_path, encoding='utf8') as text_file:
#     for line in text_file:
#         lines_from_file.append(line)

# tokenized_lines = []
# for single_line in lines_from_file:
#     tokenized_lines.append(single_line.split())

# model = Word2Vec(tokenized_lines, size=200, window=5, min_count=10)

In [0]:

# model.wv.most_similar('ছেলে', topn=5)

In [0]:

# model.wv.save_word2vec_format('bn-wiki-word2vec-300.txt', binary=False)

In [0]:

Dataset¶

In [6]:

!wget http://119.81.77.70:8090/bn-wiki-word2vec-300.txt

--2019-11-21 05:31:24--  http://119.81.77.70:8090/bn-wiki-word2vec-300.txt
Connecting to 119.81.77.70:8090... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2496996336 (2.3G) [text/plain]
Saving to: ‘bn-wiki-word2vec-300.txt’

bn-wiki-word2vec-30 100%[===================>]   2.33G  12.7MB/s    in 3m 7s   

2019-11-21 05:34:31 (12.8 MB/s) - ‘bn-wiki-word2vec-300.txt’ saved [2496996336/2496996336]

In [7]:

!ls

bn-wiki-word2vec-300.txt  sample_data

In [8]:

!wget https://raw.githubusercontent.com/tensorflow/hub/master/examples/text_embeddings_v2/export_v2.py
# !wget https://raw.githubusercontent.com/tensorflow/hub/master/examples/text_embeddings/export.py

--2019-11-21 05:34:37--  https://raw.githubusercontent.com/tensorflow/hub/master/examples/text_embeddings_v2/export_v2.py
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.0.133, 151.101.64.133, 151.101.128.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.0.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 7603 (7.4K) [text/plain]
Saving to: ‘export_v2.py’

export_v2.py        100%[===================>]   7.42K  --.-KB/s    in 0s      

2019-11-21 05:34:37 (125 MB/s) - ‘export_v2.py’ saved [7603/7603]

In [9]:

!python export_v2.py --embedding_file=/content/bn-wiki-word2vec-300.txt --export_path=text_module --num_lines_to_ignore=1 
# !python export.py --embedding_file=/content/bn-wiki-word2vec-300.txt --export_path=text_module --num_lines_to_ignore=1 --preprocess_text=True

/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/dtypes.py:516: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/dtypes.py:517: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/dtypes.py:518: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/dtypes.py:519: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/dtypes.py:520: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/dtypes.py:525: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  np_resource = np.dtype([("resource", np.ubyte, 1)])
/usr/local/lib/python3.6/dist-packages/tensorboard/compat/tensorflow_stub/dtypes.py:541: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
/usr/local/lib/python3.6/dist-packages/tensorboard/compat/tensorflow_stub/dtypes.py:542: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/usr/local/lib/python3.6/dist-packages/tensorboard/compat/tensorflow_stub/dtypes.py:543: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
/usr/local/lib/python3.6/dist-packages/tensorboard/compat/tensorflow_stub/dtypes.py:544: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/usr/local/lib/python3.6/dist-packages/tensorboard/compat/tensorflow_stub/dtypes.py:545: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
/usr/local/lib/python3.6/dist-packages/tensorboard/compat/tensorflow_stub/dtypes.py:550: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  np_resource = np.dtype([("resource", np.ubyte, 1)])
tcmalloc: large alloc 1607057408 bytes == 0x8a8f4000 @  0x7f564d8411e7 0x7f5649de0f71 0x7f5649e4455d 0x7f5649e47e28 0x7f5649e483e5 0x7f5649edefc2 0x50abc5 0x50c549 0x509ce8 0x50aa1d 0x50c549 0x5081d5 0x509647 0x5951c1 0x54a11f 0x551761 0x5aa69c 0x50ab53 0x50c549 0x509ce8 0x50aa1d 0x50c549 0x509ce8 0x50aa1d 0x50c549 0x509ce8 0x50aa1d 0x50c549 0x5081d5 0x50a020 0x50aa1d
2019-11-21 05:38:47.121986: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcuda.so.1
2019-11-21 05:38:47.191968: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-11-21 05:38:47.192557: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties: 
name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285
pciBusID: 0000:00:04.0
2019-11-21 05:38:47.193585: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcudart.so.10.0'; dlerror: libcudart.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia
2019-11-21 05:38:47.193815: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcublas.so.10.0'; dlerror: libcublas.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia
2019-11-21 05:38:47.194494: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcufft.so.10.0'; dlerror: libcufft.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia
2019-11-21 05:38:47.194613: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcurand.so.10.0'; dlerror: libcurand.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia
2019-11-21 05:38:47.195317: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcusolver.so.10.0'; dlerror: libcusolver.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia
2019-11-21 05:38:47.196074: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcusparse.so.10.0'; dlerror: libcusparse.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia
2019-11-21 05:38:47.599728: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
2019-11-21 05:38:47.599791: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1663] Cannot dlopen some GPU libraries. Skipping registering GPU devices...
2019-11-21 05:38:47.622365: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA
2019-11-21 05:38:47.802161: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-11-21 05:38:47.805560: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x322bf80 executing computations on platform CUDA. Devices:
2019-11-21 05:38:47.805652: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-11-21 05:38:47.879207: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2000175000 Hz
2019-11-21 05:38:47.879573: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x322d9c0 executing computations on platform Host. Devices:
2019-11-21 05:38:47.879605: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): <undefined>, <undefined>
2019-11-21 05:38:47.879684: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-11-21 05:38:47.879694: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187]      
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/lookup_ops.py:1159: add_dispatch_support.<locals>.wrapper (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
W1121 05:38:48.594012 140008648824704 deprecation.py:323] From /usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/lookup_ops.py:1159: add_dispatch_support.<locals>.wrapper (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
INFO:tensorflow:Assets written to: text_module/assets
I1121 05:38:55.261133 140008648824704 builder_impl.py:770] Assets written to: text_module/assets

In [0]:

module_path = "text_module"
embedding_layer = hub.KerasLayer(module_path, trainable=False)

In [11]:

embedding_layer(['বাস বাস আমার '], ['আমার']).shape

Out[11]:

TensorShape([1, 300])

In [0]:

In [12]:

!wget http://119.81.77.70:8090/bangla-sentiment.neg
!wget http://119.81.77.70:8090/bangla-sentiment.pos

all_sentences = []
with open('bangla-sentiment.pos', encoding='utf8') as f:
    all_sentences.extend([(line.strip(), 'positive') for line in f])
        
with open('bangla-sentiment.neg', encoding='utf8') as f:
    all_sentences.extend([(line.strip(), 'negative') for line in f])

--2019-11-21 05:39:02--  http://119.81.77.70:8090/bangla-sentiment.neg
Connecting to 119.81.77.70:8090... connected.
HTTP request sent, awaiting response... 200 OK
Length: 363162 (355K) [application/octet-stream]
Saving to: ‘bangla-sentiment.neg’

bangla-sentiment.ne 100%[===================>] 354.65K   386KB/s    in 0.9s    

2019-11-21 05:39:04 (386 KB/s) - ‘bangla-sentiment.neg’ saved [363162/363162]

--2019-11-21 05:39:05--  http://119.81.77.70:8090/bangla-sentiment.pos
Connecting to 119.81.77.70:8090... connected.
HTTP request sent, awaiting response... 200 OK
Length: 220062 (215K) [application/octet-stream]
Saving to: ‘bangla-sentiment.pos’

bangla-sentiment.po 100%[===================>] 214.90K   235KB/s    in 0.9s    

2019-11-21 05:39:07 (235 KB/s) - ‘bangla-sentiment.pos’ saved [220062/220062]

We can check the distribution of labels in the training and validation examples after shuffling.

In [13]:

pos_count = 0
neg_count = 0
for sentence, label in all_sentences:
  if label =='positive':
      pos_count +=1
  else:
      neg_count +=1
print(pos_count)
print(neg_count)

2039
2520

In [0]:

import random

def generator():
    random.shuffle(all_sentences)    
    for sentence, label in all_sentences:
        if label =='positive':
            label = tf.keras.utils.to_categorical(1, num_classes=2)
        else:
            label = tf.keras.utils.to_categorical(0, num_classes=2)
        sentence_tensor = tf.constant(sentence, dtype=tf.dtypes.string)
        yield sentence_tensor, label

In [0]:

def make_dataset(train_size):
  data = tf.data.Dataset.from_generator(generator=generator, 
                                        output_types=(tf.string, tf.float32))
  train_size = 4000
  train_data = data.take(train_size)
  validation_data = data.skip(train_size)
  return train_data, validation_data

In [16]:

train_data, validation_data = make_dataset(0.80)

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python/data/ops/dataset_ops.py:505: py_func (from tensorflow.python.ops.script_ops) is deprecated and will be removed in a future version.
Instructions for updating:
tf.py_func is deprecated in TF V2. Instead, there are two
    options available in V2.
    - tf.py_function takes a python function which manipulates tf eager
    tensors instead of numpy arrays. It's easy to convert a tf eager tensor to
    an ndarray (just call tensor.numpy()) but having access to eager tensors
    means `tf.py_function`s can use accelerators such as GPUs as well as
    being differentiable using a gradient tape.
    - tf.numpy_function maintains the semantics of the deprecated tf.py_func
    (it is not differentiable, and manipulates numpy arrays). It drops the
    stateful argument making all functions stateful.

In [17]:

# get a single batch of 2 elements from train_data
next(iter(train_data.batch(2)))

Out[17]:

(<tf.Tensor: id=305, shape=(2,), dtype=string, numpy=
 array([b'\xe0\xa6\xae\xe0\xa6\xbe\xe0\xa6\x87\xe0\xa6\x9c\xe0\xa6\xbf\xe0\xa6\xaa\xe0\xa6\xbf \xe0\xa6\xa4\xe0\xa7\x87 \xe0\xa6\xa1\xe0\xa7\x81\xe0\xa6\x95\xe0\xa6\xb2\xe0\xa7\x87 \xe0\xa6\x9f\xe0\xa6\xbe\xe0\xa6\x95\xe0\xa6\xbe \xe0\xa6\x95\xe0\xa7\x87\xe0\xa6\x9f\xe0\xa7\x87 \xe0\xa6\xa8\xe0\xa7\x87\xe0\xa6\x93\xe0\xa7\x9f\xe0\xa6\xbe \xe0\xa6\xb9\xe0\xa7\x9f \xe0\xa6\x95\xe0\xa7\x87\xe0\xa6\xa8?',
        b'\xe0\xa6\xa8\xe0\xa7\x87\xe0\xa6\x9f \xe0\xa6\xaa\xe0\xa7\x8d\xe0\xa6\xb0\xe0\xa6\xac\xe0\xa7\x8d\xe0\xa6\xb2\xe0\xa7\x87\xe0\xa6\xae \xe0\xa6\x9c\xe0\xa6\xbf\xe0\xa6\xaa\xe0\xa6\xbf'],
       dtype=object)>, <tf.Tensor: id=306, shape=(2, 2), dtype=float32, numpy=
 array([[1., 0.],
        [1., 0.]], dtype=float32)>)

In [0]:

sentences_in_a_single_batch, labels_in_a_single_batch = next(iter(train_data.batch(2)))

In [19]:

sentences_in_a_single_batch

Out[19]:

<tf.Tensor: id=320, shape=(2,), dtype=string, numpy=
array([b'\xe0\xa6\x86\xe0\xa6\xae\xe0\xa6\xbe\xe0\xa6\xb0 \xe0\xa6\x97\xe0\xa6\xbe \xe0\xa6\x95\xe0\xa6\xbe\xe0\xa6\x9f\xe0\xa6\xbe \xe0\xa6\xa6\xe0\xa6\xbf\xe0\xa6\x9a\xe0\xa7\x8d\xe0\xa6\x9b\xe0\xa7\x87 \xe0\xa6\x96\xe0\xa6\xac\xe0\xa6\xb0 \xe0\xa6\xaa\xe0\xa6\xa1\xe0\xa6\xbc\xe0\xa6\xa4\xe0\xa7\x87 \xe0\xa6\xaa\xe0\xa6\xbe\xe0\xa6\xb0\xe0\xa6\x9b\xe0\xa6\xbf \xe0\xa6\xa8\xe0\xa6\xbe',
       b'\xe0\xa6\xac\xe0\xa6\xbf\xe0\xa6\xb6\xe0\xa7\x8d\xe0\xa6\xac\xe0\xa7\x87\xe0\xa6\xb0 \xe0\xa6\xae\xe0\xa7\x81\xe0\xa6\xb8\xe0\xa6\xb2\xe0\xa6\xbf\xe0\xa6\xae \xe0\xa6\xa4\xe0\xa7\x8b\xe0\xa6\xae\xe0\xa6\xb0\xe0\xa6\xbe \xe0\xa6\x8f\xe0\xa6\x95 \xe0\xa6\xb9\xe0\xa6\x93'],
      dtype=object)>

In [20]:

sentences_in_a_single_batch.shape

Out[20]:

TensorShape([2])

In [21]:

labels_in_a_single_batch.shape

Out[21]:

TensorShape([2, 2])

In [0]:

sentence, label = next(iter(train_data.take(1)))

In [23]:

# numpy() returns the string as bytes. we need to decode it to read it
sentence.numpy().decode('utf8')

Out[23]:

'দোয়া করি সুস্থত হয়ে আবার সাভাবিক জিবন ফিরে পাবে'

In [24]:

# label after converted by to_categorical()
label.numpy() 

Out[24]:

array([0., 1.], dtype=float32)

Model Training and Evaluation¶

Model¶

In [0]:

def create_model():
  model = tf.keras.Sequential()
  model.add(embedding_layer)
  # model.add(tf.keras.layers.Flatten())
  # model.add(tf.keras.layers.SpatialDropout1D(0.2))
  # model.add(tf.keras.layers.LSTM(100, dropout=0.2, recurrent_dropout=0.2))
  # model.add(Dense(13, activation='softmax'))
  model.add(tf.keras.layers.Dense(256, activation="relu"))
  model.add(tf.keras.layers.Dense(128, activation="relu"))
  model.add(tf.keras.layers.Dense(2, activation="softmax"))
  model.compile(optimizer="adam",loss="categorical_crossentropy",metrics=['acc'])
  return model

In [0]:

model = create_model()
# Create earlystopping callback
# early_stopping_callback = tf.keras.callbacks.EarlyStopping(monitor='val_loss', min_delta=0, patience=3)

Training¶

In [27]:

batch_size = 256
history = model.fit(train_data.batch(batch_size), 
                    validation_data=validation_data.batch(batch_size), 
                    epochs=10,)

Epoch 1/10
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/math_grad.py:1250: add_dispatch_support.<locals>.wrapper (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
16/16 [==============================] - 16s 1s/step - loss: 0.5320 - acc: 0.6640 - val_loss: 0.0000e+00 - val_acc: 0.0000e+00
Epoch 2/10
16/16 [==============================] - 14s 853ms/step - loss: 0.3657 - acc: 0.8356 - val_loss: 0.3144 - val_acc: 0.8497
Epoch 3/10
16/16 [==============================] - 14s 853ms/step - loss: 0.3059 - acc: 0.8659 - val_loss: 0.2848 - val_acc: 0.8962
Epoch 4/10
16/16 [==============================] - 14s 875ms/step - loss: 0.2651 - acc: 0.8897 - val_loss: 0.2313 - val_acc: 0.9177
Epoch 5/10
16/16 [==============================] - 14s 862ms/step - loss: 0.2218 - acc: 0.9159 - val_loss: 0.1790 - val_acc: 0.9374
Epoch 6/10
16/16 [==============================] - 14s 855ms/step - loss: 0.1933 - acc: 0.9289 - val_loss: 0.2051 - val_acc: 0.9159
Epoch 7/10
16/16 [==============================] - 14s 849ms/step - loss: 0.1654 - acc: 0.9347 - val_loss: 0.1152 - val_acc: 0.9750
Epoch 8/10
16/16 [==============================] - 14s 850ms/step - loss: 0.1372 - acc: 0.9572 - val_loss: 0.1119 - val_acc: 0.9660
Epoch 9/10
16/16 [==============================] - 14s 854ms/step - loss: 0.1095 - acc: 0.9674 - val_loss: 0.0864 - val_acc: 0.9839
Epoch 10/10
16/16 [==============================] - 14s 851ms/step - loss: 0.0925 - acc: 0.9803 - val_loss: 0.0705 - val_acc: 0.9875

In [28]:

model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
keras_layer (KerasLayer)     multiple                  200881800 
_________________________________________________________________
dense (Dense)                multiple                  77056     
_________________________________________________________________
dense_1 (Dense)              multiple                  32896     
_________________________________________________________________
dense_2 (Dense)              multiple                  258       
=================================================================
Total params: 200,992,010
Trainable params: 110,210
Non-trainable params: 200,881,800
_________________________________________________________________

Saving model¶

After training the model we can export it as a SavedModel to deploy or share with others.

In [29]:

tf.saved_model.save(model, export_dir="my_model")

INFO:tensorflow:Assets written to: my_model/assets

Prediction¶

In [37]:

sents = ['আমরা খুবি খুশি অফারটির জন্য', 'বই পড়তে পছন্দ করি', 'বই পড়তে পছন্দ করি না', 'আমার ভালো লাগছে না', 
         'আমার কষ্ট লাগছে', 'এই বইটা বেশ ভালো লাগছে', 'একটা দুর্ঘটনা ঘটে গেল',
         'জিপি আমার প্রিয় নেটওয়ার্ক', 'মোবাইল অপারেটর বেশ টাকা কাটে', 'আমাদের প্রতিদিনের সমস্যা নিয়ে ঝামেলায় আছি',
         'ঢাকা-সিলেটসহ আশপাশের সড়কের যানবাহন চলাচল বন্ধ হয়ে যায়',]
pred_dataset = tf.data.Dataset.from_tensor_slices(sents)
prediction = model.predict(np.array(sents))

for sentence, pred_sentiment in zip(sents, prediction.argmax(axis=1)):
  print("Sentence:{} - predicted: {}".format(sentence, pred_sentiment))

Sentence:আমরা খুবি খুশি অফারটির জন্য - predicted: 1
Sentence:বই পড়তে পছন্দ করি - predicted: 1
Sentence:বই পড়তে পছন্দ করি না - predicted: 0
Sentence:আমার ভালো লাগছে না - predicted: 0
Sentence:আমার কষ্ট লাগছে - predicted: 0
Sentence:এই বইটা বেশ ভালো লাগছে - predicted: 1
Sentence:একটা দুর্ঘটনা ঘটে গেল - predicted: 0
Sentence:জিপি আমার প্রিয় নেটওয়ার্ক - predicted: 1
Sentence:মোবাইল অপারেটর বেশ টাকা কাটে - predicted: 1
Sentence:আমাদের প্রতিদিনের সমস্যা নিয়ে ঝামেলায় আছি - predicted: 0
Sentence:ঢাকা-সিলেটসহ আশপাশের সড়কের যানবাহন চলাচল বন্ধ হয়ে যায় - predicted: 0