Nazokake with gensim¶

事前に cython, gensimをインストールしておく。

conda install cython
conda install gensim

情報元として、text8 を利用する。

wget http://mattmahoney.net/dc/text8.zip -P /tmp
unzip text8.zip

In [172]:

from gensim.models import word2vec
data = word2vec.Text8Corpus('/tmp/text8')
model = word2vec.Word2Vec(data, size=200)

In [51]:

model.save("sample.model")

In [170]:

model.most_similar(positive=['dog','cat'])

Out[170]:

[('goat', 0.7966023683547974),
 ('bee', 0.7859479784965515),
 ('pig', 0.7830795645713806),
 ('bird', 0.7660524845123291),
 ('hound', 0.7580216526985168),
 ('panda', 0.7541525363922119),
 ('hamster', 0.7503507137298584),
 ('ass', 0.7488968372344971),
 ('haired', 0.7469390630722046),
 ('rat', 0.7466884851455688)]

In [121]:

wx = 'japanese'
x = model.wv[wx]

In [165]:

wy = 'smart'
y = model.wv[wy]

In [166]:

sx = set()
for word, emb in model.most_similar([x], [], 500):
    sx.add(word)

In [167]:

sy = set()
for word, emb in model.most_similar([y], [], 500):
    sy.add(word)

In [168]:

sz = (sx & sy)

In [169]:

for wz in sz:
    print(wx + "とかけまして" + wy + "と解く。その心は・・・" + wz + "でございます")

japaneseとかけましてsmartと解く。その心は・・・capcomでございます
japaneseとかけましてsmartと解く。その心は・・・starcraftでございます
japaneseとかけましてsmartと解く。その心は・・・doraemonでございます

Nazokake with gensim¶

Reference¶