Files used for Hindi data generation are taken from this github repo which mainly took data from IIT Bombay University.
You need to download w2v from this website.
% you need to create an empty file nball.txt for output
$ python nball.py --train_nball /Users/<user-name>/data/nball.txt --w2v /Users/<user-name>/data/cc.hi.300.vec --ws_child /Users/<user-name>/data/wordSenseChildren.txt --ws_catcode /Users/<user-name>/data/glove/catCodes.txt --log log.txt
% --train_nball: output file of nball embeddings
% --w2v: file of pre-trained word embeddings
% --ws_child: file of parent-children relations among word-senses
% --ws_catcode: file of the parent location code of a word-sense in the tree structure
% --log: log file, shall be located in the same directory as the file of nball embeddings
The training process can take around 3 days.
$ python nball.py --zero_energy <output-path> --ball <output-file> --ws_child /Users/<user-name>/data/wordSenseChildren.txt
% --zero_energy <output-path> : output path of the nballs of Experiment 1.1, e.g. ```/Users/<user-name>/data/data_out```
% --ball <output-file> : the name of the output nball-embedding file
% --ws_child /Users/<user-name>/data/wordSenseChildren.txt: file of parent-children relations among word-senses
The checking process can take a very long time around 3-4 hours.
If zero-energy is achieved, a big nball-embedding file will be created <output-path>/<output-file>
otherwise, failed relations and word-senses will be printed.
** Test result at Ubuntu platform:
© T. Dong, C. Bauckhage Licensed under a CC BY-NC 4.0 . |
Acknowledgments: This material was prepared within the project P3ML which is funded by the Ministry of Education and Research of Germany (BMBF) under grant number 01/S17064. The authors gratefully acknowledge this support. |