hindinballs:¶

Contributed by Praveen Yadav, GitHub Repo ¶

Resources Used:¶

Files used for Hindi data generation are taken from this github repo which mainly took data from IIT Bombay University.
You need to download w2v from this website.

Experiment 1: Training and evaluating nball embeddings¶

Experiment 1.1: Training nball embeddings¶

Please also go through this Informative Report on how Hindi Data is structure and how to process it to use it for this experiment or you can also go through data_preprocessing and data_generation notebooks for Hindi language.

% you need to create an empty file nball.txt for output

$ python nball.py --train_nball /Users/<user-name>/data/nball.txt --w2v /Users/<user-name>/data/cc.hi.300.vec  --ws_child /Users/<user-name>/data/wordSenseChildren.txt  --ws_catcode /Users/<user-name>/data/glove/catCodes.txt  --log log.txt
% --train_nball: output file of nball embeddings
% --w2v: file of pre-trained word embeddings
% --ws_child: file of parent-children relations among word-senses
% --ws_catcode: file of the parent location code of a word-sense in the tree structure
% --log: log file, shall be located in the same directory as the file of nball embeddings

exp1.1.jpeg

The training process can take around 3 days.

Experiment 1.2: Checking whether tree structures are perfectly embedded into word-embeddings¶

main input is the output directory of nballs created in Experiment 1.1
shell command for running the nball construction and training process

$ python nball.py --zero_energy <output-path> --ball <output-file> --ws_child /Users/<user-name>/data/wordSenseChildren.txt
% --zero_energy <output-path> : output path of the nballs of Experiment 1.1, e.g. ```/Users/<user-name>/data/data_out```
% --ball <output-file> : the name of the output nball-embedding file
% --ws_child /Users/<user-name>/data/wordSenseChildren.txt: file of parent-children relations among word-senses

The checking process can take a very long time around 3-4 hours.

result

If zero-energy is achieved, a big nball-embedding file will be created <output-path>/<output-file> otherwise, failed relations and word-senses will be printed.

** Test result at Ubuntu platform: exp1.2.jpeg

Acknowledgments: This material was prepared within the project P3ML which is funded by the Ministry of Education and Research of Germany (BMBF) under grant number 01/S17064. The authors gratefully acknowledge this support.