(C) 2024 by Damir Cavar
Use the config widget on spaCy's website to generate a base_config.cfg
configuration and paste it into the base_config.cfg
file in this folder.
Run the following command to create a full config:
!python -m spacy init fill-config ./base_config.cfg ./config.cfg
Go to the Universal Dependencies website and download the UD-Marathi-UFAL data (train and dev). Convert the CoNLL files to the spaCy binary format using the following commands:
!python -m spacy convert ./mr_ufal-ud-dev.conllu ./dev.spacy --converter conllu --file-type spacy --seg-sents --morphology --merge-subtokens --lang mr
!python -m spacy convert ./mr_ufal-ud-train.conllu ./train.spacy --converter conllu --file-type spacy --seg-sents --morphology --merge-subtokens --lang mr
Run the training for Marathi:
!python -m spacy train config.cfg --output ./output --paths.train ./train.spacy --paths.dev ./dev.spacy
import spacy
nlp = spacy.load("./output")
text = u"त्याला एक मुलगा होता."
doc = nlp(text)
for token in doc:
print("\t".join( (token.text, str(token.idx), token.lemma_, token.pos_, token.tag_, token.dep_,
token.shape_, str(token.is_alpha), str(token.is_stop) )))
(C) 2024 by Damir Cavar