Stemmer: Morphological Analyzer & Disambiguator

class vnlp.stemmer_morph_analyzer.stemmer_morph_analyzer.StemmerAnalyzer(evaluate=False)[source]

StemmerAnalyzer Class.

This is a Morphological Disambiguator.

  • This is an implementation of The Role of Context in Neural Morphological Disambiguation.

  • There are slight modifications to the original paper:

  • This version uses GRU instead of LSTM, which decreases the number of parameters by 25% with no actual performance penalty.

  • This version has an extra Dense layer before the output(p) layer.

  • During training, the positions of candidates and labels are shuffled in every batch.

  • It achieves 0.9596 accuracy on ambigious tokens and 0.9745 accuracy on all tokens on trmorph2006 dataset, compared to 0.910 and 0.964 in the original paper.

  • For more details about the implementation, training procedure and evaluation metrics, see ReadMe.

predict(sentence: str) List[str][source]

High level user API for Morphological Disambiguation.

Parameters

sentence – Input text(sentence).

Returns

List of selected stem and morphological tags for each token.

Example:

from vnlp import StemmerAnalyzer
stemmer = StemmerAnalyzer()
stemmer.predict("Üniversite sınavlarına canla başla çalışıyorlardı.")

['üniversite+Noun+A3sg+Pnon+Nom',
'sınav+Noun+A3pl+P3sg+Dat',
'can+Noun+A3sg+Pnon+Ins',
'baş+Noun+A3sg+Pnon+Ins',
'çalış+Verb+Pos+Prog1+A3pl+Past',
'.+Punc']