Stemmer: Morphological Analyzer & Disambiguator

class vnlp.stemmer_morph_analyzer.stemmer_morph_analyzer.StemmerAnalyzer(evaluate=False)[source]

StemmerAnalyzer Class.

This is a Morphological Disambiguator.

  • This is an implementation of The Role of Context in Neural Morphological Disambiguation.

  • There are slight modifications to the original paper:

  • This version uses GRU instead of LSTM, which decreases the number of parameters by 25% with no actual performance penalty.

  • This version has an extra Dense layer before the output(p) layer.

  • During training, the positions of candidates and labels are shuffled in every batch.

  • It achieves 0.9467 accuracy on ambigious tokens and 0.9664 accuracy on all tokens on trmorph2006 dataset, compared to 0.910 and 0.964 in the original paper.

  • For more details about the implementation, training procedure and evaluation metrics, see ReadMe.

predict(sentence: str, batch_size: int = 64) List[str][source]

High level user API for Morphological Disambiguation.

Parameters:
  • sentence – Input text(sentence).

  • batch_size – batch size (number of tokens to be predicted together.) In case of long sentences, you can increase this for better performance on GPU. If you come across OOM error, decrease until error disappears.

Returns:

List of selected stem and morphological tags for each token.

Example:

from vnlp import StemmerAnalyzer
stemmer = StemmerAnalyzer()
stemmer.predict("Üniversite sınavlarına canla başla çalışıyorlardı.")

['üniversite+Noun+A3sg+Pnon+Nom',
'sınav+Noun+A3pl+P3sg+Dat',
'can+Noun+A3sg+Pnon+Ins',
'baş+Noun+A3sg+Pnon+Ins',
'çalış+Verb+Pos+Prog1+A3pl+Past',
'.+Punc']