Part of Speech Tagger¶

class vnlp.part_of_speech_tagger.part_of_speech_tagger.PoSTagger(model='SPUContextPoS', evaluate=False, *args)[source]¶

Main API class for Part of Speech Tagger implementations.

Available models: [‘SPUContextPoS’, ‘TreeStackPoS’]

In order to evaluate, initialize the class with “evaluate = True” argument. This will load the model weights that are not trained on test sets.

predict(sentence: str) → List[Tuple[str, str]][source]¶

High level user API for Part of Speech Tagging.

Parameters:: sentence – Input text(sentence).
Returns:: List of (token, pos_label).

Example:

from vnlp import PoSTagger
pos = PoSTagger()
pos.predict("Vapurla Beşiktaş'a geçip yürüyerek Maçka Parkı'na ulaştım.")

[("Oğuz'un", 'PROPN'),
 ('kırmızı', 'ADJ'),
 ('bir', 'DET'),
 ("Astra'sı", 'PROPN'),
 ('vardı', 'VERB'),
 ('.', 'PUNCT')]

SentencePiece Unigram Context Part of Speech Tagger¶

class vnlp.part_of_speech_tagger.spu_context_pos.SPUContextPoS(evaluate)[source]¶

SentencePiece Unigram Context Part of Speech Tagger class.

This is a context aware Deep GRU based Part of Speech Tagger that uses SentencePiece Unigram tokenizer and pre-trained Word2Vec embeddings.
It achieves 0.9010 Accuracy and 0.7623 F1 macro score on all of test sets of Universal Dependencies 2.9.
For more details about the training procedure, dataset and evaluation metrics, see ReadMe.

predict(sentence: str) → List[Tuple[str, str]][source]¶

Parameters:: sentence – Input text(sentence).
Returns:: List of (token, pos_label).

Tree-stack Part of Speech Tagger¶

class vnlp.part_of_speech_tagger.treestack_pos.TreeStackPoS(evaluate, stemmer_analyzer=None)[source]¶

Tree-stack Part of Speech Tagger class.

This Part of Speech Tagger is inspired by Tree-stack LSTM in Transition Based Dependency Parsing.
“Inspire” is emphasized because this implementation uses the approach of using Morphological Tags, Pre-trained word embeddings and POS tags as input for the model, rather than implementing the exact network proposed in the paper.
It achieves 0.89 Accuracy and 0.71 F1_macro_score on test sets of Universal Dependencies 2.9.
Input data is processed by NLTK.tokenize.TreebankWordTokenizer.
For more details about the training procedure, dataset and evaluation metrics, see ReadMe.

predict(sentence: str) → List[Tuple[str, str]][source]¶

Parameters:: sentence – Input text(sentence).
Returns:: List of (token, pos_label).