Part of Speech Tagger

class vnlp.part_of_speech_tagger.part_of_speech_tagger.PoSTagger(model='SPUContextPoS', evaluate=False, *args)[source]

Main API class for Part of Speech Tagger implementations.

Available models: [‘SPUContextPoS’, ‘TreeStackPoS’]

In order to evaluate, initialize the class with “evaluate = True” argument. This will load the model weights that are not trained on test sets.

predict(sentence: str) List[Tuple[str, str]][source]

High level user API for Part of Speech Tagging.

Parameters:

sentence – Input text(sentence).

Returns:

List of (token, pos_label).

Example:

from vnlp import PoSTagger
pos = PoSTagger()
pos.predict("Vapurla Beşiktaş'a geçip yürüyerek Maçka Parkı'na ulaştım.")

[("Oğuz'un", 'PROPN'),
 ('kırmızı', 'ADJ'),
 ('bir', 'DET'),
 ("Astra'sı", 'PROPN'),
 ('vardı', 'VERB'),
 ('.', 'PUNCT')]

SentencePiece Unigram Context Part of Speech Tagger

class vnlp.part_of_speech_tagger.spu_context_pos.SPUContextPoS(evaluate)[source]

SentencePiece Unigram Context Part of Speech Tagger class.

  • This is a context aware Deep GRU based Part of Speech Tagger that uses SentencePiece Unigram tokenizer and pre-trained Word2Vec embeddings.

  • It achieves 0.9010 Accuracy and 0.7623 F1 macro score on all of test sets of Universal Dependencies 2.9.

  • For more details about the training procedure, dataset and evaluation metrics, see ReadMe.

predict(sentence: str) List[Tuple[str, str]][source]
Parameters:

sentence – Input text(sentence).

Returns:

List of (token, pos_label).

Tree-stack Part of Speech Tagger

class vnlp.part_of_speech_tagger.treestack_pos.TreeStackPoS(evaluate, stemmer_analyzer=None)[source]

Tree-stack Part of Speech Tagger class.

  • This Part of Speech Tagger is inspired by Tree-stack LSTM in Transition Based Dependency Parsing.

  • “Inspire” is emphasized because this implementation uses the approach of using Morphological Tags, Pre-trained word embeddings and POS tags as input for the model, rather than implementing the exact network proposed in the paper.

  • It achieves 0.89 Accuracy and 0.71 F1_macro_score on test sets of Universal Dependencies 2.9.

  • Input data is processed by NLTK.tokenize.TreebankWordTokenizer.

  • For more details about the training procedure, dataset and evaluation metrics, see ReadMe.

predict(sentence: str) List[Tuple[str, str]][source]
Parameters:

sentence – Input text(sentence).

Returns:

List of (token, pos_label).