Named Entity Recognizer

class vnlp.named_entity_recognizer.named_entity_recognizer.NamedEntityRecognizer(model='SPUContextNER', evaluate=False)[source]

Main API class for Named Entity Recognizer implementations.

Available models: [‘SPUContextNER’, ‘CharNER’]

In order to evaluate, initialize the class with “evaluate = True” argument. This will load the model weights that are not trained on test sets.

predict(sentence: str, displacy_format: bool = False) List[Tuple[str, str]][source]

High level user API for Named Entity Recognition.

Parameters:
  • sentence – Input sentence/text.

  • displacy_format – When set True, returns the result in spacy.displacy format to allow visualization.

Returns:

NER result as pairs of (token, entity).

Example:

from vnlp import NamedEntityRecognizer
ner = NamedEntityRecognizer()
ner.predict("Benim adım Melikşah, 29 yaşındayım, İstanbul'da ikamet ediyorum ve VNGRS AI Takımı'nda çalışıyorum.")

[('Benim', 'O'),
('adım', 'O'),
('Melikşah', 'PER'),
(',', 'O'),
('29', 'O'),
('yaşındayım', 'O'),
(',', 'O'),
("İstanbul'da", 'LOC'),
('ikamet', 'O'),
('ediyorum', 'O'),
('ve', 'O'),
('VNGRS', 'ORG'),
('AI', 'ORG'),
("Takımı'nda", 'ORG'),
('çalışıyorum', 'O'),
('.', 'O')]

# Visualization with Spacy:
import spacy
from vnlp import NamedEntityRecognizer
ner = NamedEntityRecognizer()
result = ner.predict("İstanbul'dan Foça'ya giderken Zeynep ile Bursa'ya uğradık.", displacy_format = True)
spacy.displacy.render(result, style="ent", manual = True)

SentencePiece Unigram Context Named Entity Recognizer

class vnlp.named_entity_recognizer.spu_context_ner.SPUContextNER(evaluate)[source]

SentencePiece Unigram Context Named Entity Recognizer class.

  • This is a context aware Deep GRU based Named Entity Recognizer that uses SentencePiece Unigram tokenizer and pre-trained Word2Vec embeddings.

  • It achieves 0.9928 Accuracy and 0.9833 F1 score on test sets of “wikiann”, “gungor.ner” and “teghub-TurkishNER-BERT” datasets.

  • It achieves 0.9766 F1 score for “ORG”, 0.9852 F1 score for “PER” and 0.9742 F1 score for “LOC” entities. (Treating entity of interest as positive, all others as negative class.)

  • For more details about the training procedure, dataset and evaluation metrics, see ReadMe.

predict(sentence: str, displacy_format: bool = False) List[Tuple[int, str, int, str]][source]
Parameters:
  • sentence – Input sentence/text.

  • displacy_format – When set True, returns the result in spacy.displacy format to allow visualization.

Returns:

NER result as pairs of (token, entity).

CharNER

class vnlp.named_entity_recognizer.charner.CharNER(evaluate)[source]

CharNER Named Entity Recognizer.

  • This is an implementation of CharNER: Character-Level Named Entity Recognition.

  • There are slight modifications to the original paper:

  • This version is trained for Turkish language only.

  • This version uses simple Mode operation among the character predictions of each token, instead of Viterbi Decoder

  • It achieves 0.9589 Accuracy and 0.9200 F1_macro_score.

  • Input data is processed by NLTK.tokenize.WordPunctTokenizer so that each punctuation becomes a new token.

  • Entity labels are: [‘O’, ‘PER’, ‘LOC’, ‘ORG’]

  • For more details about the training procedure, dataset and evaluation metrics, see ReadMe.

predict(text: str, displacy_format: bool = False) List[Tuple[str, str]][source]
Parameters:
  • text – Input text.

  • displacy_format – When set True, returns the result in spacy.displacy format to allow visualization.

Returns:

NER result as pairs of (token, entity).