Named Entity Recognizer¶

class vnlp.named_entity_recognizer.named_entity_recognizer.NamedEntityRecognizer(model='SPUContextNER', evaluate=False)[source]¶

Main API class for Named Entity Recognizer implementations.

Available models: [‘SPUContextNER’, ‘CharNER’]

In order to evaluate, initialize the class with “evaluate = True” argument. This will load the model weights that are not trained on test sets.

predict(sentence: str, displacy_format: bool = False) → List[Tuple[str, str]][source]¶

High level user API for Named Entity Recognition.

Parameters:

sentence – Input sentence/text.
displacy_format – When set True, returns the result in spacy.displacy format to allow visualization.

Returns:

NER result as pairs of (token, entity).

Example:

from vnlp import NamedEntityRecognizer
ner = NamedEntityRecognizer()
ner.predict("Benim adım Melikşah, 29 yaşındayım, İstanbul'da ikamet ediyorum ve VNGRS AI Takımı'nda çalışıyorum.")

[('Benim', 'O'),
('adım', 'O'),
('Melikşah', 'PER'),
(',', 'O'),
('29', 'O'),
('yaşındayım', 'O'),
(',', 'O'),
("İstanbul'da", 'LOC'),
('ikamet', 'O'),
('ediyorum', 'O'),
('ve', 'O'),
('VNGRS', 'ORG'),
('AI', 'ORG'),
("Takımı'nda", 'ORG'),
('çalışıyorum', 'O'),
('.', 'O')]

# Visualization with Spacy:
import spacy
from vnlp import NamedEntityRecognizer
ner = NamedEntityRecognizer()
result = ner.predict("İstanbul'dan Foça'ya giderken Zeynep ile Bursa'ya uğradık.", displacy_format = True)
spacy.displacy.render(result, style="ent", manual = True)

SentencePiece Unigram Context Named Entity Recognizer¶

class vnlp.named_entity_recognizer.spu_context_ner.SPUContextNER(evaluate)[source]¶

SentencePiece Unigram Context Named Entity Recognizer class.

This is a context aware Deep GRU based Named Entity Recognizer that uses SentencePiece Unigram tokenizer and pre-trained Word2Vec embeddings.
It achieves 0.9928 Accuracy and 0.9833 F1 score on test sets of “wikiann”, “gungor.ner” and “teghub-TurkishNER-BERT” datasets.
It achieves 0.9766 F1 score for “ORG”, 0.9852 F1 score for “PER” and 0.9742 F1 score for “LOC” entities. (Treating entity of interest as positive, all others as negative class.)
For more details about the training procedure, dataset and evaluation metrics, see ReadMe.

predict(sentence: str, displacy_format: bool = False) → List[Tuple[int, str, int, str]][source]¶

Parameters:

sentence – Input sentence/text.
displacy_format – When set True, returns the result in spacy.displacy format to allow visualization.

Returns:

NER result as pairs of (token, entity).

CharNER¶

class vnlp.named_entity_recognizer.charner.CharNER(evaluate)[source]¶

CharNER Named Entity Recognizer.

This is an implementation of CharNER: Character-Level Named Entity Recognition.
There are slight modifications to the original paper:
This version is trained for Turkish language only.
This version uses simple Mode operation among the character predictions of each token, instead of Viterbi Decoder
It achieves 0.9589 Accuracy and 0.9200 F1_macro_score.
Input data is processed by NLTK.tokenize.WordPunctTokenizer so that each punctuation becomes a new token.
Entity labels are: [‘O’, ‘PER’, ‘LOC’, ‘ORG’]
For more details about the training procedure, dataset and evaluation metrics, see ReadMe.

predict(text: str, displacy_format: bool = False) → List[Tuple[str, str]][source]¶

Parameters:

text – Input text.
displacy_format – When set True, returns the result in spacy.displacy format to allow visualization.

Returns:

NER result as pairs of (token, entity).