Dependency Parser

class vnlp.dependency_parser.dependency_parser.DependencyParser(model='SPUContextDP', evaluate=False)[source]

Main API class for Dependency Parser implementations.

Available models: [‘SPUContextDP’, ‘TreeStackDP’]

In order to evaluate, initialize the class with “evaluate = True” argument. This will load the model weights that are not trained on test sets.

predict(sentence: str, displacy_format: bool = False, pos_result: List[Tuple[str, str]] | None = None) List[Tuple[int, str, int, str]][source]

High level user API for Dependency Parsing.

Parameters:
  • sentence – Input sentence.

  • displacy_format – When set True, returns the result in spacy.displacy format to allow visualization.

  • pos_result – Part of Speech tags. To be used when displacy_format = True.

Returns:

List of (token_index, token, arc, label).

Raises:

ValueError – Sentence is too long. Try again by splitting it into smaller pieces.

Example:

from vnlp import DependencyParser
dependency_parser = DependencyParser()
dependency_parser.predict("Onun için yol arkadaşlarımızı titizlikle seçer, kendilerini iyice sınarız.")

[(1, 'Onun', 6, 'obl'),
 (2, 'için', 1, 'case'),
 (3, 'yol', 4, 'nmod'),
 (4, 'arkadaşlarımızı', 6, 'obj'),
 (5, 'titizlikle', 6, 'obl'),
 (6, 'seçer', 10, 'parataxis'),
 (7, ',', 6, 'punct'),
 (8, 'kendilerini', 10, 'obj'),
 (9, 'iyice', 10, 'advmod'),
 (10, 'sınarız', 0, 'root'),
 (11, '.', 10, 'punct')]

# Visualization with Spacy:
import spacy
from vnlp import DependencyParser
dependency_parser = DependencyParser()
result = dependency_parser.predict(Oğuz'un kırmızı bir Astra' vardı.", displacy_format = True)
spacy.displacy.render(result, style="dep", manual = True)

SentencePiece Unigram Context Dependency Parser

class vnlp.dependency_parser.spu_context_dp.SPUContextDP(evaluate)[source]

SentencePiece Unigram Context Dependency Parser class.

  • This is a context aware Deep GRU based Dependency Parser that uses SentencePiece Unigram tokenizer and pre-trained Word2Vec embeddings.

  • It achieves 0.7117 LAS (Labeled Attachment Score) and 0.8370 UAS (Unlabeled Attachment Score) on all of test sets of Universal Dependencies 2.9.

  • For more details about the training procedure, dataset and evaluation metrics, see ReadMe.

predict(sentence: str, displacy_format: bool = False, pos_result: List[Tuple[str, str]] | None = None) List[Tuple[int, str, int, str]][source]
Parameters:
  • sentence – Input sentence.

  • displacy_format – When set True, returns the result in spacy.displacy format to allow visualization.

  • pos_result – Part of Speech tags. To be used when displacy_format = True.

Returns:

List of (token_index, token, arc, label).

Raises:

ValueError – Sentence is too long. Try again by splitting it into smaller pieces.

Tree-stack Dependency Parser

class vnlp.dependency_parser.treestack_dp.TreeStackDP(evaluate)[source]

Tree-stack Dependency Parser class.

  • This dependency parser is inspired by Tree-stack LSTM in Transition Based Dependency Parsing.

  • “Inspire” is emphasized because this implementation uses the approach of using Morphological Tags, Pre-trained word embeddings and POS tags as input for the model, rather than implementing the exact network proposed in the paper.

  • It achieves 0.6914 LAS (Labeled Attachment Score) and 0.8048 UAS (Unlabeled Attachment Score) on all of test sets of Universal Dependencies 2.9.

  • Input data is processed by NLTK.tokenize.TreebankWordTokenizer.

  • For more details about the training procedure, dataset and evaluation metrics, see ReadMe.

predict(sentence: str, displacy_format: bool = False, *args) List[Tuple[int, str, int, str]][source]
Parameters:
  • sentence – Input sentence.

  • displacy_format – When set True, returns the result in spacy.displacy format to allow visualization.

Returns:

List of (token_index, token, arc, label).

Raises:

ValueError – Sentence is too long. Try again by splitting it into smaller pieces.