Dependency Parser¶
- class vnlp.dependency_parser.dependency_parser.DependencyParser(model='SPUContextDP', evaluate=False)[source]¶
Main API class for Dependency Parser implementations.
Available models: [‘SPUContextDP’, ‘TreeStackDP’]
In order to evaluate, initialize the class with “evaluate = True” argument. This will load the model weights that are not trained on test sets.
- predict(sentence: str, displacy_format: bool = False, pos_result: List[Tuple[str, str]] | None = None) List[Tuple[int, str, int, str]] [source]¶
High level user API for Dependency Parsing.
- Parameters:
sentence – Input sentence.
displacy_format – When set True, returns the result in spacy.displacy format to allow visualization.
pos_result – Part of Speech tags. To be used when displacy_format = True.
- Returns:
List of (token_index, token, arc, label).
- Raises:
ValueError – Sentence is too long. Try again by splitting it into smaller pieces.
Example:
from vnlp import DependencyParser dependency_parser = DependencyParser() dependency_parser.predict("Onun için yol arkadaşlarımızı titizlikle seçer, kendilerini iyice sınarız.") [(1, 'Onun', 6, 'obl'), (2, 'için', 1, 'case'), (3, 'yol', 4, 'nmod'), (4, 'arkadaşlarımızı', 6, 'obj'), (5, 'titizlikle', 6, 'obl'), (6, 'seçer', 10, 'parataxis'), (7, ',', 6, 'punct'), (8, 'kendilerini', 10, 'obj'), (9, 'iyice', 10, 'advmod'), (10, 'sınarız', 0, 'root'), (11, '.', 10, 'punct')] # Visualization with Spacy: import spacy from vnlp import DependencyParser dependency_parser = DependencyParser() result = dependency_parser.predict(Oğuz'un kırmızı bir Astra'sı vardı.", displacy_format = True) spacy.displacy.render(result, style="dep", manual = True)
SentencePiece Unigram Context Dependency Parser¶
- class vnlp.dependency_parser.spu_context_dp.SPUContextDP(evaluate)[source]¶
SentencePiece Unigram Context Dependency Parser class.
This is a context aware Deep GRU based Dependency Parser that uses SentencePiece Unigram tokenizer and pre-trained Word2Vec embeddings.
It achieves 0.7117 LAS (Labeled Attachment Score) and 0.8370 UAS (Unlabeled Attachment Score) on all of test sets of Universal Dependencies 2.9.
For more details about the training procedure, dataset and evaluation metrics, see ReadMe.
- predict(sentence: str, displacy_format: bool = False, pos_result: List[Tuple[str, str]] | None = None) List[Tuple[int, str, int, str]] [source]¶
- Parameters:
sentence – Input sentence.
displacy_format – When set True, returns the result in spacy.displacy format to allow visualization.
pos_result – Part of Speech tags. To be used when displacy_format = True.
- Returns:
List of (token_index, token, arc, label).
- Raises:
ValueError – Sentence is too long. Try again by splitting it into smaller pieces.
Tree-stack Dependency Parser¶
- class vnlp.dependency_parser.treestack_dp.TreeStackDP(evaluate)[source]¶
Tree-stack Dependency Parser class.
This dependency parser is inspired by Tree-stack LSTM in Transition Based Dependency Parsing.
“Inspire” is emphasized because this implementation uses the approach of using Morphological Tags, Pre-trained word embeddings and POS tags as input for the model, rather than implementing the exact network proposed in the paper.
It achieves 0.6914 LAS (Labeled Attachment Score) and 0.8048 UAS (Unlabeled Attachment Score) on all of test sets of Universal Dependencies 2.9.
Input data is processed by NLTK.tokenize.TreebankWordTokenizer.
For more details about the training procedure, dataset and evaluation metrics, see ReadMe.
- predict(sentence: str, displacy_format: bool = False, *args) List[Tuple[int, str, int, str]] [source]¶
- Parameters:
sentence – Input sentence.
displacy_format – When set True, returns the result in spacy.displacy format to allow visualization.
- Returns:
List of (token_index, token, arc, label).
- Raises:
ValueError – Sentence is too long. Try again by splitting it into smaller pieces.