Stopword Remover

class vnlp.stopword_remover.stopword_remover.StopwordRemover[source]

Stopword Remover class.

Consists of Static and Dynamic stopword detection methods.

Static stopwords list is taken from Zemberek and some minor improvements are done.

add_to_stop_words(novel_stop_words: List[str])[source]

Updates self.stop_words by adding given novel_stop_words to existing dictionary.

Parameters:

novel_stop_words – Tokens to be added to existing stop_words dictionary.

Example:

from vnlp import StopwordRemover
stopword_remover = StopwordRemover()
stopword_remover.add_to_stop_words(['ama', 'aşı', 'gelip', 'eve'])
drop_stop_words(list_of_tokens: List[str]) List[str][source]

Given list of tokens, drops stop words and returns list of remaining tokens.

Parameters:

list_of_tokens – List of input tokens.

Returns:

List of tokens stripped of stopwords

Example:

from vnlp import StopwordRemover
stopword_remover = StopwordRemover()
stopword_remover.drop_stop_words("acaba bugün kahvaltıda kahve yerine çay mı içsem ya da neyse süt içeyim".split())

['bugün', 'kahvaltıda', 'kahve', 'çay', 'içsem', 'süt', 'içeyim']
dynamically_detect_stop_words(list_of_tokens: List[str], rare_words_freq: int = 0) List[str][source]

Dynamically detects stop words and returns them as list of tokens.

Use a large corpus with at least hundreds of unique tokens for a reasonable result.

Parameters:
  • list_of_tokens – List of input tokens

  • rare_words_freq – Maximum frequency of words when deciding rarity. Default value is 0 so it does not detect any rare words by default.

Returns:

List of dynamically detected stop words.

Raises:

ValueError – Number of unique tokens must be at least 3 for Dynamic Stop Word Detection.

Example:

from vnlp import StopwordRemover
stopword_remover = StopwordRemover()
stopword_remover.dynamically_detect_stop_words(""ben bugün gidip aşı olacağım sonra da eve gelip telefon açacağım aşı nasıl etkiledi eve gelip anlatırım aşı olmak bu dönemde çok ama ama ama ama çok önemli"".split())

['ama', 'aşı', 'gelip', 'eve']