vocabs¶
Concrete Vocab classes.
-
class
deeppavlov.vocabs.wiki_sqlite.
WikiSQLiteVocab
(load_path: str, join_docs: bool = True, shuffle: bool = False, **kwargs)[source]¶ Get content from SQLite database by document ids.
- Parameters
load_path – a path to local DB file
join_docs – whether to join extracted docs with ‘ ‘ or not
shuffle – whether to shuffle data or not
-
join_docs
¶ whether to join extracted docs with ‘ ‘ or not
-
class
deeppavlov.vocabs.typos.
RussianWordsVocab
(data_dir: [<class 'pathlib.Path'>, <class 'str'>] = '', *args, **kwargs)[source]¶ Implementation of
StaticDictionary
that builds data from https://github.com/danakt/russian-words/- Parameters
data_dir – path to the directory where the built trie will be stored. Relative paths are interpreted as relative to pipeline’s data directory
-
dict_name
¶ logical name of the dictionary
-
alphabet
¶ set of all the characters used in this dictionary
-
words_set
¶ set of all the words
-
words_trie
¶ trie structure of all the words
-
class
deeppavlov.vocabs.typos.
StaticDictionary
(data_dir: [<class 'pathlib.Path'>, <class 'str'>] = '', *args, dictionary_name: str = 'dictionary', **kwargs)[source]¶ Trie vocabulary used in spelling correction algorithms
- Parameters
data_dir – path to the directory where the built trie will be stored. Relative paths are interpreted as relative to pipeline’s data directory
dictionary_name – logical name of the dictionary
raw_dictionary_path – path to the source file with the list of words
-
dict_name
¶ logical name of the dictionary
-
alphabet
¶ set of all the characters used in this dictionary
-
words_set
¶ set of all the words
-
words_trie
¶ trie structure of all the words
-
class
deeppavlov.vocabs.typos.
Wiki100KDictionary
(data_dir: [<class 'pathlib.Path'>, <class 'str'>] = '', *args, **kwargs)[source]¶ Implementation of
StaticDictionary
that builds data from Wikitionary- Parameters
data_dir – path to the directory where the built trie will be stored. Relative paths are interpreted as relative to pipeline’s data directory
-
dict_name
¶ logical name of the dictionary
-
alphabet
¶ set of all the characters used in this dictionary
-
words_set
¶ set of all the words
-
words_trie
¶ trie structure of all the words