deeppavlov.models.spelling_correction¶
-
class
deeppavlov.models.spelling_correction.brillmoore.
ErrorModel
(dictionary: deeppavlov.vocabs.typos.StaticDictionary, window: int = 1, candidates_count: int = 1, *args, **kwargs)[source]¶ Component that uses statistics based error model to find best candidates in a static dictionary. Based on An Improved Error Model for Noisy Channel Spelling Correction by Eric Brill and Robert C. Moore
Parameters: - dictionary – a
StaticDictionary
object - window – maximum context window size
- candidates_count – maximum number of replacement candidates to return for every token in the input
-
costs
¶ logarithmic probabilities of character sequences replacements
-
dictionary
¶ a
StaticDictionary
object
-
window
¶ maximum context window size
-
candidates_count
¶ maximum number of replacement candidates to return for every token in the input
-
__call__
(data: Iterable[Iterable[str]], *args, **kwargs) → List[List[List[Tuple[float, str]]]][source]¶ Propose candidates for tokens in sentences
Parameters: data – batch of tokenized sentences Returns: batch of lists of probabilities and candidates for every token
- dictionary – a
-
class
deeppavlov.models.spelling_correction.levenshtein.
LevenshteinSearcherComponent
(words: Iterable[str], max_distance: int = 1, error_probability: float = 0.0001, *args, **kwargs)[source]¶ Component that finds replacement candidates for tokens at a set Damerau-Levenshtein distance
Parameters: - words – list of every correct word
- max_distance – maximum allowed Damerau-Levenshtein distance between source words and candidates
- error_probability – assigned probability for every edit
-
max_distance
¶ maximum allowed Damerau-Levenshtein distance between source words and candidates
-
error_probability
¶ assigned logarithmic probability for every edit
-
vocab_penalty
¶ assigned logarithmic probability of an out of vocabulary token being the correct one without changes
-
class
deeppavlov.models.spelling_correction.electors.top1_elector.
TopOneElector
(*args, **kwargs)[source]¶ Component that chooses a candidate with highest base probability for every token
-
class
deeppavlov.models.spelling_correction.electors.kenlm_elector.
KenlmElector
(load_path: pathlib.Path, beam_size: int = 4, *args, **kwargs)[source]¶ Component that chooses a candidate with the highest product of base and language model probabilities
Parameters: - load_path – path to the kenlm model file
- beam_size – beam size for highest probability search
-
lm
¶ kenlm object
-
beam_size
¶ beam size for highest probability search