Features¶
Models¶
NER model [docs]¶
There are two models for Named Entity Recognition task in DeepPavlov: BERT-based and Bi-LSTM+CRF. The models predict tags (in BIO format) for tokens in input.
BERT-based model is described in BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding.
The second model reproduces architecture from the paper Application of a Hybrid Bi-LSTM-CRF model to the task of Russian Named Entity Recognition which is inspired by Bi-LSTM+CRF architecture from https://arxiv.org/pdf/1603.01360.pdf.
Dataset |
Lang |
Model |
Test F1 |
---|---|---|---|
Persons-1000 dataset with additional LOC and ORG markup (Collection 3) |
Ru |
97.7 |
|
95.1 |
|||
88.4 ± 0.5 |
|||
93.3 ± 0.3 |
|||
Ontonotes |
Multi |
88.8 |
|
En |
88.6 |
||
87.1 |
|||
ConLL-2003 |
91.7 |
||
88.6 |
|||
89.9 |
Classification model [docs]¶
Model for classification tasks (intents, sentiment, etc) on word-level. Shallow-and-wide CNN, Deep CNN, BiLSTM, BiLSTM with self-attention and other models are presented. The model also allows multilabel classification of texts. Several pre-trained models are available and presented in Table below.
Task |
Dataset |
Lang |
Model |
Metric |
Valid |
Test |
Downloads |
---|---|---|---|---|---|---|---|
Insult detection |
En |
ROC-AUC |
0.9327 |
0.8602 |
1.1 Gb |
||
Sentiment |
Accuracy |
0.6456 |
0.6715 |
400 Mb |
|||
Sentiment |
Ru |
0.9965 |
0.9961 |
6.2 Gb |
|||
F1-weighted |
0.6809 |
0.7193 |
1900 Mb |
||||
0.7548 |
0.7742 |
657 Mb |
|||||
0.703 ± 0.0031 |
0.7348 ± 0.0028 |
690 Mb |
|||||
0.7376 ± 0.0045 |
0.7645 ± 0.035 |
1.0 Gb |
As no one had published intent recognition for DSTC-2 data, the comparison of the presented model is given on SNIPS dataset. The evaluation of model scores was conducted in the same way as in 3 to compare with the results from the report of the authors of the dataset. The results were achieved with tuning of parameters and embeddings trained on Reddit dataset.
Model |
AddToPlaylist |
BookRestaurant |
GetWheather |
PlayMusic |
RateBook |
SearchCreativeWork |
SearchScreeningEvent |
---|---|---|---|---|---|---|---|
api.ai |
0.9931 |
0.9949 |
0.9935 |
0.9811 |
0.9992 |
0.9659 |
0.9801 |
ibm.watson |
0.9931 |
0.9950 |
0.9950 |
0.9822 |
0.9996 |
0.9643 |
0.9750 |
microsoft.luis |
0.9943 |
0.9935 |
0.9925 |
0.9815 |
0.9988 |
0.9620 |
0.9749 |
wit.ai |
0.9877 |
0.9913 |
0.9921 |
0.9766 |
0.9977 |
0.9458 |
0.9673 |
snips.ai |
0.9873 |
0.9921 |
0.9939 |
0.9729 |
0.9985 |
0.9455 |
0.9613 |
recast.ai |
0.9894 |
0.9943 |
0.9910 |
0.9660 |
0.9981 |
0.9424 |
0.9539 |
amazon.lex |
0.9930 |
0.9862 |
0.9825 |
0.9709 |
0.9981 |
0.9427 |
0.9581 |
Shallow-and-wide CNN |
0.9956 |
0.9973 |
0.9968 |
0.9871 |
0.9998 |
0.9752 |
0.9854 |
Automatic spelling correction model [docs]¶
Pipelines that use candidates search in a static dictionary and an ARPA language model to correct spelling errors.
Note
About 4.4 GB on disc required for the Russian language model and about 7 GB for the English one.
Comparison on the test set for the SpellRuEval competition on Automatic Spelling Correction for Russian:
Correction method |
Precision |
Recall |
F-measure |
Speed (sentences/s) |
---|---|---|---|---|
Yandex.Speller |
83.09 |
59.86 |
69.59 |
|
53.26 |
53.74 |
53.50 |
29.3 |
|
Hunspell + lm |
41.03 |
48.89 |
44.61 |
2.1 |
JamSpell |
44.57 |
35.69 |
39.64 |
136.2 |
Hunspell |
30.30 |
34.02 |
32.06 |
20.3 |
Ranking model [docs]¶
The main neural ranking model based on LSTM-based deep learning models for non-factoid answer selection. The model performs ranking of responses or contexts from some database by their relevance for the given context.
There are 3 alternative neural architectures available as well:
- Sequential Matching Network (SMN)
Based on the work Wu, Yu, et al. “Sequential Matching Network: A New Architecture for Multi-turn Response Selection in Retrieval-based Chatbots”. ACL. 2017.
- Deep Attention Matching Network (DAM)
- Deep Attention Matching Network + Universal Sentence Encoder v3 (DAM-USE-T)
Our new proposed architecture based on the works: Xiangyang Zhou, et al. “Multi-Turn Response Selection for Chatbots with Deep Attention Matching Network”. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2018 and Daniel Cer, Yinfei Yang, Sheng-yi Kong, Nan Hua, Nicole Limtiaco, Rhomni St. John, Noah Constant, Mario Guajardo-Cespedes, Steve Yuan, Chris Tar, Brian Strope, Ray Kurzweil. 2018a. Universal Sentence Encoder for English.
Available pre-trained models for ranking:
Dataset |
Model config |
Val |
Test |
|||
---|---|---|---|---|---|---|
R10@1 |
R10@1 |
R10@2 |
R10@5 |
Downloads |
||
74.32 |
74.46 |
86.77 |
97.38 |
2457 MB |
||
68.56 |
67.91 |
81.49 |
95.63 |
1609 MB |
||
66.5 |
66.6 |
– |
– |
396 MB |
||
65.73 |
65.74 |
– |
– |
1.1 Gb |
||
66.5 |
66.5 |
– |
– |
396 MB |
Available pre-trained models for paraphrase identification:
Dataset |
Model config |
Val (accuracy) |
Test (accuracy) |
Val (F1) |
Test (F1) |
Val (log_loss) |
Test (log_loss) |
Downloads |
---|---|---|---|---|---|---|---|---|
87.4 |
79.3 |
90.2 |
83.4 |
– |
– |
1330M |
||
90.2 |
84.9 |
92.3 |
87.9 |
– |
– |
1325M |
||
76.1 ± 0.2 |
64.5 ± 0.5 |
81.8 ± 0.2 |
73.9 ± 0.8 |
– |
– |
618M |
||
86.5 ± 0.5 |
78.9 ± 0.4 |
89.6 ± 0.3 |
83.2 ± 0.5 |
– |
– |
930M |
Comparison with other models on the Ubuntu Dialogue Corpus v2 (test):
Model |
R@1 |
R@2 |
R@5 |
---|---|---|---|
SMN last [Wu et al., 2017] |
– |
– |
– |
SMN last [DeepPavlov ranking_ubuntu_v2_mt_word2vec_smn] |
0.6791 |
0.8149 |
0.9563 |
DAM [Zhou et al., 2018] |
– |
– |
– |
MRFN-FLS [Tao et al., 2019] |
– |
– |
– |
IMN [Gu et al., 2019] |
0.771 |
0.886 |
0.979 |
IMN Ensemble [Gu et al., 2019] |
0.791 |
0.899 |
0.982 |
DAM-USE-T [DeepPavlov ranking_ubuntu_v2_mt_word2vec_dam_transformer] |
0.7446 |
0.8677 |
0.9738 |
References:
Yu Wu, Wei Wu, Ming Zhou, and Zhoujun Li. 2017. Sequential match network: A new architecture for multi-turn response selection in retrieval-based chatbots. In ACL, pages 372–381. https://www.aclweb.org/anthology/P17-1046
Xiangyang Zhou, Lu Li, Daxiang Dong, Yi Liu, Ying Chen, Wayne Xin Zhao, Dianhai Yu and Hua Wu. 2018. Multi-Turn Response Selection for Chatbots with Deep Attention Matching Network. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1118-1127, ACL. http://aclweb.org/anthology/P18-1103
Chongyang Tao, Wei Wu, Can Xu, Wenpeng Hu, Dongyan Zhao, and Rui Yan. Multi-Representation Fusion Network for Multi-turn Response Selection in Retrieval-based Chatbots. In WSDM’19. https://dl.acm.org/citation.cfm?id=3290985
Gu, Jia-Chen & Ling, Zhen-Hua & Liu, Quan. (2019). Interactive Matching Network for Multi-Turn Response Selection in Retrieval-Based Chatbots. https://arxiv.org/abs/1901.01824
TF-IDF Ranker model [docs]¶
Based on Reading Wikipedia to Answer Open-Domain Questions. The model solves the task of document retrieval for a given query.
Dataset |
Model |
Wiki dump |
Recall@5 |
Downloads |
||
---|---|---|---|---|---|---|
enwiki (2018-02-11) |
75.6 |
33 GB |
Question Answering model [docs]¶
Models in this section solve the task of looking for an answer on a question in a given context (SQuAD task format). There are two models for this task in DeepPavlov: BERT-based and R-Net. Both models predict answer start and end position in a given context.
BERT-based model is described in BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding.
R-Net model is based on R-NET: Machine Reading Comprehension with Self-matching Networks.
Dataset |
Model config |
lang |
EM (dev) |
F-1 (dev) |
Downloads |
---|---|---|---|---|---|
en |
80.88 |
88.49 |
806Mb |
||
en |
80.79 |
88.30 |
1.1 Gb |
||
en |
71.49 |
80.34 |
~2.5Gb |
||
ru |
66.30 ± 0.24 |
84.60 ± 0.11 |
1325Mb |
||
ru |
66.24 |
84.71 |
1.6 Gb |
||
ru |
60.62 |
80.04 |
~5Gb |
||
ru |
44.2 ± 0.46 |
65.1 ± 0.36 |
867Mb |
||
ru |
61.23 ± 0.42 |
80.36 ± 0.28 |
1.18Gb |
In the case when answer is not necessary present in given context we have squad_noans model. This model outputs empty string in case if there is no answer in context.
Morphological tagging model [docs]¶
We have a BERT-based model for Russian language. Model takes as input tokenized sentences and outputs the corresponding sequence of morphological labels in UD format.
Dataset |
Model |
Word accuracy |
Sent. accuracy |
Download size (MB) |
---|---|---|---|---|
UD2.3 (Russian) |
UD Pipe 2.3 (Straka et al., 2017) |
93.5 |
||
UD Pipe Future (Straka et al., 2018) |
96.90 |
|||
97.83 |
72.02 |
661 |
Syntactic parsing model [docs]¶
We have a biaffine model for syntactic parsing based on RuBERT.
It achieves the highest known labeled attachments score of 93.7%
on ru_syntagrus
Russian corpus (version UD 2.3).
Dataset |
Model |
UAS |
LAS |
---|---|---|---|
UD2.3 (Russian) |
UD Pipe 2.3 (Straka et al., 2017) |
90.3 |
89.0 |
UD Pipe Future (Straka, 2018) |
93.0 |
91.5 |
|
UDify (multilingual BERT) (Kondratyuk, 2018) |
94.8 |
93.1 |
|
95.2 |
93.7 |
Frequently Asked Questions (FAQ) model [docs]¶
Set of pipelines for FAQ task: classifying incoming question into set of known questions and return prepared answer. You can build different pipelines based on: tf-idf, weighted fasttext, cosine similarity, logistic regression.
Skills¶
Goal-oriented bot [docs]¶
Based on Hybrid Code Networks (HCNs) architecture from Jason D. Williams, Kavosh Asadi, Geoffrey Zweig, Hybrid Code Networks: practical and efficient end-to-end dialog control with supervised and reinforcement learning – 2017. It allows to predict responses in a goal-oriented dialog. The model is customizable: embeddings, slot filler and intent classifier can be switched on and off on demand.
ODQA [docs]¶
An open domain question answering skill. The skill accepts free-form questions about the world and outputs an answer based on its Wikipedia knowledge.
Dataset |
Model config |
Wiki dump |
F1 |
Downloads |
---|---|---|---|---|
enwiki (2018-02-11) |
35.89 |
9.7Gb |
||
ruwiki (2018-04-01) |
28.56 |
7.7Gb |
||
ruwiki (2018-04-01) |
37.83 |
4.3Gb |
AutoML¶
Hyperparameters optimization [docs]¶
Hyperparameters optimization by cross-validation for DeepPavlov models that requires only some small changes in a config file.
Embeddings¶
Pre-trained embeddings [docs]¶
Word vectors for the Russian language trained on joint Russian Wikipedia and Lenta.ru corpora.
Examples of some models¶
Run insults detection model with Telegram interface:
python -m deeppavlov telegram insults_kaggle_bert -d -t <TELEGRAM_TOKEN>
Run insults detection model with console interface:
python -m deeppavlov interact insults_kaggle_bert -d
Run insults detection model with REST API:
python -m deeppavlov riseapi insults_kaggle_bert -d
Predict whether it is an insult on every line in a file:
python -m deeppavlov predict insults_kaggle_bert -d --batch-size 15 < /data/in.txt > /data/out.txt
View video demo of deployment of a goal-oriented bot and a slot-filling model with Telegram UI.