Intent Catcher¶
Overview¶
Intent Catcher is an NLP component used for intent detection in the Conversational AI systems.
It consists of an embedder, which is a Transformer model, and a number of dense layers, that are fitted upon provided embeddings. The current provided embeddings are: Universal Sentence Encoder 1, and it’s larger version.
Intent Catcher has been originally designed for the high-level intent detection as part of the DREAM Socialbot that was built by DeepPavlov team for Alexa Prize 3.
Goals¶
Typical approach for building ML-based intent classification is based on providing a relatively large number of examples for each of the intents. This might make sense when a number of intents is relatively small and there is enough data (e.g., a small internal organizational chatbot) but is questionable when the number of intents is large and amount of available data is relatively small.
For Alexa Prize 3, typical approach didn’t work. Alexa Prize socialbots are expected to react a wide number of user intents in the open domain. The team needed to have a simple and fast way to add more intents, and add a relatively small number of examples for each new intent. Using regular expressions alone wouldn’t be useful. But they could be used for up-sampling.
Intent Catcher was designed around idea that by adding an additional cost of requiring basic knowledge of Regular Expressions, users would be able to provide a smaller number of examples in RegEx format to enable up-sampling. In addition to that, it turned out that using RegEx directly, in addition to the up-sampled dataset was useful, too. Finally, there was need to check punctuation as a useful way to distinguish statements from questions and the like.
Features¶
Up-sampling using RegEx-based format
Direct RegEx-based pattern matching
Additional checks for punctuation
How Do I: Train My Intent Classifier¶
Dataset construction¶
Dataset can be constructed in 2 ways: listing number of intents and regular expressions in .json, or just a usual .csv format. The json format is down below:
{
"intent_1": ["regexp1", "regexp2"]
}
To use data in this format, don’t forget to add intent_catcher_reader
as a dataset_reader in the config of model.
Train and evaluate model¶
All the embeddings come pre-trained, and there is no need to install them. Though, for both Command Line Interface (CLI) and Python it is necessary to install dependences first. To do so, run:
python -m deeppavlov install intent_catcher
To use a pre-trained model from CLI use the following command:
python -m deeppavlov interact intent_catcher -d
where intent_catcher
is the name of the config.
The provided config example is intent_catcher
How Do I: Integrate Intent Catcher into DeepPavlov Deepy¶
To integrate your Intent Catcher-based intent classifier into your Multiskill AI Assistant built using DeepPavlov Conversational AI Stack, follow the following instructions:
Clone Deepy repository
Replace
docker-compose.yml
in the root of the repository andpipeline_conf.json
in the/agent/
subdirectory with the corresponding files from the deepy_adv Deepy DistributionClone the Tutorial Notebook
Change its
intents
based on your project needs with your custom intentsTrain the Intent Catcher model in your copy of the Tutorial Notebook
Download and put saved data from your copy of the Tutorial Notebook into the Intent Catcher
[Optional] Unless you need a Chit-Chat skill remove it from at both the
/agent/pipeline_conf.json
and fromdocker-compose.yml
Use
docker-compose up --build
command to build and run your DeepPavlov-based Multiskill AI Assistant
Note
In the future versions of the DeepPavlov Library we will provide a more comprehensive update to the documentation to further simplify the process of adding DeepPavlov NLP components as annotators to the Multiskill AI Assistants built using DeepPavlov Conversational AI Stack. Stay tuned!