deeppavlov.models.ranking¶
Ranking classes.
-
class
deeppavlov.models.ranking.bilstm_siamese_network.
BiLSTMSiameseNetwork
(len_vocab: int, seed: int = None, shared_weights: bool = True, embedding_dim: int = 300, reccurent: str = 'bilstm', hidden_dim: int = 300, max_pooling: bool = True, triplet_loss: bool = True, margin: float = 0.1, hard_triplets: bool = False, *args, **kwargs)[source]¶ The class implementing a siamese neural network with BiLSTM and max pooling.
There is a possibility to use a binary cross-entropy loss as well as a triplet loss with random or hard negative sampling.
- Parameters
len_vocab – A size of the vocabulary to build embedding layer.
seed – Random seed.
shared_weights – Whether to use shared weights in the model to encode
contexts
andresponses
.embedding_dim – Dimensionality of token (word) embeddings.
reccurent – A type of the RNN cell. Possible values are
lstm
andbilstm
.hidden_dim – Dimensionality of the hidden state of the RNN cell. If
reccurent
equalsbilstm
hidden_dim
should be doubled to get the actual dimensionality.max_pooling – Whether to use max-pooling operation to get
context
(response
) vector representation. IfFalse
, the last hidden state of the RNN will be used.triplet_loss – Whether to use a model with triplet loss. If
False
, a model with crossentropy loss will be used.margin – A margin parameter for triplet loss. Only required if
triplet_loss
is set toTrue
.hard_triplets – Whether to use hard triplets sampling to train the model i.e. to choose negative samples close to positive ones. If set to
False
random sampling will be used. Only required iftriplet_loss
is set toTrue
.
-
class
deeppavlov.models.ranking.bilstm_gru_siamese_network.
BiLSTMGRUSiameseNetwork
(len_vocab: int, seed: int = None, shared_weights: bool = True, embedding_dim: int = 300, reccurent: str = 'bilstm', hidden_dim: int = 300, max_pooling: bool = True, triplet_loss: bool = True, margin: float = 0.1, hard_triplets: bool = False, *args, **kwargs)[source]¶ The class implementing a siamese neural network with BiLSTM, GRU and max pooling.
GRU is used to take into account multi-turn dialogue
context
.- Parameters
len_vocab – A size of the vocabulary to build embedding layer.
seed – Random seed.
shared_weights – Whether to use shared weights in the model to encode
contexts
andresponses
.embedding_dim – Dimensionality of token (word) embeddings.
reccurent – A type of the RNN cell. Possible values are
lstm
andbilstm
.hidden_dim – Dimensionality of the hidden state of the RNN cell. If
reccurent
equalsbilstm
hidden_dim
should be doubled to get the actual dimensionality.max_pooling – Whether to use max-pooling operation to get
context
(response
) vector representation. IfFalse
, the last hidden state of the RNN will be used.triplet_loss – Whether to use a model with triplet loss. If
False
, a model with crossentropy loss will be used.margin – A margin parameter for triplet loss. Only required if
triplet_loss
is set toTrue
.hard_triplets – Whether to use hard triplets sampling to train the model i.e. to choose negative samples close to positive ones. If set to
False
random sampling will be used. Only required iftriplet_loss
is set toTrue
.
-
class
deeppavlov.models.ranking.keras_siamese_model.
KerasSiameseModel
(learning_rate: float = 0.001, use_matrix: bool = True, emb_matrix: numpy.ndarray = None, max_sequence_length: int = None, dynamic_batch: bool = False, attention: bool = False, *args, **kwargs)[source]¶ The class implementing base functionality for siamese neural networks in keras.
- Parameters
learning_rate – Learning rate.
use_matrix – Whether to use a trainable matrix with token (word) embeddings.
emb_matrix – An embeddings matrix to initialize an embeddings layer of a model. Only used if
use_matrix
is set toTrue
.max_sequence_length – A maximum length of text sequences in tokens. Longer sequences will be truncated and shorter ones will be padded.
dynamic_batch – Whether to use dynamic batching. If
True
, the maximum length of a sequence for a batch will be equal to the maximum of all sequences lengths from this batch, but not higher thanmax_sequence_length
.attention – Whether any attention mechanism is used in the siamese network.
*args – Other parameters.
**kwargs – Other parameters.
-
class
deeppavlov.models.ranking.mpm_siamese_network.
MPMSiameseNetwork
(dense_dim: int = 50, perspective_num: int = 20, aggregation_dim: int = 200, recdrop_val: float = 0.0, inpdrop_val: float = 0.0, ldrop_val: float = 0.0, dropout_val: float = 0.0, *args, **kwargs)[source]¶ The class implementing a siamese neural network with bilateral multi-Perspective matching.
The network architecture is based on https://arxiv.org/abs/1702.03814.
- Parameters
dense_dim – Dimensionality of the dense layer.
perspective_num – Number of perspectives in multi-perspective matching layers.
dim (aggregation) – Dimensionality of the hidden state in the second BiLSTM layer.
inpdrop_val – Float between 0 and 1. A dropout value for the linear transformation of the inputs.
recdrop_val – Float between 0 and 1. A dropout value for the linear transformation of the recurrent state.
ldrop_val – A dropout value of the dropout layer before the second BiLSTM layer.
dropout_val – A dropout value of the dropout layer after the second BiLSTM layer.
-
class
deeppavlov.models.ranking.siamese_model.
SiameseModel
(batch_size: int, num_context_turns: int = 1, *args, **kwargs)[source]¶ The class implementing base functionality for siamese neural networks.
- Parameters
batch_size – A size of a batch.
num_context_turns – A number of
context
turns in data samples.*args – Other parameters.
**kwargs – Other parameters.
-
train_on_batch
(samples_generator: Iterable[List[numpy.ndarray]], y: List[int]) → float[source]¶ This method is called by trainer to make one training step on one batch. The number of samples returned by samples_generator is always equal to batch_size, so we need to: 1) accumulate data for all of the inputs of the model; 2) format inputs of a model in a proper way using self._make_batch function; 3) run a model with provided inputs and ground truth labels (y) using self._train_on_batch function; 4) return mean loss value on the batch
- Parameters
samples_generator (Iterable[List[np.ndarray]]) – generator that returns list of numpy arrays of words of all sentences represented as integers. Its shape: (number_of_context_turns + 1, max_number_of_words_in_a_sentence)
y (List[int]) – tuple of labels, with shape: (batch_size, )
- Returns
value of mean loss on the batch
- Return type
-
__call__
(samples_generator: Iterable[List[numpy.ndarray]]) → Union[numpy.ndarray, List[str]][source]¶ This method is called by trainer to make one evaluation step on one batch.
- Parameters
samples_generator (Iterable[List[np.ndarray]]) – generator that returns list of numpy arrays
words of all sentences represented as integers. (of) –
shape (Has) – (number_of_context_turns + 1, max_number_of_words_in_a_sentence)
- Returns
predictions for the batch of samples
- Return type
np.ndarray
-
class
deeppavlov.models.ranking.siamese_predictor.
SiamesePredictor
(model: deeppavlov.models.ranking.siamese_model.SiameseModel, batch_size: int, num_context_turns: int = 1, ranking: bool = True, attention: bool = False, responses: deeppavlov.core.data.simple_vocab.SimpleVocabulary = None, preproc_func: Callable = None, interact_pred_num: int = 3, *args, **kwargs)[source]¶ The class for ranking or paraphrase identification using the trained siamese network in the
interact
mode.- Parameters
batch_size – A size of a batch.
num_context_turns – A number of
context
turns in data samples.ranking – Whether to perform ranking. If it is set to
False
paraphrase identification will be performed.attention – Whether any attention mechanism is used in the siamese network. If
False
then calculated in advance vectors ofresponses
will be used to obtain similarity score for the inputcontext
; Otherwise the whole siamese architecture will be used to obtain similarity score for the inputcontext
and each particularresponse
. The parameter will be used if theranking
is set toTrue
.responses – A instance of
SimpleVocabulary
with all possibleresponses
to perform ranking. Will be used if theranking
is set toTrue
.preproc_func – A
__call__
function of theSiamesePreprocessor
.interact_pred_num – The number of the most relevant
responses
which will be returned. Will be used if theranking
is set toTrue
.**kwargs – Other parameters.