deeppavlov.models.syntax_parser¶
-
class
deeppavlov.models.syntax_parser.network.
BertSyntaxParser
(n_deps: int, keep_prob: float, bert_config_file: str, pretrained_bert: str = None, attention_probs_keep_prob: float = None, hidden_keep_prob: float = None, embeddings_dropout: float = 0.0, encoder_layer_ids: List[int] = (-1, ), encoder_dropout: float = 0.0, optimizer: str = None, weight_decay_rate: float = 1e-06, state_size: int = 256, use_birnn: bool = True, birnn_cell_type: str = 'lstm', birnn_hidden_size: int = 256, ema_decay: float = None, ema_variables_on_cpu: bool = True, predict_tags=False, n_tags=None, tag_weight=1.0, return_probas: bool = False, freeze_embeddings: bool = False, learning_rate: float = 0.001, bert_learning_rate: float = 2e-05, min_learning_rate: float = 1e-07, learning_rate_drop_patience: int = 20, learning_rate_drop_div: float = 2.0, load_before_drop: bool = True, clip_norm: float = 1.0, **kwargs)[source]¶ BERT-based model for syntax parsing. For each word the model predicts the index of its syntactic head and the label of the dependency between this head and the current word. See
deeppavlov.models.bert.bert_sequence_tagger.BertSequenceNetwork
for the description of inherited parameters.- Parameters
n_deps – number of distinct syntactic dependencies
embeddings_dropout – dropout for embeddings in biaffine layer
state_size – the size of hidden state in biaffine layer
dep_state_size – the size of hidden state in biaffine layer
use_birnn – whether to use bidirection rnn after BERT layers. Set it to True as it leads to much higher performance at least on large datasets
birnn_cell_type – the type of Bidirectional RNN. Either lstm or gru
birnn_hidden_size – number of hidden units in the BiRNN layer in each direction
return_probas – set this to True if you need the probabilities instead of raw answers
tags (predict) – whether to predict morphological tags together with syntactic information
n_tags – the number of morphological tags
tag_weight – the weight of tag model loss in multitask training
-
__call__
(input_ids: Union[List[List[int]], numpy.ndarray], input_masks: Union[List[List[int]], numpy.ndarray], y_masks: Union[List[List[int]], numpy.ndarray]) → Union[Tuple[List[Union[List[int], numpy.ndarray]], List[List[int]]], Tuple[List[Union[List[int], numpy.ndarray]], List[List[int]], List[List[int]]]][source]¶ Predicts the outputs for a batch of inputs. By default (
return_probas
= False andpredict_tags
= False) it returns two output batches. The first is the batch of head indexes: i stands for i-th word in the sequence, where numeration starts with 1. 0 is predicted for the syntactic root of the sentence. The second is the batch of indexes for syntactic dependencies. In casereturn_probas
= True we return the probability distribution over possible heads instead of the position of the most probable head. For a sentence of length k the output is an array of shape k * (k+1). In casepredict_tags
= True the model additionally returns the index of the most probable morphological tag for each word. The batch of such indexes becomes the third output of the function.- Returns
pred_heads_to_return, either a batch of most probable head positions for each token (in case
return_probas
= False) or a batch of probability distribution over token head positionspred_deps, the indexes of token dependency relations
pred_tags: the indexes of token morphological tags (only if
predict_tags
= True)
-
deeppavlov.models.syntax_parser.network.
gather_indexes
(A: tensorflow.Tensor, B: tensorflow.Tensor) → tensorflow.Tensor[source]¶ - Parameters
A – a tensor with data
B – an integer tensor with indexes
- Returns
answer a tensor such that
answer[i, j] = A[i, B[i, j]]
. In case B is one-dimensional, the output isanswer[i] = A[i, B[i]]
-
deeppavlov.models.syntax_parser.network.
biaffine_layer
(deps: tensorflow.Tensor, heads: tensorflow.Tensor, deps_dim: int, heads_dim: int, output_dim: int, name: str = 'biaffine_layer') → tensorflow.Tensor[source]¶ Implements a biaffine layer from [Dozat, Manning, 2016].
- Parameters
deps – the 3D-tensor of dependency states,
heads – the 3D-tensor of head states,
deps_dim – the dimension of dependency states,
heads_dim – the dimension of head_states,
output_dim – the output dimension
name – the name of a layer
- Returns
answer the output 3D-tensor
-
deeppavlov.models.syntax_parser.network.
biaffine_attention
(deps: tensorflow.Tensor, heads: tensorflow.Tensor, name='biaffine_attention') → tensorflow.Tensor[source]¶ Implements a trainable matching layer between two families of embeddings.
- Parameters
deps – the 3D-tensor of dependency states,
heads – the 3D-tensor of head states,
name – the name of a layer
- Returns
answer a 3D-tensor of pairwise scores between deps and heads
-
class
deeppavlov.models.syntax_parser.joint.
JointTaggerParser
(tagger: deeppavlov.core.common.chainer.Chainer, parser: deeppavlov.core.common.chainer.Chainer, output_format: str = 'ud', to_output_string: bool = False, *args, **kwargs)[source]¶ A class to perform joint morphological and syntactic parsing. It is just a wrapper that calls the models for tagging and parsing and comprises their results in a single output.
- Parameters
-
__call__
(data: Union[List[str], List[List[str]]]) → Union[List[List[dict]], List[str], List[List[str]]][source]¶ Parses a batch of sentences.
- Parameters
data – either a batch of tokenized sentences, or a batch of raw sentences
- Returns
answer, a batch of parsed sentences. A sentence parse is a list of single word parses. Each word parse is either a CoNLL-U-formatted string or a dictionary. A sentence parse is returned either as is if
self.to_output_string
isFalse
, or as a single string, where each word parse begins with a new string.
>>> from deeppavlov.core.commands.infer import build_model >>> model = build_model("ru_syntagrus_joint_parsing") >>> batch = ["Девушка пела в церковном хоре.", "У этой задачи есть сложное решение."] >>> print(*model(batch), sep="\\n\\n") 1 Девушка девушка NOUN _ Animacy=Anim|Case=Nom|Gender=Fem|Number=Sing 2 nsubj _ _ 2 пела петь VERB _ Aspect=Imp|Gender=Fem|Mood=Ind|Number=Sing|Tense=Past|VerbForm=Fin|Voice=Act 0 root _ _ 3 в в ADP _ _ 5 case _ _ 4 церковном церковный ADJ _ Case=Loc|Degree=Pos|Gender=Masc|Number=Sing 5 amod _ _ 5 хоре хор NOUN _ Animacy=Inan|Case=Loc|Gender=Masc|Number=Sing 2 obl _ _ 6 . . PUNCT _ _ 2 punct _ _ 1 У у ADP _ _ 3 case _ _ 2 этой этот DET _ Case=Gen|Gender=Fem|Number=Sing 3 det _ _ 3 задачи задача NOUN _ Animacy=Inan|Case=Gen|Gender=Fem|Number=Sing 4 obl _ _ 4 есть быть VERB _ Aspect=Imp|Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin|Voice=Act 0 root _ _ 5 сложное сложный ADJ _ Case=Nom|Degree=Pos|Gender=Neut|Number=Sing 6 amod _ _ 6 решение решение NOUN _ Animacy=Inan|Case=Nom|Gender=Neut|Number=Sing 4 nsubj _ _ 7 . . PUNCT _ _ 4 punct _ _ >>> # Dirty hacks to change model parameters in the code, you should do it in the configuration file. >>> model["main"].to_output_string = False >>> model["main"].output_format = "json" >>> for sent_parse in model(batch): >>> for word_parse in sent_parse: >>> print(word_parse) >>> print("") {'id': '1', 'word': 'Девушка', 'lemma': 'девушка', 'upos': 'NOUN', 'feats': 'Animacy=Anim|Case=Nom|Gender=Fem|Number=Sing', 'head': '2', 'deprel': 'nsubj'} {'id': '2', 'word': 'пела', 'lemma': 'петь', 'upos': 'VERB', 'feats': 'Aspect=Imp|Gender=Fem|Mood=Ind|Number=Sing|Tense=Past|VerbForm=Fin|Voice=Act', 'head': '0', 'deprel': 'root'} {'id': '3', 'word': 'в', 'lemma': 'в', 'upos': 'ADP', 'feats': '_', 'head': '5', 'deprel': 'case'} {'id': '4', 'word': 'церковном', 'lemma': 'церковный', 'upos': 'ADJ', 'feats': 'Case=Loc|Degree=Pos|Gender=Masc|Number=Sing', 'head': '5', 'deprel': 'amod'} {'id': '5', 'word': 'хоре', 'lemma': 'хор', 'upos': 'NOUN', 'feats': 'Animacy=Inan|Case=Loc|Gender=Masc|Number=Sing', 'head': '2', 'deprel': 'obl'} {'id': '6', 'word': '.', 'lemma': '.', 'upos': 'PUNCT', 'feats': '_', 'head': '2', 'deprel': 'punct'} {'id': '1', 'word': 'У', 'lemma': 'у', 'upos': 'ADP', 'feats': '_', 'head': '3', 'deprel': 'case'} {'id': '2', 'word': 'этой', 'lemma': 'этот', 'upos': 'DET', 'feats': 'Case=Gen|Gender=Fem|Number=Sing', 'head': '3', 'deprel': 'det'} {'id': '3', 'word': 'задачи', 'lemma': 'задача', 'upos': 'NOUN', 'feats': 'Animacy=Inan|Case=Gen|Gender=Fem|Number=Sing', 'head': '4', 'deprel': 'obl'} {'id': '4', 'word': 'есть', 'lemma': 'быть', 'upos': 'VERB', 'feats': 'Aspect=Imp|Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin|Voice=Act', 'head': '0', 'deprel': 'root'} {'id': '5', 'word': 'сложное', 'lemma': 'сложный', 'upos': 'ADJ', 'feats': 'Case=Nom|Degree=Pos|Gender=Neut|Number=Sing', 'head': '6', 'deprel': 'amod'} {'id': '6', 'word': 'решение', 'lemma': 'решение', 'upos': 'NOUN', 'feats': 'Animacy=Inan|Case=Nom|Gender=Neut|Number=Sing', 'head': '4', 'deprel': 'nsubj'} {'id': '7', 'word': '.', 'lemma': '.', 'upos': 'PUNCT', 'feats': '_', 'head': '4', 'deprel': 'punct'}