dataset_iterators¶
Concrete DatasetIterator classes.
-
class
deeppavlov.dataset_iterators.basic_classification_iterator.
BasicClassificationDatasetIterator
(data: dict, fields_to_merge: Optional[List[str]] = None, merged_field: Optional[str] = None, field_to_split: Optional[str] = None, split_fields: Optional[List[str]] = None, split_proportions: Optional[List[float]] = None, seed: Optional[int] = None, shuffle: bool = True, split_seed: Optional[int] = None, stratify: Optional[bool] = None, *args, **kwargs)[source]¶ Class gets data dictionary from DatasetReader instance, merge fields if necessary, split a field if necessary
- Parameters
data – dictionary of data with fields “train”, “valid” and “test” (or some of them)
fields_to_merge – list of fields (out of
"train", "valid", "test"
) to mergemerged_field – name of field (out of
"train", "valid", "test"
) to which save merged fieldsfield_to_split – name of field (out of
"train", "valid", "test"
) to splitsplit_fields – list of fields (out of
"train", "valid", "test"
) to which save splitted fieldsplit_proportions – list of corresponding proportions for splitting
seed – random seed for iterating
shuffle – whether to shuffle examples in batches
split_seed – random seed for splitting dataset, if
split_seed
is None, division is based on seed.stratify – whether to use stratified split
*args – arguments
**kwargs – arguments
-
data
¶ dictionary of data with fields “train”, “valid” and “test” (or some of them)
-
class
deeppavlov.dataset_iterators.siamese_iterator.
SiameseIterator
(data: Dict[str, List[Tuple[Any, Any]]], seed: Optional[int] = None, shuffle: bool = True, *args, **kwargs)[source]¶ The class contains methods for iterating over a dataset for ranking in training, validation and test mode.
-
class
deeppavlov.dataset_iterators.sqlite_iterator.
SQLiteDataIterator
(load_path: Union[str, pathlib.Path], batch_size: Optional[int] = None, shuffle: Optional[bool] = None, seed: Optional[int] = None, **kwargs)[source]¶ Iterate over SQLite database. Gen batches from SQLite data. Get document ids and document.
- Parameters
load_path – a path to local DB file
batch_size – a number of samples in a single batch
shuffle – whether to shuffle data during batching
seed – random seed for data shuffling
-
connect
¶ a DB connection
-
db_name
¶ a DB name
-
doc_ids
¶ DB document ids
-
doc2index
¶ a dictionary of document indices and their titles
-
batch_size
¶ a number of samples in a single batch
-
shuffle
¶ whether to shuffle data during batching
-
random
¶ an instance of
Random
class.
-
class
deeppavlov.dataset_iterators.squad_iterator.
SquadIterator
(data: Dict[str, List[Tuple[Any, Any]]], seed: Optional[int] = None, shuffle: bool = True, *args, **kwargs)[source]¶ SquadIterator allows to iterate over examples in SQuAD-like datasets. SquadIterator is used to train
torch_transformers_squad:TorchTransformersSquad
.It extracts
context
,question
,answer_text
andanswer_start
position from dataset. Example from a dataset is a tuple of(context, question)
and(answer_text, answer_start)
-
train
¶ train examples
-
valid
¶ validation examples
-
test
¶ test examples
-
-
class
deeppavlov.dataset_iterators.typos_iterator.
TyposDatasetIterator
(data: Dict[str, List[Tuple[Any, Any]]], seed: Optional[int] = None, shuffle: bool = True, *args, **kwargs)[source]¶ Implementation of
DataLearningIterator
used for trainingErrorModel