dataset¶
-
class
MapDataset
(data, **kwargs)[source]¶ Bases:
paddle.fluid.dataloader.dataset.Dataset
Wraps a dataset-like object as a instance of Dataset, and equips it with
map
and other utility methods. All non-magic methods of the raw object also accessible. :param data: A dataset-like object. It can be a list or asubclass of Dataset.
-
filter
(fn)[source]¶ Filters samples by the filter function and uses the filtered data to update this dataset. :param fn: A filter function that takes a sample as input and
returns a boolean. Samples that return False are discarded.
-
shard
(num_shards=None, index=None)[source]¶ Use samples whose indices mod
index
equals 0 to update this dataset. :param num_shards: A integer representing the number ofdata shards. If None,
num_shards
would be number of trainers. Default: None- Parameters
index (int, optional) – A integer representing the index of the current shard. If None, index` would be the current trainer rank id. Default: None.
-
map
(fn, lazy=True, batched=False)[source]¶ Performs specific function on the dataset to transform and update every sample. :param fn: Transformations to be performed. It receives single
sample as argument if batched is False. Else it receives all examples.
- Parameters
lazy (bool, optional) – If True, transformations would be delayed and performed on demand. Otherwise, transforms all samples at once. Note that if
fn
is stochastic,lazy
should be True or you will get the same result on all epochs. Defalt: False.batched (bool, optional) – If True, transformations would take all examples as input and return a collection of transformed examples. Note that if set True,
lazy
option would be ignored.
-
-
class
DatasetBuilder
(lazy=None, name=None, **config)[source]¶ Bases:
object
A base class for all DatasetBuilder. It provides a
read()
function to turn a data file into a MapDataset or IterDataset._get_data()
function and_read()
function should be implemented to download data file and read data file into aIterable
of the examples.-
read
(filename, split='train')[source]¶ Returns an dataset containing all the examples that can be read from the file path. If
self.lazy
isFalse
, this eagerly reads all instances fromself._read()
and returns anMapDataset
. Ifself.lazy
isTrue
, this returns anIterDataset
, which internally relies on the generator created fromself._read()
to lazily produce examples. In this case your implementation of_read()
must also be lazy (that is, not load all examples into memory at once).
-
-
class
IterDataset
(data, **kwargs)[source]¶ Bases:
paddle.fluid.dataloader.dataset.IterableDataset
Wraps a dataset-like object as a instance of Dataset, and equips it with
map
and other utility methods. All non-magic methods of the raw object also accessible. :param data: A dataset-like object. It can be a Iterable or asubclass of Dataset.
-
filter
(fn)[source]¶ Filters samples by the filter function and uses the filtered data to update this dataset. :param fn: A filter function that takes a sample as input and
returns a boolean. Samples that return False are discarded.
-
shard
(num_shards=None, index=None)[source]¶ Use samples whose indices mod
index
equals 0 to update this dataset. :param num_shards: A integer representing the number ofdata shards. If None,
num_shards
would be number of trainers. Default: None- Parameters
index (int, optional) – A integer representing the index of the current shard. If None, index` would be the current trainer rank id. Default: None.
-