sampler

class SamplerHelper(dataset, iterable=None)[source]

Bases: object

SamplerHelper is to help construct iterable sampler used for DataLoader. It wraps a dataset and uses its __getitem__ Every SamplerHelper subclass has to provide an __iter__() method, providing a way to iterate over indices of dataset elements, and a __len__() method that returns the length of the returned iterators. Also can be used as batch iterator instead of indices iterator when iterator yield samples rather than indices by initializing iterator with a iterable dataset. .. note:: The __len__() method isn’t strictly required by

DataLoader, but is expected in any calculation involving the length of a DataLoader.

Parameters
  • dataset (Dataset) – Input dataset for SamplerHelper.

  • iterable (collections.Iterable|callable, optional) – Iterator of dataset. Default: None.

property length

Returns: the length of the SamplerHelper.

apply(fn)[source]

Transformations would be performed. It includes Shuffle, sort, fit and shard. :param fn: Transformations to be performed. It returns transformed iterable (and data_source). :type fn: callable

Returns

A new transformed object.

Return type

SamplerHelper

shuffle(buffer_size=- 1, seed=None)[source]

Shuffle the dataset according to the given buffer size and random seed. :param buffer_size: Buffer size for shuffle. if buffer_size < 0 or more than the length of the dataset,

buffer_size is the length of the dataset. Default: -1.

Parameters

seed (int, optional) – Seed for the random. Default: None.

Returns

SamplerHelper

sort(cmp=None, key=None, reverse=False, buffer_size=- 1)[source]

Sort samples according to given callable cmp or key. :param cmp: The function of comparison. Default: None. :type cmp: callable :param key: Return element to be compared. Default: None. :type key: callable :param reverse: If True, it means in descending order, and False means in ascending order. Default: False. :type reverse: bool :param buffer_size: Buffer size for sort. If buffer_size < 0 or buffer_size is more than the length of the data,

buffer_size will be set to the length of the data. Default: -1.

Returns

SamplerHelper

batch(batch_size, drop_last=False, batch_size_fn=None, key=None)[source]

To produce a BatchSampler. :param batch_size: Batch size. :type batch_size: int :param drop_last: Whether to drop the last mini batch. Default:

False.

Parameters
  • batch_size_fn (callable, optional) – It accepts four arguments: index of data source, the length of minibatch, the size of minibatch so far and data source, and it returns the size of mini batch so far. Actually, the returned value can be anything and would used as argument size_so_far in key. If None, it would return the length of mini match. Default: None.

  • key (callable, optional) – It accepts the size of minibatch so far and the length of minibatch, and returns what to be compared with batch_size. If None, only the size of mini batch so far would be compared with batch_size. Default: None.

Returns

SamplerHelper

shard(num_replicas=None, rank=None)[source]

Operates slice using multi GPU. :param num_replicas: The number of training process, and is also the number of GPU cards used in training.

Default: None.

Parameters

rank (int, optional) – Number of training process. Equal to the value of the environment variable PADDLE_TRAINER_ID. Default: None.

Returns

SamplerHelper

list()[source]

Produce a sampler with a listiterator when calling iter. Since list would fetch all contents at time, thus it can get accurate length. :returns: SamplerHelper