ChunkizeDataFilter#

class myoverse.datasets.filters.generic.ChunkizeDataFilter(input_is_chunked, is_output=False, name=None, run_checks=True, *, chunk_size=None, chunk_shift=None, chunk_overlap=None)[source]#

Filter that chunks the input array into overlapping or non-overlapping segments.

This filter divides a continuous signal into chunks along the last dimension. It’s useful for preparing data for window-based analysis or for applying sliding window techniques.

Parameters:
  • input_is_chunked (bool) – Whether the input is chunked or not.

  • is_output (bool, optional) – Whether the filter is an output filter. If True, the resulting signal will be outputted by any dataset pipeline, by default False.

  • name (str, optional) – The name of the filter, by default None.

  • run_checks (bool) –

    Whether to run the checks when filtering. By default, True. If False can potentially speed up performance.

    Warning

    If False, the user is responsible for ensuring that the input array is valid.

  • chunk_size (int) – The size of each chunk along the last dimension.

  • chunk_shift (int, optional) – The shift between consecutive chunks. If provided, chunk_overlap is ignored. A small shift creates more overlapping chunks.

  • chunk_overlap (int, optional) – The overlap between consecutive chunks. If provided, chunk_shift is ignored. Overlap = chunk_size - chunk_shift.

Raises:
  • ValueError – If input_is_chunked is True (this filter only accepts unchunked input).

  • ValueError – If chunk_size is not specified.

  • ValueError – If neither chunk_shift nor chunk_overlap is specified.

  • ValueError – If chunk_shift is less than 1.

  • ValueError – If chunk_overlap is less than 0 or greater than chunk_size.

Examples

>>> import numpy as np
>>> from myoverse.datasets.filters.generic import ChunkizeDataFilter
>>> # Create data
>>> data = np.random.rand(10, 1000)
>>> # Create non-overlapping chunks
>>> no_overlap = ChunkizeDataFilter(
...     chunk_size=100,
...     chunk_shift=100,
...     input_is_chunked=False
... )
>>> chunked_data = no_overlap(data)  # shape: (10, 10, 100)
>>> # Create overlapping chunks
>>> with_overlap = ChunkizeDataFilter(
...     chunk_size=100,
...     chunk_overlap=50,
...     input_is_chunked=False
... )
>>> overlapped_data = with_overlap(data)  # shape: (19, 10, 100)

Notes

The output shape will be (n_chunks, *original_dims, chunk_size), where n_chunks = (input_length - chunk_size) // chunk_shift + 1 or n_chunks = (input_length - chunk_size) // (chunk_size - chunk_overlap) + 1

When both chunk_shift and chunk_overlap are provided, chunk_shift takes precedence.

See also

_get_windows_with_shift

Efficient windowing function used in temporal filters

Methods

__init__(input_is_chunked[, is_output, ...])

_filter(input_array, **kwargs)

Chunk the input array into overlapping segments.