DatasetCreator#
- class myoverse.datasets.DatasetCreator(modalities, sampling_frequency=2048.0, tasks_to_use=(), save_path=PosixPath('dataset.zip'), test_ratio=0.2, val_ratio=0.2, time_chunk_size=256, debug_level=0)[source]#
Creates datasets stored in zarr for direct tensor loading.
Data is stored in zarr format with metadata for dimension names, enabling direct loading to GPU tensors with named dimensions.
- Parameters:
modalities (dict[str, Modality]) – Dictionary mapping modality names to Modality configs.
sampling_frequency (float) – Sampling frequency in Hz.
tasks_to_use (Sequence[str]) – Task keys to include (empty = all).
save_path (Path | str) – Output path for the zarr zip file (e.g., “dataset.zip”).
test_ratio (float) – Ratio for test split (0.0-1.0).
val_ratio (float) – Ratio for validation split (0.0-1.0).
time_chunk_size (int) – Chunk size along time dimension for zarr storage.
debug_level (int) – Debug output level (0=none, 1=text, 2=text+graphs).
Examples
>>> from myoverse.datasets import DatasetCreator, Modality >>> >>> creator = DatasetCreator( ... modalities={ ... "emg": Modality(path="emg.pkl", dims=("channel", "time")), ... "kinematics": Modality(path="kin.pkl", dims=("joint", "time")), ... }, ... sampling_frequency=2048.0, ... save_path="dataset.zip", ... ) >>> creator.create() >>> >>> # Load directly to GPU tensors >>> from myoverse.datasets import DataModule >>> dm = DataModule("dataset.zip", device="cuda")
Methods
__init__(modalities[, sampling_frequency, ...])_extract_center(data, ratio)Extract center portion of data along last axis.
_print_header([title])Print dataset summary.
_process_all_tasks(store)_process_task(task, store)Process a single task for all modalities.
_split_continuous(data)Split continuous data along time axis (last dimension).
_store_array(group, name, data)Store an array with time-chunked layout.
create()Create the dataset.