Creating a dataset#

This example shows how to create a multi-modal dataset for training.

Creating a Dataset with Multiple Modalities#

MyoVerse stores continuous data with named dimensions (xarray + zarr). Any number of modalities can be stored - you decide what’s input vs target at training time, not storage time.

from pathlib import Path

from myoverse.datasets import DatasetCreator, Modality

# Get the path to the data file
# Find data directory relative to myoverse package (works in all contexts)
import myoverse
_pkg_dir = Path(myoverse.__file__).parent.parent
DATA_DIR = _pkg_dir / "examples" / "data"
if not DATA_DIR.exists():
    DATA_DIR = Path.cwd() / "examples" / "data"

# Create dataset with multiple modalities
creator = DatasetCreator(
    modalities={
        "emg": Modality(
            path=DATA_DIR / "emg.pkl",
            dims=("channel", "time"),
        ),
        "kinematics": Modality(
            path=DATA_DIR / "kinematics.pkl",
            dims=("joint", "xyz", "time"),
        ),
    },
    sampling_frequency=2044.0,
    tasks_to_use=["1", "2"],
    save_path=DATA_DIR / "dataset.zip",
    test_ratio=0.2,
    val_ratio=0.2,
    debug_level=1,
)

creator.create()
────────────────────────── STARTING DATASET CREATION ───────────────────────────

                             Dataset Configuration
┏━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Parameter               ┃ Value                                              ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Modalities              │ emg, kinematics                                    │
│ Sampling frequency (Hz) │ 2044.0                                             │
│ Save path               │ /home/runner/work/MyoVerse/MyoVerse/examples/data… │
│ Test ratio              │ 0.2                                                │
│ Validation ratio        │ 0.2                                                │
└─────────────────────────┴────────────────────────────────────────────────────┘

Processing 2 tasks: 1, 2

Dataset Structure
├── emg dims=('channel', 'time')
│   ├── Task 1: (320, 20440)
│   └── Task 2: (320, 20440)
└── kinematics dims=('joint', 'xyz', 'time')
    ├── Task 1: (21, 3, 20440)
    └── Task 2: (21, 3, 20440)

─────────────────────────────── PROCESSING TASKS ───────────────────────────────

  Processing task 2 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00
────────────────────────── DATASET CREATION COMPLETED ──────────────────────────

                Dataset Summary
 Split       emg              kinematics
 training    1: (320, 16352)  1: (21, 3, 16352)
             2: (320, 16352)  2: (21, 3, 16352)
 validation  1: (320, 816)    1: (21, 3, 816)
             2: (320, 816)    2: (21, 3, 816)
 testing     1: (320, 3272)   1: (21, 3, 3272)
             2: (320, 3272)   2: (21, 3, 3272)

Total size: 26.09 MB
─────────────────── Dataset Creation Successfully Completed! ───────────────────

Total running time of the script: (0 minutes 0.754 seconds)

Estimated memory usage: 626 MB

Gallery generated by Sphinx-Gallery