Note
Go to the end to download the full example code.
Creating a dataset#
This example shows how to create a multi-modal dataset for training.
Creating a Dataset with Multiple Modalities#
MyoVerse stores continuous data with named dimensions (xarray + zarr). Any number of modalities can be stored - you decide what’s input vs target at training time, not storage time.
from pathlib import Path
from myoverse.datasets import DatasetCreator, Modality
# Get the path to the data file
# Find data directory relative to myoverse package (works in all contexts)
import myoverse
_pkg_dir = Path(myoverse.__file__).parent.parent
DATA_DIR = _pkg_dir / "examples" / "data"
if not DATA_DIR.exists():
DATA_DIR = Path.cwd() / "examples" / "data"
# Create dataset with multiple modalities
creator = DatasetCreator(
modalities={
"emg": Modality(
path=DATA_DIR / "emg.pkl",
dims=("channel", "time"),
),
"kinematics": Modality(
path=DATA_DIR / "kinematics.pkl",
dims=("joint", "xyz", "time"),
),
},
sampling_frequency=2044.0,
tasks_to_use=["1", "2"],
save_path=DATA_DIR / "dataset.zip",
test_ratio=0.2,
val_ratio=0.2,
debug_level=1,
)
creator.create()
────────────────────────── STARTING DATASET CREATION ───────────────────────────
Dataset Configuration
┏━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Parameter ┃ Value ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Modalities │ emg, kinematics │
│ Sampling frequency (Hz) │ 2044.0 │
│ Save path │ /home/runner/work/MyoVerse/MyoVerse/examples/data… │
│ Test ratio │ 0.2 │
│ Validation ratio │ 0.2 │
└─────────────────────────┴────────────────────────────────────────────────────┘
Processing 2 tasks: 1, 2
Dataset Structure
├── emg dims=('channel', 'time')
│ ├── Task 1: (320, 20440)
│ └── Task 2: (320, 20440)
└── kinematics dims=('joint', 'xyz', 'time')
├── Task 1: (21, 3, 20440)
└── Task 2: (21, 3, 20440)
─────────────────────────────── PROCESSING TASKS ───────────────────────────────
Processing task 2 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00
────────────────────────── DATASET CREATION COMPLETED ──────────────────────────
Dataset Summary
Split emg kinematics
training 1: (320, 16352) 1: (21, 3, 16352)
2: (320, 16352) 2: (21, 3, 16352)
validation 1: (320, 816) 1: (21, 3, 816)
2: (320, 816) 2: (21, 3, 816)
testing 1: (320, 3272) 1: (21, 3, 3272)
2: (320, 3272) 2: (21, 3, 3272)
Total size: 26.09 MB
─────────────────── Dataset Creation Successfully Completed! ───────────────────
Total running time of the script: (0 minutes 0.754 seconds)
Estimated memory usage: 626 MB