Creating a dataset

This example shows how to create a dataset for training a deep learning model.

In this example we will create a dataset that was used in our real-time paper [1].

from functools import partial
from pathlib import Path

import numpy as np
from scipy.signal import butter

from myoverse.datasets.filters.emg_augmentations import WaveletDecomposition
from myoverse.datasets.filters.generic import ApplyFunctionFilter, IndexDataFilter
from myoverse.datasets.filters.temporal import SOSFrequencyFilter
from myoverse.datasets.supervised import EMGDataset

dataset = EMGDataset(
    emg_data_path=Path(r"data/emg.pkl").resolve(),
    ground_truth_data_path=Path(r"data/kinematics.pkl").resolve(),
    sampling_frequency=2044.0,
    tasks_to_use=["1", "2"],
    save_path=Path(r"data/dataset.zarr").resolve(),
    emg_filter_pipeline_after_chunking=[
        [
            SOSFrequencyFilter(
                sos_filter_coefficients=butter(
                    4, [47.5, 52.5], "bandstop", output="sos", fs=2044
                ),
                is_output=True,
                name="Raw No Powerline",
            ),
            SOSFrequencyFilter(
                sos_filter_coefficients=butter(4, 20, "lowpass", output="sos", fs=2044),
                is_output=True,
                name="Raw No Powerline Lowpassed 20 Hz",
            ),
        ]
    ],
    emg_representations_to_filter_after_chunking=["Last"],
    ground_truth_filter_pipeline_before_chunking=[
        [
            ApplyFunctionFilter(function=np.reshape, newshape=(63, -1)),
            IndexDataFilter(indices=(slice(3, 63),)),
        ]
    ],
    ground_truth_representations_to_filter_before_chunking=["Input"],
    ground_truth_filter_pipeline_after_chunking=[
        [
            ApplyFunctionFilter(
                function=partial(np.mean, axis=-1),
                is_output=True,
                name="Mean Kinematics per EMG Chunk",
            ),
        ]
    ],
    ground_truth_representations_to_filter_after_chunking=["Last"],
    testing_split_ratio=0.3,
    validation_split_ratio=0.1,
    augmentation_pipelines=[
        [WaveletDecomposition(nr_of_grids=5, is_output=True, level=2)]
    ],
)

dataset.create_dataset()
Filtering and splitting data:   0%|          | 0/2 [00:00<?, ?it/s]
Filtering and splitting data:  50%|█████     | 1/2 [00:01<00:01,  1.88s/it]
Filtering and splitting data: 100%|██████████| 2/2 [00:03<00:00,  1.91s/it]
Filtering and splitting data: 100%|██████████| 2/2 [00:03<00:00,  1.90s/it]

Augmenting with [WaveletDecomposition (WaveletDecomposition)]:   0%|          | 0/317 [00:00<?, ?it/s]
Augmenting with [WaveletDecomposition (WaveletDecomposition)]:   5%|▌         | 16/317 [00:00<00:01, 155.84it/s]
Augmenting with [WaveletDecomposition (WaveletDecomposition)]:  10%|█         | 32/317 [00:00<00:01, 156.48it/s]
Augmenting with [WaveletDecomposition (WaveletDecomposition)]:  15%|█▌        | 49/317 [00:00<00:01, 161.92it/s]
Augmenting with [WaveletDecomposition (WaveletDecomposition)]:  21%|██        | 67/317 [00:00<00:01, 167.81it/s]
Augmenting with [WaveletDecomposition (WaveletDecomposition)]:  27%|██▋       | 85/317 [00:00<00:01, 170.54it/s]
Augmenting with [WaveletDecomposition (WaveletDecomposition)]:  32%|███▏      | 103/317 [00:00<00:01, 172.87it/s]
Augmenting with [WaveletDecomposition (WaveletDecomposition)]:  38%|███▊      | 121/317 [00:00<00:01, 170.32it/s]
Augmenting with [WaveletDecomposition (WaveletDecomposition)]:  44%|████▍     | 139/317 [00:00<00:01, 168.86it/s]
Augmenting with [WaveletDecomposition (WaveletDecomposition)]:  49%|████▉     | 156/317 [00:00<00:00, 167.87it/s]
Augmenting with [WaveletDecomposition (WaveletDecomposition)]:  55%|█████▍    | 173/317 [00:01<00:00, 166.73it/s]
Augmenting with [WaveletDecomposition (WaveletDecomposition)]:  60%|█████▉    | 190/317 [00:01<00:00, 166.60it/s]
Augmenting with [WaveletDecomposition (WaveletDecomposition)]:  65%|██████▌   | 207/317 [00:01<00:00, 161.67it/s]
Augmenting with [WaveletDecomposition (WaveletDecomposition)]:  71%|███████   | 224/317 [00:01<00:00, 162.91it/s]
Augmenting with [WaveletDecomposition (WaveletDecomposition)]:  76%|███████▌  | 241/317 [00:01<00:00, 164.02it/s]
Augmenting with [WaveletDecomposition (WaveletDecomposition)]:  81%|████████▏ | 258/317 [00:02<00:01, 57.78it/s]
Augmenting with [WaveletDecomposition (WaveletDecomposition)]:  87%|████████▋ | 275/317 [00:02<00:00, 71.98it/s]
Augmenting with [WaveletDecomposition (WaveletDecomposition)]:  92%|█████████▏| 293/317 [00:02<00:00, 88.30it/s]
Augmenting with [WaveletDecomposition (WaveletDecomposition)]:  98%|█████████▊| 311/317 [00:02<00:00, 104.29it/s]
Augmenting with [WaveletDecomposition (WaveletDecomposition)]: 100%|██████████| 317/317 [00:02<00:00, 125.18it/s]

Default dataset are also available. Here is an example of how to use the EMBCDataset used in [2].

from myoverse.datasets.defaults import EMBCDataset

dataset = EMBCDataset(
    emg_data_path=Path(r"data/emg.pkl").resolve(),
    ground_truth_data_path=Path(r"data/kinematics.pkl").resolve(),
    save_path=Path(r"data/dataset.zarr").resolve(),
    tasks_to_use=["1", "2"],
)

dataset.create_dataset()
Filtering and splitting data:   0%|          | 0/2 [00:00<?, ?it/s]
Filtering and splitting data:  50%|█████     | 1/2 [00:01<00:01,  1.19s/it]
Filtering and splitting data: 100%|██████████| 2/2 [00:02<00:00,  1.13s/it]
Filtering and splitting data: 100%|██████████| 2/2 [00:02<00:00,  1.14s/it]

Augmenting with [GaussianNoise (GaussianNoise)]:   0%|          | 0/317 [00:00<?, ?it/s]
Augmenting with [GaussianNoise (GaussianNoise)]:   0%|          | 1/317 [00:01<08:04,  1.53s/it]
Augmenting with [GaussianNoise (GaussianNoise)]:   5%|▌         | 17/317 [00:01<00:21, 14.20it/s]
Augmenting with [GaussianNoise (GaussianNoise)]:  10%|█         | 33/317 [00:01<00:09, 29.91it/s]
Augmenting with [GaussianNoise (GaussianNoise)]:  15%|█▌        | 49/317 [00:01<00:05, 46.99it/s]
Augmenting with [GaussianNoise (GaussianNoise)]:  20%|██        | 64/317 [00:01<00:04, 63.23it/s]
Augmenting with [GaussianNoise (GaussianNoise)]:  25%|██▌       | 80/317 [00:02<00:02, 80.79it/s]
Augmenting with [GaussianNoise (GaussianNoise)]:  30%|███       | 96/317 [00:02<00:02, 96.89it/s]
Augmenting with [GaussianNoise (GaussianNoise)]:  35%|███▌      | 112/317 [00:02<00:01, 110.50it/s]
Augmenting with [GaussianNoise (GaussianNoise)]:  40%|████      | 128/317 [00:02<00:01, 121.91it/s]
Augmenting with [GaussianNoise (GaussianNoise)]:  45%|████▌     | 144/317 [00:02<00:01, 130.97it/s]
Augmenting with [GaussianNoise (GaussianNoise)]:  50%|█████     | 160/317 [00:02<00:01, 137.87it/s]
Augmenting with [GaussianNoise (GaussianNoise)]:  56%|█████▌    | 176/317 [00:02<00:00, 141.16it/s]
Augmenting with [GaussianNoise (GaussianNoise)]:  61%|██████    | 192/317 [00:02<00:00, 143.94it/s]
Augmenting with [GaussianNoise (GaussianNoise)]:  66%|██████▌   | 208/317 [00:02<00:00, 144.72it/s]
Augmenting with [GaussianNoise (GaussianNoise)]:  71%|███████   | 224/317 [00:02<00:00, 146.15it/s]
Augmenting with [GaussianNoise (GaussianNoise)]:  76%|███████▌  | 240/317 [00:03<00:00, 147.57it/s]
Augmenting with [GaussianNoise (GaussianNoise)]:  81%|████████  | 256/317 [00:03<00:00, 148.64it/s]
Augmenting with [GaussianNoise (GaussianNoise)]:  86%|████████▌ | 272/317 [00:03<00:00, 149.44it/s]
Augmenting with [GaussianNoise (GaussianNoise)]:  91%|█████████ | 288/317 [00:03<00:00, 150.16it/s]
Augmenting with [GaussianNoise (GaussianNoise)]:  96%|█████████▌| 304/317 [00:03<00:00, 150.42it/s]
Augmenting with [GaussianNoise (GaussianNoise)]: 100%|██████████| 317/317 [00:03<00:00, 87.96it/s]

Augmenting with [MagnitudeWarping (MagnitudeWarping)]:   0%|          | 0/317 [00:00<?, ?it/s]
Augmenting with [MagnitudeWarping (MagnitudeWarping)]:   6%|▌         | 19/317 [00:00<00:01, 182.37it/s]
Augmenting with [MagnitudeWarping (MagnitudeWarping)]:  12%|█▏        | 39/317 [00:00<00:01, 192.29it/s]
Augmenting with [MagnitudeWarping (MagnitudeWarping)]:  19%|█▊        | 59/317 [00:00<00:01, 193.96it/s]
Augmenting with [MagnitudeWarping (MagnitudeWarping)]:  25%|██▍       | 79/317 [00:00<00:01, 195.43it/s]
Augmenting with [MagnitudeWarping (MagnitudeWarping)]:  31%|███       | 99/317 [00:00<00:01, 196.93it/s]
Augmenting with [MagnitudeWarping (MagnitudeWarping)]:  38%|███▊      | 119/317 [00:00<00:01, 188.91it/s]
Augmenting with [MagnitudeWarping (MagnitudeWarping)]:  44%|████▎     | 138/317 [00:00<00:00, 187.17it/s]
Augmenting with [MagnitudeWarping (MagnitudeWarping)]:  50%|█████     | 159/317 [00:00<00:00, 191.29it/s]
Augmenting with [MagnitudeWarping (MagnitudeWarping)]:  56%|█████▋    | 179/317 [00:00<00:00, 193.81it/s]
Augmenting with [MagnitudeWarping (MagnitudeWarping)]:  63%|██████▎   | 199/317 [00:01<00:00, 193.09it/s]
Augmenting with [MagnitudeWarping (MagnitudeWarping)]:  69%|██████▉   | 219/317 [00:01<00:00, 190.04it/s]
Augmenting with [MagnitudeWarping (MagnitudeWarping)]:  75%|███████▌  | 239/317 [00:01<00:00, 188.98it/s]
Augmenting with [MagnitudeWarping (MagnitudeWarping)]:  81%|████████▏ | 258/317 [00:01<00:00, 188.47it/s]
Augmenting with [MagnitudeWarping (MagnitudeWarping)]:  87%|████████▋ | 277/317 [00:01<00:00, 188.14it/s]
Augmenting with [MagnitudeWarping (MagnitudeWarping)]:  93%|█████████▎| 296/317 [00:01<00:00, 188.01it/s]
Augmenting with [MagnitudeWarping (MagnitudeWarping)]:  99%|█████████▉| 315/317 [00:01<00:00, 187.68it/s]
Augmenting with [MagnitudeWarping (MagnitudeWarping)]: 100%|██████████| 317/317 [00:01<00:00, 190.05it/s]

Augmenting with [WaveletDecomposition (WaveletDecomposition)]:   0%|          | 0/317 [00:00<?, ?it/s]
Augmenting with [WaveletDecomposition (WaveletDecomposition)]:   4%|▍         | 13/317 [00:00<00:02, 127.25it/s]
Augmenting with [WaveletDecomposition (WaveletDecomposition)]:   9%|▉         | 29/317 [00:00<00:01, 145.58it/s]
Augmenting with [WaveletDecomposition (WaveletDecomposition)]:  14%|█▍        | 45/317 [00:00<00:01, 151.37it/s]
Augmenting with [WaveletDecomposition (WaveletDecomposition)]:  19%|█▉        | 61/317 [00:00<00:01, 153.73it/s]
Augmenting with [WaveletDecomposition (WaveletDecomposition)]:  24%|██▍       | 77/317 [00:00<00:01, 154.51it/s]
Augmenting with [WaveletDecomposition (WaveletDecomposition)]:  29%|██▉       | 93/317 [00:00<00:01, 155.61it/s]
Augmenting with [WaveletDecomposition (WaveletDecomposition)]:  34%|███▍      | 109/317 [00:00<00:01, 153.79it/s]
Augmenting with [WaveletDecomposition (WaveletDecomposition)]:  39%|███▉      | 125/317 [00:00<00:01, 152.39it/s]
Augmenting with [WaveletDecomposition (WaveletDecomposition)]:  44%|████▍     | 141/317 [00:00<00:01, 151.42it/s]
Augmenting with [WaveletDecomposition (WaveletDecomposition)]:  50%|████▉     | 157/317 [00:01<00:01, 150.89it/s]
Augmenting with [WaveletDecomposition (WaveletDecomposition)]:  55%|█████▍    | 173/317 [00:01<00:00, 150.05it/s]
Augmenting with [WaveletDecomposition (WaveletDecomposition)]:  60%|█████▉    | 189/317 [00:01<00:00, 150.00it/s]
Augmenting with [WaveletDecomposition (WaveletDecomposition)]:  65%|██████▍   | 205/317 [00:01<00:00, 149.86it/s]
Augmenting with [WaveletDecomposition (WaveletDecomposition)]:  69%|██████▉   | 220/317 [00:01<00:00, 146.31it/s]
Augmenting with [WaveletDecomposition (WaveletDecomposition)]:  74%|███████▍  | 236/317 [00:01<00:00, 147.44it/s]
Augmenting with [WaveletDecomposition (WaveletDecomposition)]:  79%|███████▉  | 251/317 [00:01<00:00, 148.00it/s]
Augmenting with [WaveletDecomposition (WaveletDecomposition)]:  84%|████████▍ | 266/317 [00:01<00:00, 147.88it/s]
Augmenting with [WaveletDecomposition (WaveletDecomposition)]:  89%|████████▊ | 281/317 [00:01<00:00, 147.27it/s]
Augmenting with [WaveletDecomposition (WaveletDecomposition)]:  93%|█████████▎| 296/317 [00:01<00:00, 147.87it/s]
Augmenting with [WaveletDecomposition (WaveletDecomposition)]:  98%|█████████▊| 311/317 [00:02<00:00, 148.19it/s]
Augmenting with [WaveletDecomposition (WaveletDecomposition)]: 100%|██████████| 317/317 [00:02<00:00, 149.42it/s]

Total running time of the script: (0 minutes 19.022 seconds)

Estimated memory usage: 1362 MB

Gallery generated by Sphinx-Gallery