Creating a dataset

This example shows how to create a dataset for training a deep learning model.

In this example we will create a dataset that was used in our real-time paper [1].

from functools import partial
from pathlib import Path

import numpy as np
from scipy.signal import butter

from doc_octopy.datasets.filters.emg_augmentations import WaveletDecomposition
from doc_octopy.datasets.filters.generic import ApplyFunctionFilter, IndexDataFilter
from doc_octopy.datasets.filters.temporal import SOSFrequencyFilter
from doc_octopy.datasets.supervised import EMGDataset

dataset = EMGDataset(
    emg_data_path=Path("data/emg.pkl").resolve(),
    ground_truth_data_path=Path("data/kinematics.pkl").resolve(),
    sampling_frequency=2044.0,
    tasks_to_use=["1", "2"],
    save_path=Path("data/dataset.zarr").resolve(),
    emg_filter_pipeline_after_chunking=[
        [
            SOSFrequencyFilter(
                sos_filter_coefficients=butter(
                    4, [47.5, 52.5], "bandstop", output="sos", fs=2044
                ),
                is_output=True,
                name="Raw No Powerline",
            ),
            SOSFrequencyFilter(
                sos_filter_coefficients=butter(4, 20, "lowpass", output="sos", fs=2044),
                is_output=True,
                name="Raw No Powerline Lowpassed 20 Hz",
            ),
        ]
    ],
    emg_representations_to_filter_after_chunking=["Last"],
    ground_truth_filter_pipeline_before_chunking=[
        [
            ApplyFunctionFilter(function=np.reshape, newshape=(63, -1)),
            IndexDataFilter(indices=(slice(3, 63),)),
        ]
    ],
    ground_truth_representations_to_filter_before_chunking=["Input"],
    ground_truth_filter_pipeline_after_chunking=[
        [
            ApplyFunctionFilter(
                function=partial(np.mean, axis=-1),
                is_output=True,
                name="Mean Kinematics per EMG Chunk",
            ),
        ]
    ],
    ground_truth_representations_to_filter_after_chunking=["Last"],
    testing_split_ratio=0.3,
    validation_split_ratio=0.1,
    augmentation_pipelines=[
        [WaveletDecomposition(nr_of_grids=5, is_output=True, level=2)]
    ],
)

dataset.create_dataset()
Filtering and splitting data:   0%|          | 0/2 [00:00<?, ?it/s]
Filtering and splitting data:  50%|█████     | 1/2 [00:01<00:01,  1.90s/it]
Filtering and splitting data: 100%|██████████| 2/2 [00:03<00:00,  1.81s/it]
Filtering and splitting data: 100%|██████████| 2/2 [00:03<00:00,  1.82s/it]

Augmenting with [WaveletDecomposition (WaveletDecomposition)]:   0%|          | 0/317 [00:00<?, ?it/s]
Augmenting with [WaveletDecomposition (WaveletDecomposition)]:   5%|▌         | 16/317 [00:00<00:01, 159.00it/s]
Augmenting with [WaveletDecomposition (WaveletDecomposition)]:  11%|█         | 34/317 [00:00<00:01, 167.35it/s]
Augmenting with [WaveletDecomposition (WaveletDecomposition)]:  16%|█▋        | 52/317 [00:00<00:01, 171.23it/s]
Augmenting with [WaveletDecomposition (WaveletDecomposition)]:  22%|██▏       | 70/317 [00:00<00:01, 173.10it/s]
Augmenting with [WaveletDecomposition (WaveletDecomposition)]:  28%|██▊       | 88/317 [00:00<00:01, 173.71it/s]
Augmenting with [WaveletDecomposition (WaveletDecomposition)]:  33%|███▎      | 106/317 [00:00<00:01, 174.21it/s]
Augmenting with [WaveletDecomposition (WaveletDecomposition)]:  39%|███▉      | 124/317 [00:00<00:01, 168.86it/s]
Augmenting with [WaveletDecomposition (WaveletDecomposition)]:  44%|████▍     | 141/317 [00:00<00:01, 167.51it/s]
Augmenting with [WaveletDecomposition (WaveletDecomposition)]:  50%|████▉     | 158/317 [00:00<00:00, 166.47it/s]
Augmenting with [WaveletDecomposition (WaveletDecomposition)]:  55%|█████▌    | 175/317 [00:01<00:00, 165.98it/s]
Augmenting with [WaveletDecomposition (WaveletDecomposition)]:  61%|██████    | 192/317 [00:01<00:00, 166.10it/s]
Augmenting with [WaveletDecomposition (WaveletDecomposition)]:  66%|██████▌   | 209/317 [00:01<00:00, 166.23it/s]
Augmenting with [WaveletDecomposition (WaveletDecomposition)]:  71%|███████▏  | 226/317 [00:01<00:00, 166.48it/s]
Augmenting with [WaveletDecomposition (WaveletDecomposition)]:  77%|███████▋  | 243/317 [00:01<00:00, 166.34it/s]
Augmenting with [WaveletDecomposition (WaveletDecomposition)]:  82%|████████▏ | 260/317 [00:02<00:00, 61.85it/s]
Augmenting with [WaveletDecomposition (WaveletDecomposition)]:  88%|████████▊ | 278/317 [00:02<00:00, 77.74it/s]
Augmenting with [WaveletDecomposition (WaveletDecomposition)]:  93%|█████████▎| 296/317 [00:02<00:00, 94.18it/s]
Augmenting with [WaveletDecomposition (WaveletDecomposition)]:  99%|█████████▉| 314/317 [00:02<00:00, 110.10it/s]
Augmenting with [WaveletDecomposition (WaveletDecomposition)]: 100%|██████████| 317/317 [00:02<00:00, 129.77it/s]

Default dataset are also available. Here is an example of how to use the EMBCDataset used in [2].

from doc_octopy.datasets.defaults import EMBCDataset

dataset = EMBCDataset(
    emg_data_path=Path("data/emg.pkl").resolve(),
    ground_truth_data_path=Path("data/kinematics.pkl").resolve(),
    save_path=Path("data/dataset.zarr").resolve(),
    tasks_to_use=["1", "2"],
)

dataset.create_dataset()
Filtering and splitting data:   0%|          | 0/2 [00:00<?, ?it/s]
Filtering and splitting data:  50%|█████     | 1/2 [00:01<00:01,  1.10s/it]
Filtering and splitting data: 100%|██████████| 2/2 [00:02<00:00,  1.07s/it]
Filtering and splitting data: 100%|██████████| 2/2 [00:02<00:00,  1.08s/it]

Augmenting with [GaussianNoise (GaussianNoise)]:   0%|          | 0/317 [00:00<?, ?it/s]
Augmenting with [GaussianNoise (GaussianNoise)]:   0%|          | 1/317 [00:01<08:09,  1.55s/it]
Augmenting with [GaussianNoise (GaussianNoise)]:   7%|▋         | 21/317 [00:01<00:16, 17.43it/s]
Augmenting with [GaussianNoise (GaussianNoise)]:  13%|█▎        | 41/317 [00:01<00:07, 36.94it/s]
Augmenting with [GaussianNoise (GaussianNoise)]:  19%|█▉        | 61/317 [00:01<00:04, 58.40it/s]
Augmenting with [GaussianNoise (GaussianNoise)]:  26%|██▌       | 81/317 [00:01<00:02, 80.35it/s]
Augmenting with [GaussianNoise (GaussianNoise)]:  32%|███▏      | 101/317 [00:02<00:02, 101.56it/s]
Augmenting with [GaussianNoise (GaussianNoise)]:  38%|███▊      | 121/317 [00:02<00:01, 121.32it/s]
Augmenting with [GaussianNoise (GaussianNoise)]:  44%|████▍     | 141/317 [00:02<00:01, 137.78it/s]
Augmenting with [GaussianNoise (GaussianNoise)]:  51%|█████     | 161/317 [00:02<00:01, 151.09it/s]
Augmenting with [GaussianNoise (GaussianNoise)]:  57%|█████▋    | 181/317 [00:02<00:00, 162.70it/s]
Augmenting with [GaussianNoise (GaussianNoise)]:  63%|██████▎   | 201/317 [00:02<00:00, 171.71it/s]
Augmenting with [GaussianNoise (GaussianNoise)]:  70%|██████▉   | 221/317 [00:02<00:00, 175.39it/s]
Augmenting with [GaussianNoise (GaussianNoise)]:  76%|███████▌  | 241/317 [00:02<00:00, 178.93it/s]
Augmenting with [GaussianNoise (GaussianNoise)]:  82%|████████▏ | 260/317 [00:02<00:00, 181.37it/s]
Augmenting with [GaussianNoise (GaussianNoise)]:  88%|████████▊ | 279/317 [00:02<00:00, 182.94it/s]
Augmenting with [GaussianNoise (GaussianNoise)]:  94%|█████████▍| 298/317 [00:03<00:00, 184.18it/s]
Augmenting with [GaussianNoise (GaussianNoise)]: 100%|██████████| 317/317 [00:03<00:00, 185.14it/s]
Augmenting with [GaussianNoise (GaussianNoise)]: 100%|██████████| 317/317 [00:03<00:00, 98.97it/s]

Augmenting with [MagnitudeWarping (MagnitudeWarping)]:   0%|          | 0/317 [00:00<?, ?it/s]
Augmenting with [MagnitudeWarping (MagnitudeWarping)]:   6%|▌         | 18/317 [00:00<00:01, 173.62it/s]
Augmenting with [MagnitudeWarping (MagnitudeWarping)]:  12%|█▏        | 38/317 [00:00<00:01, 186.62it/s]
Augmenting with [MagnitudeWarping (MagnitudeWarping)]:  18%|█▊        | 58/317 [00:00<00:01, 190.46it/s]
Augmenting with [MagnitudeWarping (MagnitudeWarping)]:  25%|██▍       | 78/317 [00:00<00:01, 192.38it/s]
Augmenting with [MagnitudeWarping (MagnitudeWarping)]:  31%|███       | 98/317 [00:00<00:01, 188.34it/s]
Augmenting with [MagnitudeWarping (MagnitudeWarping)]:  37%|███▋      | 117/317 [00:00<00:01, 185.03it/s]
Augmenting with [MagnitudeWarping (MagnitudeWarping)]:  43%|████▎     | 136/317 [00:00<00:01, 180.71it/s]
Augmenting with [MagnitudeWarping (MagnitudeWarping)]:  49%|████▉     | 155/317 [00:00<00:00, 181.29it/s]
Augmenting with [MagnitudeWarping (MagnitudeWarping)]:  55%|█████▍    | 174/317 [00:00<00:00, 181.98it/s]
Augmenting with [MagnitudeWarping (MagnitudeWarping)]:  61%|██████    | 193/317 [00:01<00:00, 182.30it/s]
Augmenting with [MagnitudeWarping (MagnitudeWarping)]:  67%|██████▋   | 212/317 [00:01<00:00, 182.29it/s]
Augmenting with [MagnitudeWarping (MagnitudeWarping)]:  73%|███████▎  | 231/317 [00:01<00:00, 182.34it/s]
Augmenting with [MagnitudeWarping (MagnitudeWarping)]:  79%|███████▉  | 250/317 [00:01<00:00, 181.94it/s]
Augmenting with [MagnitudeWarping (MagnitudeWarping)]:  85%|████████▍ | 269/317 [00:01<00:00, 182.17it/s]
Augmenting with [MagnitudeWarping (MagnitudeWarping)]:  91%|█████████ | 288/317 [00:01<00:00, 177.72it/s]
Augmenting with [MagnitudeWarping (MagnitudeWarping)]:  97%|█████████▋| 307/317 [00:01<00:00, 179.00it/s]
Augmenting with [MagnitudeWarping (MagnitudeWarping)]: 100%|██████████| 317/317 [00:01<00:00, 182.33it/s]

Augmenting with [WaveletDecomposition (WaveletDecomposition)]:   0%|          | 0/317 [00:00<?, ?it/s]
Augmenting with [WaveletDecomposition (WaveletDecomposition)]:   4%|▍         | 13/317 [00:00<00:02, 124.98it/s]
Augmenting with [WaveletDecomposition (WaveletDecomposition)]:   9%|▉         | 29/317 [00:00<00:02, 142.24it/s]
Augmenting with [WaveletDecomposition (WaveletDecomposition)]:  14%|█▍        | 45/317 [00:00<00:01, 148.17it/s]
Augmenting with [WaveletDecomposition (WaveletDecomposition)]:  19%|█▉        | 61/317 [00:00<00:01, 151.08it/s]
Augmenting with [WaveletDecomposition (WaveletDecomposition)]:  24%|██▍       | 77/317 [00:00<00:01, 152.55it/s]
Augmenting with [WaveletDecomposition (WaveletDecomposition)]:  29%|██▉       | 93/317 [00:00<00:01, 153.45it/s]
Augmenting with [WaveletDecomposition (WaveletDecomposition)]:  34%|███▍      | 109/317 [00:00<00:01, 151.32it/s]
Augmenting with [WaveletDecomposition (WaveletDecomposition)]:  39%|███▉      | 125/317 [00:00<00:01, 149.91it/s]
Augmenting with [WaveletDecomposition (WaveletDecomposition)]:  44%|████▍     | 141/317 [00:00<00:01, 148.87it/s]
Augmenting with [WaveletDecomposition (WaveletDecomposition)]:  49%|████▉     | 156/317 [00:01<00:01, 148.28it/s]
Augmenting with [WaveletDecomposition (WaveletDecomposition)]:  54%|█████▍    | 171/317 [00:01<00:00, 147.89it/s]
Augmenting with [WaveletDecomposition (WaveletDecomposition)]:  59%|█████▊    | 186/317 [00:01<00:00, 144.39it/s]
Augmenting with [WaveletDecomposition (WaveletDecomposition)]:  63%|██████▎   | 201/317 [00:01<00:00, 145.11it/s]
Augmenting with [WaveletDecomposition (WaveletDecomposition)]:  68%|██████▊   | 216/317 [00:01<00:00, 145.65it/s]
Augmenting with [WaveletDecomposition (WaveletDecomposition)]:  73%|███████▎  | 231/317 [00:01<00:00, 145.95it/s]
Augmenting with [WaveletDecomposition (WaveletDecomposition)]:  78%|███████▊  | 246/317 [00:01<00:00, 146.13it/s]
Augmenting with [WaveletDecomposition (WaveletDecomposition)]:  82%|████████▏ | 261/317 [00:01<00:00, 146.30it/s]
Augmenting with [WaveletDecomposition (WaveletDecomposition)]:  87%|████████▋ | 276/317 [00:01<00:00, 145.38it/s]
Augmenting with [WaveletDecomposition (WaveletDecomposition)]:  92%|█████████▏| 291/317 [00:01<00:00, 145.76it/s]
Augmenting with [WaveletDecomposition (WaveletDecomposition)]:  97%|█████████▋| 306/317 [00:02<00:00, 146.19it/s]
Augmenting with [WaveletDecomposition (WaveletDecomposition)]: 100%|██████████| 317/317 [00:02<00:00, 147.11it/s]

Total running time of the script: (0 minutes 18.210 seconds)

Estimated memory usage: 944 MB

Gallery generated by Sphinx-Gallery