Datasets

Loader

class doc_octopy.datasets.loader.EMGDatasetLoader(data_path, seed=None, dataloader_parameters=None, shuffle_training_data=True, input_type=<class 'numpy.float32'>, ground_truth_type=<class 'numpy.float32'>, ground_truth_name='ground_truth')[source]

Dataset loader for the EMG dataset.

Parameters:
  • data_path (Path) – The path to the zarr file

  • seed (Optional[int], optional) – The seed for the random number generator, by default None

  • dataloader_parameters (Dict[str, Any], optional) – The parameters for the DataLoader, by default None

  • shuffle_training_data (bool, optional) – Whether to shuffle the training data, by default True

  • input_type (numpy.dtype, optional) – The type of the input data, by default np.float32

  • ground_truth_type (numpy.dtype, optional) – The type of the ground_truth data, by default np.float32

  • ground_truth_name (str)

Initializes the dataset.

data_path

The path to the HDF5 file

Type:

Path

seed

The seed for the random number generator, by default None

Type:

Optional[int], optional

dataloader_parameters

The parameters for the DataLoader, by default None

Type:

Dict[str, Any], optional

shuffle_training_data

Whether to shuffle the training data, by default True

Type:

bool, optional

input_type

The type of the input data, by default np.float32

Type:

np.dtype, optional

ground_truth_type

The type of the label data, by default np.float32

Type:

np.dtype, optional

ground_truth_name

The name of the ground truth data, by default “ground_truth”

Type:

bool, optional

test_dataloader()[source]

Returns the testing set as a DataLoader.

Returns:

The testing set

Return type:

DataLoader

train_dataloader()[source]

Returns the training set as a DataLoader.

Returns:

The training set

Return type:

DataLoader

val_dataloader()[source]

Returns the testing set as a DataLoader.

Returns:

The testing set

Return type:

DataLoader

Supervised Dataset

class doc_octopy.datasets.supervised.EMGDataset(emg_data_path=PosixPath('REPLACE ME'), emg_data={}, ground_truth_data_path=PosixPath('REPLACE ME'), ground_truth_data={}, ground_truth_data_type='kinematics', sampling_frequency=0.0, tasks_to_use=(), save_path=PosixPath('REPLACE ME'), emg_filter_pipeline_before_chunking=(), emg_representations_to_filter_before_chunking=(), emg_filter_pipeline_after_chunking=(), emg_representations_to_filter_after_chunking=(), ground_truth_filter_pipeline_before_chunking=(), ground_truth_representations_to_filter_before_chunking=(), ground_truth_filter_pipeline_after_chunking=(), ground_truth_representations_to_filter_after_chunking=(), chunk_size=192, chunk_shift=64, testing_split_ratio=0.2, validation_split_ratio=0.2, augmentation_pipelines=(), amount_of_chunks_to_augment_at_once=250, debug=False)[source]

Class for creating a dataset from EMG and ground truth data.

Parameters:
  • emg_data_path (pathlib.Path) – Path to the EMG data file. It should be a pickle file containing a dictionary with the keys being the task number and the values being a numpy array of shape (n_channels, n_samples).

  • ground_truth_data_path (pathlib.Path) – Path to the ground truth data file. It should be a pickle file containing a dictionary with the keys being the task number and the values being a numpy array of custom shape (…, n_samples). The custom shape can be anything, but the last dimension should be the same as the EMG data.

  • tasks_to_use (Sequence[str]) – Sequence of strings containing the task numbers to use. If empty, all tasks will be used.

  • save_path (pathlib.Path) – Path to save the dataset to. It should be a zarr file.

  • emg_filter_pipeline_before_chunking (list[list[FilterBaseClass]]) – Sequence of filters to apply to the EMG data before chunking. The filters should inherit from FilterBaseClass.

  • emg_filter_pipeline_after_chunking (list[list[FilterBaseClass]]) – Sequence of filters to apply to the EMG data after chunking. The filters should inherit from FilterBaseClass.

  • ground_truth_filter_pipeline_before_chunking (list[list[FilterBaseClass]]) – Sequence of filters to apply to the ground truth data before chunking. The filters should inherit from FilterBaseClass.

  • ground_truth_filter_pipeline_after_chunking (list[list[FilterBaseClass]]) – Sequence of filters to apply to the ground truth data after chunking. The filters should inherit from FilterBaseClass.

  • chunk_size (int) – Size of the chunks to create from the data.

  • chunk_shift (int) – Shift between the chunks.

  • testing_split_ratio (float) – Ratio of the data to use for testing. The data will be split in the middle. The first half will be used for training and the second half will be used for testing. If 0, no data will be used for testing.

  • validation_split_ratio (float) – Ratio of the data to use for validation. The data will be split in the middle. The first half will be used for training and the second half will be used for validation. If 0, no data will be used for validation.

  • augmentation_pipelines (list[list[EMGAugmentation]]) – Sequence of augmentation_pipelines to apply to the training data. The augmentation_pipelines should inherit from EMGAugmentation.

  • amount_of_chunks_to_augment_at_once (int) – Amount of chunks to augment at once. This is done to speed up the process.

  • emg_data (dict[str, ndarray])

  • ground_truth_data (dict[str, ndarray])

  • ground_truth_data_type (str)

  • sampling_frequency (float)

  • emg_representations_to_filter_before_chunking (list[str])

  • emg_representations_to_filter_after_chunking (list[str])

  • ground_truth_representations_to_filter_before_chunking (list[str])

  • ground_truth_representations_to_filter_after_chunking (list[str])

  • debug (bool)

create_dataset()[source]

Creates the dataset.

Default Supervised Datasets

class doc_octopy.datasets.defaults.CastelliniDataset(emg_data_path, ground_truth_data_path, save_path, tasks_to_use=['1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '12', '13', '14', '15', '16', '17', '18', '19'])[source]

Dataset maker made after the Castellini paper [1]. This is not the official dataset maker used but our own version made after the paper.

Parameters:
  • emg_data_path (Path) – The path to the pickle file containing the EMG data. This should be a dictionary with the keys as the tasks in tasks_to_use and the values as the EMG data. The EMG data should be of shape (320, samples).

  • ground_truth_data_path (Path) – The path to the pickle file containing the ground truth data. This should be a dictionary with the keys as the tasks in tasks_to_use and the values as the ground truth data. The ground truth data should be of shape (21, 3, samples).

  • save_path (Path) – The path to save the dataset to. This should be a zarr file.

  • tasks_to_use (Sequence[str])

create_dataset()[source]

Creates the dataset.

References

[1] Nowak, M., Vujaklija, I., Sturma, A., Castellini, C., Farina, D., 2023. Simultaneous and Proportional Real-Time Myocontrol of Up to Three Degrees of Freedom of the Wrist and Hand. IEEE Transactions on Biomedical Engineering 70, 459–469. https://doi.org/10/grc7qf

class doc_octopy.datasets.defaults.EMBCDataset(emg_data_path, ground_truth_data_path, save_path, tasks_to_use=['1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '12', '13', '14', '15', '16', '17', '18', '19'], debug=False)[source]

Official dataset maker for the EMBC paper [1].

Parameters:
  • emg_data_path (Path) – The path to the pickle file containing the EMG data. This should be a dictionary with the keys as the tasks in tasks_to_use and the values as the EMG data. The EMG data should be of shape (320, samples).

  • ground_truth_data_path (Path) – The path to the pickle file containing the ground truth data. This should be a dictionary with the keys as the tasks in tasks_to_use and the values as the ground truth data. The ground truth data should be of shape (21, 3, samples).

  • save_path (Path) – The path to save the dataset to. This should be a zarr file.

  • tasks_to_use (Sequence[str], optional) – The tasks to use. The default is EXPERIMENTS_TO_USE.

  • debug (bool)

create_dataset()[source]

Creates the dataset.

References

[1] Sîmpetru, R.C., Osswald, M., Braun, D.I., Souza de Oliveira, D., Cakici, A.L., Del Vecchio, A., 2022. Accurate Continuous Prediction of 14 Degrees of Freedom of the Hand from Myoelectrical Signals through Convolutive Deep Learning, in: Proceedings of the 2022 44th Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC) pp. 702–706. https://doi.org/10/gq2f47