Package basics

This example is a brief introduction to the basic functionalities of the package.

Loading data

First we load the EMG example data and convert it to a DocOctoPy Data object. The Data object is the main object in the package and is used to store the data and apply filters to it.

import pickle as pkl
from doc_octopy.datatypes import EMGData

emg_data = {}
with open("data/emg.pkl", "rb") as f:
    for k, v in pkl.load(f).items():
        emg_data[k] = EMGData(v, sampling_frequency=2044)

print(emg_data)
{'1': EMGData; Sampling frequency: 2044 Hz; (0) Input (320, 20440), '2': EMGData; Sampling frequency: 2044 Hz; (0) Input (320, 20440)}

Looking at one specific task for simplicity

The example data contains EMG from two different tasks labeled as “1” and “2”. In the following we will only look at task one to explain the filtering functionalities.

task_one_data = emg_data["1"]

print(task_one_data)
--
EMGData
Sampling frequency: 2044 Hz
(0) Input (320, 20440)
--

Understanding the saving format

The EMGData object has a input_data attribute that stores the raw data.

Note

The raw data is stored as a dictionary where the keys are “data” and “filter_sequence”. The “data” key stores the raw data and the “filter_sequence” key stores the filter sequence applied to the data. For the raw data the filter sequence is always “Raw”. However, when filters are applied the filter sequence is updated.

print(task_one_data.input_data)
[[  53   63   74 ... -160 -173 -116]
 [ -20   -8   12 ... -194 -198 -128]
 [ -17   -4   14 ... -180 -187 -116]
 ...
 [ -89 -100 -133 ... -200 -234 -212]
 [ -27  -25  -53 ... -156 -203 -210]
 [ -11   -7  -27 ... -143 -200 -216]]

Plotting the raw data

We can plot the raw data using matplotlib.

import matplotlib.pyplot as plt
import numpy as np

raw_emg = task_one_data.input_data

# set plt font size
plt.rcParams.update({"font.size": 14})

for channel in range(raw_emg.shape[0]):
    plt.plot(raw_emg[channel], color="black", alpha=0.1)

plt.title("Raw EMG data")
plt.ylabel("Amplitude (a. u.)")
plt.xticks(
    np.arange(0, raw_emg.shape[-1] + 1, 2044).astype(int),
    np.arange(0, raw_emg.shape[-1] / 2044 + 1, 1).astype(int),
)
plt.xlabel("Time (s)")

plt.tight_layout()
plt.show()
Raw EMG data

Attributes of the EMGData object

Any Data object, of which EMGData is inheriting from, posses a processed_representations attribute where filtered data will be stored.

Note

We refer to a filtered data as a representation.

At the beginning this attribute is empty.

print(task_one_data.processed_representations)
{'Input': array([[  53,   63,   74, ..., -160, -173, -116],
       [ -20,   -8,   12, ..., -194, -198, -128],
       [ -17,   -4,   14, ..., -180, -187, -116],
       ...,
       [ -89, -100, -133, ..., -200, -234, -212],
       [ -27,  -25,  -53, ..., -156, -203, -210],
       [ -11,   -7,  -27, ..., -143, -200, -216]], dtype=int16)}

Applying a filter

The EMGData object has a method called apply_filter that applies a filter to the data. For example, we can apply a 4th order 20 HZ lowpass filter to the data.

from scipy.signal import butter
from doc_octopy.datasets.filters.temporal import SOSFrequencyFilter

sos_filter_coefficients = butter(4, 20, "lowpass", output="sos", fs=2044)

Creating the filter

Each filter has a parameter input_is_chunked that specifies if the input data is chunked or not. This must be set explicitly as some filters can only be used on either chunked or non-chunked data. Further having the user specify this parameter forces them to think about the data they are working with.

lowpass_filter = SOSFrequencyFilter(
    sos_filter_coefficients, is_output=True, name="Lowpass"
)
print(lowpass_filter)
Lowpass (SOSFrequencyFilter)

Applying the filter

To apply the filter we call the apply_filter method on the EMGData object.

task_one_data.apply_filter(
    lowpass_filter, representation_to_filter="Last"
)

print(task_one_data.processed_representations)
{'Input': array([[  53,   63,   74, ..., -160, -173, -116],
       [ -20,   -8,   12, ..., -194, -198, -128],
       [ -17,   -4,   14, ..., -180, -187, -116],
       ...,
       [ -89, -100, -133, ..., -200, -234, -212],
       [ -27,  -25,  -53, ..., -156, -203, -210],
       [ -11,   -7,  -27, ..., -143, -200, -216]], dtype=int16), 'Lowpass': array([[  30.41515398,   28.55359452,   26.67929727, ...,  -21.69369799,
         -21.72851447,  -21.75680579],
       [ -45.92706148,  -45.11724327,  -44.3360931 , ...,  -31.30362556,
         -31.32936372,  -31.35041695],
       [ -43.78353654,  -42.16849119,  -40.58739862, ...,   -5.6463695 ,
          -5.66186106,   -5.67475859],
       ...,
       [-110.89048115, -104.30581101,  -97.80368913, ...,  -15.93949333,
         -15.93448988,  -15.93147574],
       [ -56.62476067,  -49.13379821,  -41.74402497, ...,   42.30042609,
          42.27745456,   42.25766496],
       [ -39.179818  ,  -32.0000522 ,  -24.91885117, ...,   51.63322883,
          51.606299  ,   51.58321195]])}

Accessing the filtered data

The filtered data is saved in the processed_data attribute of the EMGData object. Processed_data is a dictionary where the keys are the names of the filters sequence applied to the data.

The key of the last filter sequence always is marked with “(Output)”. This is the data that will be outputted by the dataset pipeline.

In our example the key is (Output) Raw->SOSFrequencyFilter.

The data can be accessed by indexing the processed_data attribute or by indexing the EMGData directly. In case you do not want to index using the filter sequence name, you can retrieve the last processed data by indexing with “Last”.

print(task_one_data)

print(
    np.allclose(
        task_one_data.processed_representations["Lowpass"],
        task_one_data["Last"],
    )
)
--
EMGData
Sampling frequency: 2044 Hz
(0) Input (320, 20440)

Filter(s):
(1 | 1) (Output) Lowpass (320, 20440)
--
True

Total running time of the script: (0 minutes 5.183 seconds)

Estimated memory usage: 553 MB

Gallery generated by Sphinx-Gallery