DataSplitter#
- class myoverse.datasets.utils.DataSplitter(test_ratio=0.2, val_ratio=0.2)[source]#
Handles splitting data into training, testing, and validation sets.
The splitting strategy extracts data from the middle of the array, which is useful for time-series data where you want to avoid temporal leakage at the boundaries.
- Parameters:
Examples
>>> splitter = DataSplitter(test_ratio=0.2, val_ratio=0.2) >>> result = splitter.split(data) >>> print(result.sizes) (800, 100, 100)
Methods
__init__([test_ratio, val_ratio])_split_middle(data, ratio)Split data by extracting a portion from the middle.
Validate split ratios.
split(data)Split data into training, testing, and validation sets.
split_dict(data_dict)Split multiple arrays with the same split indices.
- split(data)[source]#
Split data into training, testing, and validation sets.
The split is performed by extracting from the middle of the data: 1. First, test data is extracted from the center 2. Then, validation data is extracted from the center of test data
- Parameters:
data (np.ndarray) – Data to split. First dimension is assumed to be samples.
- Returns:
Named tuple with training, testing, and validation arrays.
- Return type:
SplitResult