MinimumNaNSplit¶
- class MinimumNaNSplit(n_splits: int, n_repeats: int = 10, random_state: int = None, min_non_nan: int = 2, which: str = 'train')[source][source]¶
A Repeated Stratified KFold iterator that splits the data into sections
This class splits the data into sections, checking that the training set never has fewer than the specified number of non-NaN values. :param n_splits: The number of splits. :type n_splits:
int:param n_repeats: The number of times to repeat the splits, by default 10. :type n_repeats:int, optional :param random_state: The random state to use, by default None. :type random_state:int, optionalExamples
>>> import numpy as np >>> np.random.seed(0) >>> X = np.vstack((np.arange(1, 9).reshape(4, 2), np.full((4, 2), np.nan))) >>> y = np.array([0, 0, 1, 1, 0, 0, 1, 1]) >>> msn = MinimumNaNSplit(2, 3) >>> for train, test in msn.split(X, y): ... print("train:", train, "test:", test) train: [2 3 4 5] test: [0 1 6 7] train: [0 1 6 7] test: [2 3 4 5] train: [2 3 4 5] test: [0 1 6 7] train: [0 1 6 7] test: [2 3 4 5] train: [2 3 4 5] test: [0 1 6 7] train: [0 1 6 7] test: [2 3 4 5] >>> msn = MinimumNaNSplit(2, 3, which='test', min_non_nan=1) >>> for train, test in msn.split(X, y): ... print("train:", train, "test:", test) train: [1 3 4 7] test: [0 2 5 6] train: [0 2 5 6] test: [1 3 4 7] train: [0 3 5 7] test: [1 2 4 6] train: [1 2 4 6] test: [0 3 5 7] train: [1 2 5 6] test: [0 3 4 7] train: [0 3 4 7] test: [1 2 5 6]
- static oversample(arr: ~numpy.ndarray, func: callable = <function mixup>, axis: int = 1, copy: bool = True, seed=None) ndarray[source][source]¶
Oversample nan rows using func
- Parameters:
- Return type:
Examples
>>> np.random.seed(0) >>> arr = np.array([[1, 2], [4, 5], [7, 8], ... [float("nan"), float("nan")]]) >>> MinimumNaNSplit.oversample(arr, norm, 0) array([[1. , 2. ], [4. , 5. ], [7. , 8. ], [8.32102813, 5.98018098]]) >>> MinimumNaNSplit.oversample(arr, mixup, 0, seed=42) array([[1. , 2. ], [4. , 5. ], [7. , 8. ], [5.24946679, 6.24946679]])
- shuffle_labels(arr: ndarray, labels: ndarray, trials_ax: int = 0, min_trials: int = 1)[source][source]¶
Shuffle the labels while making sure that the minimum non nan trials are kept
- Parameters:
Examples
>>> np.random.seed(0) >>> arr = np.array([[[1, 2], [4, 5], [7, 8], ... [float("nan"), float("nan")]]]) >>> labels = np.array([0, 0, 1, 1]) >>> MinimumNaNSplit(1).shuffle_labels(arr, labels, 1, 1) >>> labels array([1, 1, 0, 0])
- split(X, y=None, groups=None)[source][source]¶
Generate indices to split data into training and test set.
- Parameters:
X (
array-likeofshape (n_samples,n_features)) –Training data, where
n_samplesis the number of samples andn_featuresis the number of features.Note that providing
yis sufficient to generate the splits and hencenp.zeros(n_samples)may be used as a placeholder forXinstead of actual training data.y (
array-likeofshape (n_samples,)) – The target variable for supervised learning problems. Stratification is done based on the y labels.groups (
object) – Always ignored, exists for compatibility.
- Yields:
train (
ndarray) – The training set indices for that split.test (
ndarray) – The testing set indices for that split.
Notes
Randomized CV splitters may return different results for each call of split. You can make the results identical by setting
random_stateto an integer.