I have a big numpy array and want to split it. I have read this solution but it could not help me. The target column can have several values but I know based on which one I want to split it. In my simplified example the target column is the third one and I want to split it based on the value 2.
. This is my array.
import numpy as np big_array = np.array([[0., 10., 2.], [2., 6., 2.], [3., 1., 7.1], [3.3, 6., 7.8], [4., 5., 2.], [6., 6., 2.], [7., 1., 2.], [8., 5., 2.1]])
Rows that have this value (2.
) make one split. Then, the next rows (number three and four) which are not 2.
, make another one. Again in my data set I see this value (2.
) and make a split out of it and again I keep non 2.
values (last row) as another split. The final result should look like this:
spl_array = [np.array([[0., 10., 2.], [2., 6., 2.]]), np.array([[3., 1., 7.1], [3.3, 6., 7.8]]), np.array([[4., 5., 2.], [6., 6., 2.], [7., 1., 2.]]), np.array([[8., 5., 2.1]])]
In advance I do appreciate any help.
Advertisement
Answer
First you find all arrays which contains 2 or which do not contains 2. This array will be full with True and False values. Transform this array to an array with zeros and ones. Check where there are differences (like [0, 0, 1, 1, 0]
will be: 0, 1, 0, -1
.
Based on the change one can use numpy where to find the indices of those values.
Insert the index 0 and the last index for the big array, so you are able to zip them in a left and right slice.
import numpy as np big_array = np.array([[0., 10., 2.], [2., 6., 2.], [3., 1., 7.1], [3.3, 6., 7.8], [4., 5., 2.], [6., 6., 2.], [7., 1., 2.], [8., 5., 2.1]]) idx = [2 in array for array in big_array] idx *= np.ones(len(idx)) slices = list(np.where(np.diff(idx) != 0)[0] + 1) slices.insert(0,0) slices.append(len(big_array)) result = list() for left, right in zip(slices[:-1], slices[1:]): result.append(big_array[left:right]) ''' [array([[ 0., 10., 2.], [ 2., 6., 2.]]), array([[3. , 1. , 7.1], [3.3, 6. , 7.8]]), array([[4., 5., 2.], [6., 6., 2.], [7., 1., 2.]]), array([[8. , 5. , 2.1]])] '''