Skip to content
Advertisement

seperate array from large array in numpy by column condition

check if values of a,b are 1 ,2 and c,d are 3,4 then print it

   a       b   c      d         e        f
[[  1.    2   3.     4         1.      9.935]
 [  1.    2   3.     4         0.9     9.403]
 [  1.    2   3.     4         0.8     8.785] 
 [  1.    2   10.    15        0.8   192.523]
 [  1.    2   10.    15        0.7   176.913]
 [  1.    2   10.    15        0.6   158.936]]

what i am currently doing is

xx2 = a[np.where(a[:,0] == 1)]
print(xx2)

but it prints all the rows where the 1st column is 1

Advertisement

Answer

You can slice your array and then use row equality checks:

mask = (a[:, 0:4] == [1,2,3,4]).all(1)
>>> a[mask]
array([[1.   , 2.   , 3.   , 4.   , 1.   , 9.935],
       [1.   , 2.   , 3.   , 4.   , 0.9  , 9.403],
       [1.   , 2.   , 3.   , 4.   , 0.8  , 8.785]])

BTW, it is always a good idea to make an example that can be reproduced by simple copy/paste. It took me more time to adapt your example than to figure out the answer (each sub-minute, so we are good).

Reproducible setup

a = np.array([
    [1, 2, 3, 4, 1, 9.935],
    [1, 2, 3, 4, 0.9, 9.403],
    [1, 2, 3, 4, 0.8, 8.785], 
    [1, 2, 10, 15, 0.8, 192.523],
    [1, 2, 10, 15, 0.7, 176.913],
    [1, 2, 10, 15, 0.6, 158.936]])

Explanation

Slice/index the array to retain just the columns you want to check against:

>>> a[:, :4]
array([[ 1.,  2.,  3.,  4.],
       [ 1.,  2.,  3.,  4.],
       [ 1.,  2.,  3.,  4.],
       [ 1.,  2., 10., 15.],
       [ 1.,  2., 10., 15.],
       [ 1.,  2., 10., 15.]])

Note that, in your case, the four columns are consecutive. What if they weren’t? Say we want to check that (d,a,c,b) == (4,1,3,2)? In that case, specify the selection as a tuple on the second dimension:

>>> a[:, (3,0,2,1)]
array([[ 4.,  1.,  3.,  2.],
       [ 4.,  1.,  3.,  2.],
       [ 4.,  1.,  3.,  2.],
       [15.,  1., 10.,  2.],
       [15.,  1., 10.,  2.],
       [15.,  1., 10.,  2.]])

Comparison of the rows of the selected columns to your desired target, by using broadcasting of the == operator:

>>> a[:, (3,0,2,1)] == [4,1,3,2]
array([[ True,  True,  True,  True],
       [ True,  True,  True,  True],
       [ True,  True,  True,  True],
       [False,  True, False,  True],
       [False,  True, False,  True],
       [False,  True, False,  True]])

But we want all values (on each row) to match, so:

mask = (a[:, (3,0,2,1)] == [4,1,3,2]).all(1)
>>> mask
array([ True,  True,  True, False, False, False])

From that point, you can just select a[mask] and get your subset array where all the selected columns match your desired target.

User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement