The objective is to slice multiple pandas row of a specific column using a Numpy
boolean.
The following code should do the task
JavaScript
x
18
18
1
import numpy as np
2
import numpy.random
3
import pandas as pd
4
numpy.random.seed(0)
5
6
h=((), ('x'), (), ('y'), (), (), ())
7
8
drop_idx=[n for n, dl in enumerate(h) if len(dl)]
9
10
df = pd.DataFrame(np.arange(7),columns=['class'])
11
12
df.reset_index(inplace=True)
13
df2=pd.DataFrame(np.arange(5),columns=[('feature','ch1')])
14
15
idx_true=np.invert(np.array(h).astype(bool))
16
g=df[idx_true.tolist()].reset_index(drop=True)
17
df2['dlabel']=g['class']
18
However, I wonder whether the above code can be shortened further, especially these lines
JavaScript
1
4
1
idx_true=np.invert(np.array(h).astype(bool))
2
g=df[idx_true.tolist()].reset_index(drop=True)
3
df2['dlabel']=g['class']
4
Currently, Pandas
throw an error if I am to directly using Numpy
boolean without converting to list
JavaScript
1
2
1
df[idx_true.tolist()]
2
Is there something I miss, or this is the only way to achieve the intended objective?
Advertisement
Answer
You can simply use:
JavaScript
1
2
1
df2['dlabel'] = df.loc[idx_true, 'class'].values
2
You actually don’t even need to convert h
to a numpy array:
JavaScript
1
2
1
df2['dlabel'] = df.loc[[not bool(x) for x in h], 'class'].values
2
output:
JavaScript
1
7
1
(feature, ch1) dlabel
2
0 0 0
3
1 1 2
4
2 2 4
5
3 3 5
6
4 4 6
7