Skip to content
Advertisement

Select only available rows of a pandas dataframe

Let say I have the following pandas df

import pandas as pd
d = [0.0, 1.0, 2.0]
e = pd.Series(d, index = ['a', 'b', 'c'])
df = pd.DataFrame({'A': 1., 'B': e, 'C': pd.Timestamp('20130102')})

Now I have another array

select = ['c', 'a', 'x']

Clearly, the element 'x' is not available in my original df. How can I select rows of df based on select but choose only available rows without any error? i.e. in this case, I want to select only rows corresponding to 'c' and 'a' maintaining this order.

Any pointer will be very helpful.

Advertisement

Answer

You could use reindex + dropna:

out = df.reindex(select).dropna()

you could also filter select before reindex:

out = df.reindex([i for i in select if i in df.index])

Output:

     A    B          C
c  1.0  2.0 2013-01-02
a  1.0  0.0 2013-01-02
User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement