Let say I have the following pandas df
import pandas as pd d = [0.0, 1.0, 2.0] e = pd.Series(d, index = ['a', 'b', 'c']) df = pd.DataFrame({'A': 1., 'B': e, 'C': pd.Timestamp('20130102')})
Now I have another array
select = ['c', 'a', 'x']
Clearly, the element 'x'
is not available in my original df
. How can I select rows of df
based on select
but choose only available rows without any error? i.e. in this case, I want to select only rows corresponding to 'c'
and 'a'
maintaining this order.
Any pointer will be very helpful.
Advertisement
Answer
You could use reindex
+ dropna
:
out = df.reindex(select).dropna()
you could also filter select before reindex
:
out = df.reindex([i for i in select if i in df.index])
Output:
A B C c 1.0 2.0 2013-01-02 a 1.0 0.0 2013-01-02