Fastest way to get all first-matched rows given a sequence of column values in Pandas

Question

Say I have a Pandas dataframe with 10 rows and 2 columns. Now that I am given a sequence of &#8216;col1&#8217; values in a numpy array： I want to find the rows that have the first occurence of 3, 1 and 2 in &#8216;col1&#8217;, and then get the corresponding &#8216;col2&#8217; values in order. Right now I am u…

Accepted Answer

Option 1Perhaps faster than what I suggested earlier (below: option 2):df.groupby('col1').first().reindex(nums)      col2col1      3      0.31      0.92      0.7Option 2First get the matches for col1 by using Series.isin and select from the df based on the mask.Now, apply df.groupby and get the first non-null entry for each group.Finally, apply df.reindex to sort the values.df[df['col1'].isin(nums)].groupby('col1').first().reindex(nums)      col2col1      3      0.31      0.92      0.7If a value cannot be found, you&#8217;ll end up with a NaN. E.g.df.iloc[1,0] = 6 # there's no '2' in `col1` anymoredf[df['col1'].isin(nums)].groupby('col1').first().reindex(nums)      col2col1      3      0.31      0.92      NaN

Advertisement

Answer