Skip to content
Advertisement

How to sort a dataframe with the first occurences of each unique element in a column?

Dataframe is

df = pd.DataFrame({"necmi": [0, 3, 14, 15, 2, 71, 8, 2, -1],
                   "fehmi": ["trial", "error", "manifest", "trial", "no", "only", "error", "no", "no"]})

it is

   necmi     fehmi
0      0     trial
1      3     error
2     14  manifest
3     15     trial
4      2        no
5     71      only
6      8     error
7      2        no
8     -1        no

So i’d like to sort this df over the fehmi over the first occurences of the entries and they are grouped together then. The desired is

   necmi     fehmi
0      0     trial
1     15     trial
2      3     error
3      8     error
4     14  manifest
5      2        no
6      2        no
7     -1        no
8     71      only

because we saw trial first in df so we gather its entries together. Then we saw error so they are together and so on.

I attempted with a groupby with its sort is False as it seemed natural but..

df.groupby("fehmi", sort=False)

I imagine they are almost in the form I need but it is a “groupby object” and cannot get a form I need, but i tried this to get the groups as is

df.groupby("fehmi", sort=False).apply(lambda s: s)

but it gives the original df back!

Advertisement

Answer

factorize + argsort

df.iloc[np.argsort(df['fehmi'].factorize()[0])]

   necmi     fehmi
0      0     trial
3     15     trial
1      3     error
6      8     error
2     14  manifest
4      2        no
7      2        no
8     -1        no
5     71      only
User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement