Dataframe is
df = pd.DataFrame({"necmi": [0, 3, 14, 15, 2, 71, 8, 2, -1], "fehmi": ["trial", "error", "manifest", "trial", "no", "only", "error", "no", "no"]})
it is
necmi fehmi 0 0 trial 1 3 error 2 14 manifest 3 15 trial 4 2 no 5 71 only 6 8 error 7 2 no 8 -1 no
So i’d like to sort this df over the fehmi
over the first occurences of the entries and they are grouped together then. The desired is
necmi fehmi 0 0 trial 1 15 trial 2 3 error 3 8 error 4 14 manifest 5 2 no 6 2 no 7 -1 no 8 71 only
because we saw trial
first in df so we gather its entries together. Then we saw error
so they are together and so on.
I attempted with a groupby
with its sort
is False as it seemed natural but..
df.groupby("fehmi", sort=False)
I imagine they are almost in the form I need but it is a “groupby object” and cannot get a form I need, but i tried this to get the groups as is
df.groupby("fehmi", sort=False).apply(lambda s: s)
but it gives the original df back!
Advertisement
Answer
factorize
+ argsort
df.iloc[np.argsort(df['fehmi'].factorize()[0])]
necmi fehmi 0 0 trial 3 15 trial 1 3 error 6 8 error 2 14 manifest 4 2 no 7 2 no 8 -1 no 5 71 only