Skip to content
Advertisement

Reorder subset of columns in pandas dataframe with natural sorting

I have the following dataframe:

df = pd.DataFrame({
   'tmp': ['A', 'A', 'B', 'Z', 'D', 'C'],
   'F1': [2, 1, 9, 8, 7, 4],
   'F20': [0, 1, 9, 4, 2, 3],
   'F3': ['a', 'B', 'c', 'D', 'e', 'F'],
   'aabb': ['a', 'B', 'c', 'D', 'e', 'F']
})
---
  tmp  F1  F20 F3 aabb
0   A   2    0  a    a
1   A   1    1  B    B
2   B   9    9  c    c
3   Z   8    4  D    D
4   D   7    2  e    e
5   C   4    3  F    F

and I would like to sort only the columns with the F in this way:

   tmp  F1  F3  F20 aabb
0   A   2    a   0    a
1   A   1    B   1    B
2   B   9    c   9    c
3   Z   8    D   4    D
4   D   7    e   2    e
5   C   4    F   3    F

How could I do?

(edit) The columns with the “F” can vary both in quantity and in the values that follow the F (in my case I have about 100 columns like those) The columns with F are always grouped but the number before and after is variable

Advertisement

Answer

You can use natsort for natural sorting and a mask to handle only the F columns:

# pip install natsort
from natsort import natsorted

cols = df.columns.to_numpy(copy=True)
m = df.columns.str.fullmatch('Fd+')
cols[m] = natsorted(cols[m])

df_sorted = df[cols]

Alternative without natsort:

num = df.columns.str.extract('F(d+)', expand=False).astype(float)
cols = df.columns.to_numpy(copy=True)
m = num.notna()
order = np.argsort(num[m])
cols[m] = cols[m][order]

df_sorted = df[cols]

output:

  tmp  F1 F3  F20 aabb
0   A   2  a    0    a
1   A   1  B    1    B
2   B   9  c    9    c
3   Z   8  D    4    D
4   D   7  e    2    e
5   C   4  F    3    F
User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement