I have the following dataframe:
df = pd.DataFrame({ 'tmp': ['A', 'A', 'B', 'Z', 'D', 'C'], 'F1': [2, 1, 9, 8, 7, 4], 'F20': [0, 1, 9, 4, 2, 3], 'F3': ['a', 'B', 'c', 'D', 'e', 'F'], 'aabb': ['a', 'B', 'c', 'D', 'e', 'F'] }) --- tmp F1 F20 F3 aabb 0 A 2 0 a a 1 A 1 1 B B 2 B 9 9 c c 3 Z 8 4 D D 4 D 7 2 e e 5 C 4 3 F F
and I would like to sort only the columns with the F in this way:
tmp F1 F3 F20 aabb 0 A 2 a 0 a 1 A 1 B 1 B 2 B 9 c 9 c 3 Z 8 D 4 D 4 D 7 e 2 e 5 C 4 F 3 F
How could I do?
(edit) The columns with the “F” can vary both in quantity and in the values that follow the F (in my case I have about 100 columns like those) The columns with F are always grouped but the number before and after is variable
Advertisement
Answer
You can use natsort
for natural sorting and a mask to handle only the F columns:
# pip install natsort from natsort import natsorted cols = df.columns.to_numpy(copy=True) m = df.columns.str.fullmatch('Fd+') cols[m] = natsorted(cols[m]) df_sorted = df[cols]
Alternative without natsort
:
num = df.columns.str.extract('F(d+)', expand=False).astype(float) cols = df.columns.to_numpy(copy=True) m = num.notna() order = np.argsort(num[m]) cols[m] = cols[m][order] df_sorted = df[cols]
output:
tmp F1 F3 F20 aabb 0 A 2 a 0 a 1 A 1 B 1 B 2 B 9 c 9 c 3 Z 8 D 4 D 4 D 7 e 2 e 5 C 4 F 3 F