I have this dataframe (all strings):
JavaScript
x
8
1
to_sort data
2
0 Belgien a2
3
1 Zürich b2
4
2 dänemark c2
5
3 20 d2
6
4 100 e2
7
5 Österreich f2
8
I want to sort it so that German umlauts are correct, also lowercase and numbers are correct:
JavaScript
1
8
1
to_sort data
2
3 20 d2
3
4 100 e2
4
0 Belgien a2
5
2 dänemark c2
6
5 Österreich f2
7
1 Zürich b2
8
Here is my code to generate the dataframe and result:
JavaScript
1
14
14
1
import io, pandas as pd
2
3
t = io.StringIO("""
4
to_sort|data
5
Belgien|a2
6
Zürich|b2
7
dänemark|c2
8
20|d2
9
100|e2
10
Österreich|f2""")
11
df = pd.read_csv(t, sep='|')
12
13
df = df.sort_values(by='to_sort', key=lambda col: col.str.lower().str.normalize('NFD'))
14
The result is almost correct, but the numbers are sorted in the wrong order, 20 should be before 200:
JavaScript
1
8
1
to_sort data
2
4 100 e2
3
3 20 d2
4
0 Belgien a2
5
2 dänemark c2
6
5 Österreich f2
7
1 Zürich b2
8
How can I fix the number sorting, while maintaining all the other characteristics?
Advertisement
Answer
Use solution from last sample data in DataFrame.sort_values
:
JavaScript
1
13
13
1
from natsort import index_natsorted
2
3
f = lambda col: np.argsort(index_natsorted(col.str.lower().str.normalize('NFD')))
4
df = df.sort_values(by='to_sort', key=f )
5
print (df)
6
to_sort data
7
3 20 d2
8
4 100 e2
9
0 Belgien a2
10
2 dänemark c2
11
5 Österreich f2
12
1 Zürich b2
13