I have this dataframe (all strings):
to_sort data 0 Belgien a2 1 Zürich b2 2 dänemark c2 3 20 d2 4 100 e2 5 Österreich f2
I want to sort it so that German umlauts are correct, also lowercase and numbers are correct:
to_sort data 3 20 d2 4 100 e2 0 Belgien a2 2 dänemark c2 5 Österreich f2 1 Zürich b2
Here is my code to generate the dataframe and result:
import io, pandas as pd t = io.StringIO(""" to_sort|data Belgien|a2 Zürich|b2 dänemark|c2 20|d2 100|e2 Österreich|f2""") df = pd.read_csv(t, sep='|') df = df.sort_values(by='to_sort', key=lambda col: col.str.lower().str.normalize('NFD'))
The result is almost correct, but the numbers are sorted in the wrong order, 20 should be before 200:
to_sort data 4 100 e2 3 20 d2 0 Belgien a2 2 dänemark c2 5 Österreich f2 1 Zürich b2
How can I fix the number sorting, while maintaining all the other characteristics?
Advertisement
Answer
Use solution from last sample data in DataFrame.sort_values
:
from natsort import index_natsorted f = lambda col: np.argsort(index_natsorted(col.str.lower().str.normalize('NFD'))) df = df.sort_values(by='to_sort', key=f ) print (df) to_sort data 3 20 d2 4 100 e2 0 Belgien a2 2 dänemark c2 5 Österreich f2 1 Zürich b2