Sort dataframe by multiple columns while ignoring case

Tags: , , , ,



I want to sort a dataframe by multiple columns like this:

df.sort_values( by=[ 'A', 'B', 'C', 'D', 'E' ], inplace=True )

However i found out that python first sorts the uppercase values and then the lowercase.

I tried this:

df.sort_values( by=[ 'A', 'B', 'C', 'D', 'E' ], inplace=True, key=lambda x: x.str.lower() )

but i get this error:

TypeError: sort_values() got an unexpected keyword argument 'key'

If i could, i would turn all columns to lowercase but i want them as they are.

Any hints?

Answer

If check docs – DataFrame.sort_values for correct working need upgrade pandas higher like pandas 1.1.0:

key – callable, optional

Apply the key function to the values before sorting. This is similar to the key argument in the builtin sorted() function, with the notable difference that this key function should be vectorized. It should expect a Series and return a Series with the same shape as the input. It will be applied to each column in by independently.

New in version 1.1.0.

Sample:

df = pd.DataFrame({
        'A':list('MmMJJj'),
        'B':list('aYAbCc')
})
df.sort_values(by=[ 'A', 'B'], inplace=True, key=lambda x: x.str.lower())
print (df)
   A  B
3  J  b
4  J  C
5  j  c
0  M  a
2  M  A
1  m  Y


Source: stackoverflow