I have a pandas DataFrame with indices I want to sort naturally. Natsort doesn’t seem to work. Sorting the indices prior to building the DataFrame doesn’t seem to help because the manipulations I do to the DataFrame seem to mess up the sorting in the process. Any thoughts on how I can resort the indices naturally?
from natsort import natsorted import pandas as pd # An unsorted list of strings a = ['0hr', '128hr', '72hr', '48hr', '96hr'] # Sorted incorrectly b = sorted(a) # Naturally Sorted c = natsorted(a) # Use a as the index for a DataFrame df = pd.DataFrame(index=a) # Sorted Incorrectly df2 = df.sort() # Natsort doesn't seem to work df3 = natsorted(df) print(a) print(b) print(c) print(df.index) print(df2.index) print(df3.index)
Advertisement
Answer
If you want to sort the df, just sort the index or the data and assign directly to the index of the df rather than trying to pass the df as an arg as that yields an empty list:
In [7]: df.index = natsorted(a) df.index Out[7]: Index(['0hr', '48hr', '72hr', '96hr', '128hr'], dtype='object')
Note that df.index = natsorted(df.index)
also works
if you pass the df as an arg it yields an empty list, in this case because the df is empty (has no columns), otherwise it will return the columns sorted which is not what you want:
In [10]: natsorted(df) Out[10]: []
EDIT
If you want to sort the index so that the data is reordered along with the index then use reindex
:
In [13]: df=pd.DataFrame(index=a, data=np.arange(5)) df Out[13]: 0 0hr 0 128hr 1 72hr 2 48hr 3 96hr 4 In [14]: df = df*2 df Out[14]: 0 0hr 0 128hr 2 72hr 4 48hr 6 96hr 8 In [15]: df.reindex(index=natsorted(df.index)) Out[15]: 0 0hr 0 48hr 6 72hr 4 96hr 8 128hr 2
Note that you have to assign the result of reindex
to either a new df or to itself, it does not accept the inplace
param.