What I would like to achieve
I have a DataFrame whose indices are “ID (int) + underscore (_) + name (str)”. I would like to sort the data based on the ID.
import pandas as pd data = [[1, 2], [3, 4], [5, 6], [7, 8], [9, 10], [11, 12], [13, 14]] index = ['11_ItemName', '0_ItemName', '1_ItemName', '2_ItemName', '10_ItemName', '20_ItemName', '101_ItemName'] columns = ['column1', 'column2'] df = pd.DataFrame(data, index=index, columns=columns) print(df) ## Output # column1 column2 # 11_ItemName 1 2 # 0_ItemName 3 4 # 1_ItemName 5 6 # 2_ItemName 7 8 # 10_ItemName 9 10 # 20_ItemName 11 12 # 101_ItemName 13 14 # print(DO SOMETHING!) ## Expected output # column1 column2 # 0_ItemName 3 4 # 1_ItemName 5 6 # 2_ItemName 7 8 # 10_ItemName 9 10 # 11_ItemName 1 2 # 20_ItemName 11 12 # 101_ItemName 13 14
What I tested
I tried to use sort_index
and failed.
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.sort_index.html
To be honest, I’m stuck at the debug of lambda
.
df(sort_index()) ## Output # column1 column2 # 0_ItemName 3 4 # 101_ItemName 13 14 # 10_ItemName 9 10 # 11_ItemName 1 2 # 1_ItemName 5 6 # 20_ItemName 11 12 # 2_ItemName 7 8 print(df.sort_index(key=(lambda x: int(x.str.split('_')[0])))) # TypeError: int() argument must be a string, a bytes-like object or a real number, not 'list'
Environment
Python 3.10.5
Pandas 1.4.3
Advertisement
Answer
Try df.sort_index
with custom key=
:
df = df.sort_index( key=lambda idx: idx.str.split("_") .str[0] .astype(int) ) print(df)
Prints:
column1 column2 0_ItemName 3 4 1_ItemName 5 6 2_ItemName 7 8 10_ItemName 9 10 11_ItemName 1 2 20_ItemName 11 12 101_ItemName 13 14