What I would like to achieve
I have a DataFrame whose indices are “ID (int) + underscore (_) + name (str)”. I would like to sort the data based on the ID.
JavaScript
x
29
29
1
import pandas as pd
2
3
data = [[1, 2], [3, 4], [5, 6], [7, 8], [9, 10], [11, 12], [13, 14]]
4
index = ['11_ItemName', '0_ItemName', '1_ItemName', '2_ItemName', '10_ItemName', '20_ItemName', '101_ItemName']
5
columns = ['column1', 'column2']
6
7
df = pd.DataFrame(data, index=index, columns=columns)
8
print(df)
9
## Output
10
# column1 column2
11
# 11_ItemName 1 2
12
# 0_ItemName 3 4
13
# 1_ItemName 5 6
14
# 2_ItemName 7 8
15
# 10_ItemName 9 10
16
# 20_ItemName 11 12
17
# 101_ItemName 13 14
18
19
# print(DO SOMETHING!)
20
## Expected output
21
# column1 column2
22
# 0_ItemName 3 4
23
# 1_ItemName 5 6
24
# 2_ItemName 7 8
25
# 10_ItemName 9 10
26
# 11_ItemName 1 2
27
# 20_ItemName 11 12
28
# 101_ItemName 13 14
29
What I tested
I tried to use sort_index
and failed.
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.sort_index.html
To be honest, I’m stuck at the debug of lambda
.
JavaScript
1
14
14
1
df(sort_index())
2
## Output
3
# column1 column2
4
# 0_ItemName 3 4
5
# 101_ItemName 13 14
6
# 10_ItemName 9 10
7
# 11_ItemName 1 2
8
# 1_ItemName 5 6
9
# 20_ItemName 11 12
10
# 2_ItemName 7 8
11
12
print(df.sort_index(key=(lambda x: int(x.str.split('_')[0]))))
13
# TypeError: int() argument must be a string, a bytes-like object or a real number, not 'list'
14
Environment
Python 3.10.5
Pandas 1.4.3
Advertisement
Answer
Try df.sort_index
with custom key=
:
JavaScript
1
7
1
df = df.sort_index(
2
key=lambda idx: idx.str.split("_")
3
.str[0]
4
.astype(int)
5
)
6
print(df)
7
Prints:
JavaScript
1
9
1
column1 column2
2
0_ItemName 3 4
3
1_ItemName 5 6
4
2_ItemName 7 8
5
10_ItemName 9 10
6
11_ItemName 1 2
7
20_ItemName 11 12
8
101_ItemName 13 14
9