Skip to content
Advertisement

Pandas: Reorder values in one column within the groups that are created based on another column

My data is already sorted by groups based on one column. And within each group I am trying to reorder the values in another column. The below example illustrates what I mean:

Currently my data looks like this, its sorted by the carrier name. The 3rd column shows what type of products they’re holding. However, I would like chicken to be first on the list for every carrier, so that it looks like the second table below.

Name Amount Product
Carrier A 1 Apples
Carrier A 9 Oranges
Carrier A 8 Bananas
Carrier A 10 Chicken
Carrier A 28 Total
Carrier B 9 Apples
Carrier B 4 Oranges
Carrier B 6 Bananas
Carrier B 9 Chicken
Carrier B 28 Total
Carrier C 5 Apples
Carrier C 8 Oranges
Carrier C 9 Bananas
Carrier C 10 Chicken
Carrier C 32 Total

The below table is what I am trying to achieve.

Name Amount Product
Carrier A 10 Chicken
Carrier A 1 Apples
Carrier A 9 Oranges
Carrier A 8 Bananas
Carrier A 18 Total
Carrier B 9 Chicken
Carrier B 9 Apples
Carrier B 4 Oranges
Carrier B 6 Bananas
Carrier B 19 Total
Carrier C 10 Chicken
Carrier C 5 Apples
Carrier C 8 Oranges
Carrier C 9 Bananas
Carrier C 22 Total

Is there a way to do this with pandas? Instinctively I looked at feeding df.sort_value both the Name and Product columns, but realized that it can only sort by ascending and descending order.

Also the order of values in the “Name” column must be preserved for my task. In this example, its already ordered alphabetically, but its not necessarily the case nor my intention in my actual dataset.

Any insight would be appreciated!

Edit: added details for additional clarity

Advertisement

Answer

Use pd.CategoricalDType to accomplish that:

# create your ordered list
cat = ['Chicken'] + sorted([p for p in df["Product"].unique()
                                  if not p in ['Chicken', 'Total']]) + ['Total']

# set as category
df = df.astype({'Product': pd.CategoricalDtype(cat, ordered=True)})
>>> df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 15 entries, 0 to 14
Data columns (total 3 columns):
 #   Column   Non-Null Count  Dtype
---  ------   --------------  -----
 0   Name     15 non-null     object
 1   Amount   15 non-null     int64
 2   Product  15 non-null     category
dtypes: category(1), int64(1), object(1)
memory usage: 595.0+ bytes
>>> df.sort_values(['Name', 'Product'])
         Name  Amount  Product
3   Carrier A      10  Chicken
0   Carrier A       1   Apples
2   Carrier A       8  Bananas
1   Carrier A       9  Oranges
4   Carrier A      28    Total
8   Carrier B       9  Chicken
5   Carrier B       9   Apples
7   Carrier B       6  Bananas
6   Carrier B       4  Oranges
9   Carrier B      28    Total
13  Carrier C      10  Chicken
10  Carrier C       5   Apples
12  Carrier C       9  Bananas
11  Carrier C       8  Oranges
14  Carrier C      32    Total
User contributions licensed under: CC BY-SA
4 People found this is helpful
Advertisement