My data is already sorted by groups based on one column. And within each group I am trying to reorder the values in another column. The below example illustrates what I mean:
Currently my data looks like this, its sorted by the carrier name. The 3rd column shows what type of products they’re holding. However, I would like chicken to be first on the list for every carrier, so that it looks like the second table below.
Name | Amount | Product |
---|---|---|
Carrier A | 1 | Apples |
Carrier A | 9 | Oranges |
Carrier A | 8 | Bananas |
Carrier A | 10 | Chicken |
Carrier A | 28 | Total |
Carrier B | 9 | Apples |
Carrier B | 4 | Oranges |
Carrier B | 6 | Bananas |
Carrier B | 9 | Chicken |
Carrier B | 28 | Total |
Carrier C | 5 | Apples |
Carrier C | 8 | Oranges |
Carrier C | 9 | Bananas |
Carrier C | 10 | Chicken |
Carrier C | 32 | Total |
The below table is what I am trying to achieve.
Name | Amount | Product |
---|---|---|
Carrier A | 10 | Chicken |
Carrier A | 1 | Apples |
Carrier A | 9 | Oranges |
Carrier A | 8 | Bananas |
Carrier A | 18 | Total |
Carrier B | 9 | Chicken |
Carrier B | 9 | Apples |
Carrier B | 4 | Oranges |
Carrier B | 6 | Bananas |
Carrier B | 19 | Total |
Carrier C | 10 | Chicken |
Carrier C | 5 | Apples |
Carrier C | 8 | Oranges |
Carrier C | 9 | Bananas |
Carrier C | 22 | Total |
Is there a way to do this with pandas? Instinctively I looked at feeding df.sort_value both the Name and Product columns, but realized that it can only sort by ascending and descending order.
Also the order of values in the “Name” column must be preserved for my task. In this example, its already ordered alphabetically, but its not necessarily the case nor my intention in my actual dataset.
Any insight would be appreciated!
Edit: added details for additional clarity
Advertisement
Answer
Use pd.CategoricalDType
to accomplish that:
# create your ordered list
cat = ['Chicken'] + sorted([p for p in df["Product"].unique()
if not p in ['Chicken', 'Total']]) + ['Total']
# set as category
df = df.astype({'Product': pd.CategoricalDtype(cat, ordered=True)})
>>> df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 15 entries, 0 to 14
Data columns (total 3 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Name 15 non-null object
1 Amount 15 non-null int64
2 Product 15 non-null category
dtypes: category(1), int64(1), object(1)
memory usage: 595.0+ bytes
>>> df.sort_values(['Name', 'Product'])
Name Amount Product
3 Carrier A 10 Chicken
0 Carrier A 1 Apples
2 Carrier A 8 Bananas
1 Carrier A 9 Oranges
4 Carrier A 28 Total
8 Carrier B 9 Chicken
5 Carrier B 9 Apples
7 Carrier B 6 Bananas
6 Carrier B 4 Oranges
9 Carrier B 28 Total
13 Carrier C 10 Chicken
10 Carrier C 5 Apples
12 Carrier C 9 Bananas
11 Carrier C 8 Oranges
14 Carrier C 32 Total