My data is already sorted by groups based on one column. And within each group I am trying to reorder the values in another column. The below example illustrates what I mean:
Currently my data looks like this, its sorted by the carrier name. The 3rd column shows what type of products they’re holding. However, I would like chicken to be first on the list for every carrier, so that it looks like the second table below.
Name | Amount | Product |
---|---|---|
Carrier A | 1 | Apples |
Carrier A | 9 | Oranges |
Carrier A | 8 | Bananas |
Carrier A | 10 | Chicken |
Carrier A | 28 | Total |
Carrier B | 9 | Apples |
Carrier B | 4 | Oranges |
Carrier B | 6 | Bananas |
Carrier B | 9 | Chicken |
Carrier B | 28 | Total |
Carrier C | 5 | Apples |
Carrier C | 8 | Oranges |
Carrier C | 9 | Bananas |
Carrier C | 10 | Chicken |
Carrier C | 32 | Total |
The below table is what I am trying to achieve.
Name | Amount | Product |
---|---|---|
Carrier A | 10 | Chicken |
Carrier A | 1 | Apples |
Carrier A | 9 | Oranges |
Carrier A | 8 | Bananas |
Carrier A | 18 | Total |
Carrier B | 9 | Chicken |
Carrier B | 9 | Apples |
Carrier B | 4 | Oranges |
Carrier B | 6 | Bananas |
Carrier B | 19 | Total |
Carrier C | 10 | Chicken |
Carrier C | 5 | Apples |
Carrier C | 8 | Oranges |
Carrier C | 9 | Bananas |
Carrier C | 22 | Total |
Is there a way to do this with pandas? Instinctively I looked at feeding df.sort_value both the Name and Product columns, but realized that it can only sort by ascending and descending order.
Also the order of values in the “Name” column must be preserved for my task. In this example, its already ordered alphabetically, but its not necessarily the case nor my intention in my actual dataset.
Any insight would be appreciated!
Edit: added details for additional clarity
Advertisement
Answer
Use pd.CategoricalDType
to accomplish that:
# create your ordered list cat = ['Chicken'] + sorted([p for p in df["Product"].unique() if not p in ['Chicken', 'Total']]) + ['Total'] # set as category df = df.astype({'Product': pd.CategoricalDtype(cat, ordered=True)})
>>> df.info() <class 'pandas.core.frame.DataFrame'> RangeIndex: 15 entries, 0 to 14 Data columns (total 3 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Name 15 non-null object 1 Amount 15 non-null int64 2 Product 15 non-null category dtypes: category(1), int64(1), object(1) memory usage: 595.0+ bytes
>>> df.sort_values(['Name', 'Product']) Name Amount Product 3 Carrier A 10 Chicken 0 Carrier A 1 Apples 2 Carrier A 8 Bananas 1 Carrier A 9 Oranges 4 Carrier A 28 Total 8 Carrier B 9 Chicken 5 Carrier B 9 Apples 7 Carrier B 6 Bananas 6 Carrier B 4 Oranges 9 Carrier B 28 Total 13 Carrier C 10 Chicken 10 Carrier C 5 Apples 12 Carrier C 9 Bananas 11 Carrier C 8 Oranges 14 Carrier C 32 Total