Skip to content
Advertisement

Sort the products based on the frequency of changes in customer demand

Imagine following dataframe is given.

import pandas as pd 
products = ['Apple', 'Apple', 'Carrot', 'Eggplant', 'Eggplant']
customer_demand_date = ['2023-01-01', '2023-01-07', '2023-01-01', '2023-01-01', '2023-01-07', '2023-01-14']
col_02_2023 = [0, 20, 0, 0, 0, 10]
col_03_2023 = [20, 30, 10, 0, 10, 0]
col_04_2023 = [10, 40, 50, 30, 40, 10]
col_05_2023 = [40, 40, 60, 50, 60, 20]

data = {'Products': products,
        'customer_demand_date': customer_demand_date,
        '02_2023': col_02_2023,
        '03_2023': col_03_2023,
        '04_2023': col_04_2023,
        '05_2023': col_05_2023}

df = pd.DataFrame(data)

print(df) 

Products customer_demand_date  02_2023  03_2023  04_2023  05_2023
0    Apple           2023-01-01        0       20       10       40
1    Apple           2023-01-07       20       30       40       40
2   Carrot           2023-01-01        0       10       50       60
3      Egg           2023-01-01        0        0       30       50
4      Egg           2023-01-07        0       10       40       60
5      Egg           2023-01-14        0        0       10       20

I have columns products, custome_demand_date (every week there is new customer demand for products per upcoming months) and months with quantity demand. How can I determine which product has experienced the most frequent changes in customer demand over the months, and sort the products in descending order of frequency of change? I have tried to group by product, accumulate the demand quantity but none of them can analyze the data both horizontally (per customer demand date) and vertically (per months). Desired output:

Sorted products      Ranking(or %, or count of changes) 
Egg                  1 (or 70% or 13)   
Apple                2 (or 52% or 8)
Carrot               3 (22% or 3)

Either ranking or % of change frequency or count of changes.

  • Note: percentages in desired output are random numbers

I’d really appreciate if you have any clever approach to solve this problem? Thanks

Advertisement

Answer

One way is to define a function that counts horizontal and vertical changes which you can apply to each group individually.

import pandas as pd
from io import StringIO

def change_freq(x, months):
    # count horizontal changes
    chngs_horizontal = x[months].diff(axis=1).fillna(0).astype(bool).sum().sum()
    # count vertical changes
    chngs_vertical = x[months].diff(axis=0).fillna(0).astype(bool).sum().sum()
    return chngs_horizontal+chngs_vertical

# sample data
data = StringIO("""
Products,customer_demand_date,02_2023,03_2023,04_2023,05_2023
Apple,2023-01-01,0,20,10,40
Apple,2023-01-07,20,30,40,40
Carrot,2023-01-01,0,10,50,60
Egg,2023-01-01,0,0,30,50
Egg,2023-01-07,0,10,40,60
Egg,2023-01-14,0,0,10,20
""")

df = pd.read_csv(data, sep=",")

# count horizontal and vertical changes by product
result = df.groupby('Products').apply(change_freq, ['02_2023','03_2023','04_2023','05_2023'])
result = result.sort_values(ascending=False).to_frame('count_changes')
result['rank'] = result['count_changes'].rank(ascending=False)

This returns

          count_changes  rank
Products
Egg                  13   1.0
Apple                 8   2.0
Carrot                3   3.0
User contributions licensed under: CC BY-SA
1 People found this is helpful
Advertisement