Imagine following dataframe is given.
import pandas as pd products = ['Apple', 'Apple', 'Carrot', 'Eggplant', 'Eggplant'] customer_demand_date = ['2023-01-01', '2023-01-07', '2023-01-01', '2023-01-01', '2023-01-07', '2023-01-14'] col_02_2023 = [0, 20, 0, 0, 0, 10] col_03_2023 = [20, 30, 10, 0, 10, 0] col_04_2023 = [10, 40, 50, 30, 40, 10] col_05_2023 = [40, 40, 60, 50, 60, 20] data = {'Products': products, 'customer_demand_date': customer_demand_date, '02_2023': col_02_2023, '03_2023': col_03_2023, '04_2023': col_04_2023, '05_2023': col_05_2023} df = pd.DataFrame(data) print(df) Products customer_demand_date 02_2023 03_2023 04_2023 05_2023 0 Apple 2023-01-01 0 20 10 40 1 Apple 2023-01-07 20 30 40 40 2 Carrot 2023-01-01 0 10 50 60 3 Egg 2023-01-01 0 0 30 50 4 Egg 2023-01-07 0 10 40 60 5 Egg 2023-01-14 0 0 10 20
I have columns products, custome_demand_date (every week there is new customer demand for products per upcoming months) and months with quantity demand. How can I determine which product has experienced the most frequent changes in customer demand over the months, and sort the products in descending order of frequency of change? I have tried to group by product, accumulate the demand quantity but none of them can analyze the data both horizontally (per customer demand date) and vertically (per months). Desired output:
Sorted products Ranking(or %, or count of changes) Egg 1 (or 70% or 13) Apple 2 (or 52% or 8) Carrot 3 (22% or 3)
Either ranking or % of change frequency or count of changes.
- Note: percentages in desired output are random numbers
I’d really appreciate if you have any clever approach to solve this problem? Thanks
Advertisement
Answer
One way is to define a function that counts horizontal and vertical changes which you can apply to each group individually.
import pandas as pd from io import StringIO def change_freq(x, months): # count horizontal changes chngs_horizontal = x[months].diff(axis=1).fillna(0).astype(bool).sum().sum() # count vertical changes chngs_vertical = x[months].diff(axis=0).fillna(0).astype(bool).sum().sum() return chngs_horizontal+chngs_vertical # sample data data = StringIO(""" Products,customer_demand_date,02_2023,03_2023,04_2023,05_2023 Apple,2023-01-01,0,20,10,40 Apple,2023-01-07,20,30,40,40 Carrot,2023-01-01,0,10,50,60 Egg,2023-01-01,0,0,30,50 Egg,2023-01-07,0,10,40,60 Egg,2023-01-14,0,0,10,20 """) df = pd.read_csv(data, sep=",") # count horizontal and vertical changes by product result = df.groupby('Products').apply(change_freq, ['02_2023','03_2023','04_2023','05_2023']) result = result.sort_values(ascending=False).to_frame('count_changes') result['rank'] = result['count_changes'].rank(ascending=False)
This returns
count_changes rank Products Egg 13 1.0 Apple 8 2.0 Carrot 3 3.0