Skip to content
Advertisement

how to divide revenue between check_in_date and check_out_date, and turn those dates into single column named date

I have an example of my dataset like this :

import pandas as pd
df = pd.DataFrame({'check_in':['2020-02-27','2020-02-28'],'check_out':['2020-02-29','2020-03-02'],'revenue':[100,66]})
df

    check_in    check_out   revenue
0   2020-02-27  2020-02-29  100
1   2020-02-28  2020-03-02  66

and I want to turn it into something like this :

    date        revenue
0   2020-02-27  50
1   2020-02-28  72 
2   2020-02-29  22
2   2020-03-01  22

The check_out date is not included in the range; so the first period is 2 days (27 and 28) with 50 revenue each.

Advertisement

Answer

Another method to solve this is first get difference between the out and in dates and then add the difference as Timedelta, . Then use the len of the range to divide the revenue to split and then groupby with sum


Solution:

a = df['check_out'].sub(df['check_in']).dt.days
b = a.map(range) #range automatically excludes the last entry
c = df['check_in'].add(pd.to_timedelta(b.explode(),unit='days'))

out = (c.to_frame('date').assign(revenue=df['revenue'].div(a))
       .groupby("date")['revenue'].sum().reset_index())

print(out)

        date  revenue
0 2020-02-27     50.0
1 2020-02-28     72.0
2 2020-02-29     22.0
3 2020-03-01     22.0

Considering your actual columns are datetime (if not you can convert then using below)

df[['check_in','check_out']] = df[['check_in','check_out']].apply(pd.to_datetime)
User contributions licensed under: CC BY-SA
1 People found this is helpful
Advertisement