I have an example of my dataset like this :
import pandas as pd df = pd.DataFrame({'check_in':['2020-02-27','2020-02-28'],'check_out':['2020-02-29','2020-03-02'],'revenue':[100,66]}) df check_in check_out revenue 0 2020-02-27 2020-02-29 100 1 2020-02-28 2020-03-02 66
and I want to turn it into something like this :
date revenue 0 2020-02-27 50 1 2020-02-28 72 2 2020-02-29 22 2 2020-03-01 22
The check_out
date is not included in the range; so the first period is 2 days (27 and 28) with 50 revenue each.
Advertisement
Answer
Another method to solve this is first get difference between the out and in dates and then add the difference as Timedelta, . Then use the len of the range to divide the revenue to split and then groupby with sum
Solution:
a = df['check_out'].sub(df['check_in']).dt.days b = a.map(range) #range automatically excludes the last entry c = df['check_in'].add(pd.to_timedelta(b.explode(),unit='days')) out = (c.to_frame('date').assign(revenue=df['revenue'].div(a)) .groupby("date")['revenue'].sum().reset_index())
print(out) date revenue 0 2020-02-27 50.0 1 2020-02-28 72.0 2 2020-02-29 22.0 3 2020-03-01 22.0
Considering your actual columns are datetime (if not you can convert then using below)
df[['check_in','check_out']] = df[['check_in','check_out']].apply(pd.to_datetime)