In a df comprised of the columns asset_id, event_start_date, event_end_date
,
I wish to add a forth column datediff
that for each asset_id
will capture how many days passed between a end_date
and the following start_date
for the same asset_id
, but in case that following start_date
is earlier than the current end_date
, I would like to capture the difference between the two start_dates. Dataset is sorted by (asset_id, start_date asc).
In Excel it would look something like:
I tried:
JavaScript
x
5
1
events['datediff'] = df.groupby('asset_id').apply(lambda x: x['event_start_date'].shift(-1)-x['event_end_date'] if
2
x['event_start_date'].shift(-1)>x['event_end_date'] else x['event_start_date'].shift(-1)-x['event_start_date'] ).
3
fillna(pd.Timedelta(seconds=0)).reset_index(drop=True)
4
5
But this is:
- not working. Throwing
ValueError: The truth value of a Series is ambiguous.
- so un-elegant.
Thanks!
Advertisement
Answer
JavaScript
1
20
20
1
df = pd.DataFrame({
2
'asset_id':[0,0,1,1],
3
'event_start_date':['2019-07-08','2019-07-11','2019-07-15','2019-07-25'],
4
'event_end_date':['2019-07-08','2019-07-23','2019-07-29','2019-07-25']
5
})
6
7
df['event_end_date'] = pd.to_datetime(df['event_end_date'])
8
df['event_start_date'] = pd.to_datetime(df['event_start_date'])
9
10
11
df['next_start']=df.groupby('asset_id')['event_start_date'].shift(-1)
12
13
df['date_diff'] = np.where(
14
df['next_start']>df['event_end_date'],
15
(df['next_start']-df['event_end_date']).dt.days,
16
(df['next_start']-df['event_start_date']).dt.days
17
)
18
19
df = df.drop(columns=['next_start']).fillna(0)
20