Conditional lambda in pandas returns ValueError

Tags: , ,

In a df comprised of the columns asset_id, event_start_date, event_end_date, I wish to add a forth column datediff that for each asset_id will capture how many days passed between a end_date and the following start_date for the same asset_id, but in case that following start_date is earlier than the current end_date, I would like to capture the difference between the two start_dates. Dataset is sorted by (asset_id, start_date asc).

In Excel it would look something like:

enter image description here

I tried:

  events['datediff'] = df.groupby('asset_id').apply(lambda x:  x['event_start_date'].shift(-1)-x['event_end_date'] if
    x['event_start_date'].shift(-1)>x['event_end_date'] else  x['event_start_date'].shift(-1)-x['event_start_date'] ).

But this is:

  1. not working. Throwing ValueError: The truth value of a Series is ambiguous.
  2. so un-elegant.



df = pd.DataFrame({

df['event_end_date'] = pd.to_datetime(df['event_end_date'])
df['event_start_date'] = pd.to_datetime(df['event_start_date'])


df['date_diff'] = np.where(

df = df.drop(columns=['next_start']).fillna(0)

Source: stackoverflow