I am trying to calculate the number of days that have elapsed since the launch of a marketing campaign. I have one row per date for each marketing campaign in my DataFrame (df) and all dates start from the same day (though there is not a data point for each day for each campaign). In column ‘b’ I have the date relating to the data points of interest (dateime64[ns]) and in column ‘c’ I have the launch date of the marketing campaign (dateime64[ns]). I would like the resulting calculation to return n/a (or np.NaN or a suitable alternative) when column ‘b’ is earlier than column ‘c’, else I would like the calculation to return the difference the two dates.
Campaign | Date | Launch Date | Desired Column |
---|---|---|---|
A | 2019-09-01 | 2022-12-01 | n/a |
A | 2019-09-02 | 2022-12-01 | n/a |
B | 2019-09-01 | 2019-09-01 | 0 |
B | 2019-09-25 | 2019-09-01 | 24 |
When I try:
df['Days Since Launch'] = df['Date'] - df['Launch Date']
What I would hope returns a negative value actually returns a positive one, thus leading to duplicate values when I have dates that are 10 days prior and 10 days after the launch date.
When I try:
df['Days Since Launch'] = np.where(df['Date'] < df['Launch Date'], XXX, df['Date'] - df['Launch Date'])
Where XXX has to be the same data type as the two input columns, so I can’t enter np.NaN because the calculation will fail, nor can I enter a date as this will still leave the same issue that i want to solve. IF statements do not work as the “truth value of a Series is ambiguous”. Any ideas?
Advertisement
Answer
You can use a direct subtraction and conversion to days with dt.days
, then mask the negative values with where
:
s = pd.to_datetime(df['Date']).sub(pd.to_datetime(df['Launch Date'])).dt.days # or, if already datetime: #s = df['Date'].sub(df['Launch Date']).dt.days df['Desired Column'] = s.where(s.ge(0))
Alternative closer to your initial attempt, using mask
:
df['Desired Column'] = (df['Date'].sub(df['Launch Date']) .mask(df['Date'] < df['Launch Date']) )
Output:
Campaign Date Launch Date Desired Column 0 A 2019-09-01 2022-12-01 NaN 1 A 2019-09-02 2022-12-01 NaN 2 B 2019-09-01 2019-09-01 0.0 3 B 2019-09-25 2019-09-01 24.0