pandas subtracting value in another column from previous row

Question

I have a dataframe (named df) sorted by identifier, id_number and contract_year_month in order like this so far: and would like to add a column named 'date_difference' that is consisted of contract_year_month minus collection_year_month from previous row based on identifier and id_number (e.g. 2018-01-08 minus 2018-01-09), so that the df would be: I already converted the type of contract_year_month and

Accepted Answer

Here is one potential way to do this.First create a boolean mask, then use numpy.where and Series.shift to create the column date_difference:mask = df.duplicated(['identifier', 'id_number'])df['date_difference'] = (np.where(mask, (df['contract_year_month'] -                                          df['collection_year_month'].shift(1)).dt.days, np.nan))[output]    identifier  id_number   contract_year_month collection_year_month   date_difference0   K001    1   2018-01-03  2018-01-09  NaN1   K001    1   2018-01-08  2018-01-10  -1.02   K001    2   2018-01-01  2018-01-05  NaN3   K001    2   2018-01-15  2018-01-18  10.04   K002    4   2018-01-04  2018-01-07  NaN5   K002    4   2018-01-09  2018-01-15  2.0

Advertisement

Answer