Skip to content
Advertisement

How to calculate difference in DATE based on status of another column?

I have the following dataset,

D_DATE       BIN Number   Disposition    Unit Assigned        
2018-01-04    10005      SWO Issued      PLUMBING DIVISION     
2016-06-23    10005      SWO Issued      SCAFFOLD UNIT         
2016-06-23    10005      SWO Rescinded   SCAFFOLD UNIT         
2018-01-17    10005      SWO Rescinded   PLUMBING DIVISION  
2019-01-04    10006      SWO Rescinded   BEST SQUAD 
2018-12-21    10006      SWO Issued      BEST SQUAD            

I want to create a new field name “duration” which will capture the time difference between SWO issued and rescinded for each BIN number. Note that, each BIN number can show up multiple times based on Date and different Unit. So, each unit can issue SWO on the same BIN number.

I am trying to get a output like this which will calculate the date difference in days on the BIN number by Unit and based on SWO Issued and Rescinded.

Output:

    D_DATE       BIN Number   Disposition    Unit Assigned         Duration
    2018-01-04    10005      SWO Issued      PLUMBING DIVISION     13 Days
    2016-06-23    10005      SWO Issued      SCAFFOLD UNIT         0 days
    2016-06-23    10005      SWO Rescinded   SCAFFOLD UNIT         0 days
    2018-01-17    10005      SWO Rescinded   PLUMBING DIVISION     13 days
    2019-01-04    10006      SWO Rescinded   BEST SQUAD            14 days
    2018-12-21    10006      SWO Issued      BEST SQUAD            14 days

I used this but it groupby on Unit rather than giving me individual BIN Number by each unit and their dispositions status,

df2 = df2.groupby('BIN Number')['D_DATE'].agg(['max','min'])

df2["Duration"] = df2['max'].sub(df2['min']).dt.days
print(df2)

Appreciate any help

Advertisement

Answer

IIUC,

you need to groupby and use diff to calculate the delta in time values,

you can then merge onto your original df.

m = df.groupby(['BIN Number','Disposition','Unit Assigned']).agg(date_min=('D_DATE','min'),
                                            date_max=('D_DATE','max')).groupby(level=[0,2])
                                            .diff().reset_index().dropna().drop('Disposition',axis=1)

df2 = pd.merge(df,m,on=['BIN Number','Unit Assigned'],how='left')

print(df2)

      D_DATE  BIN Number    Disposition      Unit Assigned date_min date_max
0 2018-01-04       10005     SWO Issued  PLUMBING DIVISION  13 days  13 days
1 2016-06-23       10005     SWO Issued      SCAFFOLD UNIT   0 days   0 days
2 2016-06-23       10005  SWO Rescinded      SCAFFOLD UNIT   0 days   0 days
3 2018-01-17       10005  SWO Rescinded  PLUMBING DIVISION  13 days  13 days
4 2019-01-04       10006  SWO Rescinded         BEST SQUAD  14 days  14 days
5 2018-12-21       10006     SWO Issued         BEST SQUAD  14 days  14 days
User contributions licensed under: CC BY-SA
9 People found this is helpful
Advertisement