I have the following dataset,
D_DATE BIN Number Disposition Unit Assigned 2018-01-04 10005 SWO Issued PLUMBING DIVISION 2016-06-23 10005 SWO Issued SCAFFOLD UNIT 2016-06-23 10005 SWO Rescinded SCAFFOLD UNIT 2018-01-17 10005 SWO Rescinded PLUMBING DIVISION 2019-01-04 10006 SWO Rescinded BEST SQUAD 2018-12-21 10006 SWO Issued BEST SQUAD
I want to create a new field name “duration” which will capture the time difference between SWO issued and rescinded for each BIN number. Note that, each BIN number can show up multiple times based on Date and different Unit. So, each unit can issue SWO on the same BIN number.
I am trying to get a output like this which will calculate the date difference in days on the BIN number by Unit and based on SWO Issued and Rescinded.
Output:
D_DATE BIN Number Disposition Unit Assigned Duration 2018-01-04 10005 SWO Issued PLUMBING DIVISION 13 Days 2016-06-23 10005 SWO Issued SCAFFOLD UNIT 0 days 2016-06-23 10005 SWO Rescinded SCAFFOLD UNIT 0 days 2018-01-17 10005 SWO Rescinded PLUMBING DIVISION 13 days 2019-01-04 10006 SWO Rescinded BEST SQUAD 14 days 2018-12-21 10006 SWO Issued BEST SQUAD 14 days
I used this but it groupby on Unit rather than giving me individual BIN Number by each unit and their dispositions status,
df2 = df2.groupby('BIN Number')['D_DATE'].agg(['max','min']) df2["Duration"] = df2['max'].sub(df2['min']).dt.days print(df2)
Appreciate any help
Advertisement
Answer
IIUC,
you need to groupby and use diff
to calculate the delta in time values,
you can then merge onto your original df.
m = df.groupby(['BIN Number','Disposition','Unit Assigned']).agg(date_min=('D_DATE','min'), date_max=('D_DATE','max')).groupby(level=[0,2]) .diff().reset_index().dropna().drop('Disposition',axis=1) df2 = pd.merge(df,m,on=['BIN Number','Unit Assigned'],how='left')
print(df2) D_DATE BIN Number Disposition Unit Assigned date_min date_max 0 2018-01-04 10005 SWO Issued PLUMBING DIVISION 13 days 13 days 1 2016-06-23 10005 SWO Issued SCAFFOLD UNIT 0 days 0 days 2 2016-06-23 10005 SWO Rescinded SCAFFOLD UNIT 0 days 0 days 3 2018-01-17 10005 SWO Rescinded PLUMBING DIVISION 13 days 13 days 4 2019-01-04 10006 SWO Rescinded BEST SQUAD 14 days 14 days 5 2018-12-21 10006 SWO Issued BEST SQUAD 14 days 14 days