I have a dataset where I would like to replace and update values within a column when a data condition is met.
Data
ID Date State File AA Q122 NY closed AA Q222 NY closed AA Q322 NY closed AA Q422 NY closed AA Q123 NY closed AA Q223 NY closed AA Q323 NY closed AA Q423 NY closed BB Q124 CA closed BB Q224 CA open BB Q324 CA open BB Q424 CA open BB Q125 CA open BB Q225 CA open BB Q325 CA open BB Q425 CA open
Desired
ID Date State File AA Q122 NY yes AA Q222 NY yes AA Q322 NY yes AA Q422 NY yes AA Q123 NY yes AA Q223 NY closed AA Q323 NY closed AA Q423 NY closed BB Q124 CA closed BB Q224 CA open BB Q324 CA open BB Q424 CA open BB Q125 CA open BB Q225 CA open BB Q325 CA open BB Q425 CA open
Doing
df.loc[(df['ID'].eq('AA')) & (df['Date'] <= 'Q123'), 'File'] = 'yes' **However this is not capturing the quarter order and assigns the string 'yes' to rows that do not meet the condition of less than or = to Q123.**
Still researching, any suggestion is appreciated- Perhaps I need to convert quarters to datetime longdate and base the condition off of this column.
Advertisement
Answer
here is one way to do it
to apply the condition, Date need to be in Year and Quarter, and then applying np.where condition. However, you need to put the condition in the same format i.e. YYQ (year, quarter)
df['File']= np.where( ( df['ID'] =='AA') & ((df['Date'].str[-2:] + df['Date'].str[1:2]).astype(int) <=231) , 'yes' , df['File'] ) df
ID Date State File 0 AA Q122 NY yes 1 AA Q222 NY yes 2 AA Q322 NY yes 3 AA Q422 NY yes 4 AA Q123 NY yes 5 AA Q223 NY closed 6 AA Q323 NY closed 7 AA Q423 NY closed 8 BB Q124 CA closed 9 BB Q224 CA open 10 BB Q324 CA open 11 BB Q424 CA open 12 BB Q125 CA open 13 BB Q225 CA open 14 BB Q325 CA open 15 BB Q425 CA open