Verify if elements of pandas columns have been shuffled

Question

I have the following df: The above df represents the lines in a csv file where the del_el is an add_el on another line. I want to add a column action in which the value would be &#8220;replace&#8221; if for the same (name, id), the del_el is equal to the add_el column on another line_number. Desired output Sa…

Accepted Answer

The solution I came up with consists in grouping the rows by name and id and aggregating the columns added and deleted into a list(removed version for simplicity purpose). More info here.res = df.groupby(['name', 'id']).agg(tuple).applymap(list).reset_index()I then create a column replaced with list comprehension that returns the set intersection between added and deleted elements. More info here.res['replaced'] = [(set(a) & set(b)) if len((set(a) & set(b))) != 0 else 'NaN' for a, b in zip(res.added, res.deleted)]res = res[['name', 'id', 'replaced']] #selecting necessary columnsI merge the result with the original dataframe so I have the set intersection in each row.res_final = pd.merge(df, res, on=['name', 'id']) #merging with original dfI finally create a function that checks if the deleted element appears in the set intersection column replaced. If yes, then the label &#8220;replace&#8221; is added. Else, I just return the action that was previously there. To ensure that we are not looking at elements on the same row, I verify if the action isn&#8217;t none (based on the code in my question post).def is_it_replaced(row):    if str(row['deleted']) in str(row['replaced']) and str(row['action']) != 'none':        return 'replace'    else:        return str(row['action'])res_final['action_type'] = res_final.apply(lambda x: is_it_replaced(x), axis=1)res_final = res_final.drop(columns=['action', 'replaced']) #final cleanupGood: it worksBad: it&#8217;s slow, especially if you dataframe is big. It is preferable to avoid list comprehension.

Advertisement

Answer