Skip to content
Advertisement

Verify if elements of pandas columns have been shuffled

I have the following df:

JavaScript

The above df represents the lines in a csv file where the del_el is an add_el on another line. I want to add a column action in which the value would be “replace” if for the same (name, id), the del_el is equal to the add_el column on another line_number.

Desired output

JavaScript

Sample code to recreate the input df

JavaScript

In my current solution, I define the actions as follow: Tuple format: ((add_el_name, added_el_ver), (del_el_name, del_el_ver))

JavaScript

Code of my current solution:

JavaScript

My current solution only verifies if the name and version of the deleted element are different from the ones of the added element. I need the “replace” action to verify if the del_el, del_ver is added to another line_number of the same (name, id).

Advertisement

Answer

The solution I came up with consists in grouping the rows by name and id and aggregating the columns added and deleted into a list(removed version for simplicity purpose). More info here.

JavaScript

I then create a column replaced with list comprehension that returns the set intersection between added and deleted elements. More info here.

JavaScript

I merge the result with the original dataframe so I have the set intersection in each row.

JavaScript

I finally create a function that checks if the deleted element appears in the set intersection column replaced. If yes, then the label “replace” is added. Else, I just return the action that was previously there. To ensure that we are not looking at elements on the same row, I verify if the action isn’t none (based on the code in my question post).

JavaScript

Good: it works Bad: it’s slow, especially if you dataframe is big. It is preferable to avoid list comprehension.

Advertisement