Skip to content
Advertisement

can’t replace duplicate values with new values in xlsx with pandas

I have an xlsx file containing too much data. however the data contains duplicate values in column named UniversalIDS which I wanted to replace it with a randomly generated IDS with Pandas.

So far I’ve tried different scenarios which I googled but did not work. for example I tried this:

JavaScript

also I tried other alternatives seen on this site like for example:

JavaScript

also this didn’t work:

JavaScript

this is a snippet from xlsx data:

JavaScript

as can be seen in the above there are a duplicate values in UniversalIDS column, also it is worth to mention that there are other columns in the data but cut out the problem causing column for simplicity.

so my question is how can I replace the duplicate values in UniversalIDS column with a new unique IDs?

Advertisement

Answer

Your expression:

JavaScript

is correct python but it sets one uuid for all duplicated elements, which means the elements wil still be duplicated after it’s been executed. You should create a Series with distinct uuids:

JavaScript
User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement