Skip to content
Advertisement

In Pandas, how to create a unique ID based on the combination of many columns?

I have a very large dataset, that looks like

JavaScript

I need to create a ID variable, that is unique for every B-C combination. That is, the output should be

JavaScript

I actually dont care about whether the index starts at zero or not, and whether the value for the missing columns is 0 or any other number. I just want something fast, that does not take a lot of memory and can be sorted quickly. I use:

JavaScript

but the output is float64 and takes a lot of memory. Can we do better? Thanks!

Advertisement

Answer

I think you can use factorize:

JavaScript
User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement