In Pandas, how to create a unique ID based on the combination of many columns?

Question

I have a very large dataset, that looks like I need to create a ID variable, that is unique for every B-C combination. That is, the output should be I actually dont care about whether the index starts at zero or not, and whether the value for the missing columns is 0 or any other number. I just want something

Accepted Answer

I think you can use factorize:df['combined_id'] = pd.factorize(df.B+df.C)[0]print df            B              C  combined_id0  john smith  indiana jones            01    john doe   duck mc duck            12  adam smith         batman            23    john doe   duck mc duck            14         NaN            NaN           -1

Advertisement

Answer