Skip to content
Advertisement

Replace Values of Multiple Columns in Pandas Dataframe More Efficiently

I have a DataFrame, df, where I would like to replace several values

user1 user2 user3
apple yoo apple
mango ram mango

Instead of doing

df['user1'] = df['user1'].replace(['apple','mango'], [0, 1])
df['user3'] = df['user1'].replace(['apple','mango'], [0, 1])
df['user2'] = df['user2'].replace(['yoo','ram'], [2, 3])


to get the final DataFrame of

user1 user2 user3
0 2 0
1 3 1

Is there any way I make the code above more efficient such that I can change the values of apple, mango, yoo and ram with one line of code?

Advertisement

Answer

If need set range by unique values per columns use:

cols = ['user1','user2','user3']
s = df[cols].unstack()
df[cols] = pd.Series(pd.factorize(s)[0], index=s.index).unstack(0)
print (df)
   user1  user2  user3
0      0      2      0
1      1      3      1
User contributions licensed under: CC BY-SA
5 People found this is helpful
Advertisement