I have two data frames: first:
l1=[123,345,546,245] l2=["a","a","b","b"] l3=["j","k","l","pp"] l4=["m","n","m","n"] df1=pd.DataFrame(list(zip(l1,l2,l3,l4)),columns=['id','X','Y','Z']) df1.head()
second:
l1=["X","X","Y","Y","Y","Y","Z","Z"] l2=["a","b","j","k","l","pp","m","n"] l3=["1","2","1","2","3","4","1","2"] df2=pd.DataFrame(list(zip(l1,l2,l3)),columns=["labelnames","levels","labels"]) df2
I need to do label encoding to the first data frame values by using the second data frame. I tried converting the second data frame to a dictionary and then replacing the first data frame values.
such as;
dict= {"X" : { a:1 , b:2} "Y": {j:1,k:2,l:3, pp:4}, "Z" : {m:1,n:2}}
I couldn’t print the proper dictionary format.
I want to see the output like:
l1=[123,345,546,245] l2=["1","1","2","2"] l3=["1","2","3","4"] l4=["1","2","1","2"] df1=pd.DataFrame(list(zip(l1,l2,l3,l4)),columns=['id','X','Y','Z']) df1.head()
So, How can I replace the first data frame values( X, Y, Z) with second data frame labels?
Advertisement
Answer
here is one way to do it, using melt, and pivot
# melt the dataframe to make it columner # merge with the reference dataframe # finally pivot to get back to the original format (df.melt(id_vars=['id'], var_name='key' ).merge(df2, left_on=['key','value'], right_on=['labelnames','levels'], how='left' ) .pivot(index='id', columns='key', values='labels') .reset_index() .rename_axis(columns=None) )
id X Y Z 0 123 1 1 1 1 245 2 4 2 2 345 1 2 2 3 546 2 3 1