Skip to content
Advertisement

create new column on conditions python

I have a ref table df_ref like this:

col1 col2 ref
a    b    a,b
c    d    c,d

I need to create a new column in another table based on ref table.The table like this:

col1 col2 
a    b   
a    NULL 
NULL b 
a    NULL 
a    NULL 
c    d  
c    NULL 
NULL NULL 

The output table df_org looks like:

col1  col2   ref
a     b      a,b
a     NULL   a,b
NULL  b      a,b
a     NULL   a,b
a     NULL   a,b
c     d      c,d
c     NULL   c,d
NULL  NULL   NULL

If any column value in col1 and col2 can find in ref table, it will use the ref col in ref table. If col1 and col2 are NULL, So they cannot find anything in ref table, just return NULL. I use this code, but it doesn’t work.

df_org['ref']=np.where(((df_org['col1'].isin(df_ref['ref'])) | 
         (df_org['col2'].isin(df_ref['ref']))  
          ), df_ref['ref'], 'NULL')

ValueError: operands could not be broadcast together with shapes

Advertisement

Answer

You want to perform two merges and combine them:

df_org = (
 df.merge(df_ref.drop('col2', axis=1), on='col1', how='left')
   .combine_first(df.merge(df_ref.drop('col1', axis=1), on='col2', how='left'))
)

output:

  col1 col2  ref
0    a    b  a,b
1    a  NaN  a,b
2  NaN    b  a,b
3    a  NaN  a,b
4    a  NaN  a,b
5    c    d  c,d
6    c  NaN  c,d
7  NaN  NaN  NaN
User contributions licensed under: CC BY-SA
5 People found this is helpful
Advertisement