I have a ref table df_ref like this:
col1 col2 ref a b a,b c d c,d
I need to create a new column in another table based on ref table.The table like this:
col1 col2 a b a NULL NULL b a NULL a NULL c d c NULL NULL NULL
The output table df_org looks like:
col1 col2 ref a b a,b a NULL a,b NULL b a,b a NULL a,b a NULL a,b c d c,d c NULL c,d NULL NULL NULL
If any column value in col1 and col2 can find in ref table, it will use the ref col in ref table. If col1 and col2 are NULL, So they cannot find anything in ref table, just return NULL. I use this code, but it doesn’t work.
df_org['ref']=np.where(((df_org['col1'].isin(df_ref['ref'])) |
(df_org['col2'].isin(df_ref['ref']))
), df_ref['ref'], 'NULL')
ValueError: operands could not be broadcast together with shapes
Advertisement
Answer
You want to perform two merges and combine them:
df_org = (
df.merge(df_ref.drop('col2', axis=1), on='col1', how='left')
.combine_first(df.merge(df_ref.drop('col1', axis=1), on='col2', how='left'))
)
output:
col1 col2 ref 0 a b a,b 1 a NaN a,b 2 NaN b a,b 3 a NaN a,b 4 a NaN a,b 5 c d c,d 6 c NaN c,d 7 NaN NaN NaN