I have a ref table df_ref
like this:
col1 col2 ref a b a,b c d c,d
I need to create a new column in another table based on ref table.The table like this:
col1 col2 a b a NULL NULL b a NULL a NULL c d c NULL NULL NULL
The output table df_org
looks like:
col1 col2 ref a b a,b a NULL a,b NULL b a,b a NULL a,b a NULL a,b c d c,d c NULL c,d NULL NULL NULL
If any column value in col1
and col2
can find in ref table, it will use the ref col in ref table. If col1
and col2
are NULL
, So they cannot find anything in ref table, just return NULL
. I use this code, but it doesn’t work.
df_org['ref']=np.where(((df_org['col1'].isin(df_ref['ref'])) | (df_org['col2'].isin(df_ref['ref'])) ), df_ref['ref'], 'NULL')
ValueError: operands could not be broadcast together with shapes
Advertisement
Answer
You want to perform two merges and combine them:
df_org = ( df.merge(df_ref.drop('col2', axis=1), on='col1', how='left') .combine_first(df.merge(df_ref.drop('col1', axis=1), on='col2', how='left')) )
output:
col1 col2 ref 0 a b a,b 1 a NaN a,b 2 NaN b a,b 3 a NaN a,b 4 a NaN a,b 5 c d c,d 6 c NaN c,d 7 NaN NaN NaN