I am facing the below challenge.
For instance, let the dummy dataframes be,
col1_a | A | B | C a | 1 | 4 | 7 b | 2 | 5 | 8 c | 3 | 6 | 9
Let another dataframe be,
col1_b |col2 b | B c | C a | A
The output dataframe should be the following,
col1_b | col2 | output b | B | 5 c | C | 9 a | A | 1
My train of thought was to create dictionary(s), in this case, would be,
A, B, C ({'a': 1, 'b': 2, 'c': 3}, {'a': 4, 'b': 5, 'c': 6}, {'a': 7, 'b': 8, 'c': 9})
followed by this function,
def func_test1(row): if row['col2'] == 'B': return test1_df.col1_b.map(B) elif row['col2'] == 'C': return test1_df.col1_b.map(C) elif test1_df['col2'] == 'A': return test1_df.col1_b.map(A) test1_df['output'] = test1_df.apply(func_test1, axis=1)
I am always getting the following error,
('The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().', 'occurred at index 2')
Also I think this is not an efficient solution at all.
Are there pandas inbuilt function(s) to achieve the goal?
Advertisement
Answer
Use DataFrame.join
with unpivot df1
by DataFrame.set_index
with DataFrame.stack
and rename
for new column name:
df = df2.join(df1.set_index('col1_a').stack().rename('output'), on=['col1_b','col2']) print (df) col1_b col2 output 0 b B 5 1 c C 9 2 a A 1
Another idea is use DataFrame.melt
, rename
column and DataFrame.merge
:
df1 = (df1.melt('col1_a', var_name='col2', value_name='output') .rename(columns={'col1_a':'col1_b'})) df = df2.merge(df1, on=['col1_b','col2'], how='left') print (df) col1_b col2 output 0 b B 5 1 c C 9 2 a A 1