I am facing the below challenge.
For instance, let the dummy dataframes be,
col1_a | A | B | C
a | 1 | 4 | 7
b | 2 | 5 | 8
c | 3 | 6 | 9
Let another dataframe be,
col1_b |col2
b | B
c | C
a | A
The output dataframe should be the following,
col1_b | col2 | output
b | B | 5
c | C | 9
a | A | 1
My train of thought was to create dictionary(s), in this case, would be,
A, B, C
({'a': 1, 'b': 2, 'c': 3}, {'a': 4, 'b': 5, 'c': 6}, {'a': 7, 'b': 8, 'c': 9})
followed by this function,
def func_test1(row):
if row['col2'] == 'B':
return test1_df.col1_b.map(B)
elif row['col2'] == 'C':
return test1_df.col1_b.map(C)
elif test1_df['col2'] == 'A':
return test1_df.col1_b.map(A)
test1_df['output'] = test1_df.apply(func_test1, axis=1)
I am always getting the following error,
('The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().', 'occurred at index 2')
Also I think this is not an efficient solution at all.
Are there pandas inbuilt function(s) to achieve the goal?
Advertisement
Answer
Use DataFrame.join with unpivot df1 by DataFrame.set_index with DataFrame.stack and rename for new column name:
df = df2.join(df1.set_index('col1_a').stack().rename('output'), on=['col1_b','col2'])
print (df)
col1_b col2 output
0 b B 5
1 c C 9
2 a A 1
Another idea is use DataFrame.melt, rename column and DataFrame.merge:
df1 = (df1.melt('col1_a', var_name='col2', value_name='output')
.rename(columns={'col1_a':'col1_b'}))
df = df2.merge(df1, on=['col1_b','col2'], how='left')
print (df)
col1_b col2 output
0 b B 5
1 c C 9
2 a A 1