Skip to content
Advertisement

How to perform index/match excel function equivalent using pandas?

I am facing the below challenge.

For instance, let the dummy dataframes be,

 col1_a | A | B | C
    a   | 1 | 4 | 7
    b   | 2 | 5 | 8
    c   | 3 | 6 | 9

Let another dataframe be,

 col1_b |col2
    b   | B
    c   | C
    a   | A

The output dataframe should be the following,

 col1_b | col2 | output
    b   |  B   |  5
    c   |  C   |  9
    a   |  A   |  1

My train of thought was to create dictionary(s), in this case, would be,

A, B, C

({'a': 1, 'b': 2, 'c': 3}, {'a': 4, 'b': 5, 'c': 6}, {'a': 7, 'b': 8, 'c': 9})

followed by this function,

def func_test1(row):
    if row['col2'] == 'B':
        return test1_df.col1_b.map(B)
    elif row['col2'] == 'C':
        return test1_df.col1_b.map(C)
    elif test1_df['col2'] == 'A':
        return test1_df.col1_b.map(A)
    
test1_df['output'] = test1_df.apply(func_test1, axis=1)

I am always getting the following error,

 ('The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().', 'occurred at index 2')

Also I think this is not an efficient solution at all.

Are there pandas inbuilt function(s) to achieve the goal?

Advertisement

Answer

Use DataFrame.join with unpivot df1 by DataFrame.set_index with DataFrame.stack and rename for new column name:

df = df2.join(df1.set_index('col1_a').stack().rename('output'), on=['col1_b','col2'])
print (df)
  col1_b col2  output
0      b    B       5
1      c    C       9
2      a    A       1

Another idea is use DataFrame.melt, rename column and DataFrame.merge:

df1 = (df1.melt('col1_a', var_name='col2', value_name='output')
          .rename(columns={'col1_a':'col1_b'}))
    
df = df2.merge(df1, on=['col1_b','col2'], how='left')
print (df)
  col1_b col2  output
0      b    B       5
1      c    C       9
2      a    A       1
User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement