Skip to content
Advertisement

Passing multiple dataframes as arguments to a function

I have been scratching my head since morning on how to tackle this problem. I need to pass multiple columns from multiple dataframes to a function as arguments.

Example : Df1

A B C
1 11 111,333
2 22 222
3 33 nan

Df2

D E
a 111
b 333

Now, I want all the rows from Df2(col E) which are not present in the Df1 (Col C) along with DF1(A,B) concatenated.

The output I want is below:

newcol C
2_22 222

I have written below code snippet to do the same but I’m stuck at how to apply it to the dataframes.

def get_wrong_ls(a,b,c,d):
    wrong_l_list = []
    wrong_l_dict = {}
    if c is not None:
        c_list = str(c).split(",")
        for ele in c_list:
            if ele not in d:
                wrong_l_dict[str(a)+"_"+str(b)] = c
            wrong_l_list.append(wrong_l_dict)
    return pd.DataFrame(wrong_l_list)

it returns correct output when the strings are passed. But I’m stuck at how to pass the dataframes directly. The below code behaves weirdly and also the “nan” in the col C doesn’t get filtered out.

fd = get_wrong_ls(df1['A'],df1['B'],df1['C'],df2['E'])

Please help.

Advertisement

Answer

import pandas as pd
import numpy as np
df1 = pd.DataFrame({'A': [1, 2, 3], 'B': [11, 22, 33], 'C': ['111,333', '222', np.nan]})
df2 = pd.DataFrame({'D': ['a', 'b'], 'E': [111, 333]})

def get_wrong_ls(df1, df2):
    df1.dropna(inplace=True)
    df1['C'] = df1['C'].str.split(',')
    df1 = df1.explode('C').astype(int)
    out = df1[~df1['C'].isin(df2['E'])].copy()
    out['newcol'] = out['A'].astype(str) + '_' + out['B'].astype(str)

    return out[['newcol','C']]

get_wrong_ls(df1,df2)

Output

   newcol   C
1   2_22    222
User contributions licensed under: CC BY-SA
5 People found this is helpful
Advertisement