Passing multiple dataframes as arguments to a function

I have been scratching my head since morning on how to tackle this problem. I need to pass multiple columns from multiple dataframes to a function as arguments.

Example : Df1

A	B	C
1	11	111,333
2	22	222
3	33	nan

Df2

D	E
a	111
b	333

Now, I want all the rows from Df2(col E) which are not present in the Df1 (Col C) along with DF1(A,B) concatenated.

The output I want is below:

newcol	C
2_22	222

I have written below code snippet to do the same but I’m stuck at how to apply it to the dataframes.

def get_wrong_ls(a,b,c,d):
    wrong_l_list = []
    wrong_l_dict = {}
    if c is not None:
        c_list = str(c).split(",")
        for ele in c_list:
            if ele not in d:
                wrong_l_dict[str(a)+"_"+str(b)] = c
            wrong_l_list.append(wrong_l_dict)
    return pd.DataFrame(wrong_l_list)

JavaScript
​x
 
def get_wrong_ls(a,b,c,d):
    wrong_l_list = []
    wrong_l_dict = {}
    if c is not None:
        c_list = str(c).split(",")
        for ele in c_list:
            if ele not in d:
                wrong_l_dict[str(a)+"_"+str(b)] = c
            wrong_l_list.append(wrong_l_dict)
    return pd.DataFrame(wrong_l_list)
​

it returns correct output when the strings are passed. But I’m stuck at how to pass the dataframes directly. The below code behaves weirdly and also the “nan” in the col C doesn’t get filtered out.

fd = get_wrong_ls(df1['A'],df1['B'],df1['C'],df2['E'])

JavaScript
 
fd = get_wrong_ls(df1['A'],df1['B'],df1['C'],df2['E'])
​

Please help.

Answer

import pandas as pd
import numpy as np
df1 = pd.DataFrame({'A': [1, 2, 3], 'B': [11, 22, 33], 'C': ['111,333', '222', np.nan]})
df2 = pd.DataFrame({'D': ['a', 'b'], 'E': [111, 333]})

def get_wrong_ls(df1, df2):
    df1.dropna(inplace=True)
    df1['C'] = df1['C'].str.split(',')
    df1 = df1.explode('C').astype(int)
    out = df1[~df1['C'].isin(df2['E'])].copy()
    out['newcol'] = out['A'].astype(str) + '_' + out['B'].astype(str)

    return out[['newcol','C']]

get_wrong_ls(df1,df2)

JavaScript
 
import pandas as pd
import numpy as np
df1 = pd.DataFrame({'A': [1, 2, 3], 'B': [11, 22, 33], 'C': ['111,333', '222', np.nan]})
df2 = pd.DataFrame({'D': ['a', 'b'], 'E': [111, 333]})
​
def get_wrong_ls(df1, df2):
    df1.dropna(inplace=True)
    df1['C'] = df1['C'].str.split(',')
    df1 = df1.explode('C').astype(int)
    out = df1[~df1['C'].isin(df2['E'])].copy()
    out['newcol'] = out['A'].astype(str) + '_' + out['B'].astype(str)
​
    return out[['newcol','C']]
​
get_wrong_ls(df1,df2)
​

Output

   newcol   C
1   2_22    222

JavaScript
 
   newcol   C
1   2_22    222
​

Advertisement

Answer