I have been scratching my head since morning on how to tackle this problem. I need to pass multiple columns from multiple dataframes to a function as arguments.
Example : Df1
A | B | C |
---|---|---|
1 | 11 | 111,333 |
2 | 22 | 222 |
3 | 33 | nan |
Df2
D | E |
---|---|
a | 111 |
b | 333 |
Now, I want all the rows from Df2(col E) which are not present in the Df1 (Col C) along with DF1(A,B) concatenated.
The output I want is below:
newcol | C |
---|---|
2_22 | 222 |
I have written below code snippet to do the same but I’m stuck at how to apply it to the dataframes.
def get_wrong_ls(a,b,c,d): wrong_l_list = [] wrong_l_dict = {} if c is not None: c_list = str(c).split(",") for ele in c_list: if ele not in d: wrong_l_dict[str(a)+"_"+str(b)] = c wrong_l_list.append(wrong_l_dict) return pd.DataFrame(wrong_l_list)
it returns correct output when the strings are passed. But I’m stuck at how to pass the dataframes directly. The below code behaves weirdly and also the “nan” in the col C doesn’t get filtered out.
fd = get_wrong_ls(df1['A'],df1['B'],df1['C'],df2['E'])
Please help.
Advertisement
Answer
import pandas as pd import numpy as np df1 = pd.DataFrame({'A': [1, 2, 3], 'B': [11, 22, 33], 'C': ['111,333', '222', np.nan]}) df2 = pd.DataFrame({'D': ['a', 'b'], 'E': [111, 333]}) def get_wrong_ls(df1, df2): df1.dropna(inplace=True) df1['C'] = df1['C'].str.split(',') df1 = df1.explode('C').astype(int) out = df1[~df1['C'].isin(df2['E'])].copy() out['newcol'] = out['A'].astype(str) + '_' + out['B'].astype(str) return out[['newcol','C']] get_wrong_ls(df1,df2)
Output
newcol C 1 2_22 222