Skip to content
Advertisement

How to filter column names from multiindex dataframe for a specific condition?

df1 = pd.DataFrame(
{
    "empid" : [1,2,3,4,5,6],
    "empname" : ['a', 'b','c','d','e','f'],
    "empcity" : ['aa','bb','cc','dd','ee','ff']
})
df1

df2 = pd.DataFrame(
{
    "empid" : [1,2,3,4,5,6],
    "empname" : ['a', 'b','m','d','n','f'],
    "empcity" : ['aa','bb','cc','ddd','ee','fff']
})
df2

df_all = pd.concat([df1.set_index('empid'),df2.set_index('empid')],axis='columns',keys=['first','second'])
df_all

df_final = df_all.swaplevel(axis = 'columns')[df1.columns[1:]]
df_final

orig = df1.columns[1:].tolist()
print (orig)
['empname', 'empcity']

df_final = (df_all.stack()
                  .assign(comparions=lambda x: x['first'].eq(x['second']))
                  .unstack()
                  .swaplevel(axis = 'columns')
                  .reindex(orig, axis=1, level=0))
print (df_final)

How to filter level[0] column name list where comparions = False from the dataframe df_final(consider there are more than 300 column like this at level 0)

enter image description here

Advertisement

Answer

First test if in level comparions are all Trues by DataFrame.xs with DataFrame.all:

s = df_final.xs('comparions', level=1, axis=1).all()

And then invert mask for test at least one False with filter indices:

L = s.index[~s].tolist()
print (L)
['empname', 'empcity']
User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement