Boolean Masking on a Pandas Dataframe where columns may not exist

Question

I have a dataframe called compare that looks like this: Resident 1xdisc 1xdisc_doc conpark parking parking_doc conmil conmil_doc pest pest_doc pet pet1x pet_doc rent rent_doc stlc storage trash trash_doc water water_doc John 0 -500 0 50 50 0 0 3 3 0 0 0 1803 1803 0 0 30 30 0 0 Cheldone -500 0 0 50 50 0 0

Accepted Answer

If the column names doesn&#8217;t always exist, you can either add the columns  that doesn&#8217;t exist which I don&#8217;t think will be a good idea since you will have to replicate the corresponding columns which will eventually increase the size of the dataframe.So, another approach might be to filter the column names themselves and take only the column pairs that exists:Given DataFrame:>>> df.head(3)  Resident  1xdisc  1xdisc_doc  conpark  parking  parking_doc  conmil  conmil_doc  pest  pest_doc  pet  pet1x  pet_doc  rent  rent_doc  stlc  storage  trash  trash_doc  water  water_doc 0   Acacia       0           0        0        0            0       0        -500   3.0       3.0    0      0       70  2067      2067     0        0     15         15      0           01   ashley       0           0        0        0            0       0           0   3.0       3.0    0      0        0  2067      2067     0        0     15         15      0           02   Sheila       0           0        0        0            0       0           0   0.0       0.0    0      0        0  1574      1574     0        0      0          0      0           0Take out the columns pairs:>>> maskingCols = [(col[:-4], col) for col in df if col[:-4] in df and col.endswith('_doc')]maskingCols[('1xdisc', '1xdisc_doc'), ('parking', 'parking_doc'), ('conmil', 'conmil_doc'), ('pest', 'pest_doc'), ('pet', 'pet_doc'), ('rent', 'rent_doc'), ('trash', 'trash_doc')]Now that you have the column pairs, you can create the expression required to mask the dataframe.>>> "|".join(f"(df['{col1}'] != df['{col2}'])" for col1, col2 in maskingCols)"(df['1xdisc'] != df['1xdisc_doc'])|(df['parking'] != df['parking_doc'])|(df['conmil'] != df['conmil_doc'])|(df['pest'] != df['pest_doc'])|(df['pet'] != df['pet_doc'])|(df['rent'] != df['rent_doc'])|(df['trash'] != df['trash_doc'])"You can simply pass this expression string to eval function to evaluate it.>>> eval("|".join(f"(df['{col1}'] != df['{col2}'])" for col1, col2 in maskingCols))You can add other criteria other than this masking:>>> eval("|".join(f"(df['{col1}'] != df['{col2}'])" for col1, col2 in maskingCols)) | ((df['1xdisc']!=df['1xdisc_doc']) & (df['conpark']!=df['1xdisc']))0     True1    False2    False3    False4     True5    False6    False7    False8     True9    Falsedtype: boolYou can use it to get your desired dataframe:>>> df[eval("|".join(f"(df['{col1}'] != df['{col2}'])" for col1, col2 in maskingCols)) | ((df['1xdisc']!=df['1xdisc_doc']) & (df['conpark']!=df['1xdisc']))]OUTPUT: Resident  1xdisc  1xdisc_doc  conpark  parking  parking_doc  conmil  conmil_doc  pest  pest_doc  pet  pet1x  pet_doc  rent  rent_doc  stlc  storage  trash  trash_doc  water  water_doc 0    Acacia       0           0        0        0            0       0        -500   3.0       3.0    0      0       70  2067      2067     0        0     15         15      0           04  Danielle       0           0        0        0            0       0           0   0.0       0.0    0      0        0  1422         0     0        0      0          0      0           08   Shajuan       0           0        0        0            0       0           0   0.0       0.0    0      0        0  1768         0     0        0      0          0      0           0

Resident	1xdisc	1xdisc_doc	parking	parking_doc	pest	pest_doc	rent	rent_doc	trash	trash_doc
John	0	-500	50	50	3	3	1803	1803	30	30
Cheldone	-500	0	50	50	1.25	1.25	1565	1565	30	30
Dieu	-300	-300	0	0	3	3	1372	1372	18	18

Advertisement

Answer