Skip to content
Advertisement

Change certain values in a dataframe column based on conditions on several columns

Let’s take this sample dataframe :

df = pd.DataFrame({'Category':['A','B','B','B','A'], 'Subcategory':['C','C','E','D','D']})

  Category Subcategory
0        A           C
1        B           C
2        B           E
3        B           D
4        A           D

I would like to replace the “B” values in Category by “B2” where there is a C or a D in Subcategory. I tried the following but I get the error “The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()” :

df['Category']=np.where((df['Category']=='B') and (df['Subcategory']=='C' or df['Subcategory']== 'D'),'B2',df['Category'])

I know that some similar questions are already asked on Stackoverflow but I can’t figure out where I am wrong. Could you please help me to understand my error ?

Expected output :

  Category Subcategory
0        A           C
1       B2           C
2        B           E
3       B2           D
4        A           D

Advertisement

Answer

Use bitwise operator and brackets with every conditions. Refer this for explanation

df['Category']=np.where((df['Category']=='B')
                        & ((df['Subcategory']=='C')
                           | (df['Subcategory']== 'D')),'B2',df['Category'])
df

Category    Subcategory
0   A   C
1   B2  C
2   B   E
3   B2  D
4   A   D

And better to use isin to check multiple values, df["Subcategory"].isin(['C', 'D'])

User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement